MDPI - Publisher of Open Access Journals

25 pages, 4296 KiB

Open AccessArticle

StripSurface-YOLO: An Enhanced Yolov8n-Based Framework for Detecting Surface Defects on Strip Steel in Industrial Environments

by Haomin Li, Huanzun Zhang and Wenke Zang

Electronics 2025, 14(15), 2994; https://doi.org/10.3390/electronics14152994 - 27 Jul 2025

Viewed by 314

Abstract

Recent advances in precision manufacturing and high-end equipment technologies have imposed ever more stringent requirements on the accuracy, real-time performance, and lightweight design of online steel strip surface defect detection systems. To reconcile the persistent trade-off between detection precision and inference efficiency in [...] Read more.

Recent advances in precision manufacturing and high-end equipment technologies have imposed ever more stringent requirements on the accuracy, real-time performance, and lightweight design of online steel strip surface defect detection systems. To reconcile the persistent trade-off between detection precision and inference efficiency in complex industrial environments, this study proposes StripSurface–YOLO, a novel real-time defect detection framework built upon YOLOv8n. The core architecture integrates an Efficient Cross-Stage Local Perception module (ResGSCSP), which synergistically combines GSConv lightweight convolutions with a one-shot aggregation strategy, thereby markedly reducing both model parameters and computational complexity. To further enhance multi-scale feature representation, this study introduces an Efficient Multi-Scale Attention (EMA) mechanism at the feature-fusion stage, enabling the network to more effectively attend to critical defect regions. Moreover, conventional nearest-neighbor upsampling is replaced by DySample, which produces deeper, high-resolution feature maps enriched with semantic content, improving both inference speed and fusion quality. To heighten sensitivity to small-scale and low-contrast defects, the model adopts Focal Loss, dynamically adjusting to sample difficulty. Extensive evaluations on the NEU-DET dataset demonstrate that StripSurface–YOLO reduces FLOPs by 11.6% and parameter count by 7.4% relative to the baseline YOLOv8n, while achieving respective improvements of 1.4%, 3.1%, 4.1%, and 3.0% in precision, recall, mAP₅₀, and mAP_50:95. Under adverse conditions—including contrast variations, brightness fluctuations, and Gaussian noise—SteelSurface-YOLO outperforms the baseline model, delivering improvements of 5.0% in mAP₅₀ and 4.7% in mAP_50:95, attesting to the model’s robust interference resistance. These findings underscore the potential of StripSurface–YOLO to meet the rigorous performance demands of real-time surface defect detection in the metal forging industry. Full article

► Show Figures

Figure 1

19 pages, 2468 KiB

Open AccessArticle

A Dual-Branch Spatial-Frequency Domain Fusion Method with Cross Attention for SAR Image Target Recognition

by Chao Li, Jiacheng Ni, Ying Luo, Dan Wang and Qun Zhang

Remote Sens. 2025, 17(14), 2378; https://doi.org/10.3390/rs17142378 - 10 Jul 2025

Viewed by 380

Abstract

Synthetic aperture radar (SAR) image target recognition has important application values in security reconnaissance and disaster monitoring. However, due to speckle noise and target orientation sensitivity in SAR images, traditional spatial domain recognition methods face challenges in accuracy and robustness. To effectively address [...] Read more.

Synthetic aperture radar (SAR) image target recognition has important application values in security reconnaissance and disaster monitoring. However, due to speckle noise and target orientation sensitivity in SAR images, traditional spatial domain recognition methods face challenges in accuracy and robustness. To effectively address these challenges, we propose a dual-branch spatial-frequency domain fusion recognition method with cross-attention, achieving deep fusion of spatial and frequency domain features. In the spatial domain, we propose an enhanced multi-scale feature extraction module (EMFE), which adopts a multi-branch parallel structure to effectively enhance the network’s multi-scale feature representation capability. Combining frequency domain guided attention, the model focuses on key regional features in the spatial domain. In the frequency domain, we design a hybrid frequency domain transformation module (HFDT) that extracts real and imaginary features through Fourier transform to capture the global structure of the image. Meanwhile, we introduce a spatially guided frequency domain attention to enhance the discriminative capability of frequency domain features. Finally, we propose a cross-domain feature fusion (CDFF) module, which achieves bidirectional interaction and optimal fusion of spatial-frequency domain features through cross attention and adaptive feature fusion. Experimental results demonstrate that our method achieves significantly superior recognition accuracy compared to existing methods on the MSTAR dataset. Full article

(This article belongs to the Special Issue Synthetic Aperture Radar (SAR) Image Object Detection and Information Extraction: Methods and Applications (Second Edition))

► Show Figures

Figure 1

21 pages, 2238 KiB

Open AccessArticle

DMLU-Net: A Hybrid Neural Network for Water Body Extraction from Remote Sensing Images

by Ziqiang Xu, Mingfeng Li and Haixiang Guo

Appl. Sci. 2025, 15(14), 7733; https://doi.org/10.3390/app15147733 - 10 Jul 2025

Viewed by 214

Abstract

The delineation of aquatic features from satellite remote sensing data is vital for environmental monitoring and disaster early warning. However, existing water body detection models struggle with cross-scale feature extraction, often failing to resolve blurred boundaries, and they under-detect small water bodies in [...] Read more.

The delineation of aquatic features from satellite remote sensing data is vital for environmental monitoring and disaster early warning. However, existing water body detection models struggle with cross-scale feature extraction, often failing to resolve blurred boundaries, and they under-detect small water bodies in complex landscapes. To tackle these challenges, in this study, we present DMLU-Net, a U-shaped neural network integrated with a dynamic multi-kernel large-scale attention mechanism. The model employs a dynamic multi-kernel large-scale attention module (DMLKA) to enhance cross-scale feature capture; a spectral–spatial attention module (SSAM) in the decoder to boost water region sensitivity; and a dynamic upsampling module (DySample) in the encoder to restore image details. DMLU-Net and six models are tested and compared on two publicly available Chinese remote sensing datasets. The results show that the F1-scores of DMLU-net on the two datasets are 94.50% and 86.86%, and the IoU (Intersection over Union) values are 90.46% and 77.74%, both demonstrating the best performance. Notably, the model significantly reduces water boundary artifacts, and it improves overall prediction accuracy and small water body recognition, thus verifying its generalization ability and practical application potential in real-world scenarios. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

28 pages, 14588 KiB

Open AccessArticle

CAU²DNet: A Dual-Branch Deep Learning Network and a Dataset for Slum Recognition with Multi-Source Remote Sensing Data

by Xi Lyu, Chenyu Zhang, Lizhi Miao, Xiying Sun, Xinxin Zhou, Xinyi Yue, Zhongchang Sun and Yueyong Pang

Remote Sens. 2025, 17(14), 2359; https://doi.org/10.3390/rs17142359 - 9 Jul 2025

Viewed by 244

Abstract

The efficient and precise identification of urban slums is a significant challenge for urban planning and sustainable development, as their morphological diversity and complex spatial distribution make it difficult to use traditional remote sensing inversion methods. Current deep learning (DL) methods mainly face [...] Read more.

The efficient and precise identification of urban slums is a significant challenge for urban planning and sustainable development, as their morphological diversity and complex spatial distribution make it difficult to use traditional remote sensing inversion methods. Current deep learning (DL) methods mainly face challenges such as limited receptive fields and insufficient sensitivity to spatial locations when integrating multi-source remote sensing data, and high-quality datasets that integrate multi-spectral and geoscientific indicators to support them are scarce. In response to these issues, this study proposes a DL model (coordinate-attentive U²-DeepLab network [CAU²DNet]) that integrates multi-source remote sensing data. The model integrates the multi-scale feature extraction capability of U²-Net with the global receptive field advantage of DeepLabV3+ through a dual-branch architecture. Thereafter, the spatial semantic perception capability is enhanced by introducing the CoordAttention mechanism, and ConvNextV2 is adopted to optimize the backbone network of the DeepLabV3+ branch, thereby improving the modeling capability of low-resolution geoscientific features. The two branches adopt a decision-level fusion mechanism for feature fusion, which means that the results of each are weighted and summed using learnable weights to obtain the final output feature map. Furthermore, this study constructs the São Paulo slums dataset for model training due to the lack of a multi-spectral slum dataset. This dataset covers 7978 samples of 512 × 512 pixels, integrating high-resolution RGB images, Normalized Difference Vegetation Index (NDVI)/Modified Normalized Difference Water Index (MNDWI) geoscientific indicators, and POI infrastructure data, which can significantly enrich multi-source slum remote sensing data. Experiments have shown that CAU²DNet achieves an intersection over union (IoU) of 0.6372 and an F1 score of 77.97% on the São Paulo slums dataset, indicating a significant improvement in accuracy over the baseline model. The ablation experiments verify that the improvements made in this study have resulted in a 16.12% increase in precision. Moreover, CAU²DNet also achieved the best results in all metrics during the cross-domain testing on the WHU building dataset, further confirming the model’s generalizability. Full article

(This article belongs to the Special Issue Target Detection, Recognition, Tracking, and Positioning Using Remote Sensing and AI Techniques)

► Show Figures

Figure 1

20 pages, 1166 KiB

Open AccessArticle

MSP-EDA: Multivariate Time Series Forecasting Based on Multiscale Patches and External Data Augmentation

by Shunhua Peng, Wu Sun, Panfeng Chen, Huarong Xu, Dan Ma, Mei Chen, Yanhao Wang and Hui Li

Electronics 2025, 14(13), 2618; https://doi.org/10.3390/electronics14132618 - 28 Jun 2025

Viewed by 325

Abstract

Accurate multivariate time series forecasting remains a major challenge in various real-world applications, primarily due to the limitations of existing models in capturing multiscale temporal dependencies and effectively integrating external data. To address these issues, we propose MSP-EDA, a novel multivariate time series [...] Read more.

Accurate multivariate time series forecasting remains a major challenge in various real-world applications, primarily due to the limitations of existing models in capturing multiscale temporal dependencies and effectively integrating external data. To address these issues, we propose MSP-EDA, a novel multivariate time series forecasting framework that integrates multiscale patching and external data enhancement. Specifically, MSP-EDA utilizes the Discrete Fourier Transform (DFT) to extract dominant global periodic patterns and employs an adaptive Continuous Wavelet Transform (CWT) to capture scale-sensitive local variations. In addition, multiscale patches are constructed to capture temporal patterns at different resolutions, and a specialized encoder is designed for each scale. Each encoder incorporates temporal attention, channel correlation attention, and cross-attention with external data to capture intra-scale temporal dependencies, inter-variable correlations, and external influences, respectively. To fuse information from different temporal scales, we introduce a trainable global token that serves as a variable-wise aggregator across scales. Extensive experiments on four public benchmark datasets and three real-world vector database datasets that we collect demonstrate that MSP-EDA consistently outperforms state-of-the-art methods, achieving particularly notable improvements on vector database workloads. Ablation studies further confirm the effectiveness of each module and the adaptability of MSP-EDA to complex forecasting scenarios involving external dependencies. Full article

(This article belongs to the Special Issue Machine Learning in Data Analytics and Prediction)

► Show Figures

Figure 1

24 pages, 2802 KiB

Open AccessArticle

MSDCA: A Multi-Scale Dual-Branch Network with Enhanced Cross-Attention for Hyperspectral Image Classification

by Ning Jiang, Shengling Geng, Yuhui Zheng and Le Sun

Remote Sens. 2025, 17(13), 2198; https://doi.org/10.3390/rs17132198 - 26 Jun 2025

Viewed by 374

Abstract

The high dimensionality of hyperspectral data, coupled with limited labeled samples and complex scene structures, makes spatial–spectral feature learning particularly challenging. To address these limitations, we propose a dual-branch deep learning framework named MSDCA, which performs spatial–spectral joint modeling under limited supervision. First, [...] Read more.

The high dimensionality of hyperspectral data, coupled with limited labeled samples and complex scene structures, makes spatial–spectral feature learning particularly challenging. To address these limitations, we propose a dual-branch deep learning framework named MSDCA, which performs spatial–spectral joint modeling under limited supervision. First, a multiscale 3D spatial–spectral feature extraction module (3D-SSF) employs parallel 3D convolutional branches with diverse kernel sizes and dilation rates, enabling hierarchical modeling of spatial–spectral representations from large-scale patches and effectively capturing both fine-grained textures and global context. Second, a multi-branch directional feature module (MBDFM) enhances the network’s sensitivity to directional patterns and long-range spatial relationships. It achieves this by applying axis-aware depthwise separable convolutions along both horizontal and vertical axes, thereby significantly improving the representation of spatial features. Finally, the enhanced cross-attention Transformer encoder (ECATE) integrates a dual-branch fusion strategy, where a cross-attention stream learns semantic dependencies across multi-scale tokens, and a residual path ensures the preservation of structural integrity. The fused features are further refined through lightweight channel and spatial attention modules. This adaptive alignment process enhances the discriminative power of heterogeneous spatial–spectral features. The experimental results on three widely used benchmark datasets demonstrate that the proposed method consistently outperforms state-of-the-art approaches in terms of classification accuracy and robustness. Notably, the framework is particularly effective for small-sample classes and complex boundary regions, while maintaining high computational efficiency. Full article

► Show Figures

Graphical abstract

20 pages, 4391 KiB

Open AccessArticle

GDS-YOLOv7: A High-Performance Model for Water-Surface Obstacle Detection Using Optimized Receptive Field and Attention Mechanisms

by Xu Yang, Lei Huang, Fuyang Ke, Chao Liu, Ruixue Yang and Shicheng Xie

ISPRS Int. J. Geo-Inf. 2025, 14(7), 238; https://doi.org/10.3390/ijgi14070238 - 23 Jun 2025

Viewed by 311

Abstract

Unmanned ships, equipped with self-navigation and image processing capabilities, are progressively expanding their applications in fields such as mining, fisheries, and marine environments. Along with this development, issues concerning waterborne traffic safety are gradually emerging. To address the challenges of navigation and obstacle [...] Read more.

Unmanned ships, equipped with self-navigation and image processing capabilities, are progressively expanding their applications in fields such as mining, fisheries, and marine environments. Along with this development, issues concerning waterborne traffic safety are gradually emerging. To address the challenges of navigation and obstacle detection on the water’s surface, this paper presents CDS-YOLOv7, an enhanced obstacle-detection framework for aquatic environments, architecturally evolved from YOLOv7. The proposed system implements three key innovations: (1) Architectural optimization through replacement of the Spatial Pyramid Pooling Cross Stage Partial Connections (SPPCSPC) module with GhostSPPCSPC for expanded receptive field representation. (2) Integration of a parameter-free attention mechanism (SimAM) with refined pooling configurations to boost multi-scale detection sensitivity, and (3) Strategic deployment of depthwise separable convolutions (DSC) to reduce computational complexity while maintaining detection fidelity. Furthermore, we develop a Spatial–Channel Synergetic Attention (SCSA) mechanism to counteract feature degradation in convolutional operations, embedding this module within the Extended Effective Long-Range Aggregation Network (E-ELAN) network to enhance contextual awareness. Experimental results reveal the model’s superiority over baseline YOLOv7, achieving 4.9% mean average precision@0.5 (mAP@0.5), +4.3% precision (P), and +6.9% recall (R) alongside a 22.8% reduction in Giga Floating-point Operations Per Second (GFLOPS). Full article

(This article belongs to the Topic State-of-the-Art Object Detection, Tracking, and Recognition Techniques)

► Show Figures

Figure 1

18 pages, 2488 KiB

Open AccessArticle

An Improved Segformer for Semantic Segmentation of UAV-Based Mine Restoration Scenes

by Feng Wang, Lizhuo Zhang, Tao Jiang, Zhuqi Li, Wangyu Wu and Yingchun Kuang

Sensors 2025, 25(12), 3827; https://doi.org/10.3390/s25123827 - 19 Jun 2025

Cited by 1 | Viewed by 569

Abstract

Mine ecological restoration is a critical process for promoting the sustainable development of resource-dependent regions, yet existing monitoring methods remain limited in accuracy and adaptability. To address challenges such as small-object recognition, insufficient multi-scale feature fusion, and blurred boundaries in UAV-based remote sensing [...] Read more.

Mine ecological restoration is a critical process for promoting the sustainable development of resource-dependent regions, yet existing monitoring methods remain limited in accuracy and adaptability. To address challenges such as small-object recognition, insufficient multi-scale feature fusion, and blurred boundaries in UAV-based remote sensing imagery, this paper proposes an enhanced semantic segmentation model based on Segformer. Specifically, a multi-scale feature-enhanced feature pyramid network (MSFE-FPN) is introduced between the encoder and decoder to strengthen cross-level feature interaction. Additionally, a selective feature aggregation pyramid pooling module (SFA-PPM) is integrated into the deepest feature layer to improve global semantic perception, while an efficient local attention (ELA) module is embedded into lateral connections to enhance sensitivity to edge structures and small-scale targets. A high-resolution UAV image dataset, named the HUNAN Mine UAV Dataset (HNMUD), is constructed to evaluate model performance, and further validation is conducted on the public Aeroscapes dataset. Experimental results demonstrated that the proposed method exhibited strong performance in terms of segmentation accuracy and generalization ability, effectively supporting the image analysis needs of mine restoration scenes. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

18 pages, 19551 KiB

Open AccessArticle

FAD-Net: Automated Framework for Steel Surface Defect Detection in Urban Infrastructure Health Monitoring

by Nian Wang, Yue Chen, Weiang Li, Liyang Zhang and Jinghong Tian

Big Data Cogn. Comput. 2025, 9(6), 158; https://doi.org/10.3390/bdcc9060158 - 13 Jun 2025

Viewed by 594

Abstract

Steel plays a fundamental role in modern smart city development, where its surface structural integrity is decisive for operational safety and long-term sustainability. While deep learning approaches show promise, their effectiveness remains limited by inadequate receptive field adaptability, suboptimal feature fusion strategies, and [...] Read more.

Steel plays a fundamental role in modern smart city development, where its surface structural integrity is decisive for operational safety and long-term sustainability. While deep learning approaches show promise, their effectiveness remains limited by inadequate receptive field adaptability, suboptimal feature fusion strategies, and insufficient sensitivity to small defects. To overcome these limitations, we propose FAD-Net, a deep learning framework specifically designed for surface defect detection in steel materials within urban infrastructure. The network incorporates three key innovations: The RFCAConv module, which leverages dynamic receptive field construction and coordinate attention mechanisms to enhance feature representation for defects with long-range spatial dependencies and low-contrast characteristics. The MSDFConv module, employing multi-scale dilated convolutions with optimized dilation rates to preserve fine details while expanding the receptive field. An Auxiliary Head that introduces hierarchical supervision to improve the detection of small-scale defects. Experiments on the GC10-DET dataset showed that FAD-Net achieved 5.0% higher mAP@0.5 than baseline models. Cross-dataset validation with NEU and RDD2022 further confirmed its robustness. These results demonstrate FAD-Net’s effectiveness for automated infrastructure health monitoring. Full article

(This article belongs to the Special Issue Evolutionary Computation and Artificial Intelligence: Building a Sustainable Future for Smart Cities)

► Show Figures

Figure 1

20 pages, 7892 KiB

Open AccessArticle

Tissue Distribution and Pharmacokinetic Characteristics of Aztreonam Based on Multi-Species PBPK Model

by Xiao Ye, Xiaolong Sun, Jianing Zhang, Min Yu, Nie Wen, Xingchao Geng and Ying Liu

Pharmaceutics 2025, 17(6), 748; https://doi.org/10.3390/pharmaceutics17060748 - 6 Jun 2025

Viewed by 669

Abstract

Background/Objectives: As a monocyclic β-lactam antibiotic, aztreonam has regained attention recently because combining it with β-lactamase inhibitors helps fight drug-resistant bacteria. This study aimed to systematically characterize the plasma and tissue concentration-time profiles of aztreonam in rats, mice, dogs, monkeys, and humans [...] Read more.

Background/Objectives: As a monocyclic β-lactam antibiotic, aztreonam has regained attention recently because combining it with β-lactamase inhibitors helps fight drug-resistant bacteria. This study aimed to systematically characterize the plasma and tissue concentration-time profiles of aztreonam in rats, mice, dogs, monkeys, and humans by developing a multi-species, physiologically based pharmacokinetic (PBPK) model. Methods: A rat PBPK model was optimized and validated using plasma concentration-time curves determined by liquid chromatography–tandem mass spectrometry (LC-MS/MS) following intravenous administration, with reliability confirmed through another dose experiment. The rat model characteristics, modeling experience, ADMET Predictor (11.0) software prediction results, and allometric scaling were used to extrapolate to mouse, human, dog, and monkey models. The tissue-to-plasma partition coefficients (K_p values) were predicted using GastroPlus (9.0) software, and the sensitivity analyses of key parameters were evaluated. Finally, the cross-species validation was performed using the average fold error (AFE) and absolute relative error (ARE). Results: The cross-species validation showed that the model predictions were highly consistent with the experimental data (AFE < 2, ARE < 30%), but the deviation of the volume of distribution (V_ss) in dogs and monkeys suggested the need to supplement the species-specific parameters to optimize the prediction accuracy. The K_p values revealed a high distribution of aztreonam in the kidneys (K_p = 2.0–3.0), which was consistent with its clearance mechanism dominated by renal excretion. Conclusions: The PBPK model developed in this study can be used to predict aztreonam pharmacokinetics across species, elucidating its renal-targeted distribution and providing key theoretical support for the clinical dose optimization of aztreonam, the assessment of target tissue exposure in drug-resistant bacterial infections, and the development of combination therapy strategies. Full article

(This article belongs to the Section Pharmacokinetics and Pharmacodynamics)

► Show Figures

Figure 1

35 pages, 16759 KiB

Open AccessArticle

A Commodity Recognition Model Under Multi-Size Lifting and Lowering Sampling

by Mengyuan Chen, Song Chen, Kai Xie, Bisheng Wu, Ziyu Qiu, Haofei Xu and Jianbiao He

Electronics 2025, 14(11), 2274; https://doi.org/10.3390/electronics14112274 - 2 Jun 2025

Viewed by 514

Abstract

Object detection algorithms have evolved from two-stage to single-stage architectures, with foundation models achieving sustained improvements in accuracy. However, in intelligent retail scenarios, small object detection and occlusion issues still lead to significant performance degradation. To address these challenges, this paper proposes an [...] Read more.

Object detection algorithms have evolved from two-stage to single-stage architectures, with foundation models achieving sustained improvements in accuracy. However, in intelligent retail scenarios, small object detection and occlusion issues still lead to significant performance degradation. To address these challenges, this paper proposes an improved model based on YOLOv11, focusing on resolving insufficient multi-scale feature coupling and occlusion sensitivity. First, a multi-scale feature extraction network (MFENet) is designed. It splits input feature maps into dual branches along the channel dimension: the upper branch performs local detail extraction and global semantic enhancement through secondary partitioning, while the lower branch integrates CARAFE (content-aware reassembly of features) upsampling and SENet (squeeze-and-excitation network) channel weight matrices to achieve adaptive feature enhancement. The three feature streams are fused to output multi-scale feature maps, significantly improving small object detail retention. Second, a convolutional block attention module (CBAM) is introduced during feature fusion, dynamically focusing on critical regions through channel–spatial dual attention mechanisms. A fuseModule is designed to aggregate multi-level features, enhancing contextual modeling for occluded objects. Additionally, the extreme-IoU (XIoU) loss function replaces the traditional complete-IoU (CIoU), combined with XIoU-NMS (extreme-IoU non-maximum suppression) to suppress redundant detections, optimizing convergence speed and localization accuracy. Experiments demonstrate that the improved model achieves a mean average precision (mAP50) of 0.997 (0.2% improvement) and mAP50-95 of 0.895 (3.5% improvement) on the RPC product dataset and the 6th Product Recognition Challenge dataset. The recall rate increases to 0.996 (0.6% improvement over baseline). Although frames per second (FPS) decreased compared to the original model, the improved model still meets real-time requirements for retail scenarios. The model exhibits stable noise resistance in challenging environments and achieves 84% mAP in cross-dataset testing, validating its generalization capability and engineering applicability. Video streams were captured using a Zhongweiaoke camera operating at 60 fps, satisfying real-time detection requirements for intelligent retail applications. Full article

(This article belongs to the Special Issue Emerging Technologies in Computational Intelligence)

► Show Figures

Figure 1

17 pages, 1535 KiB

Open AccessArticle

Attention-Based Multi-Scale Graph Fusion Hashing for Fast Cross-Modality Image–Text Retrieval

by Jiayi Li and Gengshen Wu

Symmetry 2025, 17(6), 861; https://doi.org/10.3390/sym17060861 - 1 Jun 2025

Viewed by 487

Abstract

In recent years, hashing-based algorithms have garnered significant attention as vital technologies for cross-modal retrieval tasks. They leverage the inherent symmetry between different data modalities (e.g., text, images, or audio) to bridge their semantic gaps by embedding them into a unified representation space. [...] Read more.

In recent years, hashing-based algorithms have garnered significant attention as vital technologies for cross-modal retrieval tasks. They leverage the inherent symmetry between different data modalities (e.g., text, images, or audio) to bridge their semantic gaps by embedding them into a unified representation space. This symmetry-preserving approach would greatly enhance retrieval performance. However, challenges persist in mining and enriching multi-modal semantic feature information. Most current methods use pre-trained models for feature extraction, which limits information representation during hash code learning. Additionally, these methods map multi-modal data into a unified space, but this mapping is sensitive to feature distribution variations, potentially degrading cross-modal retrieval performance. To tackle these challenges, this paper introduces a novel method called Attention-based Multi-scale Graph Fusion Hashing (AMGFH). This approach first enhances the semantic representation of image features through multi-scale learning via an image feature enhancement network. Additionally, graph convolutional networks (GCNs) are employed to fuse multi-modal features, where the self-attention mechanism is incorporated to enhance feature representation by dynamically adjusting the weights of less relevant features. By optimizing a combination of loss functions and addressing the diverse requirements of image and text features, the proposed model demonstrates superior performance across various dimensions. Extensive experiments conducted on public datasets further confirm its outstanding performance. For instance, AMGFH exceeds the most competitive baseline by 3% and 4.7% in terms of mean average precision (MAP) when performing image-to-text and text-to-image retrieval tasks at 32 bits on the MS COCO dataset. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

25 pages, 11680 KiB

Open AccessArticle

ETAFHrNet: A Transformer-Based Multi-Scale Network for Asymmetric Pavement Crack Segmentation

by Chao Tan, Jiaqi Liu, Zhedong Zhao, Rufei Liu, Peng Tan, Aishu Yao, Shoudao Pan and Jingyi Dong

Appl. Sci. 2025, 15(11), 6183; https://doi.org/10.3390/app15116183 - 30 May 2025

Viewed by 635

Abstract

Accurate segmentation of pavement cracks from high-resolution remote sensing imagery plays a crucial role in automated road condition assessment and infrastructure maintenance. However, crack structures often exhibit asymmetry, irregular morphology, and multi-scale variations, posing significant challenges to conventional CNN-based methods in real-world environments. [...] Read more.

Accurate segmentation of pavement cracks from high-resolution remote sensing imagery plays a crucial role in automated road condition assessment and infrastructure maintenance. However, crack structures often exhibit asymmetry, irregular morphology, and multi-scale variations, posing significant challenges to conventional CNN-based methods in real-world environments. Specifically, the proposed ETAFHrNet focuses on two predominant pavement-distress morphologies—linear cracks (transverse and longitudinal) and alligator cracks—and has been empirically validated on their intersections and branching patterns over both asphalt and concrete road surfaces. In this work, we present ETAFHrNet, a novel attention-guided segmentation network designed to address the limitations of traditional architectures in detecting fine-grained and asymmetric patterns. ETAFHrNet integrates Transformer-based global attention and multi-scale hybrid feature fusion, enhancing both contextual perception and detail sensitivity. The network introduces two key modules: the Efficient Hybrid Attention Transformer (EHAT), which captures long-range dependencies, and the Cross-Scale Hybrid Attention Module (CSHAM), which adaptively fuses features across spatial resolutions. To support model training and benchmarking, we also propose QD-Crack, a high-resolution, pixel-level annotated dataset collected from real-world road inspection scenarios. Experimental results show that ETAFHrNet significantly outperforms existing methods—including U-Net, DeepLabv3+, and HRNet—in both segmentation accuracy and generalization ability. These findings demonstrate the effectiveness of interpretable, multi-scale attention architectures in complex object detection and image classification tasks, making our approach relevant for broader applications, such as autonomous driving, remote sensing, and smart infrastructure systems. Full article

(This article belongs to the Special Issue Object Detection and Image Classification)

► Show Figures

Figure 1

23 pages, 1862 KiB

Open AccessArticle

The Influence of Consumption Purpose on Consumer Preferences for Fruit Attributes: The Moderating Effect of Color Perception

by Yihan Wang, Lingying Liu and Yangyang Wei

Foods 2025, 14(11), 1902; https://doi.org/10.3390/foods14111902 - 27 May 2025

Viewed by 552

Abstract

With the increasing awareness of health among residents, consumers are paying more attention to their eating purposes and food safety when choosing fruits. This study aims to explore the impact of eating purpose on consumers’ preferences for fruits and fruit products under the [...] Read more.

With the increasing awareness of health among residents, consumers are paying more attention to their eating purposes and food safety when choosing fruits. This study aims to explore the impact of eating purpose on consumers’ preferences for fruits and fruit products under the mediation of color perception. The study obtained experimental data from 489 urban consumers in China through the Credamo data collection platform. Furthermore, four experimental groups were set up to propose six hypotheses based on the influence of eating purpose on consumer preferences for fruits and their products. The study utilized Likert scale questionnaires, chi-square tests, and variance analysis for data mining and cross-validation. The results indicate that the visual characteristics of fruits (especially color) affect the purchase preferences of consumers with different eating purposes. Approximately 65% of health-oriented consumers are highly sensitive to the color and nutritional value of fruits. They believe that fresh fruits are rich in natural nutrients and play an important role in maintaining health and preventing diseases. Meanwhile, around 62% of consumers with specific nutritional needs prefer processed fruit products, such as fruit preserves or dried fruits. These consumers have a weaker perception of color and focus primarily on the functionality of the fruits. Additionally, the study found that safety/taste preferences acted as a mediator and associative learning as a moderating variable. Around 58% of consumers indicated that their purchase preferences are influenced by safety and taste, and the relative importance of safety and taste preferences significantly mediated the relationship between eating purpose and purchase preferences. Under the moderating effect of associative learning, health-oriented consumers, when associative learning is activated, are about 45% more likely to choose fresh fruits. The study highlights consumers’ health-conscious perceptions in fruit selection, focusing on how color perception moderates the preference choices of different consumer groups based on their eating purposes. It emphasizes the need for businesses to adjust product positioning and marketing strategies according to consumer perceptions to promote broader healthy eating behaviors. Full article

(This article belongs to the Section Sensory and Consumer Sciences)

► Show Figures

Figure 1

21 pages, 4967 KiB

Open AccessArticle

Evaluation of MODIS and VIIRS BRDF Parameter Differences and Their Impacts on the Derived Indices

by Chenxia Wang, Ziti Jiao, Yaowei Feng, Jing Guo, Zhilong Li, Ge Gao, Zheyou Tan, Fangwen Yang, Sizhe Chen and Xin Dong

Remote Sens. 2025, 17(11), 1803; https://doi.org/10.3390/rs17111803 - 22 May 2025

Viewed by 518

Abstract

Multi-angle remote sensing observations play an important role in the remote sensing of solar radiation absorbed by land surfaces. Currently, the Moderate Resolution Imaging Spectroradiometer (MODIS) and Visible Infrared Imaging Radiometer Suite (VIIRS) teams have successively applied the Ross–Li kernel-driven bidirectional reflectance distribution [...] Read more.

Multi-angle remote sensing observations play an important role in the remote sensing of solar radiation absorbed by land surfaces. Currently, the Moderate Resolution Imaging Spectroradiometer (MODIS) and Visible Infrared Imaging Radiometer Suite (VIIRS) teams have successively applied the Ross–Li kernel-driven bidirectional reflectance distribution function (BRDF) model to integrate multi-angle observations to produce long time series BRDF model parameter products (MCD43 and VNP43), which can be used for the inversion of various surface parameters and the angle correction of remote sensing data. Even though the MODIS and VIIRS BRDF products originate from sensors and algorithms with similar designs, the consistency between BRDF parameters for different sensors is still unknown, and this likely affects the consistency and accuracy of various downstream parameter inversions. In this study, we applied BRDF model parameter time-series data from the overlapping period of the MODIS and VIIRS services to systematically analyze the temporal and spatial differences between the BRDF parameters and derived indices of the two sensors from the site scale to the region scale in the red band and NIR band, respectively. Then, we analyzed the sensitivity of the BRDF parameters to variations in Normalized Difference Hotspot–Darkspot (NDHD) and examined the spatiotemporal distribution of zero-valued pixels in the BRDF parameter products generated by the constraint method in the Ross–Li model from both sensors, assessing their potential impact on NDHD derivation. The results confirm that among the three BRDF parameters, the isotropic scattering parameters of MODIS and VIIRS are more consistent, whereas the volumetric and geometric-optical scattering parameters are more sensitive and variable; this performance is more pronounced in the red band. The indices derived from the MODIS and VIIRS BRDF parameters were compared, revealing increasing discrepancies between the albedo and typical directional reflectance and the NDHD. The isotropic scattering parameter and the volumetric scattering parameter show responses that are very sensitive to increases in the equal interval of the NDHD, indicating that the differences between the MODIS and VIIRS products may strongly influence the consistency of NDHD estimation. In addition, both MODIS and VIIRS have a large proportion of zero-valued pixels (volumetric and geometric-optical parameter layers), whereas the spatiotemporal distribution of zero-valued pixels in VIIRS is more widespread. While the zero-valued pixels have a minor influence on reflectance and albedo estimation, such pixels should be considered with attention to the estimation accuracy of the vegetation angular index, which relies heavily on anisotropic characteristics, e.g., the NDHD. This study reveals the need in optimizing the Clumping Index (CI)-NDHD algorithm to produce VIIRS CI product and highlights the importance of considering BRDF product quality flags for users in their specific applications. The method used in this study also helps improve the theoretical framework for cross-sensor product consistency assessment and clarify the uncertainty in high-precision ecological monitoring and various remote sensing applications. Full article

(This article belongs to the Special Issue Remote Sensing of Solar Radiation Absorbed by Land Surfaces)

► Show Figures

Figure 1

Search Results (55)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (55)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI