Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (46)

Search Parameters:
Keywords = multi-scale fusion retrieval

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 3776 KB  
Article
An Efficient Method for Retrieving Citrus Orchard Evapotranspiration Based on Multi-Source Remote Sensing Data Fusion from Unmanned Aerial Vehicles
by Zhiwei Zhang, Weiqi Zhang, Chenfei Duan, Shijiang Zhu and Hu Li
Agriculture 2025, 15(19), 2058; https://doi.org/10.3390/agriculture15192058 - 30 Sep 2025
Abstract
Severe water scarcity has become a critical constraint to global agricultural development. Enhancing both the timeliness and accuracy of crop evapotranspiration (ETc) retrieval is essential for optimizing irrigation scheduling. Addressing the limitations of conventional ground-based point-source measurements in rapidly acquiring [...] Read more.
Severe water scarcity has become a critical constraint to global agricultural development. Enhancing both the timeliness and accuracy of crop evapotranspiration (ETc) retrieval is essential for optimizing irrigation scheduling. Addressing the limitations of conventional ground-based point-source measurements in rapidly acquiring two-dimensional ETc information at the field scale, this study employed unmanned aerial vehicle (UAV) remote sensing equipped with multispectral and thermal infrared sensors to obtain high spatiotemporal resolution imagery of a representative citrus orchard (Citrus reticulata Blanco cv. ‘Yichangmiju’) in western Hubei at different phenological stages. In conjunction with meteorological data (air temperature, daily net radiation, etc.), ETc was retrieved using two established approaches: the Seguin-Itier (S-I) model, which relates canopy–air temperature differences to ETc, and the multispectral-driven single crop coefficient method, which estimates ETc by combining vegetation indices with reference evapotranspiration. The thermal-infrared-driven S-I model, which relates canopy–air temperature differences to ETc, and the multispectral-driven single crop coefficient method, which estimates ETc by combining vegetation indices with reference evapotranspiration. The findings indicate that: (1) both the S-I model and the single crop coefficient method achieved satisfactory ETc estimation accuracy, with the latter performing slightly better (accuracy of 80% and 85%, respectively); (2) the proposed multi-source fusion model consistently demonstrated high accuracy and stability across all phenological stages (R2 = 0.9104, 0.9851, and 0.9313 for the fruit-setting, fruit-enlargement, and coloration–sugar-accumulation stages, respectively; all significant at p < 0.01), significantly enhancing the precision and timeliness of ETc retrieval; and (3) the model was successfully applied to ETc retrieval during the main growth stages in the Cangwubang citrus-producing area of Yichang, providing practical support for irrigation scheduling and water resource management at the regional scale. This multi-source fusion approach offers effective technical support for precision irrigation control in agriculture and holds broad application prospects. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Graphical abstract

19 pages, 2533 KB  
Article
Effective Identification of Aircraft Boarding Tools Using Lightweight Network with Large Language Model-Assisted Detection and Data Analysis
by Anan Zhao, Jia Yin, Wei Wang, Zhonghua Guo and Liqiang Zhu
Electronics 2025, 14(13), 2702; https://doi.org/10.3390/electronics14132702 - 4 Jul 2025
Viewed by 399
Abstract
Frequent and complex boarding operations require an effective management process for specialized tools. Traditional manual statistical analysis exhibits low efficiency, poor accuracy, and a lack of electronic records, making it difficult to meet the demands of modern aviation manufacturing. In this study, we [...] Read more.
Frequent and complex boarding operations require an effective management process for specialized tools. Traditional manual statistical analysis exhibits low efficiency, poor accuracy, and a lack of electronic records, making it difficult to meet the demands of modern aviation manufacturing. In this study, we propose an efficient and lightweight network designed for the recognition and analysis of professional tools. We employ a combination of knowledge distillation and pruning techniques to construct a compact network optimized for the target dataset and constrained deployment resources. We introduce a self-attention mechanism (SAM) for multi-scale feature fusion within the network to enhance its feature segmentation capability on the target dataset. In addition, we integrate a large language model (LLM), enhanced by retrieval-augmented generation (RAG), to analyze tool detection results, enabling the system to rapidly provide relevant information about operational tools for management personnel and facilitating intelligent monitoring and control. Experimental results on multiple benchmark datasets and professional tool datasets validate the effectiveness of our approach, demonstrating superior performance. Full article
(This article belongs to the Special Issue Computer Vision and Image Processing in Machine Learning)
Show Figures

Figure 1

22 pages, 3237 KB  
Article
Local Polar Coordinate Feature Representation and Heterogeneous Fusion Framework for Accurate Leaf Image Retrieval
by Mengjie Ye, Yong Cheng, Yongqi Yuan, De Yu and Ge Jin
Symmetry 2025, 17(7), 1049; https://doi.org/10.3390/sym17071049 - 3 Jul 2025
Viewed by 316
Abstract
Leaf shape is a crucial visual cue for plant recognition. However, distinguishing among plants with high inter-class shape similarity remains a significant challenge, especially among cultivars within the same species where shape differences can be extremely subtle. To address this issue, we propose [...] Read more.
Leaf shape is a crucial visual cue for plant recognition. However, distinguishing among plants with high inter-class shape similarity remains a significant challenge, especially among cultivars within the same species where shape differences can be extremely subtle. To address this issue, we propose a novel shape representation and an advanced heterogeneous fusion framework for accurate leaf image retrieval. Specifically, based on the local polar coordinate system, multiscale analysis, and statistical histograms, we first propose local polar coordinate feature representation (LPCFR), which captures spatial distribution from two orthogonal directions while encoding local curvature characteristics. Next, we present heterogeneous feature fusion with exponential weighting and Ranking (HFER), which enhances the compatibility and robustness of fused features by applying exponential weighted normalization and ranking-based encoding within neighborhood distance measures. Extensive experiments on both species-level and cultivar-level leaf datasets demonstrate that the proposed representation effectively captures shape features, and the fusion framework successfully integrates heterogeneous features, outperforming state-of-the-art (SOTA) methods. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

17 pages, 1535 KB  
Article
Attention-Based Multi-Scale Graph Fusion Hashing for Fast Cross-Modality Image–Text Retrieval
by Jiayi Li and Gengshen Wu
Symmetry 2025, 17(6), 861; https://doi.org/10.3390/sym17060861 - 1 Jun 2025
Cited by 1 | Viewed by 718
Abstract
In recent years, hashing-based algorithms have garnered significant attention as vital technologies for cross-modal retrieval tasks. They leverage the inherent symmetry between different data modalities (e.g., text, images, or audio) to bridge their semantic gaps by embedding them into a unified representation space. [...] Read more.
In recent years, hashing-based algorithms have garnered significant attention as vital technologies for cross-modal retrieval tasks. They leverage the inherent symmetry between different data modalities (e.g., text, images, or audio) to bridge their semantic gaps by embedding them into a unified representation space. This symmetry-preserving approach would greatly enhance retrieval performance. However, challenges persist in mining and enriching multi-modal semantic feature information. Most current methods use pre-trained models for feature extraction, which limits information representation during hash code learning. Additionally, these methods map multi-modal data into a unified space, but this mapping is sensitive to feature distribution variations, potentially degrading cross-modal retrieval performance. To tackle these challenges, this paper introduces a novel method called Attention-based Multi-scale Graph Fusion Hashing (AMGFH). This approach first enhances the semantic representation of image features through multi-scale learning via an image feature enhancement network. Additionally, graph convolutional networks (GCNs) are employed to fuse multi-modal features, where the self-attention mechanism is incorporated to enhance feature representation by dynamically adjusting the weights of less relevant features. By optimizing a combination of loss functions and addressing the diverse requirements of image and text features, the proposed model demonstrates superior performance across various dimensions. Extensive experiments conducted on public datasets further confirm its outstanding performance. For instance, AMGFH exceeds the most competitive baseline by 3% and 4.7% in terms of mean average precision (MAP) when performing image-to-text and text-to-image retrieval tasks at 32 bits on the MS COCO dataset. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

28 pages, 3438 KB  
Article
Optimizing Remote Sensing Image Retrieval Through a Hybrid Methodology
by Sujata Alegavi and Raghvendra Sedamkar
J. Imaging 2025, 11(6), 179; https://doi.org/10.3390/jimaging11060179 - 28 May 2025
Cited by 1 | Viewed by 717
Abstract
The contemporary challenge in remote sensing lies in the precise retrieval of increasingly abundant and high-resolution remotely sensed images (RS image) stored in expansive data warehouses. The heightened spatial and spectral resolutions, coupled with accelerated image acquisition rates, necessitate advanced tools for effective [...] Read more.
The contemporary challenge in remote sensing lies in the precise retrieval of increasingly abundant and high-resolution remotely sensed images (RS image) stored in expansive data warehouses. The heightened spatial and spectral resolutions, coupled with accelerated image acquisition rates, necessitate advanced tools for effective data management, retrieval, and exploitation. The classification of large-sized images at the pixel level generates substantial data, escalating the workload and search space for similarity measurement. Semantic-based image retrieval remains an open problem due to limitations in current artificial intelligence techniques. Furthermore, on-board storage constraints compel the application of numerous compression algorithms to reduce storage space, intensifying the difficulty of retrieving substantial, sensitive, and target-specific data. This research proposes an innovative hybrid approach to enhance the retrieval of remotely sensed images. The approach leverages multilevel classification and multiscale feature extraction strategies to enhance performance. The retrieval system comprises two primary phases: database building and retrieval. Initially, the proposed Multiscale Multiangle Mean-shift with Breaking Ties (MSMA-MSBT) algorithm selects informative unlabeled samples for hyperspectral and synthetic aperture radar images through an active learning strategy. Addressing the scaling and rotation variations in image capture, a flexible and dynamic algorithm, modified Deep Image Registration using Dynamic Inlier (IRDI), is introduced for image registration. Given the complexity of remote sensing images, feature extraction occurs at two levels. Low-level features are extracted using the modified Multiscale Multiangle Completed Local Binary Pattern (MSMA-CLBP) algorithm to capture local contexture features, while high-level features are obtained through a hybrid CNN structure combining pretrained networks (Alexnet, Caffenet, VGG-S, VGG-M, VGG-F, VGG-VDD-16, VGG-VDD-19) and a fully connected dense network. Fusion of low- and high-level features facilitates final class distinction, with soft thresholding mitigating misclassification issues. A region-based similarity measurement enhances matching percentages. Results, evaluated on high-resolution remote sensing datasets, demonstrate the effectiveness of the proposed method, outperforming traditional algorithms with an average accuracy of 86.66%. The hybrid retrieval system exhibits substantial improvements in classification accuracy, similarity measurement, and computational efficiency compared to state-of-the-art scene classification and retrieval methods. Full article
(This article belongs to the Topic Computational Intelligence in Remote Sensing: 2nd Edition)
Show Figures

Figure 1

27 pages, 2222 KB  
Article
Venous Thrombosis Risk Assessment Based on Retrieval-Augmented Large Language Models and Self-Validation
by Dong He, Hongrui Pu and Jianfeng He
Electronics 2025, 14(11), 2164; https://doi.org/10.3390/electronics14112164 - 26 May 2025
Viewed by 777
Abstract
Venous thromboembolism is a disease with high incidence and fatality rate, and the coverage rate of prevention and treatment is insufficient in China. Aiming at the problems of low efficiency, strong subjectivity, and low extraction and utilization of electronic medical record data by [...] Read more.
Venous thromboembolism is a disease with high incidence and fatality rate, and the coverage rate of prevention and treatment is insufficient in China. Aiming at the problems of low efficiency, strong subjectivity, and low extraction and utilization of electronic medical record data by traditional evaluation methods, this study proposes a multi-scale adaptive evaluation framework based on retrieval-augmented generation. In this framework, we first optimize the knowledge base construction through entity–context dynamic association and Milvus vector retrieval. Next, the Qwen2.5-7B large language model is fine-tuned with clinical knowledge via Low-Rank Adaptation technology. Finally, a generation–verification closed-loop mechanism is designed to suppress model hallucination. Experiments show that the accuracy of the framework on the Caprini, Padua, Wells, and Geneva scales is 79.56%, 88.32%, 90.51%, and 84.67%, respectively. The comprehensive performance is better than that of clinical expert evaluation, especially in complex cases. The ablation experiments confirmed that the entity–context association and self-verification augmentation mechanism contributed significantly to the improvement in evaluation accuracy. This study not only provides a high-precision, traceable intelligent tool for VTE clinical decision-making, but also validates the technical feasibility, and will further explore multi-modal data fusion and incremental learning to optimize dynamic risk assessment in the future. Full article
Show Figures

Figure 1

22 pages, 6758 KB  
Article
Retrieval of Passive Seismic Virtual Source Data Under Non-Ideal Illumination Conditions Based on Enhanced U-Net
by Wensha Huang, Pan Zhang, Binghui Zhao, Donghao Zhang and Liguo Han
Remote Sens. 2025, 17(11), 1813; https://doi.org/10.3390/rs17111813 - 22 May 2025
Viewed by 616
Abstract
Seismic interferometry using ambient noise provides an effective approach for subsurface imaging through reconstructing passive virtual source (PVS) responses. Traditional crosscorrelation (CC) seismic interferometry relies on a uniform dense distribution of passive sources in the subsurface, which is often challenging in practice. The [...] Read more.
Seismic interferometry using ambient noise provides an effective approach for subsurface imaging through reconstructing passive virtual source (PVS) responses. Traditional crosscorrelation (CC) seismic interferometry relies on a uniform dense distribution of passive sources in the subsurface, which is often challenging in practice. The multidimensional deconvolution method (MDD) alleviates reliance on passive-source distribution, but requires wavefield decomposition of the original data. This is difficult to accurately achieve for uncorrelated noise sources, leading to the existence of non-physical artifacts in the reconstructed PVS data. To address this issue, this study proposes a method to improve the accuracy of PVS data reconstruction using an enhanced U-Net. This data-driven approach circumvents the challenge of noise wavefield decomposition encountered in the traditional MDD. By integrating a feature fusion module into U-Net, multi-scale sampling information is leveraged to improve the network’s ability to capture detailed PVS data features. The combination of active-source data constraints and the modified MDD further optimizes PVS data retrieval during training. Numerical tests show that the proposed method effectively recovers waveform information in PVS retrieval records with non-ideally distributed sources, suppressing coherent noise and false events. The reconstructed recordings have a clear advantage in the reverse time migration (RTM) imaging results, with strong generalization performance across various velocity models. Full article
Show Figures

Figure 1

23 pages, 13542 KB  
Article
A Lightweight Neural Network for Denoising Wrapped-Phase Images Generated with Full-Field Optical Interferometry
by Muhammad Awais, Younggue Kim, Taeil Yoon, Wonshik Choi and Byeongha Lee
Appl. Sci. 2025, 15(10), 5514; https://doi.org/10.3390/app15105514 - 14 May 2025
Cited by 1 | Viewed by 909
Abstract
Phase wrapping is a common phenomenon in optical full-field imaging or measurement systems. It arises from large phase retardations and results in wrapped-phase maps that contain essential information about surface roughness and topology. However, these maps are often degraded by noise, such as [...] Read more.
Phase wrapping is a common phenomenon in optical full-field imaging or measurement systems. It arises from large phase retardations and results in wrapped-phase maps that contain essential information about surface roughness and topology. However, these maps are often degraded by noise, such as speckle and Gaussian, which reduces the measurement accuracy and complicates phase reconstruction. Denoising such data is a fundamental problem in computer vision and plays a critical role in biomedical imaging modalities like Full-Field Optical Interferometry. In this paper, we propose WPD-Net (Wrapped-Phase Denoising Network), a lightweight deep learning-based neural network specifically designed to restore phase images corrupted by high noise levels. The network architecture integrates a shallow feature extraction module, a series of Residual Dense Attention Blocks (RDABs), and a dense feature fusion module. The RDABs incorporate attention mechanisms that help the network focus on critical features and suppress irrelevant noise, especially in high-frequency or complex regions. Additionally, WPD-Net employs a growth-rate-based feature expansion strategy to enhance multi-scale feature representation and improve phase continuity. We evaluate the model’s performance on both synthetic and experimentally acquired datasets and compare it with other state-of-the-art deep learning-based denoising methods. The results demonstrate that WPD-Net achieves superior noise suppression while preserving fine structural details even with mixed speckle and Gaussian noises. The proposed method is expected to enable fast image processing, allowing unwrapped biomedical images to be retrieved in real time. Full article
(This article belongs to the Special Issue Computer-Vision-Based Biomedical Image Processing)
Show Figures

Figure 1

27 pages, 10620 KB  
Article
Multi-Decision Vector Fusion Model for Enhanced Mapping of Aboveground Biomass in Subtropical Forests Integrating Sentinel-1, Sentinel-2, and Airborne LiDAR Data
by Wenhao Jiang, Linjing Zhang, Xiaoxue Zhang, Si Gao, Huimin Gao, Lin Sun and Guangjian Yan
Remote Sens. 2025, 17(7), 1285; https://doi.org/10.3390/rs17071285 - 3 Apr 2025
Cited by 2 | Viewed by 1027
Abstract
The accurate estimation of forest aboveground biomass (AGB) is essential for effective forest resource management and carbon stock assessment. However, the estimation accuracy of forest AGB is often constrained by scarce in situ measurements and the limitations of using a single data source [...] Read more.
The accurate estimation of forest aboveground biomass (AGB) is essential for effective forest resource management and carbon stock assessment. However, the estimation accuracy of forest AGB is often constrained by scarce in situ measurements and the limitations of using a single data source or retrieval model. This study proposes a multi-source data integration framework using Sentinel-1 (S-1) and Sentinel-2 (S-2) data along with eight predictive models (i.e., multiple linear regression—MLR; Elastic-Net; support vector regression (with a linear kernel and polynomial kernel); k-nearest neighbor; back-propagation neural network—BPNN; random forest—RF; and gradient-boosting tree—GBT). With airborne light detection and ranging (LiDAR)-derived AGB as a reference, a three-stage optimization strategy was developed, including stepwise feature selection (SFS), hyperparameter optimization, and multi-decision vector fusion (MDVF) model construction. Initially, the optimal feature subsets for each model were identified using SFS, followed by hyperparameter optimization through a grid search strategy. Finally, eight models were evaluated, and MDVF was implemented to integrate outputs from the top-performing models. The results revealed that LiDAR-derived AGB demonstrated a strong performance (R2 = 0.89, RMSE = 20.27 Mg/ha, RMSEr = 15.90%), validating its effectiveness as a supplement to field measurements, particularly in subtropical forests where traditional inventories are challenging. SFS could adaptively select optimal variable subsets for different models, effectively alleviating multicollinearity. Satellite-based AGB estimation using the MDVF model yielded robust results (R2 = 0.652, RMSE = 31.063 Mg/ha, RMSEr = 20.4%) through the synergy of S-1 and S-2, with R2 increasing by 4.18–7.41% and the RMSE decreasing by 3.55–5.89% compared to the four top-performing models (BPNN, GBT, RF, MLR) in the second optimization stage. This study aims to provide a cost-effective and precise strategy for large-scale and spatially continuous forest AGB mapping, demonstrating the potential of integrating active and passive satellite imagery with airborne LiDAR to enhance AGB mapping accuracy and support further ecological monitoring and forest carbon accounting. Full article
Show Figures

Figure 1

19 pages, 21661 KB  
Article
U-SwinFusionNet: High Resolution Snow Cover Mapping on the Tibetan Plateau Based on FY-4A
by Xi Kan, Xu Liu, Zhou Zhou, Jing Wang, Linglong Zhu, Lei Gong and Jiangeng Wang
Water 2025, 17(5), 706; https://doi.org/10.3390/w17050706 - 28 Feb 2025
Viewed by 559
Abstract
The Qinghai–Tibet Plateau (QTP), one of China’s most snow-rich regions, has an extremely fragile ecosystem, with drought being the primary driver of ecological degradation. Given that the water resources in this region predominantly exist in the form of snow, high-spatiotemporal-resolution snow mapping is [...] Read more.
The Qinghai–Tibet Plateau (QTP), one of China’s most snow-rich regions, has an extremely fragile ecosystem, with drought being the primary driver of ecological degradation. Given that the water resources in this region predominantly exist in the form of snow, high-spatiotemporal-resolution snow mapping is essential for understanding snow distribution and managing snow water resources effectively. However, although FY-4A/AGRI is capable of obtaining wide-area remote sensing data, only the first to third bands have a resolution of 1 km, which greatly limits its ability to produce high-resolution snow maps. This study proposes U-SwinFusionNet (USFNet), a deep learning-based snow cover retrieval algorithm that leverages the multi-scale advantages of FY-4A/AGRI remote sensing data in the shortwave infrared and visible bands. By integrating 1 km and 2 km resolution remote sensing imagery with auxiliary terrain information, USFNet effectively enhances snow cover mapping accuracy. The proposed model innovatively combines Swin Transformer and convolutional neural networks (CNNs) to capture both global contextual information and local spatial details. Additionally, an Attention Feature Fusion Module (AFFM) is introduced to align and integrate features from different modalities through an efficient attention mechanism, while the Feature Complementation Module (FCM) facilitates interactions between the encoded and decoded features. As a result, USFNet produces snow cover maps with a spatial resolution of 1 km. Experimental comparisons with Artificial Neural Networks (ANNs), Random Forest (RF), U-Net, and ResNet-FSC demonstrate that USFNet exhibits superior robustness, enhanced snow cover continuity, and lower error rates. The model achieves a correlation coefficient of 0.9126 and an R2 of 0.7072. Compared to the MOD10A1 snow product, USFNet demonstrates an improved sensitivity to fragmented and low-snow-cover areas while ensuring more natural snow boundary transitions. Full article
Show Figures

Figure 1

27 pages, 17923 KB  
Article
A Semantically Guided Deep Supervised Hashing Model for Multi-Label Remote Sensing Image Retrieval
by Bowen Liu, Shibin Liu and Wei Liu
Remote Sens. 2025, 17(5), 838; https://doi.org/10.3390/rs17050838 - 27 Feb 2025
Cited by 1 | Viewed by 987
Abstract
With the rapid growth of remote sensing data, efficiently managing and retrieving large-scale remote sensing images has become a significant challenge. Specifically, for multi-label image retrieval, single-scale feature extraction methods often fail to capture the rich and complex information inherent in these images. [...] Read more.
With the rapid growth of remote sensing data, efficiently managing and retrieving large-scale remote sensing images has become a significant challenge. Specifically, for multi-label image retrieval, single-scale feature extraction methods often fail to capture the rich and complex information inherent in these images. Additionally, the sheer volume of data creates challenges in retrieval efficiency. Furthermore, leveraging semantic information for more accurate retrieval remains an open issue. In this paper, we propose a multi-label remote sensing image retrieval method based on an improved Swin Transformer, called Semantically Guided Deep Supervised Hashing (SGDSH). The method aims to enhance feature extraction capabilities and improve retrieval precision. By utilizing multi-scale information through an end-to-end learning approach with a multi-scale feature fusion module, SGDSH effectively integrates both shallow and deep features. A classification layer is introduced to assist in training the hash codes, incorporating RS image category information to improve retrieval accuracy. The model is optimized for multi-label retrieval through a novel loss function that combines classification loss, pairwise similarity loss, and hash code quantization loss. Experimental results on three publicly available remote sensing datasets, with varying sizes and label distributions, demonstrate that SGDSH outperforms state-of-the-art multi-label hashing methods in terms of average accuracy and weighted average precision. Moreover, SGDSH returns more relevant images with higher label similarity to query images. These findings confirm the effectiveness of SGDSH for large-scale remote sensing image retrieval tasks and provide new insights for future research on multi-label remote sensing image retrieval. Full article
Show Figures

Figure 1

13 pages, 1130 KB  
Article
Content-Based Histopathological Image Retrieval
by Camilo Nuñez-Fernández , Humberto Farias  and Mauricio Solar 
Sensors 2025, 25(5), 1350; https://doi.org/10.3390/s25051350 - 22 Feb 2025
Viewed by 993
Abstract
Feature descriptors in histopathological images are an important challenge for the implementation of Content-Based Image Retrieval (CBIR) systems, an essential tool to support pathologists. Deep learning models like Convolutional Neural Networks and Vision Transformers improve the extraction of these feature descriptors. These models [...] Read more.
Feature descriptors in histopathological images are an important challenge for the implementation of Content-Based Image Retrieval (CBIR) systems, an essential tool to support pathologists. Deep learning models like Convolutional Neural Networks and Vision Transformers improve the extraction of these feature descriptors. These models typically generate embeddings by leveraging deeper single-scale linear layers or advanced pooling layers. However, these embeddings, by focusing on local spatial details at a single scale, miss out on the richer spatial context from earlier layers. This gap suggests the development of methods that incorporate multi-scale information to enhance the depth and utility of feature descriptors in histopathological image analysis. In this work, we propose the Local–Global Feature Fusion Embedding Model. This proposal is composed of three elements: (1) a pre-trained backbone for feature extraction from multi-scales, (2) a neck branch for local–global feature fusion, and (3) a Generalized Mean (GeM)-based pooling head for feature descriptors. Based on our experiments, the model’s neck and head were trained on ImageNet-1k and PanNuke datasets employing the Sub-center ArcFace loss and compared with the state-of-the-art Kimia Path24C dataset for histopathological image retrieval, achieving a Recall@1 of 99.40% for test patches. Full article
Show Figures

Figure 1

22 pages, 24659 KB  
Article
A Multi-Scale Fusion Deep Learning Approach for Wind Field Retrieval Based on Geostationary Satellite Imagery
by Wei Zhang, Yapeng Wu, Kunkun Fan, Xiaojiang Song, Renbo Pang and Boyu Guoan
Remote Sens. 2025, 17(4), 610; https://doi.org/10.3390/rs17040610 - 11 Feb 2025
Cited by 1 | Viewed by 1904
Abstract
Wind field retrieval, a crucial component of weather forecasting, has been significantly enhanced by recent advances in deep learning. However, existing approaches that are primarily focused on wind speed retrieval are limited by their inability to achieve real-time, full-coverage retrievals at large scales. [...] Read more.
Wind field retrieval, a crucial component of weather forecasting, has been significantly enhanced by recent advances in deep learning. However, existing approaches that are primarily focused on wind speed retrieval are limited by their inability to achieve real-time, full-coverage retrievals at large scales. To address this problem, we propose a novel multi-scale fusion retrieval (MFR) method, leveraging geostationary observation satellites. At the mesoscale, MFR incorporates a cloud-to-wind transformer model, which employs local self-attention mechanisms to extract detailed wind field features. At large scales, MFR incorporates a multi-encoder coordinate U-net model, which incorporates multiple encoders and utilises coordinate information to fuse meso- to large-scale features, enabling accurate and regionally complete wind field retrievals, while reducing the computational resources required. The MFR method was validated using Level 1 data from the Himawari-8 satellite, covering a geographic range of 0–60°N and 100–160°E, at a resolution of 0.25°. Wind field retrieval was accomplished within seconds using a single graphics processing unit. The mean absolute error of wind speed obtained by the MFR was 0.97 m/s, surpassing the accuracy of the CFOSAT and HY-2B Level 2B wind field products. The mean absolute error for wind direction achieved by the MFR was 23.31°, outperforming CFOSAT Level 2B products and aligning closely with HY-2B Level 2B products. The MFR represents a pioneering approach for generating initial fields for large-scale grid forecasting models. Full article
(This article belongs to the Special Issue Image Processing from Aerial and Satellite Imagery)
Show Figures

Figure 1

18 pages, 3690 KB  
Article
Text Removal for Trademark Images Based on Self-Prompting Mechanisms and Multi-Scale Texture Aggregation
by Wenchao Zhou, Xiuhui Wang, Boxiu Zhou and Longwen Li
Appl. Sci. 2025, 15(3), 1553; https://doi.org/10.3390/app15031553 - 4 Feb 2025
Viewed by 1075
Abstract
With the rapid development of electronic business, there has been a surge in incidents of trademark infringement, making it imperative to improve the accuracy of trademark retrieval systems as a key measure to combat such illegal behaviors. Evidently, the textual information encompassed within [...] Read more.
With the rapid development of electronic business, there has been a surge in incidents of trademark infringement, making it imperative to improve the accuracy of trademark retrieval systems as a key measure to combat such illegal behaviors. Evidently, the textual information encompassed within trademarks substantially influences the precision of search results. Considering the diversity of trademark text and the complexity of its design elements, accurately locating and analyzing this text poses a considerable challenge. Against this background, this research has developed an original self-prompting text removal model, denoted as “Self-prompting Trademark Text Removal Based on Multi-scale Texture Aggregation” (abbreviated as MTF-STTR). This model astutely applies a text detection network to automatically generate the required input cues for the Segment Anything Model (SAM) while incorporating the technological benefits of diffusion models to attain a finer level of trademark text removal. To further elevate the performance of the model, we introduce two innovative architectures to the text detection network: the Integrated Differentiating Feature Pyramid (IDFP) and the Texture Fusion Module (TFM). These mechanisms are capable of efficiently extracting multilevel features and multiscale textual information, which enhances the model’s stability and adaptability in complex scenarios. The experimental validation has demonstrated that the trademark text erasure model designed in this paper achieves a peak signal-to-noise ratio as high as 40.1 dB on the SCUT-Syn dataset, which is an average improvement of 11.3 dB compared with other text erasure models. Furthermore, the text detection network component of the designed model attains an accuracy of up to 89.9% on the CTW1500 dataset, representing an average enhancement of 10 percentage points over other text detection networks. Full article
Show Figures

Figure 1

15 pages, 1946 KB  
Article
Enhanced Image Retrieval Using Multiscale Deep Feature Fusion in Supervised Hashing
by Amina Belalia, Kamel Belloulata and Adil Redaoui
J. Imaging 2025, 11(1), 20; https://doi.org/10.3390/jimaging11010020 - 12 Jan 2025
Cited by 3 | Viewed by 1623
Abstract
In recent years, deep-network-based hashing has gained prominence in image retrieval for its ability to generate compact and efficient binary representations. However, most existing methods predominantly focus on high-level semantic features extracted from the final layers of networks, often neglecting structural details that [...] Read more.
In recent years, deep-network-based hashing has gained prominence in image retrieval for its ability to generate compact and efficient binary representations. However, most existing methods predominantly focus on high-level semantic features extracted from the final layers of networks, often neglecting structural details that are crucial for capturing spatial relationships within images. Achieving a balance between preserving structural information and maximizing retrieval accuracy is the key to effective image hashing and retrieval. To address this challenge, we introduce Multiscale Deep Feature Fusion for Supervised Hashing (MDFF-SH), a novel approach that integrates multiscale feature fusion into the hashing process. The hallmark of MDFF-SH lies in its ability to combine low-level structural features with high-level semantic context, synthesizing robust and compact hash codes. By leveraging multiscale features from multiple convolutional layers, MDFF-SH ensures the preservation of fine-grained image details while maintaining global semantic integrity, achieving a harmonious balance that enhances retrieval precision and recall. Our approach demonstrated a superior performance on benchmark datasets, achieving significant gains in the Mean Average Precision (MAP) compared with the state-of-the-art methods: 9.5% on CIFAR-10, 5% on NUS-WIDE, and 11.5% on MS-COCO. These results highlight the effectiveness of MDFF-SH in bridging structural and semantic information, setting a new standard for high-precision image retrieval through multiscale feature fusion. Full article
(This article belongs to the Special Issue Recent Techniques in Image Feature Extraction)
Show Figures

Figure 1

Back to TopTop