Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (23,098)

Search Parameters:
Keywords = image datasets

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 11216 KB  
Article
A Multi-Scale Remote Sensing Image Change Detection Network Based on Vision Foundation Model
by Shenbo Liu, Dongxue Zhao and Lijun Tang
Remote Sens. 2026, 18(3), 506; https://doi.org/10.3390/rs18030506 - 4 Feb 2026
Abstract
As a key technology in the intelligent interpretation of remote sensing, remote sensing image change detection aims to automatically identify surface changes from images of the same area acquired at different times. Although vision foundation models have demonstrated outstanding capabilities in image feature [...] Read more.
As a key technology in the intelligent interpretation of remote sensing, remote sensing image change detection aims to automatically identify surface changes from images of the same area acquired at different times. Although vision foundation models have demonstrated outstanding capabilities in image feature representation, their inherent patch-based processing and global attention mechanisms limit their effectiveness in perceiving multi-scale targets. To address this, we propose a multi-scale remote sensing image change detection network based on a vision foundation model, termed SAM-MSCD. This network integrates an efficient parameter fine-tuning strategy with a cross-temporal multi-scale feature fusion mechanism, significantly improving change perception accuracy in complex scenarios. Specifically, the Low-Rank Adaptation mechanism is adopted for parameter-efficient fine-tuning of the Segment Anything Model (SAM) image encoder, adapting it for the remote sensing change detection task. A bi-temporal feature interaction module(BIM) is designed to enhance the semantic alignment and the modeling of change relationships between feature maps from different time phases. Furthermore, a change feature enhancement module (CFEM) is proposed to fuse and highlight differential information from different levels, achieving precise capture of multi-scale changes. Comprehensive experimental results on four public remote sensing change detection datasets, namely LEVIR-CD, WHU-CD, NJDS, and MSRS-CD, demonstrate that SAM-MSCD surpasses current state-of-the-art (SOTA) methods on several key evaluation metrics, including the F1-score and Intersection over Union(IoU), indicating its broad prospects for practical application. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Figure 1

22 pages, 4029 KB  
Article
Anomaly Detection Algorithm of Meter Reading Messages for Power Line Communication Networks
by Zhixiong Chen, Yufan Yan, Ziyi Wu and Jiajing Li
Appl. Sci. 2026, 16(3), 1584; https://doi.org/10.3390/app16031584 - 4 Feb 2026
Abstract
Regarding the issue of abnormal data mining of electricity meters in the PLC application area, an intelligent measurement network architecture integrating protocol message interaction and an anomaly detection module has been designed. Based on an improved convolutional neural network (ICNN), abnormal messages during [...] Read more.
Regarding the issue of abnormal data mining of electricity meters in the PLC application area, an intelligent measurement network architecture integrating protocol message interaction and an anomaly detection module has been designed. Based on an improved convolutional neural network (ICNN), abnormal messages during the transmission and reception process are monitored to enhance the reliability of power information collection data. Firstly, common anomalies during the meter reading operation are analyzed using protocol analysis tools, including abnormal power data, excessive delay, message out of order, etc. Subsequently, a dataset containing these anomalies with a preset proportion is constructed, and through data splicing and matrix processing, it is transformed into a two-dimensional image set to optimize the recognition effect of the convolutional neural network. Ultimately, an anomaly detection algorithm based on the ICNN is developed. Gray wolf optimization (GWO) is adopted to improve the algorithm’s performance, and the algorithm is integrated into the anomaly detection module. The experimental results demonstrate that, compared with the CNN-LSTM and CNN-SVM algorithms, the proposed algorithm offers an advantage in terms of complexity while achieving an accuracy rate of 98.8%, providing a reliable anomaly detection solution for metering network measurement systems. Full article
(This article belongs to the Special Issue AI Technologies Applied to Energy Systems and Smart Grids)
22 pages, 6152 KB  
Article
Adaptive Localization of Picking Points for Safflower Filaments Across Full Growth Stages in Unstructured Field Environments
by Bangbang Chen, Liqiang Wang, Jijing Lin, Baojian Ma and Lingfang Chen
Horticulturae 2026, 12(2), 198; https://doi.org/10.3390/horticulturae12020198 - 4 Feb 2026
Abstract
To address the challenges of low manual harvesting efficiency and high difficulty in automated picking of safflower filaments in the unstructured field environments of Xinjiang, this study proposes an intelligent harvesting method that integrates lightweight visual detection and adaptive localization. Firstly, a safflower [...] Read more.
To address the challenges of low manual harvesting efficiency and high difficulty in automated picking of safflower filaments in the unstructured field environments of Xinjiang, this study proposes an intelligent harvesting method that integrates lightweight visual detection and adaptive localization. Firstly, a safflower image dataset covering multiple scenarios and growth stages was constructed. An improved lightweight detection model, named SSO-YOLO, was proposed based on the YOLOv11n model. By introducing the StarNet backbone network, the SEAttention mechanism, and structural optimization, this model achieves a high detection accuracy (mAP@0.5 of 97.4%) while reducing the model size by 29.4% to 3.94 MB, significantly enhancing its deployment feasibility on mobile devices. Secondly, based on the detection results, an adaptive localization algorithm for picking points was developed. This algorithm achieves precise localization of picking points at the filament–flower head junction by integrating geometric analysis of filament growth posture, dynamic judgment of connection conditions, and intersection calculation of rotated bounding boxes. Experimental results demonstrate that this algorithm achieves an average localization success rate of 87.3% across various unstructured scenarios such as occlusion and backlighting, representing an improvement of approximately 10.7 percentage points over traditional methods. The estimation error for filament posture angle is merely 0.6°, and the localization success rate remains above 90% across the entire growth cycle. This study provides an efficient and robust visual solution for the automated harvesting of safflower filaments and offers valuable insights for advancing intelligent harvesting technologies for specialty cash crops. Full article
27 pages, 16403 KB  
Article
Unsupervised Tree Detection from UAV Imagery and 3D Point Clouds via Distance Transform-Based Circle Estimation and AIC Optimization
by Smaragda Markaki and Costas Panagiotakis
Remote Sens. 2026, 18(3), 505; https://doi.org/10.3390/rs18030505 - 4 Feb 2026
Abstract
This work proposes a novel tree detection methodology, named DTCD (Distance Transform Circle Detection), based on a fast circle detection method via Distance Transform and Akaike Information Criterion (AIC) optimization. More specifically, a visible-band vegetation index (RGBVI) is calculated to enhance canopy regions, [...] Read more.
This work proposes a novel tree detection methodology, named DTCD (Distance Transform Circle Detection), based on a fast circle detection method via Distance Transform and Akaike Information Criterion (AIC) optimization. More specifically, a visible-band vegetation index (RGBVI) is calculated to enhance canopy regions, followed by morphological filtering to delineate individual tree crowns. The Euclidean Distance Transform is then applied, and the local maxima of the smoothed distance map are extracted as candidate tree locations. The final detections are iteratively refined using the AIC to optimize the number of trees with respect to canopy coverage efficiency. Additionally, this work introduces DTCD-PC, a modified algorithm tailored for point clouds, which significantly enhances detection accuracy in complex environments. This work makes a significant contribution to tree detection in the following ways: (1) by creating a tree detection framework entirely based on an unsupervised technique, which outperforms state-of-the-art unsupervised and supervised tree detection methods; (2) by introducing a new urban dataset, named AgiosNikolaos-3, that consists of orthomosaics and photogrammetrically reconstructed 3D point clouds, allowing the assessment of the proposed method in complex urban environments. The proposed DTCD approach was evaluated on the Acacia-6 dataset, consisting of UAV images of six-month-old Acacia trees in Southeast Asia, demonstrating superior detection performance compared to existing state-of-the-art techniques, both unsupervised and supervised. Additional experiments were conducted in the custom-developed Urban Dataset, confirming the robustness and generalizability of the DTCD-PC method in heterogeneous environments. Full article
27 pages, 1572 KB  
Article
Dynamic Interval Prediction of Subway Passenger Flow Using a Symmetry-Enhanced Hybrid FIG-ICPO-XGBoost Model
by Qingling He, Yifan Feng, Lin Ma, Xiaojuan Lu, Jiamei Zhang and Changxi Ma
Symmetry 2026, 18(2), 288; https://doi.org/10.3390/sym18020288 - 4 Feb 2026
Abstract
To address the challenges of characterizing subway passenger flow fluctuations and overcoming the slow convergence and significant errors of existing intelligent optimization algorithms in tuning deep learning parameters for flow prediction, this study proposes a novel subway passenger flow fluctuation interval prediction model [...] Read more.
To address the challenges of characterizing subway passenger flow fluctuations and overcoming the slow convergence and significant errors of existing intelligent optimization algorithms in tuning deep learning parameters for flow prediction, this study proposes a novel subway passenger flow fluctuation interval prediction model based on a Symmetry-Enhanced FIG-ICPO-XGBoost model. The core innovation is an Improved Cheetah Optimization Algorithm (ICPO), which incorporates enhancements including Circle mapping for population initialization, a hybrid strategy of dimension-by-dimension pinhole imaging opposition-based learning and Cauchy mutation to escape local optima, and adaptive variable spiral search with inertia weight to balance exploration and exploitation. The construction of this methodology embodies the concept of symmetry in algorithm design. For instance, Circle mapping achieves uniformity and ergodicity in the initial distribution of the population within the solution space, reflecting the symmetric principle of spatial coverage. Dimension-by-dimension pinhole imaging opposition-based learning generates opposite solutions through the principle of mirror symmetry, effectively expanding the search space. The adaptive variable spiral search strategy dynamically adjusts the spiral shape, simulating the symmetric relationship of dynamic balance between exploration and exploitation. Utilizing fuzzy-granulated passenger flow data (LOW, R, UP) from Harbin, the ICPO was employed to optimize XGBoost hyperparameters. Experimental results demonstrate the superior performance of the ICPO on 12 benchmark functions. The ICPO-XGBoost model achieves mean MAE, RMSE, and MAPE values of 10,291, 10,612, and 5.8%, respectively, for the predictions of the LOW, R, and UP datasets. Compared to existing models such as CPO-XGBoost, PSO-BiLSTM, GA-BP, and CNN-LSTM, these values represent improvements ranging from 4541 to 13,161 for MAE, 5258 to 14,613 for RMSE, and 2.6% to 7.2% for MAPE. The proposed model provides a reliable theoretical and data-driven foundation for optimizing subway train schedules and station passenger flow management. Full article
19 pages, 1576 KB  
Article
LGH-YOLOv12n: Latent Diffusion Inpainting Data Augmentation and Improved YOLOv12n Model for Rice Leaf Disease Detection
by Shaowei Mi, Cheng Li, Kui Fang, Xinghui Zhu and Gang Chen
Agriculture 2026, 16(3), 368; https://doi.org/10.3390/agriculture16030368 - 4 Feb 2026
Abstract
Detecting rice leaf diseases in real-world field environments remains challenging due to varying lesion sizes, diverse lesion morphologies, complex backgrounds, and the limited availability of high-quality annotated datasets. Existing detection models often suffer from performance degradation under these conditions, particularly when training data [...] Read more.
Detecting rice leaf diseases in real-world field environments remains challenging due to varying lesion sizes, diverse lesion morphologies, complex backgrounds, and the limited availability of high-quality annotated datasets. Existing detection models often suffer from performance degradation under these conditions, particularly when training data lack sufficient diversity and structural realism. To address these challenges, this paper proposes a Latent Diffusion Inpainting (LDI) data augmentation method and an improved lightweight detection model, LGH-YOLOv12n. Unlike conventional diffusion-based augmentation methods that generate full images or random patches, LDI performs category-aware latent inpainting, synthesizing realistic lesion patterns by jointly conditioning on background context and disease categories, thereby enhancing data diversity while preserving scene consistency. Furthermore, LGH-YOLOv12n improves upon the YOLOv12n baseline by introducing GSConv in the backbone to reduce channel redundancy and enhance lesion localization, and integrating Hierarchical Multi-head Attention (HMHA) into the neck network to better distinguish disease features from complex field backgrounds. Experimental results demonstrate that LGH-YOLOv12n achieves an F1 of 86.1% and an mAP@50 of 88.3%, outperforming the YOLOv12n model trained without data augmentation by 3.3% and 5.0%, respectively. Moreover, when trained on the LDI-augmented dataset, LGH-YOLOv12n consistently outperforms YOLOv8n, YOLOv10n, YOLOv11n, and YOLOv12n, with mAP@50 improvements of 4.6%, 5.2%, 1.9%, and 2.1%, respectively. These results indicate that the proposed LDI augmentation and LGH-YOLOv12n model provide an effective and robust solution for rice leaf disease detection in complex field environments. Full article
(This article belongs to the Topic Digital Agriculture, Smart Farming and Crop Monitoring)
17 pages, 784 KB  
Article
A Wideband Oscillation Classification Method Based on Multimodal Feature Fusion
by Yingmin Zhang, Yixiong Liu, Zongsheng Zheng and Shilin Gao
Electronics 2026, 15(3), 682; https://doi.org/10.3390/electronics15030682 - 4 Feb 2026
Abstract
With the increasing penetration of renewable energy sources and power-electronic devices, modern power systems exhibit pronounced wideband oscillation characteristics with large frequency spans, strong modal coupling, and significant time-varying behaviors. Accurate identification and classification of wideband oscillation patterns have therefore become critical challenges [...] Read more.
With the increasing penetration of renewable energy sources and power-electronic devices, modern power systems exhibit pronounced wideband oscillation characteristics with large frequency spans, strong modal coupling, and significant time-varying behaviors. Accurate identification and classification of wideband oscillation patterns have therefore become critical challenges for ensuring the secure and stable operation of “dual-high” power systems. Existing methods based on signal processing or single-modality deep-learning models often fail to fully exploit the complementary information embedded in heterogeneous data representations, resulting in limited performance when dealing with complex oscillation patterns.To address these challenges, this paper proposes a multimodal attention-based fusion network for wideband oscillation classification. A dual-branch deep-learning architecture is developed to process Gramian Angular Difference Field images and raw time-series signals in parallel, enabling collaborative extraction of global structural features and local temporal dynamics. An improved Inception module is employed in the image branch to enhance multi-scale spatial feature representation, while a gated recurrent unit network is utilized in the time-series branch to model dynamic evolution characteristics. Furthermore, an attention-based fusion mechanism is introduced to adaptively learn the relative importance of different modalities and perform dynamic feature aggregation. Extensive experiments are conducted using a dataset constructed from mathematical models and engineering-oriented simulations. Comparative studies and ablation studies demonstrate that the proposed method significantly outperforms conventional signal-processing-based approaches and single-modality deep-learning models in terms of classification accuracy, robustness, and generalization capability. The results confirm the effectiveness of multimodal feature fusion and attention mechanisms for accurate wideband oscillation classification, providing a promising solution for advanced power system monitoring and analysis. Full article
Show Figures

Figure 1

20 pages, 1298 KB  
Article
Optimizing the Accuracy and Efficiency of Camera Trap Image Analysis: Evaluating AI Model Performance and a Semi-Automated Workflow
by Kelly Hitchcock, Simon Tollington, Richard W. Yarnell, Leah J. Williams, Kat Hamill and Paul Fergus
Remote Sens. 2026, 18(3), 502; https://doi.org/10.3390/rs18030502 - 4 Feb 2026
Abstract
The widespread adoption of camera trap surveys for wildlife monitoring has generated a substantial volume of ecological data, yet processing constraints persist due to the time-consuming process of manual image classification and the reliability of automated systems. This study assesses the performance of [...] Read more.
The widespread adoption of camera trap surveys for wildlife monitoring has generated a substantial volume of ecological data, yet processing constraints persist due to the time-consuming process of manual image classification and the reliability of automated systems. This study assesses the performance of Conservation AI’s UK Mammals model in classifying three species—Western European hedgehogs (Erinaceus europaeus), red foxes (Vulpes vulpes), and European badgers (Meles meles)—from a subsample of 234 records from camera trap images collected through a citizen science initiative across residential gardens. This analysis was repeated after retraining the model to assess improvement in model performance. Initial model outputs demonstrated high precision (>0.80) for foxes and hedgehogs but low recall (<0.50) for hedgehogs, with the lowest recall probability of 0.12 at the 95% confidence threshold (CT). Following retraining, model performance improved substantially across all metrics, with average F1-scores (weighted average of precision and recall across the three species tested) improving at all CTs, though discrepancies with human classifications remained statistically significant. Based on performance results from this study, we present a semi-automated, three-step workflow incorporating an artificially intelligent (AI) generalist object detector (MegaDetector), an AI species-specific classifier (Conservation AI), and manual validation. Where privacy concerns restrict citizen science contributions, our pipeline offers an alternative that accelerates camera trap data analysis whilst maintaining classification accuracy. The findings provide baseline performance estimates of Conservation AI’s UK Mammals model and present an approach that offers a practical solution to improve the efficiency of using camera traps in ecological research and conservation planning. We also highlight the importance of continuous AI model training, the value of citizen science in expanding training datasets, and the need for adaptable workflows in camera trap studies. Full article
Show Figures

Figure 1

22 pages, 1659 KB  
Article
Lightweight Depression Detection Using 3D Facial Landmark Pseudo-Images and CNN-LSTM on DAIC-WOZ and E-DAIC
by Achraf Jallaglag, My Abdelouahed Sabri, Ali Yahyaouy and Abdellah Aarab
BioMedInformatics 2026, 6(1), 8; https://doi.org/10.3390/biomedinformatics6010008 - 4 Feb 2026
Abstract
Background: Depression is a common mental disorder, and early and objective diagnosis of depression is challenging. New advances in deep learning show promise for processing audio and video content when screening for depression. Nevertheless, the majority of current methods rely on raw video [...] Read more.
Background: Depression is a common mental disorder, and early and objective diagnosis of depression is challenging. New advances in deep learning show promise for processing audio and video content when screening for depression. Nevertheless, the majority of current methods rely on raw video processing or multimodal pipelines, which are computationally costly and challenging to understand and create privacy issues, restricting their use in actual clinical settings. Methods: Based solely on spatiotemporal 3D face landmark representations, we describe a unique, totally visual, and lightweight deep learning approach to overcome these constraints. In this paper we introduce, for the first time, a pure visual deep learning framework, based on spatiotemporal 3D facial landmarks extracted from clinical interview videos contained in the DAIC-WOZ and Extended DAIC-WOZ (E-DAIC) datasets. Our method does not use raw video or any type of semi-automated multimodal fusion. Whereas raw video streaming can be computationally expensive and is not well suited to investigating specific variables, we first take a temporal series of 3D landmarks, convert them to pseudo-images (224 × 224 × 3), and then use them within a CNN-LSTM framework. Importantly, CNN-LSTM provides the ability to analyze the spatial configuration and temporal dimensions of facial behavior. Results: The experimental results indicate macro-average F1 scores of 0.74 on DAIC-WOZ and 0.762 on E-DAIC, demonstrating robust performance under heavy class imbalances, with a variability of ±0.03 across folds. Conclusion: These results indicate that landmark-based spatiotemporal modeling represents the future of lightweight, interpretable, and scalable automatic depression detection. Second, our results suggest exciting opportunities for completely embedding ADI systems within the framework of real-world MHA. Full article
Show Figures

Graphical abstract

23 pages, 3997 KB  
Article
Assimilation of ICON/MIGHTI Wind Profiles into a Coupled Thermosphere/Ionosphere Model Using Ensemble Square Root Filter
by Meng Zhang, Xiong Hu, Yanan Zhang, Zhaoai Yan, Hongyu Liang, Junfeng Yang, Cunying Xiao and Cui Tu
Remote Sens. 2026, 18(3), 500; https://doi.org/10.3390/rs18030500 - 4 Feb 2026
Abstract
Precise characterization of the thermospheric neutral wind is essential for comprehending the dynamic interactions within the ionosphere-thermosphere system, as evidenced by the development of models like HWM and the need for localized data. However, numerical models often suffer from biases due to uncertainties [...] Read more.
Precise characterization of the thermospheric neutral wind is essential for comprehending the dynamic interactions within the ionosphere-thermosphere system, as evidenced by the development of models like HWM and the need for localized data. However, numerical models often suffer from biases due to uncertainties in external forcing and the scarcity of direct wind observations. This study examines the influence of incorporating actual neutral wind profiles from the Michelson Interferometer for Global High-resolution Thermospheric Imaging (MIGHTI) on the Ionospheric Connection Explorer (ICON) satellite into the Thermosphere Ionosphere Electrodynamics General Circulation Model (TIE-GCM) via an ensemble-based data assimilation framework. To address the challenges of assimilating real observational data, a robust background check Quality Control (QC) scheme with dynamic thresholds based on ensemble spread was implemented. The assimilation performance was evaluated by comparing the analysis results against independent, unassimilated observations and a free-running model Control Run. The findings demonstrate a substantial improvement in the precision of the thermospheric wind field. This enhancement is reflected in a 45–50% reduction in Root Mean Square Error (RMSE) for both zonal and meridional components. For zonal winds, the system demonstrated effective bias removal and sustained forecast skill, indicating a strong model memory of the large-scale mean flow. In contrast, while the assimilation exceptionally corrected the meridional circulation by refining the spatial structures and reshaping cross-equatorial flows, the forecast skill for this component dissipated rapidly. This characteristic of “short memory” underscores the highly dynamic nature of thermospheric winds and emphasizes the need for high-frequency assimilation cycles. The system required a spin-up period of approximately 8 h to achieve statistical stability. These findings demonstrate that the assimilation of data from ICON/MIGHTI satellites not only diminishes numerical inaccuracies but also improves the representation of instantaneous thermospheric wind distributions. Providing a high-fidelity dataset is crucial for advancing the modeling and understanding of the complex interactions within the Earth’s ionosphere-thermosphere system. Full article
(This article belongs to the Section Atmospheric Remote Sensing)
Show Figures

Figure 1

22 pages, 10079 KB  
Article
FS2-DETR: Transformer-Based Few-Shot Sonar Object Detection with Enhanced Feature Perception
by Shibo Yang, Xiaoyu Zhang and Panlong Tan
J. Mar. Sci. Eng. 2026, 14(3), 304; https://doi.org/10.3390/jmse14030304 - 4 Feb 2026
Abstract
In practical underwater object detection tasks, imbalanced sample distribution and the scarcity of samples for certain classes often lead to insufficient model training and limited generalization capability. To address these challenges, this paper proposes FS2-DETR (Few-Shot Detection Transformer for Sonar Images), a transformer-based [...] Read more.
In practical underwater object detection tasks, imbalanced sample distribution and the scarcity of samples for certain classes often lead to insufficient model training and limited generalization capability. To address these challenges, this paper proposes FS2-DETR (Few-Shot Detection Transformer for Sonar Images), a transformer-based few-shot object detection network tailored for sonar imagery. Considering that sonar images generally contain weak, small, and blurred object features, and that data scarcity in some classes can hinder effective feature learning, the proposed FS2-DETR introduces the following improvements over the baseline DETR model. (1) Feature Enhancement Compensation Mechanism: A decoder-prediction-guided feature resampling module (DPGFRM) is designed to process the multi-scale features and subsequently enhance the memory representations, thereby strengthening the exploitation of key features and improving detection performance for weak and small objects. (2) Visual Prompt Enhancement Mechanism: Discriminative visual prompts are generated to jointly enhance object queries and memory, thereby highlighting distinctive image features and enabling more effective feature capture for few-shot objects. (3) Multi-Stage Training Strategy: Adopting a progressive training strategy to strengthen the learning of class-specific layers, effectively mitigating misclassification in few-shot scenarios and enhancing overall detection accuracy. Extensive experiments conducted on the improved UATD sonar image dataset demonstrate that the proposed FS2-DETR achieves superior detection accuracy and robustness under few-shot conditions, outperforming existing state-of-the-art detection algorithms. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

20 pages, 2128 KB  
Article
An Image Deraining Network Integrating Dual-Color Space and Frequency Domain Prior
by Luxia Yang, Yiying Hou and Hongrui Zhang
Technologies 2026, 14(2), 102; https://doi.org/10.3390/technologies14020102 - 4 Feb 2026
Abstract
Image deraining is a crucial preprocessing task for enhancing the robustness of high-level vision systems under adverse weather conditions. However, most of the existing methods are limited to a single RGB color space, and it is difficult to effectively separate high-frequency rain streaks [...] Read more.
Image deraining is a crucial preprocessing task for enhancing the robustness of high-level vision systems under adverse weather conditions. However, most of the existing methods are limited to a single RGB color space, and it is difficult to effectively separate high-frequency rain streaks from low-frequency backgrounds, resulting in color distortion and detail loss in the restored image. Therefore, a rain removal network that combines dual-color space and frequency domain priors is proposed. Specifically, the devised network employs a dual-branch Transformer architecture to extract color and structural features from the RGB and YCbCr color spaces, respectively. Meanwhile, a Hybrid Attention Feedforward Block (HAFB) is constructed. HAFB achieves feature enhancement and regional focus through a progressive perception selection mechanism and a multi-scale feature extraction architecture, thereby effectively separating rain streaks from the background. Furthermore, a Wavelet-Gated Cross-Attention module is designed, including a Wavelet-Enhanced Attention Block (WEAB) and a Dual Cross-Attention module (DCA). This design enhances the complementary fusion of structural information and color features through frequency-domain guidance and bidirectional semantic interaction. Finally, experimental results on multiple datasets (i.e., Rain100L, Rain100H, Rain800, Rain12, and SPA-Data) demonstrate that the proposed method outperforms other approaches. Full article
(This article belongs to the Section Information and Communication Technologies)
Show Figures

Figure 1

12 pages, 1251 KB  
Article
Inception U-Net for Enhanced Breast Ultrasound Image Segmentation Using Transfer Learning
by Yeonhyo Choi, Myoung Nam Kim and Sungdae Na
Bioengineering 2026, 13(2), 181; https://doi.org/10.3390/bioengineering13020181 - 4 Feb 2026
Abstract
Background: Breast cancer diagnosis increasingly relies on ultrasound imaging, but challenges related to operator dependency and image quality limitations necessitate automated segmentation approaches. Traditional U-Net architectures, while widely used for medical image segmentation, suffer from shallow encoder structures that limit feature extraction [...] Read more.
Background: Breast cancer diagnosis increasingly relies on ultrasound imaging, but challenges related to operator dependency and image quality limitations necessitate automated segmentation approaches. Traditional U-Net architectures, while widely used for medical image segmentation, suffer from shallow encoder structures that limit feature extraction capabilities. Methods: This study proposes an enhanced segmentation model that replaces the conventional U-Net encoder with an Inception architecture and employs transfer learning using ImageNet pre-trained weights. The model was trained and evaluated on a dataset of 900 breast ultrasound images from Kyungpook National University Hospital. Performance evaluation utilized multiple metrics including Intersection over Union (IoU), Dice coefficient, precision, and recall scores. Results: The proposed Inception U-Net achieved superior performance with an IoU score of 0.7774, Dice score of 0.8491, precision score of 0.7081, and recall score of 0.7174, demonstrating approximately 5% improvement over baseline U-Net architecture across all evaluation metrics. Conclusions: The integration of Inception modules within the U-Net architecture effectively addresses feature extraction limitations in breast ultrasound segmentation. Transfer learning from ImageNet datasets proves beneficial even across domain differences, establishing a foundation for broader medical imaging applications. Full article
Show Figures

Figure 1

21 pages, 3169 KB  
Article
LGD-DeepLabV3+: An Enhanced Framework for Remote Sensing Semantic Segmentation via Multi-Level Feature Fusion and Global Modeling
by Xin Wang, Xu Liu, Adnan Mahmood, Yaxin Yang and Xipeng Li
Sensors 2026, 26(3), 1008; https://doi.org/10.3390/s26031008 - 3 Feb 2026
Abstract
Remote sensing semantic segmentation encounters several challenges, including scale variation, the coexistence of class similarity and intra-class diversity, difficulties in modeling long-range dependencies, and shadow occlusions. Slender structures and complex boundaries present particular segmentation difficulties, especially in high-resolution imagery acquired by satellite and [...] Read more.
Remote sensing semantic segmentation encounters several challenges, including scale variation, the coexistence of class similarity and intra-class diversity, difficulties in modeling long-range dependencies, and shadow occlusions. Slender structures and complex boundaries present particular segmentation difficulties, especially in high-resolution imagery acquired by satellite and aerial cameras, UAV-borne optical sensors, and other imaging payloads. These sensing systems deliver large-area coverage with fine ground sampling distance, which magnifies domain shifts between different sensors and acquisition conditions. This work builds upon DeepLabV3+ and proposes complementary improvements at three stages: input, context, and decoder fusion. First, to mitigate the interference of complex and heterogeneous data distributions on network optimization, a feature-mapping network is introduced to project raw images into a simpler distribution before they are fed into the segmentation backbone. This approach facilitates training and enhances feature separability. Second, although the Atrous Spatial Pyramid Pooling (ASPP) aggregates multi-scale context, it remains insufficient for modeling long-range dependencies. Therefore, a routing-style global modeling module is incorporated after ASPP to strengthen global relation modeling and ensure cross-region semantic consistency. Third, considering that the fusion between shallow details and deep semantics in the decoder is limited and prone to boundary blurring, a fusion module is designed to facilitate deep interaction and joint learning through cross-layer feature alignment and coupling. The proposed model improves the mean Intersection over Union (mIoU) by 8.83% on the LoveDA dataset and by 6.72% on the ISPRS Potsdam dataset compared to the baseline. Qualitative results further demonstrate clearer boundaries and more stable region annotations, while the proposed modules are plug-and-play and easy to integrate into camera-based remote sensing pipelines and other imaging-sensor systems, providing a practical accuracy–efficiency trade-off. Full article
(This article belongs to the Section Smart Agriculture)
32 pages, 44878 KB  
Article
SDLS: A Two-Stream Architecture with Self-Distillation and Local Streams for Remote Sensing Image Scene Classification
by Xinliang Ma, Junwei Luo, Shuiping Ni, Xiaohong Zhang and Runze Ding
Remote Sens. 2026, 18(3), 498; https://doi.org/10.3390/rs18030498 - 3 Feb 2026
Abstract
Remote sensing image scene classification holds significant application value and has long been a research hotspot in remote sensing. However, remote sensing images contain diverse objects and complex backgrounds. Reducing background interference while focusing on key target regions in the images remains a [...] Read more.
Remote sensing image scene classification holds significant application value and has long been a research hotspot in remote sensing. However, remote sensing images contain diverse objects and complex backgrounds. Reducing background interference while focusing on key target regions in the images remains a challenge, which limits the potential improvement of classification accuracy. In this paper, a local image generation module (LIGM) is proposed to generate weights for the original images. The resulting local images, generated by weighting the original images, effectively focus on key target regions while suppressing background regions. Based on the LIGM, a two-stream architecture with self-distillation and local streams (SDLS) is proposed. The self-distillation stream extracts features from the original images using a convolutional neural network (CNN) and two MobileNetV2 networks. Furthermore, a multiplex-guided attention (MGA) module is introduced into this stream to facilitate cross-network attention-guided learning between the CNN and MobileNetV2 features. In the local stream, a MobileNetV2 network is employed to extract features from the local images. The classification logits produced by the two streams are fused, resulting in the final SDLS classification score. Experimental results demonstrate that SDLS achieves competitive performance on multiple datasets. Full article
(This article belongs to the Section AI Remote Sensing)
Back to TopTop