Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (309)

Search Parameters:
Keywords = training with multi-size images

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 14590 KB  
Article
VTC-Net: A Semantic Segmentation Network for Ore Particles Integrating Transformer and Convolutional Block Attention Module (CBAM)
by Yijing Wu, Weinong Liang, Jiandong Fang, Chunxia Zhou and Xiaolu Sun
Sensors 2026, 26(3), 787; https://doi.org/10.3390/s26030787 - 24 Jan 2026
Viewed by 190
Abstract
In mineral processing, visual-based online particle size analysis systems depend on high-precision image segmentation to accurately quantify ore particle size distribution, thereby optimizing crushing and sorting operations. However, due to multi-scale variations, severe adhesion, and occlusion within ore particle clusters, existing segmentation models [...] Read more.
In mineral processing, visual-based online particle size analysis systems depend on high-precision image segmentation to accurately quantify ore particle size distribution, thereby optimizing crushing and sorting operations. However, due to multi-scale variations, severe adhesion, and occlusion within ore particle clusters, existing segmentation models often exhibit undersegmentation and misclassification, leading to blurred boundaries and limited generalization. To address these challenges, this paper proposes a novel semantic segmentation model named VTC-Net. The model employs VGG16 as the backbone encoder, integrates Transformer modules in deeper layers to capture global contextual dependencies, and incorporates a Convolutional Block Attention Module (CBAM) at the fourth stage to enhance focus on critical regions such as adhesion edges. BatchNorm layers are used to stabilize training. Experiments on ore image datasets show that VTC-Net outperforms mainstream models such as UNet and DeepLabV3 in key metrics, including MIoU (89.90%) and pixel accuracy (96.80%). Ablation studies confirm the effectiveness and complementary role of each module. Visual analysis further demonstrates that the model identifies ore contours and adhesion areas more accurately, significantly improving segmentation robustness and precision under complex operational conditions. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

15 pages, 6862 KB  
Article
SLR-Net: Lightweight and Accurate Detection of Weak Small Objects in Satellite Laser Ranging Imagery
by Wei Zhu, Jinlong Hu, Weiming Gong, Yong Wang and Yi Zhang
Sensors 2026, 26(2), 732; https://doi.org/10.3390/s26020732 - 22 Jan 2026
Viewed by 36
Abstract
To address the challenges of insufficient efficiency and accuracy in traditional detection models caused by minute target sizes, low signal-to-noise ratios (SNRs), and feature volatility in Satellite Laser Ranging (SLR) images, this paper proposes an efficient, lightweight, and high-precision detection model. The core [...] Read more.
To address the challenges of insufficient efficiency and accuracy in traditional detection models caused by minute target sizes, low signal-to-noise ratios (SNRs), and feature volatility in Satellite Laser Ranging (SLR) images, this paper proposes an efficient, lightweight, and high-precision detection model. The core motivation of this study is to fundamentally enhance the model’s capabilities in feature extraction, fusion, and localization for minute and blurred targets through a specifically designed network architecture and loss function, without significantly increasing the computational burden. To achieve this goal, we first design a DMS-Conv module. By employing dense sampling and channel function separation strategies, this module effectively expands the receptive field while avoiding the high computational overhead and sampling artifacts associated with traditional multi-scale methods, thereby significantly improving feature representation for faint targets. Secondly, to optimize information flow within the feature pyramid, we propose a Lightweight Upsampling Module (LUM). Integrating depthwise separable convolutions with a channel reshuffling mechanism, this module replaces traditional transposed convolutions at a minimal computational cost, facilitating more efficient multi-scale feature fusion. Finally, addressing the stringent requirements for small target localization accuracy, we introduce the MPD-IoU Loss. By incorporating the diagonal distance of bounding boxes as a geometric penalty term, this loss function provides finer and more direct spatial alignment constraints for model training, effectively boosting localization precision. Experimental results on a self-constructed real-world SLR observation dataset demonstrate that the proposed model achieves an mAP50:95 of 47.13% and an F1-score of 88.24%, with only 2.57 M parameters and 6.7 GFLOPs. Outperforming various mainstream lightweight detectors in the comprehensive performance of precision and recall, these results validate that our method effectively resolves the small target detection challenges in SLR scenarios while maintaining a lightweight design, exhibiting superior performance and practical value. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

20 pages, 13461 KB  
Article
Multi-View 3D Reconstruction of Ship Hull via Multi-Scale Weighted Neural Radiation Field
by Han Chen, Xuanhe Chu, Ming Li, Yancheng Liu, Jingchun Zhou, Xianping Fu, Siyuan Liu and Fei Yu
J. Mar. Sci. Eng. 2026, 14(2), 229; https://doi.org/10.3390/jmse14020229 - 21 Jan 2026
Viewed by 77
Abstract
The 3D reconstruction of vessel hulls is crucial for enhancing safety, efficiency, and knowledge in the maritime industry. Neural Radiance Fields (NeRFs) are an alternative to 3D reconstruction and rendering from multi-view images; particularly, tensor-based methods have proven effective in improving efficiency. However, [...] Read more.
The 3D reconstruction of vessel hulls is crucial for enhancing safety, efficiency, and knowledge in the maritime industry. Neural Radiance Fields (NeRFs) are an alternative to 3D reconstruction and rendering from multi-view images; particularly, tensor-based methods have proven effective in improving efficiency. However, existing tensor-based methods typically suffer from a lack of spatial coherence, resulting in gaps in the reconstruction of fine-grained geometric structures. This paper proposes a spatial multi-scale weighted NeRF (MDW-NeRF) for accurate and efficient surface reconstruction of vessel hulls. The proposed method develops a novel multi-scale feature decomposition mechanism that models 3D space by leveraging multi-resolution features, facilitating the integration of high-resolution details with low-resolution regional information. We designed separate color and density weighting, using a coarse-to-fine strategy, for density and a weighted matrix for color to decouple feature vectors from appearance attributes. To boost the efficiency of 3D reconstruction and rendering, we implement a hybrid sampling point strategy for volume rendering, selecting sample points based on volumetric density. Extensive experiments on the SVH dataset confirm MDW-NeRF’s superiority: quantitatively, it outperforms TensoRF by 1.5 dB in PSNR and 6.1% in CD, and shrinks the model size by 9%, with comparable training times; qualitatively, it resolves tensor-based methods’ inherent spatial incoherence and fine-grained gaps, enabling accurate restoration of hull cavities and realistic surface texture rendering. These results validate our method’s effectiveness in achieving excellent rendering quality, high reconstruction accuracy, and timeliness. Full article
Show Figures

Figure 1

28 pages, 8014 KB  
Article
YOLO-UMS: Multi-Scale Feature Fusion Based on YOLO Detector for PCB Surface Defect Detection
by Hong Peng, Wenjie Yang and Baocai Yu
Sensors 2026, 26(2), 689; https://doi.org/10.3390/s26020689 - 20 Jan 2026
Viewed by 200
Abstract
Printed circuit boards (PCBs) are critical in the electronics industry. As PCB layouts grow increasingly complex, defect detection processes often encounter challenges such as low image contrast, uneven brightness, minute defect sizes, and irregular shapes, making it difficult to achieve rapid and accurate [...] Read more.
Printed circuit boards (PCBs) are critical in the electronics industry. As PCB layouts grow increasingly complex, defect detection processes often encounter challenges such as low image contrast, uneven brightness, minute defect sizes, and irregular shapes, making it difficult to achieve rapid and accurate automated inspection. To address these challenges, this paper proposes a novel object detector, YOLO-UMS, designed to enhance the accuracy and speed of PCB surface defect detection. First, a lightweight plug-and-play Unified Multi-Scale Feature Fusion Pyramid Network (UMSFPN) is proposed to process and fuse multi-scale information across different resolution layers. The UMSFPN uses a Cross-Stage Partial Multi-Scale Module (CSPMS) and an optimized fusion strategy. This approach balances the integration of fine-grained edge information from shallow layers and coarse-grained semantic details from deep layers. Second, the paper introduces a lightweight RG-ELAN module, based on the ELAN network, to enhance feature extraction for small targets in complex scenes. The RG-ELAN module uses low-cost operations to generate redundant feature maps and reduce computational complexity. Finally, the Adaptive Interaction Feature Integration (AIFI) module enriches high-level features by eliminating redundant interactions among shallow-layer features. The channel-priority convolutional attention module (CPCA), deployed in the detection head, strengthens the expressive power of small target features. The experimental results show that the new UMSFPN neck can help improve the AP50 by 3.1% and AP by 2% on the self-collected dataset PCB-M, which is better than the original PAFPN neck. Meanwhile, UMSFPN achieves excellent results across different detectors and datasets, verifying its broad applicability. Without pre-training weights, YOLO-UMS achieves an 84% AP50 on the PCB-M dataset, which is a 6.4% improvement over the baseline YOLO11. Comparing results with existing target detection algorithms shows that the algorithm exhibits good performance in terms of detection accuracy. It provides a feasible solution for efficient and accurate detection of PCB surface defects in the industry. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Figure 1

16 pages, 328 KB  
Article
SemanticHPC: Semantics-Aware, Hardware-Conscious Workflows for Distributed AI Training on HPC Architectures
by Alba Amato
Information 2026, 17(1), 78; https://doi.org/10.3390/info17010078 - 12 Jan 2026
Viewed by 184
Abstract
High-Performance Computing (HPC) has become essential for training medium- and large-scale Artificial Intelligence (AI) models, yet two bottlenecks remain under-exploited: the semantic coherence of training data and the interaction between distributed deep learning runtimes and heterogeneous HPC architectures. Existing work tends to optimise [...] Read more.
High-Performance Computing (HPC) has become essential for training medium- and large-scale Artificial Intelligence (AI) models, yet two bottlenecks remain under-exploited: the semantic coherence of training data and the interaction between distributed deep learning runtimes and heterogeneous HPC architectures. Existing work tends to optimise multi-node, multi-GPU training in isolation from data semantics or to apply semantic technologies to data curation without considering the constraints of large-scale training on modern clusters. This paper introduces SemanticHPC, an experimental framework that integrates ontology and Resource Description Framework (RDF)-based semantic preprocessing with distributed AI training (Horovod/PyTorch Distributed Data Parallel) and hardware-aware optimisations for Non-Uniform Memory Access (NUMA), multi-GPU and high-speed interconnects. The framework has been evaluated on 1–8 node configurations (4–32 GPUs) on a production-grade cluster. Experiments on a medium-size Open Images V7 workload show that semantic enrichment improves validation accuracy by 3.5–4.4 absolute percentage points while keeping the additional end-to-end overhead below 8% and preserving strong scaling efficiency above 79% on eight nodes. We argue that bringing semantic technologies into the training workflow—rather than treating them as an offline, detached phase—is a promising direction for large-scale AI on HPC systems. We detail an implementation based on standard Python libraries, RDF tooling and widely adopted deep learning runtimes, and we discuss the limitations and practical hurdles that need to be addressed for broader adoption. Full article
Show Figures

Graphical abstract

19 pages, 2314 KB  
Article
Occlusion Avoidance for Harvesting Robots: A Lightweight Active Perception Model
by Tao Zhang, Jiaxi Huang, Jinxing Niu, Zhengyi Liu, Le Zhang and Huan Song
Sensors 2026, 26(1), 291; https://doi.org/10.3390/s26010291 - 2 Jan 2026
Viewed by 323
Abstract
Addressing the issue of fruit recognition and localization failures in harvesting robots due to severe occlusion by branches and leaves in complex orchard environments, this paper proposes an occlusion avoidance method that combines a lightweight YOLOv8n model, developed by Ultralytics in the United [...] Read more.
Addressing the issue of fruit recognition and localization failures in harvesting robots due to severe occlusion by branches and leaves in complex orchard environments, this paper proposes an occlusion avoidance method that combines a lightweight YOLOv8n model, developed by Ultralytics in the United States, with active perception. Firstly, to meet the stringent real-time requirements of the active perception system, a lightweight YOLOv8n model was developed. This model reduces computational redundancy by incorporating the C2f-FasterBlock module and enhances key feature representation by integrating the SE attention mechanism, significantly improving inference speed while maintaining high detection accuracy. Secondly, an end-to-end active perception model based on ResNet50 and multi-modal fusion was designed. This model can intelligently predict the optimal movement direction for the robotic arm based on the current observation image, actively avoiding occlusions to obtain a more complete field of view. The model was trained using a matrix dataset constructed through the robot’s dynamic exploration in real-world scenarios, achieving a direct mapping from visual perception to motion planning. Experimental results demonstrate that the proposed lightweight YOLOv8n model achieves a mAP of 0.885 in apple detection tasks, a frame rate of 83 FPS, a parameter count reduced to 1,983,068, and a model weight file size reduced to 4.3 MB, significantly outperforming the baseline model. In active perception experiments, the proposed method effectively guided the robotic arm to quickly find observation positions with minimal occlusion, substantially improving the success rate of target recognition and the overall operational efficiency of the system. The current research outcomes provide preliminary technical validation and a feasible exploratory pathway for developing agricultural harvesting robot systems suitable for real-world complex environments. It should be noted that the validation of this study was primarily conducted in controlled environments. Subsequent work still requires large-scale testing in diverse real-world orchard scenarios, as well as further system optimization and performance evaluation in more realistic application settings, which include natural lighting variations, complex weather conditions, and actual occlusion patterns. Full article
Show Figures

Figure 1

25 pages, 25629 KB  
Article
DSEPGAN: A Dual-Stream Enhanced Pyramid Based on Generative Adversarial Network for Spatiotemporal Image Fusion
by Dandan Zhou, Lina Xu, Ke Wu, Huize Liu and Mengting Jiang
Remote Sens. 2025, 17(24), 4050; https://doi.org/10.3390/rs17244050 - 17 Dec 2025
Viewed by 289
Abstract
Many deep learning-based spatiotemporal fusion (STF) methods have been proven to achieve high accuracy and robustness. Due to the variable shapes and sizes of objects in remote sensing images, pyramid networks are generally introduced to extract multi-scale features. However, the down-sampling operation in [...] Read more.
Many deep learning-based spatiotemporal fusion (STF) methods have been proven to achieve high accuracy and robustness. Due to the variable shapes and sizes of objects in remote sensing images, pyramid networks are generally introduced to extract multi-scale features. However, the down-sampling operation in the pyramid structure may lead to the loss of image detail information, affecting the model’s ability to reconstruct fine-grained targets. To address this issue, we propose a novel Dual-Stream Enhanced Pyramid based on Generative Adversarial Network (DSEPGAN) for the spatiotemporal fusion of remote sensing images. The network adopts a dual-stream architecture to separately process coarse and fine images, tailoring feature extraction to their respective characteristics: coarse images provide temporal dynamics, while fine images contain rich spatial details. A reversible feature transformation is embedded in the pyramid feature extraction stage to preserve high-frequency information, and a fusion module employing large-kernel and depthwise separable convolutions captures long-range dependencies across inputs. To further enhance realism and detail fidelity, adversarial training encourages the network to generate sharper and more visually convincing fusion results. The proposed DSEPGAN is compared with widely used and state-of-the-art STF models in three publicly available datasets. The results illustrate that DSEPGAN achieves superior performance across various evaluation metrics, highlighting its notable advantages for predicting seasonal variations in highly heterogeneous regions and abrupt changes in land use. Full article
Show Figures

Figure 1

26 pages, 1838 KB  
Article
Artificial Intelligence in Honey Pollen Analysis: Accuracy and Limitations of Pollen Classification Compared with Palynological Expert Assessment
by Joanna Katarzyna Banach, Bartosz Lewandowski and Przemysław Rujna
Appl. Sci. 2025, 15(24), 13009; https://doi.org/10.3390/app152413009 - 10 Dec 2025
Viewed by 507
Abstract
Honey authenticity, including its botanical origin, is traditionally assessed by melissopalynology, a labour-intensive and expert-dependent method. This study reports the final validation of a deep learning model for pollen grain classification in honey, developed within the NUTRITECH.I-004A/22 project, by comparing its performance with [...] Read more.
Honey authenticity, including its botanical origin, is traditionally assessed by melissopalynology, a labour-intensive and expert-dependent method. This study reports the final validation of a deep learning model for pollen grain classification in honey, developed within the NUTRITECH.I-004A/22 project, by comparing its performance with that of an independent palynology expert. A dataset of 5194 pollen images was acquired from five unifloral honeys, rapeseed (Brassica napus), sunflower (Helianthus annuus), buckwheat (Fagopyrum esculentum), phacelia (Phacelia tanacetifolia) and linden (Tilia cordata), under a standardized microscopy protocol and manually annotated using an extended set of morphological descriptors (shape, size, apertures, exine ornamentation and wall thickness). The evaluation involved training and assessing a deep learning model based solely on the ResNet152 architecture with pretrained ImageNet weights. This model was enhanced by adding additional layers: a global average pooling layer, a dense hidden layer with ReLU activation, and a final softmax output layer for multi-class classification. Model performance was assessed using multiclass metrics and agreement with the expert, including Cohen’s kappa. The AI classifier achieved almost perfect agreement with the expert (κ ≈ 0.94), with the highest accuracy for pollen grains exhibiting spiny ornamentation and clearly thin or thick walls, and lower performance for reticulate exine and intermediate wall thickness. Misclassifications were associated with suboptimal image quality and intermediate confidence scores. Compared with traditional melissopalynological assessment (approx. 1–2 h of microscopic analysis per sample), the AI system reduced the effective classification time to less than 2 min per prepared sample under routine laboratory conditions, demonstrating a clear gain in analytical throughput. The results demonstrate that, under routine laboratory conditions, AI-based digital palynology can reliably support expert assessment, provided that imaging is standardized and prediction confidence is incorporated into decision rules for ambiguous cases. Full article
(This article belongs to the Section Food Science and Technology)
Show Figures

Figure 1

20 pages, 5222 KB  
Article
A Real-Time Tractor Recognition and Positioning Method in Fields Based on Machine Vision
by Liang Wang, Dashuang Zhou and Zhongxiang Zhu
Agriculture 2025, 15(24), 2548; https://doi.org/10.3390/agriculture15242548 - 9 Dec 2025
Viewed by 510
Abstract
Multi-machine collaborative navigation in agricultural machinery can significantly improve field operation efficiency. Most existing multi-machine collaborative navigation systems rely on satellite navigation systems, which is costly and cannot meet the obstacle avoidance needs of field operations. In this paper, a real-time tractor recognition [...] Read more.
Multi-machine collaborative navigation in agricultural machinery can significantly improve field operation efficiency. Most existing multi-machine collaborative navigation systems rely on satellite navigation systems, which is costly and cannot meet the obstacle avoidance needs of field operations. In this paper, a real-time tractor recognition and positioning method in fields based on machine vision was proposed. First, we collected tractor images, annotated them, and constructed a tractor dataset. Second, we implemented lightweight improvements to the YOLOv4 algorithm, incorporating sparse training, channel pruning, layer pruning, and knowledge distillation fine-tuning based on the baseline model training. The test results of the lightweight model show that the model size was reduced by 98.73%, the recognition speed increased by 43.74%, and the recognition accuracy remains largely comparable to that of the baseline high-precision model. Then, we proposed a tractor positioning method based on an RGB-D camera. Finally, we established a field vehicle recognition and positioning experimental platform and designed a test plan. The results indicate that when IYO-RGBD recognized and positioned the leader tractor within a 10 m range, the root mean square (RMS) of longitudinal and lateral errors during straight-line travel were 0.0687 m and 0.025 m, respectively. During S-curve travel, the RMS values of longitudinal and lateral errors were 0.1101 m and 0.0481 m, respectively. IYO-RGBD can meet the accuracy requirements for recognizing and positioning the leader tractor by the follower tractor in practical autonomous following field operations. Our research outcomes can provide a new solution and certain technical references for visual navigation in multi-machine collaborative field operations of agricultural machinery. Full article
(This article belongs to the Section Agricultural Technology)
Show Figures

Figure 1

19 pages, 2272 KB  
Article
Enhancing PRRT Outcome Prediction in Neuroendocrine Tumors: Aggregated Multi-Lesion PET Radiomics Incorporating Inter-Tumor Heterogeneity
by Maziar Sabouri, Ghasem Hajianfar, Omid Gharibi, Alireza Rafiei Sardouei, Yusuf Menda, Ayca Dundar, Camila Gadens Zamboni, Sanchay Jain, Marc Kruzer, Habib Zaidi, Fereshteh Yousefirizi, Arman Rahmim and Ahmad Shariftabrizi
Cancers 2025, 17(23), 3887; https://doi.org/10.3390/cancers17233887 - 4 Dec 2025
Viewed by 682
Abstract
Introduction: Peptide Receptor Radionuclide Therapy (PRRT) with [177Lu]Lu-DOTA-TATE is effective in treating advanced Neuroendocrine Tumors (NETs), yet predicting individual response in this treatment remains a challenge due to inter-lesion heterogeneity. There is a lack of standardized, effective methods for using multi-lesion [...] Read more.
Introduction: Peptide Receptor Radionuclide Therapy (PRRT) with [177Lu]Lu-DOTA-TATE is effective in treating advanced Neuroendocrine Tumors (NETs), yet predicting individual response in this treatment remains a challenge due to inter-lesion heterogeneity. There is a lack of standardized, effective methods for using multi-lesion radiomics to predict progression and Time to Progression (TTP) in PRRT-treated patients. This study evaluated how aggregating radiomic features from multiple PET-identified lesions can be used to predict disease progression (event [progression and death] vs. event-free) and TTP. Methods: Eighty-one NETs patients with multiple lesions underwent pre-treatment PET/CT imaging. Lesions were segmented and ranked by minimum Standard Uptake Value (SUVmin) (both descending and ascending), SUVmean, SUVmax, and volume (descending). From each sorting, the top one, three, and five lesions were selected. For the selected lesions, radiomic features were extracted (using the Pyradiomics library) and lesion aggregation was performed using stacked vs. statistical methods. Eight classification models along with three feature selection methods were used to predict progression, and five survival models and three feature selection methods were used to predict TTP under a nested cross-validation framework. Results: The overall appraisal showed that sorting lesions based on SUVmin (descending) yields better classification performance in progression prediction. This is in addition to the fact that aggregating features extracted from all the lesions, as well as the top five lesions sorted by SUVmean, lead to the highest overall performance in TTP prediction. The individual appraisal in progression prediction models trained on the single top lesion sorted by SUVmin (descending) showed the highest recall and specificity despite data imbalance. The best-performing model was the Logistic Regression (LR) classifier with Recursive Feature Elimination (RFE) (recall: 0.75, specificity: 0.77). In TTP prediction, the highest concordance index was obtained using a Random Survival Forest (RSF) trained on statistically aggregated features from the top five lesions ranked by SUVmean, selected via Univariate C-Index (UCI) (C-index = 0.68). Across both tasks, features from the Gray Level Size Zone Matrix (GLSZM) family were consistently among the most predictive, highlighting the importance of spatial heterogeneity in treatment response. Conclusions: This study demonstrates that informed lesion selection and tailored aggregation strategies significantly impact the predictive performance of radiomics-based models for progression and TTP prediction in PRRT-treated NET patients. These approaches can potentially enhance model accuracy and better capture tumor heterogeneity, supporting more personalized and practical PRRT implementation. Full article
(This article belongs to the Section Methods and Technologies Development)
Show Figures

Figure 1

21 pages, 21928 KB  
Article
HieraEdgeNet: A Multi-Scale Edge-Enhanced Framework for Automated Pollen Recognition
by Yuchong Long, Wen Sun, Ningxiao Sun, Wenxiao Wang, Chao Li and Shan Yin
Agriculture 2025, 15(23), 2518; https://doi.org/10.3390/agriculture15232518 - 4 Dec 2025
Cited by 1 | Viewed by 491
Abstract
Automated pollen recognition is a foundational tool for diverse scientific domains, including paleoclimatology, biodiversity monitoring, and agricultural science. However, conventional methods create a critical data bottleneck, limiting the temporal and spatial resolution of ecological analysis. Existing deep learning models often fail to achieve [...] Read more.
Automated pollen recognition is a foundational tool for diverse scientific domains, including paleoclimatology, biodiversity monitoring, and agricultural science. However, conventional methods create a critical data bottleneck, limiting the temporal and spatial resolution of ecological analysis. Existing deep learning models often fail to achieve the requisite localization accuracy for microscopic pollen grains, which are characterized by their minute size, indistinct edges, and complex backgrounds. To overcome this, we introduce HieraEdgeNet, a novel object detection framework. The core principle of our architecture is to explicitly extract and hierarchically fuse multi-scale edge information with deep semantic features. This synergistic approach, combined with a computationally efficient large-kernel operator for fine-grained feature refinement, significantly enhances the model’s ability to perceive and precisely delineate object boundaries. On a large-scale dataset comprising 44,471 annotated microscopic images containing 342,706 pollen grains from 120 classes, HieraEdgeNet achieves a mean Average Precision of 0.9501 (mAP@0.5) and 0.8444 (mAP@0.5:0.95), substantially outperforming state-of-the-art models such as YOLOv12n and the Transformer-based RT-DETR family in terms of the accuracy–efficiency trade-off. This work provides a powerful computational tool for generating the high-throughput, high-fidelity data essential for modern ecological research, including tracking phenological shifts, assessing plant biodiversity, and reconstructing paleoenvironments. At the same time, we acknowledge that the current two-dimensional design cannot directly exploit volumetric Z-stack microscopy and that strong domain shifts between training data and real-world deployments may still degrade performance, which we identify as key directions for future work. By also enabling applications in precision agriculture, HieraEdgeNet contributes broadly to advancing ecosystem monitoring and sustainable food security. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

29 pages, 39944 KB  
Article
HDR-IRSTD: Detection-Driven HDR Infrared Image Enhancement and Small Target Detection Based on HDR Infrared Image Enhancement
by Fugui Guo, Pan Chen, Weiwei Zhao and Weichao Wang
Automation 2025, 6(4), 86; https://doi.org/10.3390/automation6040086 - 2 Dec 2025
Viewed by 549
Abstract
Infrared small target detection has become a research hotspot in recent years. Due to the small target size and low contrast with the background, it remains a highly challenging task. Existing infrared small target detection algorithms are generally implemented on 8-bit low dynamic [...] Read more.
Infrared small target detection has become a research hotspot in recent years. Due to the small target size and low contrast with the background, it remains a highly challenging task. Existing infrared small target detection algorithms are generally implemented on 8-bit low dynamic range (LDR) images, whereas raw infrared sensing images typically possess a 14–16 bit high dynamic range (HDR). Conventional HDR image enhancement methods do not consider the subsequent detection task. As a result, the enhanced LDR images often suffer from overexposure, increased noise levels with higher contrast, and target distortion or loss. Consequently, discriminative features in HDR images that are beneficial for detection are not effectively exploited, which further increases the difficulty of small target detection. To extract target features under these conditions, existing detection algorithms usually rely on large parameter models, leading to an unsatisfactory trade-off between efficiency and accuracy. To address these issues, this paper proposes a novel infrared small target detection framework based on HDR image enhancement (HDR-IRSTD). Specifically, a multi-branch feature extraction and fusion mapping subnetwork (MFEF-Net) is designed to achieve the mapping from HDR to LDR. This subnetwork effectively enhances small targets and suppresses noise while preserving both detailed features and global information. Furthermore, considering the characteristics of infrared small targets, an asymmetric Vision Mamba U-Net with multi-level inputs (AVM-Unet) is developed, which captures contextual information effectively while maintaining linear computational complexity. During training, a bilevel optimization strategy is adopted to collaboratively optimize the two subnetworks, thereby yielding optimal parameters for both HDR infrared image enhancement and small target detection. Experimental results demonstrate that the proposed method achieves visually favorable enhancement and high-precision detection, with strong generalization ability and robustness. The performance and efficiency of the method exhibit a well-balanced trade-off. Full article
Show Figures

Figure 1

19 pages, 7269 KB  
Article
Fully-Cascaded Spatial-Aware Convolutional Network for Motion Deblurring
by Yinghan Hong, Bishenghui Tao, Qian Wang, Guizhen Mai and Cai Guo
Information 2025, 16(12), 1055; https://doi.org/10.3390/info16121055 - 2 Dec 2025
Viewed by 334
Abstract
Motion deblurring is an ill-posed, challenging problem in image restoration due to non-uniform motion blurs. Although recent deep convolutional neural networks have made significant progress, many existing methods adopt multi-scale or multi-patch subnetworks that involve additional inter-subnetwork processing (e.g., feature alignment and fusion) [...] Read more.
Motion deblurring is an ill-posed, challenging problem in image restoration due to non-uniform motion blurs. Although recent deep convolutional neural networks have made significant progress, many existing methods adopt multi-scale or multi-patch subnetworks that involve additional inter-subnetwork processing (e.g., feature alignment and fusion) across different scales or patches, leading to substantial computational cost. In this paper, we propose a novel fully-cascaded spatial-aware convolutional network (FSCNet) that effectively restores sharp images from blurry inputs while maintaining a favorable balance between restoration quality and computational efficiency. The proposed architecture consists of simple yet effective subnetworks connected through a fully-cascaded feature fusion (FCFF) module, enabling the exploitation of diverse and complementary features generated at each stage. In addition, we design a lightweight spatial-aware block (SAB), whose core component is a channel-weighted spatial attention (CWSA) module. The SAB is integrated into both the FCFF module and skip connections, enhancing feature fusion by enriching spatial detail representation. On the GoPro dataset, FSCNet achieves 33.01 dB PSNR and 0.962 SSIM, delivering comparable or higher accuracy than state-of-the-art methods such as HINet, while reducing model size by nearly 80%. Furthermore, when the GoPro-trained model is evaluated on three additional benchmark datasets (HIDE, REDS, and RealBlur), FSCNet attains the highest average PSNR (29.53 dB) and SSIM (0.903) among all compared methods. This consistent cross-dataset superiority highlights FSCNet’s strong generalization and robustness under diverse blur conditions, confirming that it achieves state-of-the-art performance with a favorable performance–complexity trade-off. Full article
Show Figures

Graphical abstract

24 pages, 3197 KB  
Article
MCP-YOLO: A Pruned Edge-Aware Detection Framework for Real-Time Insulator Defect Inspection via UAV
by Hongbin Sun, Shijun Guo, Xin Pan, Qiuchen Shen, Yaqi Xu, Jianchuan Ma and Zhanpeng Qu
Sensors 2025, 25(22), 7049; https://doi.org/10.3390/s25227049 - 18 Nov 2025
Viewed by 536
Abstract
Unmanned Aerial Vehicle (UAV)-based inspection of transmission line insulators faces significant challenges due to complex backgrounds, variable imaging conditions, and diverse defect characteristics. Existing deep learning approaches often fail to balance detection accuracy with computational efficiency for edge deployment. This paper presents MCP-YOLO [...] Read more.
Unmanned Aerial Vehicle (UAV)-based inspection of transmission line insulators faces significant challenges due to complex backgrounds, variable imaging conditions, and diverse defect characteristics. Existing deep learning approaches often fail to balance detection accuracy with computational efficiency for edge deployment. This paper presents MCP-YOLO (Multi-scale Complex-background Pruned YOLO), a lightweight yet accurate detection framework specifically designed for real-time insulator defect identification. The proposed framework introduces three key innovations: (1) MS-EdgeNet module that enhances multi-granularity edge features through grouped convolution, improving detection robustness in cluttered environments; (2) Dynamic Feature Pyramid Network (DyFPN) that combines dynamic upsampling with re-parameterized multi-branch architecture, enabling effective multi-scale defect detection; (3) Auxiliary detection head that provides additional supervision during training while maintaining inference efficiency. Furthermore, Group SLIM pruning is employed to achieve model compression without sacrificing accuracy. Extensive experiments on a real-world dataset of 3091 UAV-captured images demonstrate that MCP-YOLO achieves 92.1% mAP@0.5, 90.5% precision, and 89.0% recall, while maintaining only 8.65 M parameters. Compared to state-of-the-art detectors, the proposed method achieves superior detection performance with significantly reduced computational overhead, reaching 250 FPS inference speed. The model size reduction of 37.3% from the baseline, coupled with enhanced detection capabilities, validates MCP-YOLO’s suitability for practical deployment in automated power grid inspection systems. Full article
(This article belongs to the Section Vehicular Sensing)
Show Figures

Figure 1

19 pages, 4213 KB  
Article
Decision-Support for Restorative Dentistry: Hybrid Optimization Enhances Detection on Panoramic Radiographs
by Gül Ateş, Fuat Türk, Elif Tuba Akçın and Müjgan Güngör
Healthcare 2025, 13(22), 2904; https://doi.org/10.3390/healthcare13222904 - 14 Nov 2025
Viewed by 461
Abstract
Background/Objectives: Artificial intelligence (AI) has been increasingly used to support radiological assessment in dentistry. We benchmarked machine learning (ML), deep learning (DL), and a hybrid optimization-assisted approach for the automatic five-class image-level classification of dental restorations (filling, implant, root canal treatment, fixed partial [...] Read more.
Background/Objectives: Artificial intelligence (AI) has been increasingly used to support radiological assessment in dentistry. We benchmarked machine learning (ML), deep learning (DL), and a hybrid optimization-assisted approach for the automatic five-class image-level classification of dental restorations (filling, implant, root canal treatment, fixed partial denture/bridge, crown) on panoramic radiographs. Methods: We analyzed 353 anonymized panoramic images comprising 2137 labeled restorations, acquired on the same device. Images were cropped and enhanced (histogram equalization and CLAHE), and texture features were extracted with GLCM. A three-stage pipeline was evaluated: (i) GLCM-based features classified by conventional ML and a baseline DL model; (ii) Hybrid Grey Wolf–Particle Swarm Optimization (HGWO-PSO) for feature selection followed by SVM; and (iii) a CNN trained end-to-end on raw images. Performance was assessed with an 80/20 per-patient split and 5-fold cross-validation on the training set. While each panoramic radiograph may contain multiple restorations, in this study we modeled the task as single-label, image-level classification (dominant restoration type) due to pipeline constraints; this choice is discussed as a limitation and motivates multi-label, localization-based approaches in future work. The CNN baseline was implemented in TensorFlow 2.12 (CUDA 11.8/cuDNN 8.9) and trained with Adam (learning rate 1 × 10−4), with a batch size 32 and up to 50 epochs with early stopping (patience 5); data augmentation included horizontal flips, ±10° rotations, and ±15% brightness variation. A post hoc power analysis (G*Power 3.1; α = 0.05, β = 0.2) confirmed sufficient sample size (n = 353, power > 0.84). Results: The HGWO-PSO + SVM configuration achieved the highest accuracy (73.15%), with macro-precision/recall/F1 = 0.728, outperforming the CNN (68.52% accuracy) and traditional ML models (SVM 67.89%; DT 59.09%; RF 58.33%; K-NN 53.70%). Conclusions: On this single-center dataset, the hybrid optimization-assisted classifier moderately improved detection performance over the baseline CNN and conventional ML. Given the dataset size and class imbalance, the proposed system should be interpreted as a decision-supportive tool to assist dentists rather than a stand-alone diagnostic system. Future work will target larger, multi-center datasets and stronger DL baselines to enhance generalizability and clinical utility. Full article
(This article belongs to the Section Artificial Intelligence in Healthcare)
Show Figures

Figure 1

Back to TopTop