MDPI - Publisher of Open Access Journals

15 pages, 3040 KB

Open AccessArticle

CGA-ViT: Channel-Guided Additive Attention for Efficient Vision Recognition

by Yayue Zhao, Jingli Miao, Zhenping Li, Baiyang Li, Anqi Zhuo and Yingxiao Zhao

Appl. Sci. 2026, 16(4), 1740; https://doi.org/10.3390/app16041740 - 10 Feb 2026

Viewed by 162

Vision transformers (ViTs) excel at global context modeling with self-attention. However, standard self-attention leads to quadratic computational complexity, which restricts its practical use in high-resolution or latency-sensitive tasks. Existing methods achieve linear complexity via local window constraints or additive approximations. However, they often [...] Read more.

Vision transformers (ViTs) excel at global context modeling with self-attention. However, standard self-attention leads to quadratic computational complexity, which restricts its practical use in high-resolution or latency-sensitive tasks. Existing methods achieve linear complexity via local window constraints or additive approximations. However, they often compromise long-range dependency modeling. To address this issue, we propose the channel-guided additive attention vision transformer (CGA-ViT), which achieves synergistic optimization of multi-scale feature extraction and efficient global context modeling. First, we propose multi-scale dilated feature embedding (MDFE). By designing multi-scale sampling and spatial feature embedding, we can expand the receptive field and capture fine-grained features simply by adjusting the dilation rate in the early stages; second, we design channel-guided additive attention (CGA), dynamically modulating key vectors using query-derived descriptors, enabling long-range semantic interactions while maintaining linear complexity growth. We adopt a hierarchical structure, and in the shallow layers, we use CGA to carry out local-global interactions and use efficient additive attention in deep layers for global integration. Evaluations on ImageNet-1K show that CGA-ViT achieves 84.0% Top-1 accuracy with 4.7 GFLOPs, outperforming Swin-T (81.3%) and ConvNeXt-T (82.1%) by 2.7 and 1.9 percentage points under comparable computational costs. Ablation experiments verify MDFE and CGA, which together contribute to 65.0% of performance gains, with the rest from token-level supervision. Overall, CGA-ViT effectively balances the intrinsic tradeoff between efficiency and global modeling capability, significantly boosts visual recognition performance without extra computational overhead, and provides an efficient solution for lightweight ViT design. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

39 pages, 10760 KB

Open AccessArticle

Automated Pollen Classification via Subinstance Recognition: A Comprehensive Comparison of Classical and Deep Learning Architectures

by Karol Struniawski, Aleksandra Machlanska, Agnieszka Marasek-Ciolakowska and Aleksandra Konopka

Appl. Sci. 2026, 16(2), 720; https://doi.org/10.3390/app16020720 - 9 Jan 2026

Viewed by 418

Abstract

Pollen identification is critical for melissopalynology (honey authentication), ecological monitoring, and allergen tracking, yet manual microscopic analysis remains labor-intensive, subjective, and error-prone when multiple grains overlap in realistic samples. Existing automated approaches often fail to address multi-grain scenarios or lack systematic comparison across [...] Read more.

Pollen identification is critical for melissopalynology (honey authentication), ecological monitoring, and allergen tracking, yet manual microscopic analysis remains labor-intensive, subjective, and error-prone when multiple grains overlap in realistic samples. Existing automated approaches often fail to address multi-grain scenarios or lack systematic comparison across classical and deep learning paradigms, limiting their practical deployment. This study proposes a subinstance-based classification framework combining YOLOv12n object detection for grain isolation, independent classification via classical machine learning (ML), convolutional neural networks (CNNs), or Vision Transformers (ViTs), and majority voting aggregation. Five classical classifiers with systematic feature selection, three CNN architectures (ResNet50, EfficientNet-B0, ConvNeXt-Tiny), and three ViT variants (ViT-B/16, ViT-B/32, ViT-L/16) are evaluated on four datasets (full images vs. isolated grains; raw vs. CLAHE-preprocessed) for four berry pollen species (Ribes nigrum, Ribes uva-crispa, Lonicera caerulea, and Amelanchier alnifolia). Stratified image-level splits ensure no data leakage, and explainable AI techniques (SHAP, Grad-CAM++, and gradient saliency) validate biological interpretability across all paradigms. Results demonstrate that grain isolation substantially improves classical ML performance (F1 from 0.83 to 0.91 on full images to 0.96–0.99 on isolated grains, +8–13 percentage points), while deep learning excels on both levels (CNNs: F1 = 1.000 on full images with CLAHE; ViTs: F1 = 0.99). At the instance level, all paradigms converge to near-perfect discrimination (F1 ≥ 0.96), indicating sufficient capture of morphological information. Majority voting aggregation provides +3–5% gains for classical methods but only +0.3–4.8% for deep models already near saturation. Explainable AI analysis confirms that models rely on biologically meaningful cues: blue channel moments and texture features for classical ML (SHAP), grain boundaries and exine ornamentation for CNNs (Grad-CAM++), and distributed attention across grain structures for ViTs (gradient saliency). Qualitative validation on 211 mixed-pollen images confirms robust generalization to realistic multi-species samples. The proposed framework (YOLOv12n + SVC/ResNet50 + majority voting) is practical for deployment in honey authentication, ecological surveys, and fine-grained biological image analysis. Full article

(This article belongs to the Special Issue Latest Research on Computer Vision and Image Processing)

► Show Figures

Figure 1

24 pages, 8257 KB

Open AccessArticle

Multi-Satellite Image Matching and Deep Learning Segmentation for Detection of Daytime Sea Fog Using GK2A AMI and GK2B GOCI-II

by Jonggu Kang, Hiroyuki Miyazaki, Seung Hee Kim, Menas Kafatos, Daesun Kim, Jinsoo Kim and Yangwon Lee

Remote Sens. 2026, 18(1), 34; https://doi.org/10.3390/rs18010034 - 23 Dec 2025

Viewed by 721

Abstract

Traditionally, sea fog detection technologies have relied primarily on in situ observations. However, point-based observations suffer from limitations in extensive monitoring in marine environments due to the scarcity of observation stations and the limited nature of measurement data. Satellites effectively address these issues [...] Read more.

Traditionally, sea fog detection technologies have relied primarily on in situ observations. However, point-based observations suffer from limitations in extensive monitoring in marine environments due to the scarcity of observation stations and the limited nature of measurement data. Satellites effectively address these issues by covering vast areas and operating across multiple spectral channels, enabling precise detection and monitoring of sea fog. Despite the increasing adoption of deep learning in this field, achieving further improvements in accuracy and reliability necessitates the simultaneous use of multiple satellite datasets rather than relying on a single source. Therefore, this study aims to achieve higher accuracy and reliability in sea fog detection by employing a deep learning-based advanced co-registration technique for multi-satellite image fusion and autotuning-based optimization of State-of-the-Art (SOTA) semantic segmentation models. We utilized data from the Advanced Meteorological Imager (AMI) sensor on the Geostationary Korea Multi-Purpose Satellite 2A (GK2A) and the GOCI-II sensor on the Geostationary Korea Multi-Purpose Satellite 2B (GK2B). Swin Transformer, Mask2Former, and SegNeXt all demonstrated balanced and excellent performance across overall metrics such as IoU and F1-score. Specifically, Swin Transformer achieved an IoU of 77.24 and an F1-score of 87.16. Notably, multi-satellite fusion significantly improved the Recall score compared to the single AMI product, increasing from 88.78 to 92.01, thereby effectively mitigating the omission of disaster information. Ultimately, comparisons with the officially operational GK2A AMI Fog and GK2B GOCI-II Marine Fog (MF) products revealed that our deep learning approach was superior to both existing operational products. Full article

(This article belongs to the Section AI Remote Sensing)

► Show Figures

Graphical abstract

31 pages, 25297 KB

Open AccessArticle

AET-FRAP—A Periodic Reshape Transformer Framework for Rock Fracture Early Warning Using Acoustic Emission Multi-Parameter Time Series

by Donghui Yang, Zechao Zhang, Zichu Yang, Yongqi Li and Linhuan Jin

Sensors 2025, 25(24), 7580; https://doi.org/10.3390/s25247580 - 13 Dec 2025

Viewed by 453

Abstract

The timely identification of rock fractures is crucial in deep subterranean engineering. However, it remains necessary to identify reliable warning indicators and establish effective warning levels. This study introduces the Acoustic Emission Transformer for FRActure Prediction (AET-FRAP) multi-input time series forecasting framework, which [...] Read more.

The timely identification of rock fractures is crucial in deep subterranean engineering. However, it remains necessary to identify reliable warning indicators and establish effective warning levels. This study introduces the Acoustic Emission Transformer for FRActure Prediction (AET-FRAP) multi-input time series forecasting framework, which employs acoustic emission feature parameters. First, Empirical Mode Decomposition (EMD) combined with Fast Fourier Transform (FFT) is employed to identify and filter periodicities among diverse indicators and select input channels with enhanced informative value, with the aim of predicting cumulative energy. Thereafter, the one-dimensional sequence is transformed into a two-dimensional tensor based on its predominant period via spectral analysis. This is coupled with InceptionNeXt—an efficient multiscale convolution and amplitude spectrum-weighted aggregate—to enhance pattern identification across various timeframes. A secondary criterion is created based on the prediction sequence, employing cosine similarity and kurtosis to collaboratively identify abrupt changes. This transforms single-point threshold detection into robust sequence behavior pattern identification, indicating clearly quantifiable trigger criteria. AET-FRAP exhibits improvements in accuracy relative to long short-term memory (LSTM) on uniaxial compression test data, with R² approaching 1 and reductions in Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE). It accurately delineates energy accumulation spikes in the pre-fracture period and provides advanced warning. The collaborative thresholds effectively reduce noise-induced false alarms, demonstrating significant stability and engineering significance. Full article

(This article belongs to the Section Electronic Sensors)

► Show Figures

Figure 1

26 pages, 2632 KB

Open AccessArticle

CAGM-Seg: A Symmetry-Driven Lightweight Model for Small Object Detection in Multi-Scenario Remote Sensing

by Hao Yao, Yancang Li, Wenzhao Feng, Ji Zhu, Haiming Yan, Shijun Zhang and Hanfei Zhao

Symmetry 2025, 17(12), 2137; https://doi.org/10.3390/sym17122137 - 12 Dec 2025

Cited by 2 | Viewed by 475

Abstract

In order to address challenges in small object recognition for remote sensing imagery—including high model complexity, overfitting with small samples, and insufficient cross-scenario generalization—this study proposes CAGM-Seg, a lightweight recognition model integrating multi-attention mechanisms. The model systematically enhances the U-Net architecture: First, the [...] Read more.

In order to address challenges in small object recognition for remote sensing imagery—including high model complexity, overfitting with small samples, and insufficient cross-scenario generalization—this study proposes CAGM-Seg, a lightweight recognition model integrating multi-attention mechanisms. The model systematically enhances the U-Net architecture: First, the encoder adopts a pre-trained MobileNetV3-Large as the backbone network, incorporating a coordinate attention mechanism to strengthen spatial localization of min targets. Second, an attention gating module is introduced in skip connections to achieve adaptive fusion of cross-level features. Finally, the decoder fully employs depthwise separable convolutions to significantly reduce model parameters. This design embodies a symmetry-aware philosophy, which is reflected in two aspects: the structural symmetry between the encoder and decoder facilitates multi-scale feature fusion, while the coordinate attention mechanism performs symmetric decomposition of spatial context (i.e., along height and width directions) to enhance the perception of geometrically regular small targets. Regarding training strategy, a hybrid loss function combining Dice Loss and Focal Loss, coupled with the AdamW optimizer, effectively enhances the model’s sensitivity to small objects while suppressing overfitting. Experimental results on the Xingtai black and odorous water body identification task demonstrate that CAGM-Seg outperforms comparison models in key metrics including precision (97.85%), recall (98.08%), and intersection-over-union (96.01%). Specifically, its intersection-over-union surpassed SegNeXt by 11.24 percentage points and PIDNet by 8.55 percentage points; its F1 score exceeded SegFormer by 2.51 percentage points. Regarding model efficiency, CAGM-Seg features a total of 3.489 million parameters, with 517,000 trainable parameters—approximately 80% fewer than the baseline U-Net—achieving a favorable balance between recognition accuracy and computational efficiency. Further cross-task validation demonstrates the model’s robust cross-scenario adaptability: it achieves 82.77% intersection-over-union and 90.57% F1 score in landslide detection, while maintaining 87.72% precision and 86.48% F1 score in cloud detection. The main contribution of this work is the effective resolution of key challenges in few-shot remote sensing small-object recognition—notably inadequate feature extraction and limited model generalization—via the strategic integration of multi-level attention mechanisms within a lightweight architecture. The resulting model, CAGM-Seg, establishes an innovative technical framework for real-time image interpretation under edge-computing constraints, demonstrating strong potential for practical deployment in environmental monitoring and disaster early warning systems. Full article

► Show Figures

Figure 1

24 pages, 5044 KB

Open AccessArticle

Research on Fouling Shellfish on Marine Aquaculture Cages Detection Technology Based on an Improved Symmetric Faster R-CNN Detection Algorithm

by Pengshuai Zhu, Hao Li, Junhua Chen and Chengjun Guo

Symmetry 2025, 17(12), 2107; https://doi.org/10.3390/sym17122107 - 8 Dec 2025

Viewed by 375

Abstract

The development of detection and identification technologies for biofouling organisms on marine aquaculture cages is of paramount importance for the automation and intelligence of cleaning processes by Autonomous Underwater Vehicles (AUVs). The present study proposes a methodology for the detection of fouling shellfish [...] Read more.

The development of detection and identification technologies for biofouling organisms on marine aquaculture cages is of paramount importance for the automation and intelligence of cleaning processes by Autonomous Underwater Vehicles (AUVs). The present study proposes a methodology for the detection of fouling shellfish on marine aquaculture cages. This methodology is based on an improved version of a symmetric Faster R-CNN: The original Visual Geometry Group 16-layer (VGG16) network is replaced with a 50-layer Residual Network with Aggregated Transformations (ResNeXt50) architecture, incorporating a Convolutional Block Attention Module (CBAM) to enhance feature extraction capabilities; In addition, the anchor box dimensions must be optimised concurrently with the Intersection over Union (IoU) threshold. This is to ensure the adaptation to the scale of the object; combined with the Multi-Scale Retinex with Single Scale Component and Color Restoration (MSRCR) algorithm with a view to achieving image enhancement. Experiments demonstrate that the enhanced model attains an average precision of 94.27%, signifying a 10.31% augmentation over the original model whilst necessitating a mere one-fifth of the original model’s weight. At an intersection-over-union (IoU) value of 0.5, the model attains a mean average precision (mAP) of 93.14%, surpassing numerous prevalent detection models. Furthermore, the employment of an image-enhanced dataset during the training of detection models has been demonstrated to yield an average precision that is 11.72 percentage points higher than that achieved through training with the original dataset. In summary, the technical approach proposed in this paper enables accurate and efficient detection and identification of fouling shellfish on marine aquaculture cages. Full article

(This article belongs to the Special Issue Computer Vision, Robotics, and Automation Engineering)

► Show Figures

Figure 1

17 pages, 4983 KB

Open AccessArticle

TAGNet: A Tidal Flat-Attentive Graph Network Designed for Airborne Bathymetric LiDAR Point Cloud Classification

by Ahram Song

ISPRS Int. J. Geo-Inf. 2025, 14(12), 466; https://doi.org/10.3390/ijgi14120466 - 28 Nov 2025

Viewed by 472

Abstract

Airborne LiDAR bathymetry (ALB) provides dense three-dimensional point clouds that enable the detailed mapping of tidal flat environments. However, surface classification using these point clouds remains challenging due to residual noise, water surface reflectivity, and subtle class boundaries that persist even after standard [...] Read more.

Airborne LiDAR bathymetry (ALB) provides dense three-dimensional point clouds that enable the detailed mapping of tidal flat environments. However, surface classification using these point clouds remains challenging due to residual noise, water surface reflectivity, and subtle class boundaries that persist even after standard preprocessing. To address these challenges, this study introduces Tidal flat-Attentive Graph Network (TAGNet), a graph-based deep learning framework designed to leverage both local geometric relationships and global contextual cues for the point-wise classification of tidal flat surface classes. The model incorporates multi-scale EdgeConv layers for capturing fine-grained neighborhood structures and employs squeeze-and-excitation channel attention to enhance global feature representation. To validate TAGNet’s effectiveness, classification was conducted on ALB point clouds collected from adjacent tidal flat regions, focusing on four major surface classes: exposed flat, sea surface, sea floor, and vegetation. In benchmarking tests against baseline models, including Dynamic Graph Convolutional Neural Network, PointNeXt with Single-Scale Grouping, and PointNet Transformer, TAGNet consistently achieved higher macro F1-scores. Moreover, ablation studies isolating positional encoding, attention mechanisms, and detrended Z-features confirmed their complementary contributions to TAGNet’s performance. Notably, the full TAGNet outperformed all baselines by a substantial margin, particularly when distinguishing closely related classes, such as sea floor and exposed flat. These findings highlight the potential of graph-based architectures specifically designed for ALB data in enhancing the precision of coastal monitoring and habitat mapping. Full article

(This article belongs to the Topic Advances in Sensor Data Fusion and AI for Environmental Monitoring)

► Show Figures

Figure 1

22 pages, 9577 KB

Open AccessArticle

YOLOv11-4ConvNeXtV2: Enhancing Persimmon Ripeness Detection Under Visual Challenges

by Bohan Zhang, Zhaoyuan Zhang and Xiaodong Zhang

AI 2025, 6(11), 284; https://doi.org/10.3390/ai6110284 - 1 Nov 2025

Cited by 1 | Viewed by 1071

Abstract

Reliable and efficient detection of persimmons provides the foundation for precise maturity evaluation. Persimmon ripeness detection remains challenging due to small target sizes, frequent occlusion by foliage, and motion- or focus-induced blur that degrades edge information. This study proposes YOLOv11-4ConvNeXtV2, an enhanced detection [...] Read more.

Reliable and efficient detection of persimmons provides the foundation for precise maturity evaluation. Persimmon ripeness detection remains challenging due to small target sizes, frequent occlusion by foliage, and motion- or focus-induced blur that degrades edge information. This study proposes YOLOv11-4ConvNeXtV2, an enhanced detection framework that integrates a ConvNeXtV2 backbone with Fully Convolutional Masked Auto-Encoder (FCMAE) pretraining, Global Response Normalization (GRN), and Single-Head Self-Attention (SHSA) mechanisms. We present a comprehensive persimmon dataset featuring sub-block segmentation that preserves local structural integrity while expanding dataset diversity. The model was trained on 4921 annotated images (original 703 + 6 × 703 augmented) collected under diverse orchard conditions and optimized for 300 epochs using the Adam optimizer with early stopping. Comprehensive experiments demonstrate that YOLOv11-4ConvNeXtV2 achieves 95.9% precision and 83.7% recall, with mAP@0.5 of 88.4% and mAP@0.5:0.95 of 74.8%, outperforming state-of-the-art YOLO variants (YOLOv5n, YOLOv8n, YOLOv9t, YOLOv10n, YOLOv11n, YOLOv12n) by 3.8–6.3 percentage points in mAP@0.5:0.95. The model demonstrates superior robustness to blur, occlusion, and varying illumination conditions, making it suitable for deployment in challenging maturity detection environments. Full article

► Show Figures

Figure 1

12 pages, 1779 KB

Open AccessArticle

Artificial Intelligence Algorithm Supporting the Diagnosis of Developmental Dysplasia of the Hip: Automated Ultrasound Image Segmentation

by Łukasz Pulik, Paweł Czech, Jadwiga Kaliszewska, Bartłomiej Mulewicz, Maciej Pykosz, Joanna Wiszniewska and Paweł Łęgosz

J. Clin. Med. 2025, 14(17), 6332; https://doi.org/10.3390/jcm14176332 - 8 Sep 2025

Viewed by 1391

Abstract

Background: Developmental dysplasia of the hip (DDH), if not treated, can lead to osteoarthritis and disability. Ultrasound (US) is a primary screening method for the detection of DDH, but its interpretation remains highly operator-dependent. We propose a supervised machine learning (ML) image [...] Read more.

Background: Developmental dysplasia of the hip (DDH), if not treated, can lead to osteoarthritis and disability. Ultrasound (US) is a primary screening method for the detection of DDH, but its interpretation remains highly operator-dependent. We propose a supervised machine learning (ML) image segmentation model for the automated recognition of anatomical structures in hip US images. Methods: We conducted a retrospective observational analysis based on a dataset of 10,767 hip US images from 311 patients. All images were annotated for eight key structures according to the Graf method and split into training (75.0%), validation (9.5%), and test (15.5%) sets. Model performance was assessed using the Intersection over Union (IoU) and Dice Similarity Coefficient (DSC). Results: The best-performing model was based on the SegNeXt architecture with an MSCAN_L backbone. The model achieved high segmentation accuracy (IoU; DSC) for chondro-osseous border (0.632; 0.774), femoral head (0.916; 0.956), labrum (0.625; 0.769), cartilaginous (0.672; 0.804), and bony roof (0.725; 0.841). The average Euclidean distance for point-based landmarks (bony rim and lower limb) was 4.8 and 4.5 pixels, respectively, and the baseline deflection angle was 1.7 degrees. Conclusions: This ML-based approach demonstrates promising accuracy and may enhance the reliability and accessibility of US-based DDH screening. Future applications could integrate real-time angle measurement and automated classification to support clinical decision-making. Full article

(This article belongs to the Special Issue Artificial Intelligence and Machine Learning in Clinical Practice: Advancing Medical Imaging Analysis)

► Show Figures

Figure 1

23 pages, 4190 KB

Open AccessArticle

Revealing the Power of Deep Learning in Quality Assessment of Mango and Mangosteen Purée Using NIR Spectral Data

by Pimpen Pornchaloempong, Sneha Sharma, Thitima Phanomsophon, Panmanas Sirisomboon and Ravipat Lapcharoensuk

Horticulturae 2025, 11(9), 1047; https://doi.org/10.3390/horticulturae11091047 - 2 Sep 2025

Viewed by 1562

Abstract

The quality control of fruit purée products such as mango and mangosteen is crucial for maintaining consumer satisfaction and meeting industry standards. Traditional destructive techniques for assessing key quality parameters like the soluble solid content (SSC) and titratable acidity (TA) are labor-intensive and [...] Read more.

The quality control of fruit purée products such as mango and mangosteen is crucial for maintaining consumer satisfaction and meeting industry standards. Traditional destructive techniques for assessing key quality parameters like the soluble solid content (SSC) and titratable acidity (TA) are labor-intensive and time-consuming; prompting the need for rapid, nondestructive alternatives. This study investigated the use of deep learning (DL) models including Simple-CNN, AlexNet, EfficientNetB0, MobileNetV2, and ResNeXt for predicting SSC and TA in mango and mangosteen purée and compared their performance with the conventional chemometric method partial least squares regression (PLSR). Spectral data were preprocessed and evaluated using 10-fold cross-validation. For mango purée, the Simple-CNN model achieved the highest predictive accuracy for both SSC (coefficient of determination of cross-validation (

R_{CV}^{2}

) = 0.914, root mean square error of cross-validation (RMSE_CV) = 0.688, the ratio of prediction to deviation of cross-validation (RPD_CV) = 3.367) and TA (

R_{CV}^{2}

= 0.762, RMSE_CV = 0.037, RPD_CV = 2.864), demonstrating a statistically significant improvement over PLSR. For the mangosteen purée, AlexNet exhibited the best SSC prediction performance (

R_{CV}^{2}

= 0.702, RMSE_CV = 0.471, RPD_CV = 1.666), though the RPD_CV values (<2.0) indicated limited applicability for precise quantification. TA prediction in mangosteen purée showed low variance in the reference values (standard deviation (SD) = 0.048), which may have restricted model performance. These results highlight the potential of DL for improving NIR-based quality evaluation of fruit purée, while also pointing to the need for further refinement to ensure interpretability, robustness, and practical deployment in industrial quality control. Full article

(This article belongs to the Section Postharvest Biology, Quality, Safety, and Technology)

► Show Figures

Figure 1

25 pages, 39901 KB

Open AccessArticle

A Novel Adaptive Cuboid Regional Growth Algorithm for Trunk–Branch Segmentation of Point Clouds from Two Fruit Tree Species

by Yuheng Cao, Ning Wang, Bin Wu, Xin Zhang, Yaxiong Wang, Shuting Xu, Man Zhang, Yanlong Miao and Feng Kang

Agriculture 2025, 15(14), 1463; https://doi.org/10.3390/agriculture15141463 - 8 Jul 2025

Cited by 3 | Viewed by 1245

Abstract

Accurate acquisition of the phenotypic information of trunk-shaped fruit trees plays a crucial role in intelligent orchard management, pruning during dormancy, and improving fruit yield and quality. However, the precise segmentation of trunks and branches remains a significant challenge, limiting the accurate measurement [...] Read more.

Accurate acquisition of the phenotypic information of trunk-shaped fruit trees plays a crucial role in intelligent orchard management, pruning during dormancy, and improving fruit yield and quality. However, the precise segmentation of trunks and branches remains a significant challenge, limiting the accurate measurement of phenotypic parameters and high-precision pruning of branches. To address this issue, a novel adaptive cuboid regional growth segmentation algorithm is proposed in this study. This method integrates a growth vector that is adaptively adjusted based on the growth trend of branches and a growth cuboid that is dynamically regulated according to branch diameters. Additionally, an innovative reverse growth strategy is introduced to enhance the efficiency of the growth process. Furthermore, the algorithm can automatically and effectively identify the starting and ending points of growth based on the structural characteristics of fruit tree branches, solving the problem of where to start and when to stop. Compared with PointNet++, PointNeXt, and Point Transformer, ACRGS achieved superior performance, with F₁-scores of 95.75% and 96.21% and mIoU values of 0.927 and 0.933 for apple and cherry trees. The results show that the method enables high-precision and efficiency trunk–branch segmentation, providing data support for fruit tree phenotypic parameter extraction and pruning. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

25 pages, 8202 KB

Open AccessArticle

Research on Identification Method of Transformer Windings’ Loose Vibration Spectrum Considering a Multi-Load Current Condition

by Jin Fang, Xudong Deng, Yuancan Xia, Chen Wu, Yuehua Li, Xin Li, Kaixin Chen, Fan Wang and Zhanlong Zhang

Appl. Sci. 2025, 15(12), 6949; https://doi.org/10.3390/app15126949 - 19 Jun 2025

Viewed by 1291

Abstract

During transformer operation, long-term vibration causes the winding to loosen axially. When hit by a short-circuit, the winding deforms to different extents. Thus, identifying early looseness faults in transformer windings is vital for power systems’ stability. To address issues including scarce vibration data [...] Read more.

During transformer operation, long-term vibration causes the winding to loosen axially. When hit by a short-circuit, the winding deforms to different extents. Thus, identifying early looseness faults in transformer windings is vital for power systems’ stability. To address issues including scarce vibration data across multiple load conditions for transformer winding looseness faults, inadequate extraction of two-dimensional spectrogram features, and the inability to boost recognition accuracy caused by overfitting during fault recognition model training, this study constructed a 10 kV power transformer vibration test platform. It measured the vibration signals on the box surface under various winding looseness conditions and built a time–frequency-domain vibration spectrum library for different load currents. Then, a fault identification model based on vibration spectra and ConvNeXt was constructed, and model verification and analysis were carried out. The results indicate that after training, the fault recognition accuracy of the spectrum containing three load conditions is comparable to that of a single load condition. The average recognition accuracy at six box-surface measuring points reaches 97.9%. Moreover, the ConvNeXt model outperforms the traditional ResNet50 by 1.2%. This new model effectively addresses overfitting and offers strong technical support for detecting different transformer winding looseness faults. Full article

(This article belongs to the Section Electrical, Electronics and Communications Engineering)

► Show Figures

Figure 1

20 pages, 2788 KB

Open AccessArticle

Powerful Sample Reduction Techniques for Constructing Effective Point Cloud Object Classification Models

by Chih-Lung Lin, Hai-Wei Yang and Chi-Hung Chuang

Electronics 2025, 14(12), 2439; https://doi.org/10.3390/electronics14122439 - 16 Jun 2025

Cited by 1 | Viewed by 2739

Abstract

Due to the large volume of raw data in 3D point clouds, downsampling techniques are crucial for reducing computational load and memory usage to improve the training of 3D point cloud models. This paper plans to conduct research using the ModelNet40 dataset. Our [...] Read more.

Due to the large volume of raw data in 3D point clouds, downsampling techniques are crucial for reducing computational load and memory usage to improve the training of 3D point cloud models. This paper plans to conduct research using the ModelNet40 dataset. Our proposed method is based on the PointNext architecture, an improved version of PointNet++ that significantly enhances performance through optimized training strategies and adjusted receptive fields. During the model training process, we employ the farthest point sampling method for downsampling. Specifically, we use an improved attention-based point cloud edge sampling (APES) method for downsampling, where we compute the density of each point and set the size of the neighbor K value to effectively retain feature points during downsampling. Our improved method captures edge points more effectively than the original APES method. By adjusting the architecture, our method, combined with the farthest point sampling method, not only reduced the average training time by nearly 15% compared to PointNext-s, but also improved accuracy from 93.11% to 93.57%. Full article

(This article belongs to the Section Computer Science & Engineering)

► Show Figures

Figure 1

22 pages, 46829 KB

Open AccessEditor’s ChoiceArticle

Waveshift 2.0: An Improved Physics-Driven Data Augmentation Strategy in Fine-Grained Image Classification

by Gent Imeraj and Hitoshi Iyatomi

Electronics 2025, 14(9), 1735; https://doi.org/10.3390/electronics14091735 - 24 Apr 2025

Cited by 2 | Viewed by 1651

Abstract

This paper presents Waveshift Augmentation 2.0 (WS 2.0), an enhanced version of the previously proposed Waveshift Augmentation (WS 1.0), a novel data augmentation technique inspired by light propagation dynamics in optical systems. While WS 1.0 introduced phase-based wavefront transformations under the assumption of [...] Read more.

This paper presents Waveshift Augmentation 2.0 (WS 2.0), an enhanced version of the previously proposed Waveshift Augmentation (WS 1.0), a novel data augmentation technique inspired by light propagation dynamics in optical systems. While WS 1.0 introduced phase-based wavefront transformations under the assumption of an infinitesimally small aperture, WS 2.0 incorporates an additional aperture-dependent hyperparameter that models real-world optical attenuation. This refinement enables broader frequency modulation and greater diversity in image transformations while preserving compatibility with well-established data augmentation pipelines such as CLAHE, AugMix, and RandAugment. Evaluated across a wide range of tasks, including medical imaging, fine-grained object recognition, and grayscale image classification, WS 2.0 consistently outperformed both WS 1.0 and standard geometric augmentation. Notably, when benchmarked against geometric augmentation alone, it achieved average macro-F1 improvements of +1.48 (EfficientNetV2), +0.65 (ConvNeXt), and +0.73 (Swin Transformer), with gains of up to +9.32 points in medical datasets. These results demonstrate that WS 2.0 advances physics-based augmentation by enhancing generalization without sacrificing modularity or preprocessing efficiency, offering a scalable and realistic augmentation strategy for complex imaging domains. Full article

(This article belongs to the Special Issue New Trends in Computer Vision and Image Processing)

► Show Figures

Figure 1

28 pages, 12803 KB

Open AccessArticle

Spatiotemporal Trends and Zoning Geospatial Assessment in China’s Offshore Mariculture (2018–2022)

by Zewen Mo, Yulin Chen, Xuan Zhang, Zhipan Wang and Qingling Zhang

Remote Sens. 2025, 17(7), 1227; https://doi.org/10.3390/rs17071227 - 30 Mar 2025

Cited by 2 | Viewed by 951

Abstract

Offshore mariculture is a critical component of China’s aquaculture sector, but its rapid expansion presents significant challenges to sustainable marine resource management. This study utilizes high-resolution remote sensing data (2017–2023) and advanced ConvNeXt V2 algorithms to quantitatively analyze the spatiotemporal dynamics of offshore [...] Read more.

Offshore mariculture is a critical component of China’s aquaculture sector, but its rapid expansion presents significant challenges to sustainable marine resource management. This study utilizes high-resolution remote sensing data (2017–2023) and advanced ConvNeXt V2 algorithms to quantitatively analyze the spatiotemporal dynamics of offshore mariculture and explore its spatial distribution in relation to marine functional zoning policies. Through a detailed classification of six mariculture types, this study reveals significant spatial shifts, with China’s offshore mariculture transitioning from a model characterized by a “coastal, concentrated layout” to a new paradigm of “deep-sea and far-sea expansion, multi-point distribution”. Notably, the area of deep-sea and far-sea mariculture increased by 41.8% in regions with water depths of 50 m or more from 2018 to 2022. However, in 2022, the actual mariculture area accounted for only 0.608% of the designated functional zones, while 61.79% of mariculture activities occurred outside these planned zones, indicating a considerable spatial mismatch between mariculture practices and zoning plans. This study underscores the urgent need to optimize spatial planning and regulatory frameworks to balance economic growth with environmental sustainability, offering novel insights and actionable recommendations for the coordinated development of China’s marine economy. Full article

(This article belongs to the Special Issue Remote Sensing of Coastal Waters, Land Use/Cover, Lakes, Rivers and Watersheds III)

► Show Figures

Figure 1

Search Results (39)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (39)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI