Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (266)

Search Parameters:
Keywords = segmentation head network

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 4888 KB  
Article
PGSUNet: A Phenology-Guided Deep Network for Tea Plantation Extraction from High-Resolution Remote Sensing Imagery
by Xiaoyong Zhang, Bochen Jiang and Hongrui Sun
Appl. Sci. 2025, 15(24), 13062; https://doi.org/10.3390/app152413062 - 11 Dec 2025
Viewed by 139
Abstract
Tea, recognized as one of the world’s three principal beverages, plays a significant role both economically and culturally. The accurate, large-scale mapping of tea plantations is crucial for quality control, industry regulation, and ecological assessments. Challenges arise in high-resolution imagery due to the [...] Read more.
Tea, recognized as one of the world’s three principal beverages, plays a significant role both economically and culturally. The accurate, large-scale mapping of tea plantations is crucial for quality control, industry regulation, and ecological assessments. Challenges arise in high-resolution imagery due to the spectral similarities with other land covers and the intricate nature of their boundaries. We introduce a Phenology-Guided SwinUnet (PGSUNet), a semantic segmentation network that amalgamates Swin Transformer encoding with a parallel phenology context branch. An intelligent fusion module within this network generates spatial attention informed by phenological priors, while a dual-head decoder enhances the precision through explicit edge supervision. Using Hangzhou City as the case study, PGSUNet was compared with seven mainstream models, including DeepLabV3+ and SegFormer. It achieved an F1-score of 0.84, outperforming the second-best model by 0.03, and obtained an mIoU of 84.53%, about 2% higher than the next-best result. This study demonstrates that the integration of phenological priors with edge supervision significantly improves the fine-scale extraction of agricultural land covers from complex remote sensing imagery. Full article
(This article belongs to the Section Agricultural Science and Technology)
Show Figures

Figure 1

22 pages, 5462 KB  
Article
Ship Motion State Recognition Using Trajectory Image Modeling and CNN-Lite
by Shuaibing Zhao, Zongshun Tian, Yuefeng Lu, Peng Xie, Xueyuan Li, Yu Yan and Bo Liu
J. Mar. Sci. Eng. 2025, 13(12), 2327; https://doi.org/10.3390/jmse13122327 - 8 Dec 2025
Viewed by 175
Abstract
Intelligent recognition of ship motion states is a key technology for achieving smart maritime supervision and optimized port scheduling. To enhance both the modeling efficiency and recognition accuracy of AIS trajectory data, this paper proposes a ship behavior recognition method that integrates trajectory-to-image [...] Read more.
Intelligent recognition of ship motion states is a key technology for achieving smart maritime supervision and optimized port scheduling. To enhance both the modeling efficiency and recognition accuracy of AIS trajectory data, this paper proposes a ship behavior recognition method that integrates trajectory-to-image conversion with a convolutional neural network (CNN) for classifying three typical motion states: mooring, anchoring, and sailing. Firstly, a multi-step preprocessing pipeline is established, incorporating trajectory cleaning, interpolation complementation, and segmentation to ensure data completeness and consistency; secondly, dynamic features—including speed, heading, and temporal progression—are encoded into an RGB three-channel image, which not only preserves the original spatial and temporal information of the trajectory but also strengthens the dimension of the feature expression of the image. Thirdly, the lightweight CNN architecture (CNN-Lite) is designed to automatically extract spatial motion patterns from these images, with data augmentation techniques further enhancing model robustness and generalization across diverse scenarios. Finally, comprehensive comparative experiments are conducted to evaluate the proposed method. On a real-world AIS dataset, the proposed method achieves an accuracy of 91.54%, precision of 91.51%, recall of 91.54%, and F1-score of 91.52%—demonstrating superior or highly competitive performance compared with SVM, KNN, MLSTM, ResNet-50 and Swin-Transformer in both classification accuracy and model stability. These results confirm that constructing dynamic-feature-enriched RGB trajectory images and designing a lightweight CNN can effectively improve ship behavior recognition performance and provide a practical and efficient technical solution for abnormal anchoring detection, maritime traffic monitoring, and development of intelligent shipping systems. Full article
(This article belongs to the Special Issue Advanced Ship Trajectory Prediction and Route Planning)
Show Figures

Figure 1

26 pages, 35268 KB  
Article
TriEncoderNet: Multi-Stage Fusion of CNN, Transformer, and HOG Features for Forward-Looking Sonar Image Segmentation
by Jie Liu, Yan Dong, Guofang Chen, Yimin Chen, Jian Gao and Fubin Zhang
J. Mar. Sci. Eng. 2025, 13(12), 2295; https://doi.org/10.3390/jmse13122295 - 3 Dec 2025
Viewed by 208
Abstract
Forward-looking sonar (FLS) image segmentation is essential for underwater exploration with remaining challenges including low contrast, ambient noise, and complex backgrounds, which both existing traditional and deep learning-based methods fail to address effectively. This paper presents TriEncoderNet, a novel model that simultaneously extracts [...] Read more.
Forward-looking sonar (FLS) image segmentation is essential for underwater exploration with remaining challenges including low contrast, ambient noise, and complex backgrounds, which both existing traditional and deep learning-based methods fail to address effectively. This paper presents TriEncoderNet, a novel model that simultaneously extracts local, global, and edge-related features through three parallel encoders. Specifically, the model integrates a convolutional neural network (CNN) for local feature extraction, a transformer for global context modeling, and a histogram of oriented gradients (HOG) encoder for edge and shape detection. The key innovations of TriEncoderNet include the CrossFusionTransformer (CFT) module, which effectively integrates local and global features to capture both fine details and comprehensive context, and the HOG attention gate (HAG) module, which enhances edge detection and preserves semantic consistency across diverse feature types. Additionally, TriEncoderNet introduces the hierarchical efficient transformer (HETransformer) with a lightweight multi-head self-attention mechanism to reduce computational overhead while maintaining global context modeling capability. Experimental results on the marine debris dataset and UATD dataset demonstrate the superior performance of TriEncoderNet. Specifically, it achieves an mIoU of 0.793 and mAP of 0.916 on the marine debris dataset, and an mIoU of 0.582 and mAP of 0.687 on the UATD Dataset, outperforming state-of-the-art methods in both segmentation accuracy and robustness in challenging underwater environments. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

19 pages, 700 KB  
Article
BiGRMT: Bidirectional GRU–Recurrent Memory Transformer for Efficient Long-Sequence Anomaly Detection in High-Concurrency Microservices
by Ruicheng Zhang, Renzun Zhang, Shuyuan Wang, Kun Yang, Miao Xu, Dongwei Qiao and Xuanzheng Hu
Electronics 2025, 14(23), 4754; https://doi.org/10.3390/electronics14234754 - 3 Dec 2025
Viewed by 270
Abstract
In high-concurrency distributed systems, log data often exhibits sequence uncertainty and redundancy, which pose significant challenges to the accuracy and efficiency of anomaly detection. To address these issues, we propose BiGRMT, a hybrid architecture that integrates Bidirectional Gated Recurrent Unit (Bi-GRU) with a [...] Read more.
In high-concurrency distributed systems, log data often exhibits sequence uncertainty and redundancy, which pose significant challenges to the accuracy and efficiency of anomaly detection. To address these issues, we propose BiGRMT, a hybrid architecture that integrates Bidirectional Gated Recurrent Unit (Bi-GRU) with a Recurrent Memory Transformer (RMT). BiGRMT enhances local temporal feature extraction through bidirectional modeling and adaptive noise filtering using Bi-GRU, while a RMT component is incorporated to significantly extend the model’s capacity for long-sequence modeling via segment-level memory. The Transformer’s multi-head attention mechanism continues to capture global time dependencies but now with improved efficiency due to the RMT’s memory-sharing design. Extensive experiments on three benchmark datasets from LogHub (Spark, BGL(Blue Gene/L), and HDFS (Hadoop distributed file system)) demonstrate that BiGRMT achieves strong results in terms of precision, recall, and F1-score. It attains a precision of 0.913, outperforming LogGPT (0.487) and slightly exceeding Temporal logical attention network (TLAN) (0.912). Compared to LogPal, which prioritizes detection accuracy, BiGRMT strikes a better balance by significantly reducing computational overhead while maintaining high detection performance. Even under challenging conditions such as a 50% increase in log generation rate or 20% injected noise, BiGRMT maintains F1-scores of 87.4% and 83.6%, respectively, showcasing excellent robustness. These findings confirm that BiGRMT is a scalable and practical solution for automated fault detection and intelligent maintenance in complex distributed software systems. Full article
Show Figures

Figure 1

33 pages, 2821 KB  
Article
SwinCAMF-Net: Explainable Cross-Attention Multimodal Swin Network for Mammogram Analysis
by Lakshmi Prasanthi R. S. Narayanam, Thirupathi N. Rao and Deva S. Kumar
Diagnostics 2025, 15(23), 3037; https://doi.org/10.3390/diagnostics15233037 - 28 Nov 2025
Viewed by 321
Abstract
Background: Breast cancer is a leading cause of cancer-related mortality among women, and earlier diagnosis significantly improves treatment outcomes. However, traditional mammography-based systems rely on single-modality image analysis and lack integration of volumetric and clinical context, which limits diagnostic robustness. Deep learning [...] Read more.
Background: Breast cancer is a leading cause of cancer-related mortality among women, and earlier diagnosis significantly improves treatment outcomes. However, traditional mammography-based systems rely on single-modality image analysis and lack integration of volumetric and clinical context, which limits diagnostic robustness. Deep learning models have shown promising results in identification but are typically restricted to 2D feature extraction and lack cross-modal reasoning capability. Objective: This study proposes SwinCAMF-Net, a multimodal cross-attention Swin transformer network designed to improve joint breast lesion classification and segmentation by integrating multi-view mammography, 3D ROI volumes, and clinical metadata. Methods: SwinCAMF-Net employs a Swin transformer encoder for hierarchical visual representation learning from mammographic views, a 3D CNN volume encoder for lesion depth context modelling, and a clinical projection module to embed patient metadata. A novel cross-attentive fusion (CAF) module selectively aligns multimodal features through query–key attention. The fused feature representation branches into a classification head for malignancy prediction and a segmentation decoder for lesion localization. The model is trained and evaluated on CBIS-DDSM and RTM benchmark datasets. Results: SwinCAMF-Net achieved accuracy up to 0.978, an AUC-ROC of 0.998, and an F1-score of 0.944 for classification, while segmentation reached a Dice coefficient of 0.931. Ablation experiments confirm that the CAF module improves performance by up to 6.9%, demonstrating its effectiveness in multimodal fusion. Conclusion: SwinCAMF-Net advances breast cancer analysis by providing complementary multimodal evidence through a cross-attentive fusion, leading to improved diagnostic performance and clinical interpretability. The framework demonstrates strong potential in AI-assisted screening and radiology decision support. Full article
Show Figures

Figure 1

29 pages, 1924 KB  
Article
VT-MFLV: Vision–Text Multimodal Feature Learning V Network for Medical Image Segmentation
by Wenju Wang, Jiaqi Li, Zinuo Ye, Yuyang Cai, Zhen Wang and Renwei Zhang
J. Imaging 2025, 11(12), 425; https://doi.org/10.3390/jimaging11120425 - 28 Nov 2025
Viewed by 186
Abstract
Currently, existing multimodal segmentation methods face limitations in effectively leveraging medical text to guide visual feature learning. They often suffer from insufficient multimodal fusion and inadequate accuracy in fine-grained lesion segmentation accuracy. To address these challenges, the Vision–Text Multimodal Feature Learning V Network [...] Read more.
Currently, existing multimodal segmentation methods face limitations in effectively leveraging medical text to guide visual feature learning. They often suffer from insufficient multimodal fusion and inadequate accuracy in fine-grained lesion segmentation accuracy. To address these challenges, the Vision–Text Multimodal Feature Learning V Network (VT-MFLV) is proposed. This model exploits the complementarity between medical images and text to enhance multimodal fusion, which consequently improves critical lesion recognition accuracy. VT-MFLV introduces three key modules: Diagnostic Image–Text Residual Multi-Head Semantic Encoding (DIT-RMHSE) module that preserves critical semantic cues while reducing preprocessing complexity; Fine-Grained Multimodal Fusion Local Attention Encoding (FG-MFLA) module that strengthens local cross-modal interaction; and Adaptive Global Feature Compression and Focusing (AGCF) module that emphasizes clinically relevant lesion regions. Experiments are conducted on two publicly available pulmonary infection datasets. On the MosMedData dataset, VT-MFLV achieved Dice and mIoU scores of 75.61 ± 0.32% and 63.98 ± 0.29%. On the QaTa-COV1 dataset, VT-MFLV achieved Dice and mIoU scores of 83.34 ± 0.36% and 72.09 ± 0.30%, both reaching world-leading levels. Full article
Show Figures

Figure 1

22 pages, 2100 KB  
Article
Abrupt Change Detection of ECG by Spiking Neural Networks: Policy-Aware Operating Points for Edge-Level MI Screening
by Youngseok Lee
Appl. Sci. 2025, 15(22), 12210; https://doi.org/10.3390/app152212210 - 18 Nov 2025
Viewed by 472
Abstract
Electrocardiogram (ECG) monitoring on low-power edge devices requires models that balance accuracy, latency, and energy consumption. This study evaluates abrupt change detection in ECG using spiking neural networks (SNNs) trained on spike-encoded signals that preserve salient cardiac dynamics. This study used 4910 ECG [...] Read more.
Electrocardiogram (ECG) monitoring on low-power edge devices requires models that balance accuracy, latency, and energy consumption. This study evaluates abrupt change detection in ECG using spiking neural networks (SNNs) trained on spike-encoded signals that preserve salient cardiac dynamics. This study used 4910 ECG segments from 290 subjects (PTB Diagnostic Database; 2.5-s windows at 1 kHz), providing context for the reported results. Under a unified architecture, preprocessing pipeline, and training schedule, we compare two representative neuron models—leaky integrate-and-fire (LIF) and adaptive exponential integrate-and-fire (AdEx). We report balanced accuracy, sensitivity, inference latency, and an energy proxy based on spike-event counts, and we examine robustness to input noise and temporal distortions. Across operating points, AdEx yields the highest overall accuracy and sensitivity, whereas LIF achieves the lowest energy cost and shortest latency, favoring deployment on resource-constrained hardware. Both SNN variants substantially reduce computational events—hence estimated energy—relative to conventional artificial neural network baselines, supporting their suitability for real-time, on-device diagnostics. These findings provide practical guidance for selecting neuron dynamics and decision thresholds to meet target accuracy–sensitivity trade-offs under energy and latency budgets. Overall, combining spike-encoded ECG with appropriately chosen SNN dynamics enables reliable abrupt change detection with notable efficiency gains, offering a path toward scalable edge-level cardiovascular monitoring. While lightweight CNNs and shallow transformers are important references, to keep the scope focused on SNN design choices and policy-aware thresholding for edge constraints, we refrain from reporting additional ANN numbers here. A seed-controlled head-to-head benchmark is reserved for future work. Full article
(This article belongs to the Special Issue Research on Artificial Intelligence in Healthcare)
Show Figures

Figure 1

32 pages, 13451 KB  
Article
Hybrid State–Space and Vision Transformer Framework for Fetal Ultrasound Plane Classification in Prenatal Diagnostics
by Sara Tehsin, Hend Alshaya, Wided Bouchelligua and Inzamam Mashood Nasir
Diagnostics 2025, 15(22), 2879; https://doi.org/10.3390/diagnostics15222879 - 13 Nov 2025
Viewed by 509
Abstract
Background and Objective: Accurate classification of standard fetal ultrasound planes is a critical step in prenatal diagnostics, enabling reliable biometric measurements and anomaly detection. Conventional deep learning approaches, particularly convolutional neural networks (CNNs) and transformers, often face challenges such as domain variability, [...] Read more.
Background and Objective: Accurate classification of standard fetal ultrasound planes is a critical step in prenatal diagnostics, enabling reliable biometric measurements and anomaly detection. Conventional deep learning approaches, particularly convolutional neural networks (CNNs) and transformers, often face challenges such as domain variability, noise artifacts, class imbalance, and poor calibration, which limit their clinical utility. This study proposes a hybrid state–space and vision transformer framework designed to address these limitations by integrating sequential dynamics and global contextual reasoning. Methods: The proposed framework comprises five stages: (i) preprocessing for ultrasound harmonization using intensity normalization, anisotropic diffusion filtering, and affine alignment; (ii) hybrid feature encoding with a state–space model (SSM) for sequential dependency modeling and a vision transformer (ViT) for global self-attention; (iii) multi-task learning (MTL) with anatomical regularization leveraging classification, segmentation, and biometric regression objectives; (iv) gated decision fusion for balancing local sequential and global contextual features; and (v) calibration strategies using temperature scaling and entropy regularization to ensure reliable confidence estimation. The framework was comprehensively evaluated on three publicly available datasets: FETAL_PLANES_DB, HC18, and a large-scale fetal head dataset. Results: The hybrid framework consistently outperformed baseline CNN, SSM-only, and ViT-only models across all tasks. On FETAL_PLANES_DB, it achieved an accuracy of 95.8%, a macro-F1 of 94.9%, and an ECE of 1.5%. On the Fetal Head dataset, the model achieved 94.1% accuracy and a macro-F1 score of 92.8%, along with superior calibration metrics. For HC18, it achieved a Dice score of 95.7%, an IoU of 91.7%, and a mean absolute error of 2.30 mm for head circumference estimation. Cross-dataset evaluations confirmed the model’s robustness and generalization capability. Ablation studies further demonstrated the critical role of SSM, ViT, fusion gating, and anatomical regularization in achieving optimal performance. Conclusions: By combining state–space dynamics and transformer-based global reasoning, the proposed framework delivers accurate, calibrated, and clinically meaningful predictions for fetal ultrasound plane classification and biometric estimation. The results highlight its potential for deployment in real-time prenatal screening and diagnostic systems. Full article
(This article belongs to the Special Issue Advances in Fetal Imaging)
Show Figures

Figure 1

18 pages, 3175 KB  
Article
AudioFakeNet: A Model for Reliable Speaker Verification in Deepfake Audio
by Samia Dilbar, Muhammad Ali Qureshi, Serosh Karim Noon and Abdul Mannan
Algorithms 2025, 18(11), 716; https://doi.org/10.3390/a18110716 - 13 Nov 2025
Viewed by 689
Abstract
Deepfake audio refers to the generation of voice recordings using deep neural networks that replicate a specific individual’s voice, often for deceptive or fraud purposes. Although this has been an area of research for quite some time, deepfakes still pose substantial challenges for [...] Read more.
Deepfake audio refers to the generation of voice recordings using deep neural networks that replicate a specific individual’s voice, often for deceptive or fraud purposes. Although this has been an area of research for quite some time, deepfakes still pose substantial challenges for reliable true speaker authentication. To address the issue, we propose AudioFakeNet, a hybrid deep learning architecture that use Convolutional Neural Networks (CNNs) along with Long Short-Term Memory (LSTM) units, and Multi-Head Attention (MHA) mechanisms for robust deepfake detection. CNN extracts spatial and spectral features, LSTM captures temporal dependencies, and MHA enhances to focus on informative audio segments. The model is trained using Mel-Frequency Cepstral Coefficients (MFCCs) from the publicly available dataset and was validated on self-collected dataset, ensuring reproducibility. Performance comparisons with state-of-the-art machine learning and deep learning models show that our proposed AudioFakeNet achieves higher accuracy, better generalization, and lower Equal Error Rate (EER). Its modular design allows for broader adaptability in fake-audio detection tasks, offering significant potential across diverse speech synthesis applications. Full article
(This article belongs to the Section Algorithms for Multidisciplinary Applications)
Show Figures

Figure 1

37 pages, 4859 KB  
Review
Eyes of the Future: Decoding the World Through Machine Vision
by Svetlana N. Khonina, Nikolay L. Kazanskiy, Ivan V. Oseledets, Roman M. Khabibullin and Artem V. Nikonorov
Technologies 2025, 13(11), 507; https://doi.org/10.3390/technologies13110507 - 7 Nov 2025
Viewed by 2481
Abstract
Machine vision (MV) is reshaping numerous industries by giving machines the ability to understand what they “see” and respond without human intervention. This review brings together the latest developments in deep learning (DL), image processing, and computer vision (CV). It focuses on how [...] Read more.
Machine vision (MV) is reshaping numerous industries by giving machines the ability to understand what they “see” and respond without human intervention. This review brings together the latest developments in deep learning (DL), image processing, and computer vision (CV). It focuses on how these technologies are being applied in real operational environments. We examine core methodologies such as feature extraction, object detection, image segmentation, and pattern recognition. These techniques are accelerating innovation in key sectors, including healthcare, manufacturing, autonomous systems, and security. A major emphasis is placed on the deepening integration of artificial intelligence (AI) and machine learning (ML) into MV. We particularly consider the impact of convolutional neural networks (CNNs), generative adversarial networks (GANs), and transformer architectures on the evolution of visual recognition capabilities. Beyond surveying advances, this review also takes a hard look at the field’s persistent roadblocks, above all the scarcity of high-quality labeled data, the heavy computational load of modern models, and the unforgiving time limits imposed by real-time vision applications. In response to these challenges, we examine a range of emerging fixes: leaner algorithms, purpose-built hardware (like vision processing units and neuromorphic chips), and smarter ways to label or synthesize data that sidestep the need for massive manual operations. What distinguishes this paper, however, is its emphasis on where MV is headed next. We spotlight nascent directions, including edge-based processing that moves intelligence closer to the sensor, early explorations of quantum methods for visual tasks, and hybrid AI systems that fuse symbolic reasoning with DL, not as speculative futures but as tangible pathways already taking shape. Ultimately, the goal is to connect cutting-edge research with actual deployment scenarios, offering a grounded, actionable guide for those working at the front lines of MV today. Full article
(This article belongs to the Section Information and Communication Technologies)
Show Figures

Figure 1

16 pages, 2708 KB  
Article
Comparing Handcrafted Radiomics Versus Latent Deep Learning Features of Admission Head CT for Hemorrhagic Stroke Outcome Prediction
by Anh T. Tran, Junhao Wen, Gaby Abou Karam, Dorin Zeevi, Adnan I. Qureshi, Ajay Malhotra, Shahram Majidi, Niloufar Valizadeh, Santosh B. Murthy, Mert R. Sabuncu, David Roh, Guido J. Falcone, Kevin N. Sheth and Seyedmehdi Payabvash
BioTech 2025, 14(4), 87; https://doi.org/10.3390/biotech14040087 - 2 Nov 2025
Viewed by 697
Abstract
Handcrafted radiomics use predefined formulas to extract quantitative features from medical images, whereas deep neural networks learn de novo features through iterative training. We compared these approaches for predicting 3-month outcomes and hematoma expansion from admission non-contrast head CT in acute intracerebral hemorrhage [...] Read more.
Handcrafted radiomics use predefined formulas to extract quantitative features from medical images, whereas deep neural networks learn de novo features through iterative training. We compared these approaches for predicting 3-month outcomes and hematoma expansion from admission non-contrast head CT in acute intracerebral hemorrhage (ICH). Training and cross-validation were performed using a multicenter trial cohort (n = 866), with external validation on a single-center dataset (n = 645). We trained multiscale U-shaped segmentation models for hematoma segmentation and extracted (i) radiomics from the segmented lesions and (ii) two latent deep feature sets—from the segmentation encoder and a generative autoencoder trained on dilated lesion patches. Features were reduced with unsupervised Non-Negative Matrix Factorization (NMF) to 128 per set and used—alone or in combination—for six machine-learning classifiers to predict 3-month clinical outcomes and (>3, >6, >9 mL) hematoma expansion thresholds. The addition of latent deep features to radiomics numerically increased model prediction performance for 3-month outcomes and hematoma expansion using Random Forest, XGBoost, Extra Trees, or Elastic Net classifiers; however, the improved accuracy only reached statistical significance in predicting >3 mL hematoma expansion. Clinically, these consistent but modest increases in prediction performance may improve risk stratification at the individual level. Nevertheless, the latent deep features show potential for extracting additional clinically relevant information from admission head CT for prognostication in hemorrhagic stroke. Full article
(This article belongs to the Special Issue Advances in Bioimaging Technology)
Show Figures

Figure 1

25 pages, 5836 KB  
Article
MRSliceNet: Multi-Scale Recursive Slice and Context Fusion Network for Instance Segmentation of Leaves from Plant Point Clouds
by Shan Liu, Guangshuai Wang, Hongbin Fang, Min Huang, Tengping Jiang and Yongjun Wang
Plants 2025, 14(21), 3349; https://doi.org/10.3390/plants14213349 - 31 Oct 2025
Viewed by 545
Abstract
Plant phenotyping plays a vital role in connecting genotype to environmental adaptability, with important applications in crop breeding and precision agriculture. Traditional leaf measurement methods are laborious and destructive, while modern 3D sensing technologies like LiDAR provide high-resolution point clouds but face challenges [...] Read more.
Plant phenotyping plays a vital role in connecting genotype to environmental adaptability, with important applications in crop breeding and precision agriculture. Traditional leaf measurement methods are laborious and destructive, while modern 3D sensing technologies like LiDAR provide high-resolution point clouds but face challenges in automatic leaf segmentation due to occlusion, geometric similarity, and uneven point density. To address these challenges, we propose MRSliceNet, an end-to-end deep learning framework inspired by human visual cognition. The network integrates three key components: a Multi-scale Recursive Slicing Module (MRSM) for detailed local feature extraction, a Context Fusion Module (CFM) that combines local and global features through attention mechanisms, and an Instance-Aware Clustering Head (IACH) that generates discriminative embeddings for precise instance separation. Extensive experiments on two challenging datasets show that our method establishes new state-of-the-art performance, achieving AP of 55.04%/53.78%, AP50 of 65.37%/64.00%, and AP25 of 74.68%/73.45% on Dataset A and Dataset B, respectively. The proposed framework not only produces clear boundaries and reliable instance identification but also provides an effective solution for automated plant phenotyping, as evidenced by its successful implementation in real-world agricultural research pipelines. Full article
Show Figures

Figure 1

33 pages, 9021 KB  
Article
SLA-Net: A Novel Sea–Land Aware Network for Accurate SAR Ship Detection Guided by Hierarchical Attention Mechanism
by Han Ke, Xiao Ke, Zishuo Zhang, Xiangyu Chen, Xiaowo Xu and Tianwen Zhang
Remote Sens. 2025, 17(21), 3576; https://doi.org/10.3390/rs17213576 - 29 Oct 2025
Cited by 3 | Viewed by 889
Abstract
In recent years, deep learning (DL)-based synthetic aperture radar (SAR) ship detection has made significant strides. However, many existing DL-based SAR ship detection methods treat sea regions and land regions equally, failing to be fully aware of the differences between the two regions [...] Read more.
In recent years, deep learning (DL)-based synthetic aperture radar (SAR) ship detection has made significant strides. However, many existing DL-based SAR ship detection methods treat sea regions and land regions equally, failing to be fully aware of the differences between the two regions during training and testing. This oversight may prevent the network’s attention from fully concentrating on valuable regions (i.e., sea regions and ship regions), thereby adversely affecting overall detection accuracy. To address these issues, we propose the Sea–Land Aware Net (SLA-Net), which introduces a novel SLA Hierarchical Attention mechanism to gradually focus the network’s attention on sea and ship regions across different stages. Specifically, SLA-Net instantiates the SLA Hierarchical Attention mechanism through three components: the SLA Sea-Attention Backbone, which incorporates sea attention in the feature extraction stage; the SLA Ship-Attention FPN, which implements ship attention in the feature fusion stage; and the SLA Ship-Attention Detection Heads, which enforce ship attention in the detection refinement stage. Moreover, to tackle the lack of sea–land priors in the community working on DL-based SAR ship detection, we introduce the sea–land segmentation dataset for SSDD (SL-SSDD). Built upon the well-established SAR ship detection dataset (SSDD), it serves as a synergistic dataset for ship detection when used in conjunction with SSDD. Quantitative experimental results on SSDD and generalization results on HRSID and LS-SSDD demonstrate that SLA-Net achieves superior SAR ship detection performance compared to other methods. Furthermore, SL-SSDD, which contains sea–land segmentation information, can provide a new perspective for the community working on DL-based SAR ship detection. Full article
(This article belongs to the Special Issue Advances in Miniaturized Radar Systems for Close-Range Sensing)
Show Figures

Figure 1

26 pages, 1737 KB  
Article
ECG-CBA: An End-to-End Deep Learning Model for ECG Anomaly Detection Using CNN, Bi-LSTM, and Attention Mechanism
by Khalid Ammar, Salam Fraihat, Ghazi Al-Naymat and Yousef Sanjalawe
Algorithms 2025, 18(11), 674; https://doi.org/10.3390/a18110674 - 22 Oct 2025
Cited by 2 | Viewed by 1049
Abstract
The electrocardiogram (ECG) is a vital diagnostic tool used to monitor heart activity and detect cardiac abnormalities, such as arrhythmias. Accurate classification of normal and abnormal heartbeats is essential for effective diagnosis and treatment. Traditional deep learning methods for automated ECG classification primarily [...] Read more.
The electrocardiogram (ECG) is a vital diagnostic tool used to monitor heart activity and detect cardiac abnormalities, such as arrhythmias. Accurate classification of normal and abnormal heartbeats is essential for effective diagnosis and treatment. Traditional deep learning methods for automated ECG classification primarily focus on reconstructing the original ECG signal and detecting anomalies based on reconstruction errors, which represent abnormal features. However, these approaches struggle with unseen or underrepresented abnormalities in the training data. In addition, other methods rely on manual feature extraction, which can introduce bias and limit their adaptability to new datasets. To overcome this problem, this study proposes an end-to-end model called ECG-CBA, which integrates the convolutional neural networks (CNNs), bidirectional long short-term memory networks (Bi-LSTM), and a multi-head Attention mechanism. ECG-CBA model learns discriminative features directly from the original dataset rather than relying on feature extraction or signal reconstruction. This enables higher accuracy and reliability in detecting and classifying anomalies. The CNN extracts local spatial features from raw ECG signals, while the Bi-LSTM captures the temporal dependencies in sequential data. An attention mechanism enables the model to primarily focus on critical segments of the ECG, thereby improving classification performance. The proposed model is trained on normal and abnormal ECG signals for binary classification. The ECG-CBA model demonstrates strong performance on the ECG5000 and MIT-BIH datasets, achieving accuracies of 99.60% and 98.80%, respectively. The model surpasses traditional methods across key metrics, including sensitivity, specificity, and overall classification accuracy. This offers a robust and interpretable solution for both ECG-based anomaly detection and cardiac abnormality classification. Full article
Show Figures

Figure 1

20 pages, 5553 KB  
Article
An Improved Instance Segmentation Approach for Solid Waste Retrieval with Precise Edge from UAV Images
by Yaohuan Huang and Zhuo Chen
Remote Sens. 2025, 17(20), 3410; https://doi.org/10.3390/rs17203410 - 11 Oct 2025
Viewed by 598
Abstract
As a major contributor to environmental pollution in recent years, solid waste has become an increasingly significant concern in the realm of sustainable development. Unmanned Aerial Vehicle (UAV) imagery, known for its high spatial resolution, has become a valuable data source for solid [...] Read more.
As a major contributor to environmental pollution in recent years, solid waste has become an increasingly significant concern in the realm of sustainable development. Unmanned Aerial Vehicle (UAV) imagery, known for its high spatial resolution, has become a valuable data source for solid waste detection. However, manually interpreting solid waste in UAV images is inefficient, and object detection methods encounter serious challenges due to the patchy distribution, varied textures and colors, and fragmented edges of solid waste. In this study, we proposed an improved instance segmentation approach called Watershed Mask Network for Solid Waste (WMNet-SW) to accurately retrieve solid waste with precise edges from UAV images. This approach combined the well-established Mask R-CNN segmentation framework with the watershed transform edge detection algorithm. The benchmark Mask R-CNN was improved by optimizing the anchor size and Region of Interest (RoI) and integrating a new mask head of Layer Feature Aggregation (LFA) to initially detect solid waste. Subsequently, edges of the detected solid waste were precisely adjusted by overlaying the segments generated by the watershed transform algorithm. Experimental results show that WMNet-SW significantly enhances the performance of Mask R-CNN in solid waste retrieval, increasing the average precision from 36.91% to 58.10%, F1-score from 0.5 to 0.65, and AP from 63.04% to 64.42%. Furthermore, our method efficiently detects the details of solid waste edges, even overcoming the limitations of training Ground Truth (GT). This study provides a solution for retrieving solid waste with precise edges from UAV images, thereby contributing to the protection of the regional environment and ecosystem health. Full article
(This article belongs to the Section Environmental Remote Sensing)
Show Figures

Figure 1

Back to TopTop