MDPI - Publisher of Open Access Journals

18 pages, 4165 KiB

Open AccessArticle

Localization and Pixel-Confidence Network for Surface Defect Segmentation

by Yueyou Wang, Zixuan Xu, Li Mei, Ruiqing Guo, Jing Zhang, Tingbo Zhang and Hongqi Liu

Sensors 2025, 25(15), 4548; https://doi.org/10.3390/s25154548 - 23 Jul 2025

Surface defect segmentation based on deep learning has been widely applied in industrial inspection. However, two major challenges persist in specific application scenarios: first, the imbalanced area distribution between defects and the background leads to degraded segmentation performance; second, fine gaps within defects [...] Read more.

Surface defect segmentation based on deep learning has been widely applied in industrial inspection. However, two major challenges persist in specific application scenarios: first, the imbalanced area distribution between defects and the background leads to degraded segmentation performance; second, fine gaps within defects are prone to over-segmentation. To address these issues, this study proposes a two-stage image segmentation network that integrates a Defect Localization Module and a Pixel Confidence Module. In the first stage, the Defect Localization Module performs a coarse localization of defect regions and embeds the resulting feature vectors into the backbone of the second stage. In the second stage, the Pixel Confidence Module captures the probabilistic distribution of neighboring pixels, thereby refining the initial predictions. Experimental results demonstrate that the improved network achieves gains of

1.58 % \pm 0.80 %

in

m P A

,

1.35 % \pm 0.77 %

in

m I o U

on the self-built Carbon Fabric Defect Dataset and

2.66 % \pm 1.12 %

in

m P A

,

1.44 % \pm 0.79 %

in

m I o U

on the public Magnetic Tile Defect Dataset compared to the other network. These enhancements translate to more reliable automated quality assurance in industrial production environments. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

► Show Figures

Figure 1

26 pages, 78396 KiB

Open AccessArticle

SWRD–YOLO: A Lightweight Instance Segmentation Model for Estimating Rice Lodging Degree in UAV Remote Sensing Images with Real-Time Edge Deployment

by Chunyou Guo and Feng Tan

Agriculture 2025, 15(15), 1570; https://doi.org/10.3390/agriculture15151570 - 22 Jul 2025

Abstract

Rice lodging severely affects crop growth, yield, and mechanized harvesting efficiency. The accurate detection and quantification of lodging areas are crucial for precision agriculture and timely field management. However, Unmanned Aerial Vehicle (UAV)-based lodging detection faces challenges such as complex backgrounds, variable lighting, [...] Read more.

Rice lodging severely affects crop growth, yield, and mechanized harvesting efficiency. The accurate detection and quantification of lodging areas are crucial for precision agriculture and timely field management. However, Unmanned Aerial Vehicle (UAV)-based lodging detection faces challenges such as complex backgrounds, variable lighting, and irregular lodging patterns. To address these issues, this study proposes SWRD–YOLO, a lightweight instance segmentation model that enhances feature extraction and fusion using advanced convolution and attention mechanisms. The model employs an optimized loss function to improve localization accuracy, achieving precise lodging area segmentation. Additionally, a grid-based lodging ratio estimation method is introduced, dividing images into fixed-size grids to calculate local lodging proportions and aggregate them for robust overall severity assessment. Evaluated on a self-built rice lodging dataset, the model achieves 94.8% precision, 88.2% recall, 93.3% mAP@0.5, and 91.4% F1 score, with real-time inference at 16.15 FPS on an embedded NVIDIA Jetson Orin NX device. Compared to the baseline YOLOv8n-seg, precision, recall, mAP@0.5, and F1 score improved by 8.2%, 16.5%, 12.8%, and 12.8%, respectively. These results confirm the model’s effectiveness and potential for deployment in intelligent crop monitoring and sustainable agriculture. Full article

(This article belongs to the Topic Intelligent Agriculture: Perception Technologies and Agricultural Equipment for Crop Production Processes)

► Show Figures

Figure 1

17 pages, 1927 KiB

Open AccessArticle

ConvTransNet-S: A CNN-Transformer Hybrid Disease Recognition Model for Complex Field Environments

by Shangyun Jia, Guanping Wang, Hongling Li, Yan Liu, Linrong Shi and Sen Yang

Plants 2025, 14(15), 2252; https://doi.org/10.3390/plants14152252 - 22 Jul 2025

Viewed by 38

Abstract

To address the challenges of low recognition accuracy and substantial model complexity in crop disease identification models operating in complex field environments, this study proposed a novel hybrid model named ConvTransNet-S, which integrates Convolutional Neural Networks (CNNs) and transformers for crop disease identification [...] Read more.

To address the challenges of low recognition accuracy and substantial model complexity in crop disease identification models operating in complex field environments, this study proposed a novel hybrid model named ConvTransNet-S, which integrates Convolutional Neural Networks (CNNs) and transformers for crop disease identification tasks. Unlike existing hybrid approaches, ConvTransNet-S uniquely introduces three key innovations: First, a Local Perception Unit (LPU) and Lightweight Multi-Head Self-Attention (LMHSA) modules were introduced to synergistically enhance the extraction of fine-grained plant disease details and model global dependency relationships, respectively. Second, an Inverted Residual Feed-Forward Network (IRFFN) was employed to optimize the feature propagation path, thereby enhancing the model’s robustness against interferences such as lighting variations and leaf occlusions. This novel combination of a LPU, LMHSA, and an IRFFN achieves a dynamic equilibrium between local texture perception and global context modeling—effectively resolving the trade-offs inherent in standalone CNNs or transformers. Finally, through a phased architecture design, efficient fusion of multi-scale disease features is achieved, which enhances feature discriminability while reducing model complexity. The experimental results indicated that ConvTransNet-S achieved a recognition accuracy of 98.85% on the PlantVillage public dataset. This model operates with only 25.14 million parameters, a computational load of 3.762 GFLOPs, and an inference time of 7.56 ms. Testing on a self-built in-field complex scene dataset comprising 10,441 images revealed that ConvTransNet-S achieved an accuracy of 88.53%, which represents improvements of 14.22%, 2.75%, and 0.34% over EfficientNetV2, Vision Transformer, and Swin Transformer, respectively. Furthermore, the ConvTransNet-S model achieved up to 14.22% higher disease recognition accuracy under complex background conditions while reducing the parameter count by 46.8%. This confirms that its unique multi-scale feature mechanism can effectively distinguish disease from background features, providing a novel technical approach for disease diagnosis in complex agricultural scenarios and demonstrating significant application value for intelligent agricultural management. Full article

(This article belongs to the Section Plant Modeling)

► Show Figures

Figure 1

17 pages, 1738 KiB

Open AccessArticle

Multimodal Fusion Multi-Task Learning Network Based on Federated Averaging for SDB Severity Diagnosis

by Songlu Lin, Renzheng Tang, Yuzhe Wang and Zhihong Wang

Appl. Sci. 2025, 15(14), 8077; https://doi.org/10.3390/app15148077 - 20 Jul 2025

Viewed by 330

Abstract

Accurate sleep staging and sleep-disordered breathing (SDB) severity prediction are critical for the early diagnosis and management of sleep disorders. However, real-world polysomnography (PSG) data often suffer from modality heterogeneity, label scarcity, and non-independent and identically distributed (non-IID) characteristics across institutions, posing significant [...] Read more.

Accurate sleep staging and sleep-disordered breathing (SDB) severity prediction are critical for the early diagnosis and management of sleep disorders. However, real-world polysomnography (PSG) data often suffer from modality heterogeneity, label scarcity, and non-independent and identically distributed (non-IID) characteristics across institutions, posing significant challenges for model generalization and clinical deployment. To address these issues, we propose a federated multi-task learning (FMTL) framework that simultaneously performs sleep staging and SDB severity classification from seven multimodal physiological signals, including EEG, ECG, respiration, etc. The proposed framework is built upon a hybrid deep neural architecture that integrates convolutional layers (CNN) for spatial representation, bidirectional GRUs for temporal modeling, and multi-head self-attention for long-range dependency learning. A shared feature extractor is combined with task-specific heads to enable joint diagnosis, while the FedAvg algorithm is employed to facilitate decentralized training across multiple institutions without sharing raw data, thereby preserving privacy and addressing non-IID challenges. We evaluate the proposed method across three public datasets (APPLES, SHHS, and HMC) treated as independent clients. For sleep staging, the model achieves accuracies of 85.3% (APPLES), 87.1% (SHHS_rest), and 79.3% (HMC), with Cohen’s Kappa scores exceeding 0.71. For SDB severity classification, it obtains macro-F1 scores of 77.6%, 76.4%, and 79.1% on APPLES, SHHS_rest, and HMC, respectively. These results demonstrate that our unified FMTL framework effectively leverages multimodal PSG signals and federated training to deliver accurate and scalable sleep disorder assessment, paving the way for the development of a privacy-preserving, generalizable, and clinically applicable digital sleep monitoring system. Full article

(This article belongs to the Special Issue Machine Learning in Biomedical Applications)

► Show Figures

Figure 1

28 pages, 19790 KiB

Open AccessArticle

HSF-DETR: A Special Vehicle Detection Algorithm Based on Hypergraph Spatial Features and Bipolar Attention

by Kaipeng Wang, Guanglin He and Xinmin Li

Sensors 2025, 25(14), 4381; https://doi.org/10.3390/s25144381 - 13 Jul 2025

Viewed by 343

Abstract

Special vehicle detection in intelligent surveillance, emergency rescue, and reconnaissance faces significant challenges in accuracy and robustness under complex environments, necessitating advanced detection algorithms for critical applications. This paper proposes HSF-DETR (Hypergraph Spatial Feature DETR), integrating four innovative modules: a Cascaded Spatial Feature [...] Read more.

Special vehicle detection in intelligent surveillance, emergency rescue, and reconnaissance faces significant challenges in accuracy and robustness under complex environments, necessitating advanced detection algorithms for critical applications. This paper proposes HSF-DETR (Hypergraph Spatial Feature DETR), integrating four innovative modules: a Cascaded Spatial Feature Network (CSFNet) backbone with Cross-Efficient Convolutional Gating (CECG) for enhanced long-range detection through hybrid state-space modeling; a Hypergraph-Enhanced Spatial Feature Modulation (HyperSFM) network utilizing hypergraph structures for high-order feature correlations and adaptive multi-scale fusion; a Dual-Domain Feature Encoder (DDFE) combining Bipolar Efficient Attention (BEA) and Frequency-Enhanced Feed-Forward Network (FEFFN) for precise feature weight allocation; and a Spatial-Channel Fusion Upsampling Block (SCFUB) improving feature fidelity through depth-wise separable convolution and channel shift mixing. Experiments conducted on a self-built special vehicle dataset containing 2388 images demonstrate that HSF-DETR achieves mAP50 and mAP50-95 of 96.6% and 70.6%, respectively, representing improvements of 3.1% and 4.6% over baseline RT-DETR while maintaining computational efficiency at 59.7 GFLOPs and 18.07 M parameters. Cross-domain validation on VisDrone2019 and BDD100K datasets confirms the method’s generalization capability and robustness across diverse scenarios, establishing HSF-DETR as an effective solution for special vehicle detection in complex environments. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

19 pages, 709 KiB

Open AccessArticle

Fusion of Multimodal Spatio-Temporal Features and 3D Deformable Convolution Based on Sign Language Recognition in Sensor Networks

by Qian Zhou, Hui Li, Weizhi Meng, Hua Dai, Tianyu Zhou and Guineng Zheng

Sensors 2025, 25(14), 4378; https://doi.org/10.3390/s25144378 - 13 Jul 2025

Viewed by 239

Abstract

Sign language is a complex and dynamic visual language that requires the coordinated movement of various body parts, such as the hands, arms, and limbs—making it an ideal application domain for sensor networks to capture and interpret human gestures accurately. To address the [...] Read more.

Sign language is a complex and dynamic visual language that requires the coordinated movement of various body parts, such as the hands, arms, and limbs—making it an ideal application domain for sensor networks to capture and interpret human gestures accurately. To address the intricate task of precise and expedient SLR from raw videos, this study introduces a novel deep learning approach by devising a multimodal framework for SLR. Specifically, feature extraction models are built based on two modalities: skeleton and RGB images. In this paper, we firstly propose a Multi-Stream Spatio-Temporal Graph Convolutional Network (MSGCN) that relies on three modules: a decoupling graph convolutional network, a self-emphasizing temporal convolutional network, and a spatio-temporal joint attention module. These modules are combined to capture the spatio-temporal information in multi-stream skeleton features. Secondly, we propose a 3D ResNet model based on deformable convolution (D-ResNet) to model complex spatial and temporal sequences in the original raw images. Finally, a gating mechanism-based Multi-Stream Fusion Module (MFM) is employed to merge the results of the two modalities. Extensive experiments are conducted on the public datasets AUTSL and WLASL, achieving competitive results compared to state-of-the-art systems. Full article

(This article belongs to the Special Issue Intelligent Sensing and Artificial Intelligence for Image Processing)

► Show Figures

Figure 1

23 pages, 88853 KiB

Open AccessArticle

RSW-YOLO: A Vehicle Detection Model for Urban UAV Remote Sensing Images

by Hao Wang, Jiapeng Shang, Xinbo Wang, Qingqi Zhang, Xiaoli Wang, Jie Li and Yan Wang

Sensors 2025, 25(14), 4335; https://doi.org/10.3390/s25144335 - 11 Jul 2025

Viewed by 436

Abstract

Vehicle detection in remote sensing images faces significant challenges due to small object sizes, scale variation, and cluttered backgrounds. To address these issues, we propose RSW-YOLO, an enhanced detection model built upon the YOLOv8n framework, designed to improve feature extraction and robustness against [...] Read more.

Vehicle detection in remote sensing images faces significant challenges due to small object sizes, scale variation, and cluttered backgrounds. To address these issues, we propose RSW-YOLO, an enhanced detection model built upon the YOLOv8n framework, designed to improve feature extraction and robustness against environmental noise. A Restormer module is incorporated into the backbone to model long-range dependencies via self-attention, enabling better handling of multi-scale features and complex scenes. A dedicated detection head is introduced for small objects, focusing on critical channels while suppressing irrelevant information. Additionally, the original CIoU loss is replaced with WIoU, which dynamically reweights predicted boxes based on their quality, enhancing localization accuracy and stability. Experimental results on the DJCAR dataset show mAP@0.5 and mAP@0.5:0.95 improvements of 5.4% and 6.2%, respectively, and corresponding gains of 4.3% and 2.6% on the VisDrone dataset. These results demonstrate that RSW-YOLO offers a robust and accurate solution for UAV-based vehicle detection, particularly in urban scenes with dense or small targets. Full article

(This article belongs to the Section Sensors and Robotics)

► Show Figures

Figure 1

19 pages, 14033 KiB

Open AccessArticle

SCCA-YOLO: Spatial Channel Fusion and Context-Aware YOLO for Lunar Crater Detection

by Jiahao Tang, Boyuan Gu, Tianyou Li and Ying-Bo Lu

Remote Sens. 2025, 17(14), 2380; https://doi.org/10.3390/rs17142380 - 10 Jul 2025

Viewed by 328

Abstract

Lunar crater detection plays a crucial role in geological analysis and the advancement of lunar exploration. Accurate identification of craters is also essential for constructing high-resolution topographic maps and supporting mission planning in future lunar exploration efforts. However, lunar craters often suffer from [...] Read more.

Lunar crater detection plays a crucial role in geological analysis and the advancement of lunar exploration. Accurate identification of craters is also essential for constructing high-resolution topographic maps and supporting mission planning in future lunar exploration efforts. However, lunar craters often suffer from insufficient feature representation due to their small size and blurred boundaries. In addition, the visual similarity between craters and surrounding terrain further exacerbates background confusion. These challenges significantly hinder detection performance in remote sensing imagery and underscore the necessity of enhancing both local feature representation and global semantic reasoning. In this paper, we propose a novel Spatial Channel Fusion and Context-Aware YOLO (SCCA-YOLO) model built upon the YOLO11 framework. Specifically, the Context-Aware Module (CAM) employs a multi-branch dilated convolutional structure to enhance feature richness and expand the local receptive field, thereby strengthening the feature extraction capability. The Joint Spatial and Channel Fusion Module (SCFM) is utilized to fuse spatial and channel information to model the global relationships between craters and the background, effectively suppressing background noise and reinforcing feature discrimination. In addition, the improved Channel Attention Concatenation (CAC) strategy adaptively learns channel-wise importance weights during feature concatenation, further optimizing multi-scale semantic feature fusion and enhancing the model’s sensitivity to critical crater features. The proposed method is validated on a self-constructed Chang’e 6 dataset, covering the landing site and its surrounding areas. Experimental results demonstrate that our model achieves an

m A P_{0.5}

of 96.5% and an

m A P_{0.5 : 0.95}

of 81.5%, outperforming other mainstream detection models including the YOLO family of algorithms. These findings highlight the potential of SCCA-YOLO for high-precision lunar crater detection and provide valuable insights into future lunar surface analysis. Full article

► Show Figures

Figure 1

26 pages, 3670 KiB

Open AccessArticle

Video Instance Segmentation Through Hierarchical Offset Compensation and Temporal Memory Update for UAV Aerial Images

by Ying Huang, Yinhui Zhang, Zifen He and Yunnan Deng

Sensors 2025, 25(14), 4274; https://doi.org/10.3390/s25144274 - 9 Jul 2025

Viewed by 198

Abstract

Despite the pivotal role of unmanned aerial vehicles (UAVs) in intelligent inspection tasks, existing video instance segmentation methods struggle with irregular deforming targets, leading to inconsistent segmentation results due to ineffective feature offset capture and temporal correlation modeling. To address this issue, we [...] Read more.

Despite the pivotal role of unmanned aerial vehicles (UAVs) in intelligent inspection tasks, existing video instance segmentation methods struggle with irregular deforming targets, leading to inconsistent segmentation results due to ineffective feature offset capture and temporal correlation modeling. To address this issue, we propose a hierarchical offset compensation and temporal memory update method for video instance segmentation (HT-VIS) with a high generalization ability. Firstly, a hierarchical offset compensation (HOC) module in the form of a sequential and parallel connection is designed to perform deformable offset for the same flexible target across frames, which benefits from compensating for spatial motion features at the time sequence. Next, the temporal memory update (TMU) module is developed by employing convolutional long-short-term memory (ConvLSTM) between the current and adjacent frames to establish the temporal dynamic context correlation and update the current frame feature effectively. Finally, extensive experimental results demonstrate the superiority of the proposed HDNet method when applied to the public YouTubeVIS-2019 dataset and a self-built UAV-Seg segmentation dataset. On four typical datasets (i.e., Zoo, Street, Vehicle, and Sport) extracted from YoutubeVIS-2019 according to category characteristics, the proposed HT-VIS outperforms the state-of-the-art CNN-based VIS methods CrossVIS by 3.9%, 2.0%, 0.3%, and 3.8% in average segmentation accuracy, respectively. On the self-built UAV-VIS dataset, our HT-VIS with PHOC surpasses the baseline SipMask by 2.1% and achieves the highest average segmentation accuracy of 37.4% in the CNN-based methods, demonstrating the effectiveness and robustness of our proposed framework. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

15 pages, 1662 KiB

Open AccessArticle

YOLO-HVS: Infrared Small Target Detection Inspired by the Human Visual System

by Xiaoge Wang, Yunlong Sheng, Qun Hao, Haiyuan Hou and Suzhen Nie

Biomimetics 2025, 10(7), 451; https://doi.org/10.3390/biomimetics10070451 - 8 Jul 2025

Viewed by 326

Abstract

To address challenges of background interference and limited multi-scale feature extraction in infrared small target detection, this paper proposes a YOLO-HVS detection algorithm inspired by the human visual system. Based on YOLOv8, we design a multi-scale spatially enhanced attention module (MultiSEAM) using multi-branch [...] Read more.

To address challenges of background interference and limited multi-scale feature extraction in infrared small target detection, this paper proposes a YOLO-HVS detection algorithm inspired by the human visual system. Based on YOLOv8, we design a multi-scale spatially enhanced attention module (MultiSEAM) using multi-branch depth-separable convolution to suppress background noise and enhance occluded targets, integrating local details and global context. Meanwhile, the C2f_DWR (dilation-wise residual) module with regional-semantic dual residual structure is designed to significantly improve the efficiency of capturing multi-scale contextual information by expanding convolution and two-step feature extraction mechanism. We construct the DroneRoadVehicles dataset containing 1028 infrared images captured at 70–300 m, covering complex occlusion and multi-scale targets. Experiments show that YOLO-HVS achieves mAP50 of 83.4% and 97.8% on the public dataset DroneVehicle and the self-built dataset, respectively, which is an improvement of 1.1% and 0.7% over the baseline YOLOv8, and the number of model parameters only increases by 2.3 M, and the increase of GFLOPs is controlled at 0.1 G. The experimental results demonstrate that the proposed approach exhibits enhanced robustness in detecting targets under severe occlusion and low SNR conditions, while enabling efficient real-time infrared small target detection. Full article

(This article belongs to the Special Issue Advanced Biologically Inspired Vision and Its Application)

► Show Figures

Graphical abstract

20 pages, 1211 KiB

Open AccessArticle

Unsupervised Anomaly Detection with Continuous-Time Model for Pig Farm Environmental Data

by Heng Zhou, Seyeon Chung, Malik Muhammad Waqar, Muhammad Ibrahim Zain Ul Abideen, Arsalan Ahmad, Muhammad Ans Ilyas, Hyongsuk Kim and Sangcheol Kim

Agriculture 2025, 15(13), 1419; https://doi.org/10.3390/agriculture15131419 - 30 Jun 2025

Viewed by 372

Abstract

Environmental air anomaly detection is crucial for ensuring the healthy growth of livestock in smart pig farming systems. This study focuses on four key environmental variables within pig housing: temperature, relative humidity, carbon dioxide concentration, and ammonia concentration. Based on these variables, it [...] Read more.

Environmental air anomaly detection is crucial for ensuring the healthy growth of livestock in smart pig farming systems. This study focuses on four key environmental variables within pig housing: temperature, relative humidity, carbon dioxide concentration, and ammonia concentration. Based on these variables, it proposes a novel encoder–decoder architecture for anomaly detection based on continuous-time models. The proposed framework consists of two embedding layers: an encoder module built around a continuous-time neural network, and a decoder composed of multilayer perceptrons. The model is trained in a self-supervised manner and optimized using a reconstruction-based loss function. Extensive experiments are conducted on a multivariate multi-sequence dataset collected from real-world pig farming environments. Experimental results show that the proposed architecture significantly outperforms existing transformer-based methods, achieving 92.39% accuracy, 92.08% precision, 85.84% recall, and an F₁ score of 88.19%. These findings highlight the practical value of accurate anomaly detection in smart farming systems; timely identification of environmental irregularities enables proactive intervention, reduces animal stress, minimizes disease risk, and ultimately improves the sustainability and productivity of livestock operations. Full article

(This article belongs to the Special Issue Modeling of Livestock Breeding Environment and Animal Behavior)

► Show Figures

Figure 1

15 pages, 770 KiB

Open AccessData Descriptor

NPFC-Test: A Multimodal Dataset from an Interactive Digital Assessment Using Wearables and Self-Reports

by Luis Fernando Morán-Mirabal, Luis Eduardo Güemes-Frese, Mariana Favarony-Avila, Sergio Noé Torres-Rodríguez and Jessica Alejandra Ruiz-Ramirez

Data 2025, 10(7), 103; https://doi.org/10.3390/data10070103 - 30 Jun 2025

Viewed by 342

Abstract

The growing implementation of digital platforms and mobile devices in educational environments has generated the need to explore new approaches for evaluating the learning experience beyond traditional self-reports or instructor presence. In this context, the NPFC-Test dataset was created from an experimental protocol [...] Read more.

The growing implementation of digital platforms and mobile devices in educational environments has generated the need to explore new approaches for evaluating the learning experience beyond traditional self-reports or instructor presence. In this context, the NPFC-Test dataset was created from an experimental protocol conducted at the Experiential Classroom of the Institute for the Future of Education. The dataset was built by collecting multimodal indicators such as neuronal, physiological, and facial data using a portable EEG headband, a medical-grade biometric bracelet, a high-resolution depth camera, and self-report questionnaires. The participants were exposed to a digital test lasting 20 min, composed of audiovisual stimuli and cognitive challenges, during which synchronized data from all devices were gathered. The dataset includes timestamped records related to emotional valence, arousal, and concentration, offering a valuable resource for multimodal learning analytics (MMLA). The recorded data were processed through calibration procedures, temporal alignment techniques, and emotion recognition models. It is expected that the NPFC-Test dataset will support future studies in human–computer interaction and educational data science by providing structured evidence to analyze cognitive and emotional states in learning processes. In addition, it offers a replicable framework for capturing synchronized biometric and behavioral data in controlled academic settings. Full article

(This article belongs to the Special Issue Data Mining and Computational Intelligence for E-Learning and Education—3rd Edition)

► Show Figures

Figure 1

27 pages, 5780 KiB

Open AccessArticle

Utilizing GCN-Based Deep Learning for Road Extraction from Remote Sensing Images

by Yu Jiang, Jiasen Zhao, Wei Luo, Bincheng Guo, Zhulin An and Yongjun Xu

Sensors 2025, 25(13), 3915; https://doi.org/10.3390/s25133915 - 23 Jun 2025

Viewed by 478

Abstract

The technology of road extraction serves as a crucial foundation for urban intelligent renewal and green sustainable development. Its outcomes can optimize transportation network planning, reduce resource waste, and enhance urban resilience. Deep learning-based approaches have demonstrated outstanding performance in road extraction, particularly [...] Read more.

The technology of road extraction serves as a crucial foundation for urban intelligent renewal and green sustainable development. Its outcomes can optimize transportation network planning, reduce resource waste, and enhance urban resilience. Deep learning-based approaches have demonstrated outstanding performance in road extraction, particularly excelling in complex scenarios. However, extracting roads from remote sensing data remains challenging due to several factors that limit accuracy: (1) Roads often share similar visual features with the background, such as rooftops and parking lots, leading to ambiguous inter-class distinctions; (2) Roads in complex environments, such as those occluded by shadows or trees, are difficult to detect. To address these issues, this paper proposes an improved model based on Graph Convolutional Networks (GCNs), named FR-SGCN (Hierarchical Depth-wise Separable Graph Convolutional Network Incorporating Graph Reasoning and Attention Mechanisms). The model is designed to enhance the precision and robustness of road extraction through intelligent techniques, thereby supporting precise planning of green infrastructure. First, high-dimensional features are extracted using ResNeXt, whose grouped convolution structure balances parameter efficiency and feature representation capability, significantly enhancing the expressiveness of the data. These high-dimensional features are then segmented, and enhanced channel and spatial features are obtained via attention mechanisms, effectively mitigating background interference and intra-class ambiguity. Subsequently, a hybrid adjacency matrix construction method is proposed, based on gradient operators and graph reasoning. This method integrates similarity and gradient information and employs graph convolution to capture the global contextual relationships among features. To validate the effectiveness of FR-SGCN, we conducted comparative experiments using 12 different methods on both a self-built dataset and a public dataset. The proposed model achieved the highest F1 score on both datasets. Visualization results from the experiments demonstrate that the model effectively extracts occluded roads and reduces the risk of redundant construction caused by data errors during urban renewal. This provides reliable technical support for smart cities and sustainable development. Full article

(This article belongs to the Topic Digital and Intelligent Technologies and Application in Urban Construction, Operation, Maintenance, and Renewal)

► Show Figures

Figure 1

27 pages, 21013 KiB

Open AccessArticle

Improved YOLO-Goose-Based Method for Individual Identification of Lion-Head Geese and Egg Matching: Methods and Experimental Study

by Hengyuan Zhang, Zhenlong Wu, Tiemin Zhang, Canhuan Lu, Zhaohui Zhang, Jianzhou Ye, Jikang Yang, Degui Yang and Cheng Fang

Agriculture 2025, 15(13), 1345; https://doi.org/10.3390/agriculture15131345 - 23 Jun 2025

Viewed by 530

Abstract

As a crucial characteristic waterfowl breed, the egg-laying performance of Lion-Headed Geese serves as a core indicator for precision breeding. Under large-scale flat rearing and selection practices, high phenotypic similarity among individuals within the same pedigree coupled with traditional manual observation and existing [...] Read more.

As a crucial characteristic waterfowl breed, the egg-laying performance of Lion-Headed Geese serves as a core indicator for precision breeding. Under large-scale flat rearing and selection practices, high phenotypic similarity among individuals within the same pedigree coupled with traditional manual observation and existing automation systems relying on fixed nesting boxes or RFID tags has posed challenges in achieving accurate goose–egg matching in dynamic environments, leading to inefficient individual selection. To address this, this study proposes YOLO-Goose, an improved YOLOv8s-based method, which designs five high-contrast neck rings (DoubleBar, Circle, Dot, Fence, Cylindrical) as individual identifiers. The method constructs a lightweight model with a small-object detection layer, integrates the GhostNet backbone to reduce parameter count by 67.2%, and employs the GIoU loss function to optimize neck ring localization accuracy. Experimental results show that the model achieves an F1 score of 93.8% and mAP50 of 96.4% on the self-built dataset, representing increases of 10.1% and 5% compared to the original YOLOv8s, with a 27.1% reduction in computational load. The dynamic matching algorithm, incorporating spatiotemporal trajectories and egg positional data, achieves a 95% matching rate, a 94.7% matching accuracy, and a 5.3% mismatching rate. Through lightweight deployment using TensorRT, the inference speed is enhanced by 1.4 times compared to PyTorch-1.12.1, with detection results uploaded to a cloud database in real time. This solution overcomes the technical bottleneck of individual selection in flat rearing environments, providing an innovative computer-vision-based approach for precision breeding of pedigree Lion-Headed Geese and offering significant engineering value for advancing intelligent waterfowl breeding. Full article

(This article belongs to the Special Issue Computer Vision Analysis Applied to Farm Animals)

► Show Figures

Figure 1

28 pages, 1607 KiB

Open AccessArticle

Self-Supervised Keypoint Learning for the Geometric Analysis of Road-Marking Templates

by Chayanon Sub-r-pa and Rung-Ching Chen

Algorithms 2025, 18(7), 379; https://doi.org/10.3390/a18070379 - 23 Jun 2025

Viewed by 250

Abstract

Robust visual perception and geometric alignment are crucial for intelligent automation in various domains, such as industrial processes and infrastructure monitoring. Accurately aligning structured visual elements, such as floor markings or road-marking templates, is essential for tasks like automated guidance, verification, and condition [...] Read more.

Robust visual perception and geometric alignment are crucial for intelligent automation in various domains, such as industrial processes and infrastructure monitoring. Accurately aligning structured visual elements, such as floor markings or road-marking templates, is essential for tasks like automated guidance, verification, and condition assessment. However, traditional feature-based methods struggle with templates that feature simple geometries and lack rich textures, making reliable feature matching and alignment difficult, even under controlled conditions. To address this, we propose GeoTemplateKPNet, a novel self-supervised deep-learning framework, built upon Convolutional Neural Networks (CNNs), designed to learn robust, geometrically consistent keypoints specifically in synthetic template images. The model is trained exclusively in a synthetic template dataset by enforcing equivariance to geometric transformations and utilizing self-supervised losses, including inside mask loss, peakiness loss, repulsion loss, and keypoint-driven image reprojection loss, thereby eliminating the need for manual keypoint annotations. We evaluate the method in a synthetic template test set, using metrics such as a keypoint-matching comparison, the Inside Mask Rate (IMR), and the Alignment Reconstruction Error (ARE). The results demonstrate that GeoTemplateKPNet successfully learns to predict meaningful keypoints on template structures, enabling accurate alignment between templates and their transformed counterparts. Ablation studies reveal that the number of keypoints (K) impacts the performance, with K = 3 providing the most suitable balance for the overall alignment accuracy, although the performance varies across different template geometries. GeoTemplateKPNet offers a foundational self-supervised solution for the robust geometric analysis of templates, which is crucial for downstream alignment tasks and applications. Full article

(This article belongs to the Special Issue Data-Driven Intelligent Modeling and Optimization Algorithms for Industrial Processes: 2nd Edition)

► Show Figures

Figure 1

Search Results (325)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (325)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI