MDPI - Publisher of Open Access Journals

30 pages, 21308 KB

Open AccessArticle

Angle-Controllable SAR Image Generation and Target Recognition via StyleGAN2

by Ran Yang, Bo Wang, Tao Lai and Haifeng Huang

Remote Sens. 2025, 17(20), 3478; https://doi.org/10.3390/rs17203478 - 18 Oct 2025

Viewed by 102

Due to the inherent characteristics of synthetic aperture radar (SAR) imaging, variations in target orientation, and the challenges posed by non-cooperative targets (i.e., targets without cooperative transponders or external markers), limited viewpoint coverage results in a small-sample problem that severely constrains the application [...] Read more.

Due to the inherent characteristics of synthetic aperture radar (SAR) imaging, variations in target orientation, and the challenges posed by non-cooperative targets (i.e., targets without cooperative transponders or external markers), limited viewpoint coverage results in a small-sample problem that severely constrains the application of deep learning to SAR image interpretation and target recognition. To address this issue, this paper proposes a multi-target, multi-view SAR image generation method based on conditional information and StyleGAN2, designed to generate high-quality, angle-controllable SAR images of typical targets from limited samples. The proposed framework consists of an angle encoder, a generator, and a discriminator. The angle encoder employs a sinusoidal encoding scheme that combines sine and cosine functions to address the discontinuity inherent in one-hot angle encoding, thereby enabling precise angle control. Moreover, the integration of SimAM and IAAM attention mechanisms enhances image quality, facilitates accurate angle control, and improves the network’s generalization to untrained angles. Experiments conducted on a self-constructed dataset of typical civilian targets and the SAMPLE subset of the MSTAR dataset demonstrate that the proposed method outperforms existing baselines in terms of structural fidelity and feature distribution consistency. The generated images achieve a minimum FID of 6.541 and a maximum MS-SSIM of 0.907, while target recognition accuracy improves by 6.03% and 7.14%, respectively. These results validate the feasibility and effectiveness of the proposed approach for SAR image generation and target recognition tasks. Full article

(This article belongs to the Special Issue Synthetic Aperture Radar (SAR) Image Object Detection and Information Extraction: Methods and Applications (Second Edition))

► Show Figures

Graphical abstract

18 pages, 3666 KB

Open AccessArticle

Reinforcement Learning Enabled Intelligent Process Monitoring and Control of Wire Arc Additive Manufacturing

by Allen Love, Saeed Behseresht and Young Ho Park

J. Manuf. Mater. Process. 2025, 9(10), 340; https://doi.org/10.3390/jmmp9100340 - 18 Oct 2025

Viewed by 145

Abstract

Wire Arc Additive Manufacturing (WAAM) has been recognized as an efficient and cost-effective metal additive manufacturing technique due to its high deposition rate and scalability for large components. However, the quality and repeatability of WAAM parts are highly sensitive to process parameters such [...] Read more.

Wire Arc Additive Manufacturing (WAAM) has been recognized as an efficient and cost-effective metal additive manufacturing technique due to its high deposition rate and scalability for large components. However, the quality and repeatability of WAAM parts are highly sensitive to process parameters such as arc voltage, current, wire feed rate, and torch travel speed, requiring advanced monitoring and adaptive control strategies. In this study, a vision-based monitoring system integrated with a reinforcement learning framework was developed to enable intelligent in situ control of WAAM. A custom optical assembly employing mirrors and a bandpass filter allowed simultaneous top and side views of the melt pool, enabling real-time measurement of layer height and width. These geometric features provide feedback to a tabular Q-learning algorithm, which adaptively adjusts voltage and wire feed rate through direct hardware-level control of stepper motors. Experimental validation across multiple builds with varying initial conditions demonstrated that the RL controller stabilized layer geometry, autonomously recovered from process disturbances, and maintained bounded oscillations around target values. While systematic offsets between digital measurements and physical dimensions highlight calibration challenges inherent to vision-based systems, the controller consistently prevented uncontrolled drift and corrected large deviations in deposition quality. The computational efficiency of tabular Q-learning enabled real-time operation on standard hardware without specialized equipment, demonstrating an accessible approach to intelligent process control. These results establish the feasibility of reinforcement learning as a robust, data-efficient control technique for WAAM, capable of real-time adaptation with minimal prior process knowledge. With improved calibration methods and expanded multi-physics sensing, this framework can advance toward precise geometric accuracy and support broader adoption of machine learning-based process monitoring and control in metal additive manufacturing. Full article

(This article belongs to the Special Issue Advancing Wire Arc Additive Manufacturing (WAAM) for Metallic Component Manufacture: Recent Developments and Challenges)

► Show Figures

Figure 1

20 pages, 11855 KB

Open AccessArticle

High-Precision Extrinsic Calibration for Multi-LiDAR Systems with Narrow FoV via Synergistic Planar and Circular Features

by Xinbao Sun, Zhi Zhang, Shuo Xu and Jinyue Liu

Sensors 2025, 25(20), 6432; https://doi.org/10.3390/s25206432 - 17 Oct 2025

Viewed by 283

Abstract

Precise extrinsic calibration is a fundamental prerequisite for data fusion in multi-LiDAR systems. However, conventional methods are often encumbered by dependencies on initial estimates, auxiliary sensors, or manual feature selection, which renders them complex, time-consuming, and limited in adaptability across diverse environments. To [...] Read more.

Precise extrinsic calibration is a fundamental prerequisite for data fusion in multi-LiDAR systems. However, conventional methods are often encumbered by dependencies on initial estimates, auxiliary sensors, or manual feature selection, which renders them complex, time-consuming, and limited in adaptability across diverse environments. To address these limitations, this paper proposes a novel, high-precision extrinsic calibration method for multi-LiDAR systems with a narrow Field of View (FoV), achieved through the synergistic use of circular and planar features. Our approach commences with the automatic segmentation of the calibration target’s point cloud using an improved VoxelNet. Subsequently, a denoising step, combining RANSAC and a Gaussian Mean Intensity Filter (GMIF), is applied to ensure high-quality feature extraction. From the refined point cloud, planar and circular features are robustly extracted via Principal Component Analysis (PCA) and least-squares fitting, respectively. Finally, the extrinsic parameters are optimized by minimizing a nonlinear objective function formulated with joint constraints from both geometric features. Simulation results validate the high precision of our method, with rotational and translational errors contained within 0.08° and 0.8 cm. Furthermore, real-world experiments confirm its effectiveness and superiority, outperforming conventional point-cloud registration techniques. Full article

(This article belongs to the Section Sensors and Robotics)

► Show Figures

Figure 1

22 pages, 1678 KB

Open AccessArticle

Image Completion Network Considering Global and Local Information

by Yubo Liu, Ke Chen and Alan Penn

Buildings 2025, 15(20), 3746; https://doi.org/10.3390/buildings15203746 - 17 Oct 2025

Viewed by 129

Abstract

Accurate depth image inpainting in complex urban environments remains a critical challenge due to occlusions, reflections, and sensor limitations, which often result in significant data loss. We propose a hybrid deep learning framework that explicitly combines local and global modelling through Convolutional Neural [...] Read more.

Accurate depth image inpainting in complex urban environments remains a critical challenge due to occlusions, reflections, and sensor limitations, which often result in significant data loss. We propose a hybrid deep learning framework that explicitly combines local and global modelling through Convolutional Neural Networks (CNNs) and Transformer modules. The model employs a multi-branch parallel architecture, where the CNN branch captures fine-grained local textures and edges, while the Transformer branch models global semantic structures and long-range dependencies. We introduce an optimized attention mechanism, Agent Attention, which differs from existing efficient/linear attention methods by using learnable proxy tokens tailored for urban scene categories (e.g., façades, sky, ground). A content-guided dynamic fusion module adaptively combines multi-scale features to enhance structural alignment and texture recovery. The frame-work is trained with a composite loss function incorporating pixel accuracy, perceptual similarity, adversarial realism, and structural consistency. Extensive experiments on the Paris StreetView dataset demonstrate that the proposed method achieves state-of-the-art performance, outperforming existing approaches in PSNR, SSIM, and LPIPS metrics. The study highlights the potential of multi-scale modeling for urban depth inpainting and discusses challenges in real-world deployment, ethical considerations, and future directions for multimodal integration. Full article

(This article belongs to the Special Issue Advanced Technologies for Construction and Maintenance of Engineering Structures)

► Show Figures

Figure 1

20 pages, 18957 KB

Open AccessArticle

Multi-Modal Data Fusion for 3D Object Detection Using Dual-Attention Mechanism

by Mengying Han, Benlan Shen and Jiuhong Ruan

Sensors 2025, 25(20), 6360; https://doi.org/10.3390/s25206360 - 14 Oct 2025

Viewed by 437

Abstract

To address the issue of missing feature information for small objects caused by the sparsity and irregularity of point clouds, as well as the poor detection performance on small objects due to their weak feature representation, this paper proposes a multi-modal 3D object [...] Read more.

To address the issue of missing feature information for small objects caused by the sparsity and irregularity of point clouds, as well as the poor detection performance on small objects due to their weak feature representation, this paper proposes a multi-modal 3D object detection method based on an improved PointPillars framework. First, LiDAR point clouds are fused with camera images at the data level, incorporating 2D semantic information to enhance small-object feature representation. Second, a Pillar-wise Channel Attention (PCA) module is introduced to emphasize critical features before converting pillar features into pseudo-image representations. Additionally, a Spatial Attention Module (SAM) is embedded into the backbone network to enhance spatial feature representation. Experiments on the KITTI dataset show that, compared with the baseline PointPillars, the proposed method significantly improves small-object detection performance. Specifically, under the bird’s-eye view (BEV) evaluation metrics, the Average Precision (AP) for pedestrians and cyclists increases by 7.06% and 3.08%, respectively; under the 3D evaluation metrics, these improvements are 4.36% and 2.58%. Compared with existing methods, the improved model also achieves relatively higher accuracy in detecting small objects. Visualization results further demonstrate the enhanced detection capability of the proposed method for small objects with different difficulty levels. Overall, the proposed approach effectively improves 3D object detection performance, particularly for small objects, in complex autonomous driving scenarios. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

29 pages, 6794 KB

Open AccessArticle

An Attitude Estimation Method for Space Targets Based on the Selection of Multi-View ISAR Image Sequences

by Junzhi Li, Xin Ning, Dou Sun and Rongzhen Du

Remote Sens. 2025, 17(20), 3432; https://doi.org/10.3390/rs17203432 - 14 Oct 2025

Viewed by 298

Abstract

Multi-view inverse synthetic aperture radar (ISAR) image sequences provide multi-dimensional observation information about space targets, enabling precise attitude estimation that is fundamental to both non-cooperative target monitoring and critical space operations including active debris removal and space collision avoidance. However, directly utilizing all [...] Read more.

Multi-view inverse synthetic aperture radar (ISAR) image sequences provide multi-dimensional observation information about space targets, enabling precise attitude estimation that is fundamental to both non-cooperative target monitoring and critical space operations including active debris removal and space collision avoidance. However, directly utilizing all images within an ISAR sequence for attitude estimation can result in a substantial data preprocessing workload and reduced algorithm efficiency. Given the inherent overlap and redundancy in the target information provided by these ISAR images, this paper proposes a novel space target attitude estimation method based on the selection of multi-view ISAR image sequences. The proposed method begins by establishing an ISAR imaging projection model, then characterizing the target information differences through variations in imaging plane normal, and proposing an image selection method based on the uniform sampling across elevation and azimuth angles of the imaging plane normal. On this basis, the method utilizes a high-resolution network (HRNet) to extract the feature points of typical components of the space target. This method enables simultaneous feature point extraction and matching association within ISAR images. The attitude estimation problem is subsequently modeled as an unconstrained optimization problem. Finally, the particle swarm optimization (PSO) algorithm is employed to solve this optimization problem, thereby achieving accurate attitude estimation of the space target. Experimental results demonstrate that the proposed methodology effectively filters image data, significantly reducing the number of images required while maintaining high attitude estimation accuracy. The method provides a more informative sequence than conventional selection strategies, and the tailored HRNet + PSO estimator resists performance degradation in sparse-data conditions, thereby ensuring robust overall performance. Full article

► Show Figures

Figure 1

24 pages, 3661 KB

Open AccessArticle

Real-Time Occluded Target Detection and Collaborative Tracking Method for UAVs

by Yandi Ai, Ruolong Li, Chaoqian Xiang and Xin Liang

Electronics 2025, 14(20), 4034; https://doi.org/10.3390/electronics14204034 - 14 Oct 2025

Viewed by 273

Abstract

To address the failure of unmanned aerial vehicle (UAV) target tracking caused by occlusion and limited field of view in dense low-altitude obstacle environments, this paper proposes a novel framework integrating occlusion-aware modeling and multi-UAV collaboration. A lightweight tracking model based on the [...] Read more.

To address the failure of unmanned aerial vehicle (UAV) target tracking caused by occlusion and limited field of view in dense low-altitude obstacle environments, this paper proposes a novel framework integrating occlusion-aware modeling and multi-UAV collaboration. A lightweight tracking model based on the Mamba backbone is developed, incorporating a Dilated Wavelet Receptive Field Enhancement Module (DWRFEM) to fuse multi-scale contextual features, significantly mitigating contour fragmentation and feature degradation under severe occlusion. A dual-branch feature optimization architecture is designed, combining the Distilled Tanh Activation with Context (DiTAC) activation function and Kolmogorov–Arnold Network (KAN) bottleneck layers to enhance discriminative feature representation. To overcome the limitations of single-UAV perception, a multi-UAV cooperative system is established. Ray intersection is employed to reduce localization uncertainty, while spherical sampling viewpoints are dynamically generated based on obstacle density. Safe trajectory planning is achieved using a Crested Porcupine Optimizer (CPO). Experiments on the Multi-Drone Multi-Target Tracking (MDMT) dataset demonstrate that the model achieves 84.1% average precision (AP) at 95 Frames Per Second (FPS), striking a favorable balance between speed and accuracy, making it suitable for edge deployment. Field tests with three collaborative UAVs show sustained target coverage in complex environments, outperforming traditional single-UAV approaches. This study provides a systematic solution for robust tracking in challenging low-altitude scenarios. Full article

(This article belongs to the Special Issue Digital Intelligence Technology and Applications, 2nd Edition)

► Show Figures

Figure 1

24 pages, 13555 KB

Open AccessArticle

A Visual Trajectory-Based Method for Personnel Behavior Recognition in Industrial Scenarios

by Houquan Wang, Tao Song, Zhipeng Xu, Songxiao Cao, Bin Zhou and Qing Jiang

Sensors 2025, 25(20), 6331; https://doi.org/10.3390/s25206331 - 14 Oct 2025

Viewed by 392

Abstract

Accurate recognition of personnel behavior in industrial environments is essential for asset protection and workplace safety, yet complex environmental conditions pose a significant challenge to its accuracy. This paper presents a novel, lightweight framework to address these issues. We first enhance a YOLOv8n [...] Read more.

Accurate recognition of personnel behavior in industrial environments is essential for asset protection and workplace safety, yet complex environmental conditions pose a significant challenge to its accuracy. This paper presents a novel, lightweight framework to address these issues. We first enhance a YOLOv8n model with Receptive Field Attention Convolution (RFAConv) and Efficient Multi-scale Attention (EMA) mechanisms, achieving a 6.9% increase in AP50 and a 4.2% increase in AP50:95 over the baseline. Continuous motion trajectories are then generated using the BOT-SORT algorithm and geometrically corrected via perspective transformation to produce a high-fidelity bird’s-eye view. Finally, a set of discriminative trajectory features is classified using a Random Forest model, attaining F1-scores exceeding 82% for all behaviors on our proprietary industrial dataset. The proposed framework provides a robust and efficient solution for real-time personnel behavior recognition in challenging industrial settings. Future work will focus on exploring more advanced algorithms and validating the framework’s performance on edge devices. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

17 pages, 7451 KB

Open AccessArticle

An Off-Axis Catadioptric Division of Aperture Optical System for Multi-Channel Infrared Imaging

by Jie Chen, Tong Yang, Hongbo Xie and Lei Yang

Photonics 2025, 12(10), 1008; https://doi.org/10.3390/photonics12101008 - 13 Oct 2025

Viewed by 157

Abstract

Multi-channel optical systems can provide more feature information compared to single-channel systems, making them valuable for optical remote sensing, target identification, and other applications. The division of aperture polarization imaging modality allows for the simultaneous imaging of targets in the same field of [...] Read more.

Multi-channel optical systems can provide more feature information compared to single-channel systems, making them valuable for optical remote sensing, target identification, and other applications. The division of aperture polarization imaging modality allows for the simultaneous imaging of targets in the same field of view with a single detector. To overcome the limitations of conventional refractive aperture-divided systems for miniaturization, this work proposes an off-axis catadioptric aperture-divided technique for polarization imaging. First, the design method of the off-axis reflective telescope structure is discussed. The relationship between optical parameters such as magnification, surface coefficient, and primary aberration is studied. Second, by establishing the division of the aperture optical model, the method of maximizing the field of view and aperture is determined. Finally, an off-axis catadioptric cooled aperture-divided infrared optical system with a single aperture focal length of 60 mm is shown as a specific design example. Each channel can achieve 100% cold shield efficiency, and the overall length of the telescope module can be decreased significantly. The image quality of each imaging channel is close to the diffraction limit, verifying the effectiveness and feasibility of the method. The proposed off-axis catadioptric aperture-divided design method holds potential applications in simultaneous infrared polarization imaging. Full article

(This article belongs to the Section Optical Interaction Science)

► Show Figures

Figure 1

15 pages, 2133 KB

Open AccessArticle

A LiDAR SLAM and Visual-Servoing Fusion Approach to Inter-Zone Localization and Navigation in Multi-Span Greenhouses

by Chunyang Ni, Jianfeng Cai and Pengbo Wang

Agronomy 2025, 15(10), 2380; https://doi.org/10.3390/agronomy15102380 - 12 Oct 2025

Viewed by 485

Abstract

Greenhouse automation has become increasingly important in facility agriculture, yet multi-span glass greenhouses pose both scientific and practical challenges for autonomous mobile robots. Scientifically, solid-state LiDAR is vulnerable to glass-induced reflections, sparse geometric features, and narrow vertical fields of view, all of which [...] Read more.

Greenhouse automation has become increasingly important in facility agriculture, yet multi-span glass greenhouses pose both scientific and practical challenges for autonomous mobile robots. Scientifically, solid-state LiDAR is vulnerable to glass-induced reflections, sparse geometric features, and narrow vertical fields of view, all of which undermine Simultaneous Localization and Mapping (SLAM)-based localization and mapping. Practically, large-scale crop production demands accurate inter-row navigation and efficient rail switching to reduce labor intensity and ensure stable operations. To address these challenges, this study presents an integrated localization-navigation framework for mobile robots in multi-span glass greenhouses. In the intralogistics area, the LiDAR Inertial Odometry-Simultaneous Localization and Mapping (LIO-SAM) pipeline was enhanced with reflection filtering, adaptive feature-extraction thresholds, and improved loop-closure detection, generating high-fidelity three-dimensional maps that were converted into two-dimensional occupancy grids for A-Star global path planning and Dynamic Window Approach (DWA) local control. In the cultivation area, where rails intersect with internal corridors, YOLOv8n-based rail-center detection combined with a pure-pursuit controller established a vision-servo framework for lateral rail switching and inter-row navigation. Field experiments demonstrated that the optimized mapping reduced the mean relative error by 15%. At a navigation speed of 0.2 m/s, the robot achieved a mean lateral deviation of 4.12 cm and a heading offset of 1.79°, while the vision-servo rail-switching system improved efficiency by 25.2%. These findings confirm the proposed framework’s accuracy, robustness, and practical applicability, providing strong support for intelligent facility-agriculture operations. Full article

(This article belongs to the Section Precision and Digital Agriculture)

► Show Figures

Figure 1

22 pages, 2631 KB

Open AccessArticle

Adversarial Robustness Evaluation for Multi-View Deep Learning Cybersecurity Anomaly Detection

by Min Li, Yuansong Qiao and Brian Lee

Future Internet 2025, 17(10), 459; https://doi.org/10.3390/fi17100459 - 8 Oct 2025

Viewed by 351

Abstract

In the evolving cyberthreat landscape, a critical challenge for intrusion detection systems (IDSs) lies in defending against meticulously crafted adversarial attacks. Traditional single-view detection frameworks, constrained by their reliance on limited and unidimensional feature representations, are often inadequate for identifying maliciously manipulated samples. [...] Read more.

In the evolving cyberthreat landscape, a critical challenge for intrusion detection systems (IDSs) lies in defending against meticulously crafted adversarial attacks. Traditional single-view detection frameworks, constrained by their reliance on limited and unidimensional feature representations, are often inadequate for identifying maliciously manipulated samples. To address these limitations, this study proposes a key hypothesis: a detection architecture that adopts a multi-view fusion strategy can significantly enhance the system’s resilience to attacks. To validate the proposed hypothesis, this study developed a multi-view fusion architecture and conducted a series of comparative experiments. A two-pronged validation framework was employed. First, we examined whether the multi-view fusion model demonstrates superior robustness compared to a single-view model in intrusion detection tasks, thereby providing empirical evidence for the effectiveness of multi-view strategies. Second, we evaluated the generalization capability of the multi-view model under varying levels of attack intensity and coverage, assessing its stability in complex adversarial scenarios. Methodologically, a dual-axis training assessment scheme was introduced, comprising (i) continuous gradient testing of perturbation intensity, with the

ε

parameter increasing from 0.01 to 0.2, and (ii) variation in attack density, with sample contamination rates ranging from 80% to 90%. Adversarial test samples were generated using the Fast Gradient Sign Method (FGSM) on the TON_IoT and UNSW-NB15 datasets. Furthermore, we propose a validation mechanism that integrates both performance and robustness testing. The model is evaluated on clean and adversarial test sets, respectively. By analyzing performance retention and adversarial robustness, we provide a comprehensive assessment of the stability of the multi-view model under varying evaluation conditions. The experimental results provide clear support for the research hypothesis: The multi-view fusion model is more robust than the single-view model under adversarial scenarios. Even under high-intensity attack scenarios, the multi-view model consistently demonstrates superior robustness and stability. More importantly, the multi-view model, through its architectural feature diversity, effectively resists targeted attacks to which the single-view model is vulnerable, confirming the critical role of feature space redundancy in enhancing adversarial robustness. Full article

► Show Figures

Figure 1

22 pages, 1806 KB

Open AccessArticle

MAMVCL: Multi-Atlas Guided Multi-View Contrast Learning for Autism Spectrum Disorder Classification

by Zuohao Yin, Feng Xu, Yue Ma, Shuo Huang, Kai Ren and Li Zhang

Brain Sci. 2025, 15(10), 1086; https://doi.org/10.3390/brainsci15101086 - 8 Oct 2025

Viewed by 264

Abstract

Background: Autism spectrum disorder (ASD) is a neurodevelopmental condition characterized by significant neurological plasticity in early childhood, where timely interventions like behavioral therapy, language training, and social skills development can mitigate symptoms. Contributions: We introduce a novel Multi-Atlas Guided Multi-View Contrast Learning (MAMVCL) [...] Read more.

Background: Autism spectrum disorder (ASD) is a neurodevelopmental condition characterized by significant neurological plasticity in early childhood, where timely interventions like behavioral therapy, language training, and social skills development can mitigate symptoms. Contributions: We introduce a novel Multi-Atlas Guided Multi-View Contrast Learning (MAMVCL) framework for ASD classification, leveraging functional connectivity (FC) matrices from multiple brain atlases to enhance diagnostic accuracy. Methodology: The MAMVCL framework integrates imaging and phenotypic data through a population graph, where node features derive from imaging data, edge indices are based on similarity scoring matrices, and edge weights reflect phenotypic similarities. Graph convolution extracts global field-of-view features. Concurrently, a Target-aware attention aggregator processes FC matrices to capture high-order brain region dependencies, yielding local field-of-view features. To ensure consistency in subject characteristics, we employ a graph contrastive learning strategy that aligns global and local feature representations. Results: Experimental results on the ABIDE-I dataset demonstrate that our model achieves an accuracy of 85.71%, outperforming most existing methods and confirming its effectiveness. Implications: The proposed model demonstrates superior performance in ASD classification, highlighting the potential of multi-atlas and multi-view learning for improving diagnostic precision and supporting early intervention strategies. Full article

(This article belongs to the Special Issue Advances in Emotion Processing and Cognitive Neuropsychology)

► Show Figures

Figure 1

25 pages, 2213 KB

Open AccessArticle

Multi-Aligned and Multi-Scale Augmentation for Occluded Person Re-Identification

by Xuan Jiang, Xin Yuan and Xiaolan Yang

Sensors 2025, 25(19), 6210; https://doi.org/10.3390/s25196210 - 7 Oct 2025

Viewed by 416

Abstract

Occluded person re-identification (Re-ID) faces significant challenges, mainly due to the interference of occlusion noise and the scarcity of realistic occluded training data. Although data augmentation is a commonly used solution, the current occlusion augmentation methods suffer from the problem of dual inconsistencies: [...] Read more.

Occluded person re-identification (Re-ID) faces significant challenges, mainly due to the interference of occlusion noise and the scarcity of realistic occluded training data. Although data augmentation is a commonly used solution, the current occlusion augmentation methods suffer from the problem of dual inconsistencies: intra-sample inconsistency is caused by misaligned synthetic occluders (an augmentation operation for simulating real occlusion situations); i.e., randomly pasted occluders ignore spatial prior information and style differences, resulting in unrealistic artifacts that mislead feature learning; inter-sample inconsistency stems from information loss during random cropping (an augmentation operation for simulating occlusion-induced information loss); i.e., single-scale cropping strategies discard discriminative regions, weakening the robustness of the model. To address the aforementioned dual inconsistencies, this study proposes the unified Multi-Aligned and Multi-Scale Augmentation (MA–MSA) framework based on the core principle of ”synthetic data should resemble real-world data”. First, the Frequency–Style–Position Data Augmentation (FSPDA) module is designed: it ensures consistency in three aspects (frequency, style, and position) by constructing an occluder library that conforms to real-world distribution, achieving style alignment via adaptive instance normalization and optimizing the placement of occluders using hierarchical position rules. Second, the Multi-Scale Crop Data Augmentation (MSCDA) strategy is proposed. It eliminates the problem of information loss through multi-scale cropping with non-overlapping ratios and dynamic view fusion. In addition, different from the traditional serial augmentation method, MA–MSA integrates FSPDA and MSCDA in a parallel manner to achieve the collaborative resolution of dual inconsistencies. Extensive experiments on Occluded-Duke and Occluded-REID show that MA–MSA achieves state-leading performance of 73.3% Rank-1 (+1.5%) and 62.9% mAP on Occluded-Duke, and 87.3% Rank-1 (+2.0%) and 82.1% mAP on Occluded-REID, demonstrating superior robustness without auxiliary models. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

23 pages, 5437 KB

Open AccessArticle

Hierarchical Deep Learning for Abnormality Classification in Mouse Skeleton Using Multiview X-Ray Images: Convolutional Autoencoders Versus ConvNeXt

by Muhammad M. Jawaid, Rasneer S. Bains, Sara Wells and James M. Brown

J. Imaging 2025, 11(10), 348; https://doi.org/10.3390/jimaging11100348 - 7 Oct 2025

Viewed by 280

Abstract

Single-view-based anomaly detection approaches present challenges due to the lack of context, particularly for multi-label problems. In this work, we demonstrate the efficacy of using multiview image data for improved classification using a hierarchical learning approach. Using 170,958 images from the International Mouse [...] Read more.

Single-view-based anomaly detection approaches present challenges due to the lack of context, particularly for multi-label problems. In this work, we demonstrate the efficacy of using multiview image data for improved classification using a hierarchical learning approach. Using 170,958 images from the International Mouse Phenotyping Consortium (IMPC) repository, a specimen-wise multiview dataset comprising 54,046 specimens was curated. Next, two hierarchical classification frameworks were developed by customizing ConvNeXT and a convolutional autoencoder (CAE) as CNN backbones, respectively. The customized architectures were trained at three hierarchy levels with increasing anatomical granularity, enabling specialized layers to learn progressively more detailed features. At the top level (L1), multiview (MV) classification performed about the same as single views, with a high mean AUC of 0.95. However, using MV images in the hierarchical model greatly improved classification at levels 2 and 3. The model showed consistently higher average AUC scores with MV compared to single views such as dorsoventral or lateral. For example, at Level 2 (L2), the model divided abnormal cases into three subclasses, achieving AUCs of 0.65 for DV, 0.76 for LV, and 0.87 for MV. Then, at Level 3 (L3), it further divided these into ten specific abnormalities, with AUCs of 0.54 for DV, 0.59 for LV, and 0.82 for MV. A similar performance was achieved by the CAE-driven architecture, with mean AUCs of 0.87, 0.88, and 0.89 at Level 2 (L2) and 0.74, 0.78, and 0.81 at Level 3 (L3), respectively, for DV, LV, and MV views. The overall results demonstrate the advantage of multiview image data coupled with hierarchical learning for skeletal abnormality detection in a multi-label context. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

27 pages, 32995 KB

Open AccessArticle

Recognition of Wood-Boring Insect Creeping Signals Based on Residual Denoising Vision Network

by Henglong Lin, Huajie Xue, Jingru Gong, Cong Huang, Xi Qiao, Liping Yin and Yiqi Huang

Sensors 2025, 25(19), 6176; https://doi.org/10.3390/s25196176 - 5 Oct 2025

Viewed by 442

Abstract

Currently, the customs inspection of wood-boring pests in timber still primarily relies on manual visual inspection, which involves observing insect holes on the timber surface and splitting the timber for confirmation. However, this method has significant drawbacks such as long detection time, high [...] Read more.

Currently, the customs inspection of wood-boring pests in timber still primarily relies on manual visual inspection, which involves observing insect holes on the timber surface and splitting the timber for confirmation. However, this method has significant drawbacks such as long detection time, high labor cost, and accuracy relying on human experience, making it difficult to meet the practical needs of efficient and intelligent customs quarantine. To address this issue, this paper develops a rapid identification system based on the peristaltic signals of wood-boring pests through the PyQt framework. The system employs a deep learning model with multi-attention mechanisms, namely the Residual Denoising Vision Network (RDVNet). Firstly, a LabVIEW-based hardware–software system is used to collect pest peristaltic signals in an environment free of vibration interference. Subsequently, the original signals are clipped, converted to audio format, and mixed with external noise. Then signal features are extracted through three cepstral feature extraction methods Mel-Frequency Cepstral Coefficients (MFCC), Power-Normalized Cepstral Coefficients (PNCC), and RelAtive SpecTrAl-Perceptual Linear Prediction (RASTA-PLP) and input into the model. In the experimental stage, this paper compares the denoising module of RDVNet (de-RDVNet) with four classic denoising models under five noise intensity conditions. Finally, it evaluates the performance of RDVNet and four other noise reduction classification models in classification tasks. The results show that PNCC has the most comprehensive feature extraction capability. When PNCC is used as the model input, de-RDVNet achieves an average peak signal-to-noise ratio (PSNR) of 29.8 and a Structural Similarity Index Measure (SSIM) of 0.820 in denoising experiments, both being the best among the comparative models. In classification experiments, RDVNet has an average F1 score of 0.878 and an accuracy of 92.8%, demonstrating the most excellent performance. Overall, the application of this system in customs timber quarantine can effectively improve detection efficiency and reduce labor costs and has significant practical value and promotion prospects. Full article

(This article belongs to the Section Smart Agriculture)

► Show Figures

Figure 1

Search Results (997)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (997)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI