MDPI - Publisher of Open Access Journals

26 pages, 13961 KB

Open AccessArticle

A UAV–3DGS–VR Workflow for Scenario-Comparable Immersive Review in Heritage Landscapes

by Xintong Li, Wenqi Sheng, Yixuan Tang, Yingwen Yu and Yuyang Peng

Drones 2026, 10(6), 404; https://doi.org/10.3390/drones10060404 (registering DOI) - 23 May 2026

Unmanned aerial vehicles (UAVs) are widely used for documentation, surveying, and 3D modeling in the built environment, yet their outputs often remain difficult to reuse for immersive comparison of alternative construction scenarios. This study presents a low-cost UAV-to-3DGS-to-VR workflow for constructing scenario-comparable immersive [...] Read more.

Unmanned aerial vehicles (UAVs) are widely used for documentation, surveying, and 3D modeling in the built environment, yet their outputs often remain difficult to reuse for immersive comparison of alternative construction scenarios. This study presents a low-cost UAV-to-3DGS-to-VR workflow for constructing scenario-comparable immersive environments for built-environment review. The workflow combines multi-angle UAV imagery, point-cloud-based geometric anchoring, 3D Gaussian Splatting (3DGS), and Unity-based virtual reality (VR) to transform drone-captured reality into a reusable scene for controlled scenario comparison. The workflow is demonstrated in Middenbeemster, the central town of the Beemster polder World Heritage property. One present-condition scene (M0) and three alternative construction scenarios (M1 to M3) were created within a shared spatial reference. Reconstruction quality was assessed using PSNR and SSIM, and the VR scenes were further evaluated through eye-tracking, head-motion recording, and subjective ranking. The results indicate that the workflow can generate visually reliable and directly comparable immersive scenes from UAV data in this case study. Behavioral and subjective findings showed a consistent pattern, with M1 appearing more compatible than M2 and M3 in this pilot evaluation. The study contributes a pilot UAV-based workflow that links reality capture, immersive scenario comparison, and supplementary behavioral evidence within one process. Full article

(This article belongs to the Topic 3D Documentation of Natural and Cultural Heritage)

35 pages, 3324 KB

Open AccessArticle

POCA-Lite: A Lightweight Change-Detection Architecture with Geometry-Aware Auxiliary Supervision and Feedback Fusion

by Yongqi Shi, Ruopeng Yang, Bo Huang, Zhaoyang Gu, Yiwei Lu, Changsheng Yin, Yongqi Wen and Yihao Zhong

Remote Sens. 2026, 18(10), 1673; https://doi.org/10.3390/rs18101673 - 21 May 2026

Abstract

Building change detection from bi-temporal remote-sensing imagery underpins urban planning, infrastructure monitoring, and disaster assessment. Existing deep-learning methods achieve high accuracy but rely on large parameter counts, while pixel-level supervision provides limited boundary guidance. We propose POCA-lite, a lightweight encoder–decoder with an inference-coupled [...] Read more.

Building change detection from bi-temporal remote-sensing imagery underpins urban planning, infrastructure monitoring, and disaster assessment. Existing deep-learning methods achieve high accuracy but rely on large parameter counts, while pixel-level supervision provides limited boundary guidance. We propose POCA-lite, a lightweight encoder–decoder with an inference-coupled geometry branch: three geometric prediction heads—distance transform, boundary, and center heatmap—whose outputs are fused back into the decoder via a feedback pathway active at both training and inference. On the LEVIR-CD benchmark under a unified retraining protocol, multi-seed evaluation shows that POCA-lite matches SNUNet in mean F1 while using 47% fewer parameters and 53% fewer FLOPs. Boundary F1 improves by 9.22 pp over the no-geometry baseline. Decomposition ablations reveal two complementary improvement sources: geometric supervision alone recovers 85% of the total gain, while the feedback fusion pathway recovers 92%; their combination achieves the full result. Geometry-aware targets outperform a generic multitask control. Cross-architecture transfer to SNUNet yields +1.06 pp F1. However, cross-dataset evaluation on WHU-CD shows that the method underperforms SNUNet on dense urban morphology, and zero-shot cross-dataset transfer is not established. These results indicate that inference-coupled geometric supervision is effective for lightweight, boundary-sensitive change detection on domains with well-separated building morphology, but its applicability is scope-bounded. Full article

► Show Figures

Figure 1

25 pages, 5124 KB

Open AccessArticle

Aerodynamic Prediction and Optimization of Compressor Stators Based on Deep Learning

by Jiang Zheng, Mingming Yao, Kai Zhan and Qingfei Lu

Appl. Sci. 2026, 16(10), 5062; https://doi.org/10.3390/app16105062 - 19 May 2026

Viewed by 111

Abstract

The aerodynamic performance of compressor stators critically affects aircraft engine efficiency, yet traditional CFD-based evaluation and optimization suffer from high computational cost. This study addresses this gap by developing deep learning surrogate models to predict total pressure loss coefficient and outlet flow angle [...] Read more.

The aerodynamic performance of compressor stators critically affects aircraft engine efficiency, yet traditional CFD-based evaluation and optimization suffer from high computational cost. This study addresses this gap by developing deep learning surrogate models to predict total pressure loss coefficient and outlet flow angle deviation for compressor stator vanes, using two geometric parameters—stagger angle

β_{y}

, leading-edge radius ratio R_rle, and one operational parameter, attack angle α. A high-fidelity dataset of 1701 cases was generated via automated CFD simulations using the transitional SST k-ω model. Among evaluated models—including standard CNN, CBAM-CNN, SS-CNN, and CNN-Transformer, SS-CNN achieved the highest accuracy, reducing mean absolute percentage error from 3.56% to 2.03% for loss and from 1.49% to 1.11% for outlet angle, with substantial computational savings. These surrogate models were integrated into a multi-objective optimization framework. The optimized vane, featuring a reduced leading-edge radius ratio within a stable stagger range, reduced total pressure loss by 2.38% (from 0.0570 to 0.0556) at the design attack angle of −2.83°, while the outlet angle deviation decreased from 0.439° to 0.066° (85% reduction), with the outlet angle improvement concentrated near the design condition. This work demonstrates a systematic, data-driven pipeline combining parametric modeling, automated simulation, deep learning-based prediction, and rapid optimization, offering an efficient solution for intelligent compressor blade design. Full article

(This article belongs to the Section Fluid Science and Technology)

► Show Figures

Figure 1

18 pages, 317 KB

Open AccessArticle

Applying Integrated Delphi–AHP to Maintenance Competency Prioritization in Industry 4.0: A Formally Specified Group Decision Framework with Consistency and Sensitivity Diagnostics

by Chin-Wen Liao, Nguyen Van Thanh and Yi-Hsin Tai

Information 2026, 17(5), 500; https://doi.org/10.3390/info17050500 - 19 May 2026

Viewed by 150

Abstract

As Industry 4.0 transforms manufacturing operations, maintenance organizations face a group decision-making problem: how to consolidate diverse expert judgments into a defensible, transparent ranking of the competencies that maintenance personnel most need. This paper applies an integrated Delphi–AHP framework—with explicit notation, operators, and [...] Read more.

As Industry 4.0 transforms manufacturing operations, maintenance organizations face a group decision-making problem: how to consolidate diverse expert judgments into a defensible, transparent ranking of the competencies that maintenance personnel most need. This paper applies an integrated Delphi–AHP framework—with explicit notation, operators, and diagnostics—to prioritize maintenance competencies in advanced-manufacturing settings. The Delphi stage consolidates expert-generated items under median–interquartile-range consensus and round-to-round stability rules, while the Analytic Hierarchy Process (AHP) transforms validated pairwise comparisons into ratio-scale priority weights through geometric-mean Aggregation of Individual Judgments (AIJ) and eigenvector derivation. Consistency screening (CI/CR), inter-rater agreement (Kendall’s W), and perturbation-based sensitivity analysis accompany the resulting weight vector. A bounded AI-assisted consistency-check step supports terminology harmonization during Delphi statement consolidation, subject to explicit human-validation constraints. A panel of fifteen industry experts participated in the study; five competency dimensions and twenty-nine indicators were retained through three Delphi rounds. AHP weighting identified Basic Knowledge and Skills as the highest-priority dimension, followed by Safety and Regulation Awareness and Problem-Solving Ability. Aggregated pairwise comparison matrices, local and global weights, and sensitivity results are reported to support reproducibility. The study contributes a rigorously specified application of combined Delphi–AHP to a domain—Industry 4.0 maintenance asset management—where multi-criteria decision analysis has seen limited formal application, and closes common specification gaps in published Delphi–AHP implementations. Full article

(This article belongs to the Special Issue New Applications in Multiple Criteria Decision Analysis, 3rd Edition)

► Show Figures

Figure 1

27 pages, 4438 KB

Open AccessArticle

DOM-MUSE: A Deformable Omnidirectional State Space Architecture for Efficient Speech Enhancement

by Tsung-Jung Li, Bo-Yu Su, Jung-Shan Lin and Jeih-Weih Hung

Electronics 2026, 15(10), 2159; https://doi.org/10.3390/electronics15102159 - 18 May 2026

Viewed by 144

Abstract

Transformer-based speech enhancement (SE) architectures suffer from high computational complexity, while existing lightweight state space model (SSM) approaches are constrained to fixed one-dimensional scanning that cannot fully exploit the two-dimensional time–frequency structure of speech spectrograms. To address these limitations, we propose DOM-MUSE, a [...] Read more.

Transformer-based speech enhancement (SE) architectures suffer from high computational complexity, while existing lightweight state space model (SSM) approaches are constrained to fixed one-dimensional scanning that cannot fully exploit the two-dimensional time–frequency structure of speech spectrograms. To address these limitations, we propose DOM-MUSE, a lightweight U-Net-style SE framework built upon the Mamba-2 SSM with four targeted innovations. First, a Deformable Feature Extractor (DFE) predicts per location spatial offsets that warp the feature sampling grid to align with speech formant trajectories and harmonic structures, providing geometrically coherent inputs to the state space model. Second, a DOM Mamba Block with Cross-Dimensional Gated Fusion (CDGF) deploys two parallel Mamba-2 instances scanning the time and frequency axes independently, and uses Taylor Channel Attention (TCA) to derive semantic gates that modulate each SSM output before fusion. Third, a Phase-Guided Feature Conditioner (PGFC) computes local phase-gradient gates that suppress noise-dominated activations prior to the SSM stage, making the feature extraction pathway implicitly phase-aware. Fourth, an Attention-Based Skip Connection (ABSC) replaces conventional concatenation skip connections with a learned channel gate, adaptively controlling the information flow from the encoder to the decoder. Experiments on the VoiceBank-DEMAND benchmark demonstrate that DOM-MUSE outperforms the reproduced MUSE baseline on all five evaluation metrics—including PESQ (+0.077), CSIG (+0.058), CBAK (+0.026), COVL (+0.070), and STOI (+0.002)—while reducing the parameter count by 24% (0.51 M to 0.39 M). Notably, DOM-MUSE also surpasses MUSE++ on perceptual quality metrics (PESQ +0.061, COVL +0.032) despite MUSE++ employing dynamic SNR augmentation and an augmented multi-objective loss that DOM-MUSE deliberately omits, demonstrating that the proposed architectural innovations yield genuine improvements independent of training strategy. When DOM-MUSE is additionally trained under the same augmented protocol as MUSE++, it achieves PESQ of 3.46 and COVL of 4.22, further confirming the complementary nature of architectural and training improvements. Full article

(This article belongs to the Special Issue Recent Advances in Audio, Speech and Music Processing and Analysis, 2nd Edition)

► Show Figures

Figure 1

22 pages, 4222 KB

Open AccessArticle

Feature Transformer and LightGBM Ensemble for Ship Trajectory Recognition Using Real AIS Data

by Songtao Hu, Liang Chen, Qianyue Zhang and Wenchao Liu

Electronics 2026, 15(10), 2152; https://doi.org/10.3390/electronics15102152 - 17 May 2026

Viewed by 220

Abstract

The Automatic Identification System (AIS) generates massive volumes of real-world ship trajectory data, providing a critical foundation for maritime ship-type classification. However, existing methods often struggle to simultaneously capture long-range temporal dependencies, maintain computational efficiency, and ensure model interpretability, making accurate multi-class classification [...] Read more.

The Automatic Identification System (AIS) generates massive volumes of real-world ship trajectory data, providing a critical foundation for maritime ship-type classification. However, existing methods often struggle to simultaneously capture long-range temporal dependencies, maintain computational efficiency, and ensure model interpretability, making accurate multi-class classification challenging in real-world maritime environments. To address these limitations, this study proposes a robust and efficient hybrid framework that integrates a Feature Transformer module for deep temporal feature extraction with a LightGBM model for ensemble classification. The multi-head self-attention within the Feature Transformer captures long-range dependencies in preprocessed AIS sequences to generate compact 64-dimensional trajectory fingerprints. These deep representations are concatenated with 103 carefully designed kinematic, geometric, statistical, frequency-domain, and segment-level features and fed into a LightGBM classifier for final ship-type identification. We evaluate the framework on a real-world AIS dataset of 2196 trajectories collected between 2019 and 2023, covering 14 ship types under a natural long-tail distribution. Across five random seeds, the proposed hybrid model achieves 78.06% ± 1.15% accuracy (95% CI) and 74.09% ± 1.82% Macro-F1 (95% CI), significantly outperforming Transformer-only (65.09% accuracy) and LightGBM-only (66.85%) baselines, with paired statistical tests confirming the improvement (McNemar χ² = 172.07, p < 10⁻³⁹ vs. Transformer; χ² = 92.24, p < 10⁻²¹ vs. LightGBM). The hybrid model offers ultra-fast inference at 0.051 ms per trajectory on GPU at batch size 128 (≈19,500 samples/s), and provides instance-level interpretability via SHapley Additive exPlanations (SHAP) analysis. These properties make the framework practical for near-real-time maritime traffic monitoring and decision support. Full article

► Show Figures

Figure 1

31 pages, 7889 KB

Open AccessArticle

Physics-Constrained Variational Autoencoders for Density Compensation in High-Rise LiDAR Point Clouds

by Kohei Arai

Automation 2026, 7(3), 76; https://doi.org/10.3390/automation7030076 - 15 May 2026

Viewed by 204

Abstract

High-rise LiDAR scanning produces vertically sparse point clouds where upper-layer defects are hardest to detect due to inverse-square ranging law (1/r²) density gradients, noise contamination, and complex geometries. This paper presents PC-TowerNet, a physics-aware AI pipeline that achieves state-of-the-art reconstruction through [...] Read more.

High-rise LiDAR scanning produces vertically sparse point clouds where upper-layer defects are hardest to detect due to inverse-square ranging law (1/r²) density gradients, noise contamination, and complex geometries. This paper presents PC-TowerNet, a physics-aware AI pipeline that achieves state-of-the-art reconstruction through sequential modules: (1) 50D geometric feature classification outperforming CloudCompare SOR (100% accuracy vs. 91.3% retention); (2) Physics-Constrained VAE (PC-VAE) recovering 28.7 ± 2.1% upper density vs. 8.3 ± 1.7% standard VAE; (3) multi-modal PointNet++/GNN/Transformer fusion; and (4) Bayesian uncertainty maps (ECE = 0.042 ± 0.008). Synthetic tower evaluation (10 × 5 seeds) demonstrates 48.9% surface smoothness improvement and 38.2% volume error reduction over tuned RANSAC baselines, with clear paths to real-data validation. Full article

(This article belongs to the Topic Application of Smart Technologies in Buildings, 2nd Edition)

► Show Figures

Figure 1

25 pages, 439 KB

Open AccessArticle

Parallel Transport on Spectral Subbundles of the Similarity Group

by Tianyu Wang, Jie Wang, Xinghua Xu, Shaohua Qiu and Changchong Sheng

Mathematics 2026, 14(10), 1701; https://doi.org/10.3390/math14101701 - 15 May 2026

Viewed by 120

Abstract

We construct a connection-theoretic framework for parallel transport of spectral components along parameter families of signals on the similarity group

\tilde{G} = R \times SO (2)

. Let

{f_{t}}_{t \in I}

be a signal family that [...] Read more.

We construct a connection-theoretic framework for parallel transport of spectral components along parameter families of signals on the similarity group

\tilde{G} = R \times SO (2)

. Let

{f_{t}}_{t \in I}

be a signal family that evolves under a

C^{1}

group trajectory. The frequency support of the associated scale-rotation transforms produces three Hilbert subbundles over the parameter interval, and the trajectory velocity induces a covariant derivative on each subbundle. The standard spectral viewpoint treats transformation behavior at individual parameter values. Our formulation instead organizes the propagation of spectral components along the entire parameter path and provides closed-form transport operators together with error bounds on each subbundle. We derive three explicit parallel transport formulas. On the equivariant subbundle the transport is an exact isometric translation. On the coupled subbundle, the transport combines log-scale translation with a phase factor

e^{i n_{0} Δ θ}

. On the invariant subbundle, the transport is approximate, with the quantitative bound

∥ Π^{inv} F - F ∥ \leq ε | Δ τ | ∥ F ∥

, where

Π^{inv}

denotes the parallel transport operator on that subbundle. We introduce the notion of non-parallelism rate as a pointwise measure of deviation from parallel evolution, and we prove that cumulative deviation along the path is bounded by the path integral of this quantity. The bound separates into two parts. One part is controlled by trajectory estimation error and reflects geometric mismatch. The other part is controlled by intrinsic appearance variation and reflects non-geometric drift. We also show that regularity transfers from the signal family to the spectral sections, and we establish a discrete transport theorem whose finite-sum error bounds recover the continuous estimates in the small-step limit. The framework provides a quantitative geometric tool for multi-scale feature evolution under continuous scale-rotation transformations. Full article

► Show Figures

Figure 1

28 pages, 5404 KB

Open AccessArticle

A High-Precision Method for Extracting Lateral Deformation in Operational Shield Tunnels Based on LiDAR Point Cloud Analysis

by Sijia Tang and Xiangyang Xu

Sensors 2026, 26(10), 3111; https://doi.org/10.3390/s26103111 - 14 May 2026

Viewed by 252

Abstract

Deformation monitoring is critical for structural health assessment of operational shield tunnels in urban rail transit. LiDAR point clouds in operating tunnels usually contain auxiliary facilities, occlusions, noise, and uneven point density. Conventional section-by-section ellipse fitting often leads to unstable parameter jumps between [...] Read more.

Deformation monitoring is critical for structural health assessment of operational shield tunnels in urban rail transit. LiDAR point clouds in operating tunnels usually contain auxiliary facilities, occlusions, noise, and uneven point density. Conventional section-by-section ellipse fitting often leads to unstable parameter jumps between adjacent sections. This paper presents a high-precision method to extract lateral deformation from tunnel LiDAR point clouds. First, a point-wise attention Transformer network (PWAT) is proposed based on PointNet++ for lining segmentation, using k-NN adaptive sampling, geometric position encoding, and geometry-constrained multi-head self-attention. Second, a continuity-constrained RANSAC (CC-RANSAC) algorithm is developed to improve ellipse parameter stability by adding continuity penalties between neighboring sections. Experiments were carried out on a Shanghai metro shield tunnel. Results show that PWAT achieves 99.53% overall accuracy and 99.06% mIoU in six-class segmentation. CC-RANSAC reduces the mean residual to 2.0 mm and the center jump rate to 4.2%. Compared with total station data, the mean absolute error and root mean square error are 1.35 mm and 1.68 mm. The proposed method can automatically and accurately extract lateral deformation for operational shield tunnels. Full article

(This article belongs to the Special Issue Recent Innovations in Computational Imaging and Sensing)

► Show Figures

Figure 1

25 pages, 14530 KB

Open AccessArticle

Symplectic Geometry Matrix Machine Controlled by the Whale Optimization Algorithm and Its Application in Bearing Fault Diagnosis

by Yonghua Jiang, Zhiqiang He, Zhilin Dong, Jianjie Zhang, Hongkui Jiang, Chao Tang, Jianfeng Sun, Xiaohao Chen and Weidong Jiao

Vibration 2026, 9(2), 34; https://doi.org/10.3390/vibration9020034 - 13 May 2026

Viewed by 181

Abstract

In the field of industrial equipment condition monitoring, accurate rolling bearing fault diagnosis is critical yet challenging due to high-dimensional vibration signals and complex operating conditions. Traditional machine learning methods often struggle with insufficient feature separability and sensitivity to model parameters, leading to [...] Read more.

In the field of industrial equipment condition monitoring, accurate rolling bearing fault diagnosis is critical yet challenging due to high-dimensional vibration signals and complex operating conditions. Traditional machine learning methods often struggle with insufficient feature separability and sensitivity to model parameters, leading to fluctuating diagnostic accuracy. To address these challenges, this study introduces the whale optimization algorithm-guided symplectic geometry matrix machine (WOA-SGMM) and proposes the application of the whale optimization algorithm (WOA) to optimize the symplectic geometry matrix machine (SGMM), forming a WOA-SGMM diagnostic framework. (1) The symplectic geometry spectral transformation (SGST) effectively converts high-dimensional vibration signals into low-dimensional feature matrices while preserving intrinsic geometric and topological structures, enhancing noise robustness. (2) Leveraging WOA, we adaptively search for the optimal hyperparameters of the proposed SGMM, specifically addressing the limitations of traditional SMM, to mitigate the risk of overfitting. (3) Experimental validation on three benchmark datasets demonstrates that WOA-SGMM achieves superior multi-class fault diagnosis accuracy (up to 100%) under varying operating conditions. Compared to traditional methods, the proposed WOA-SGMM demonstrates improved classification accuracy and enhanced robustness against noise interference in the tested experimental scenarios, highlighting its potential for real-world industrial applications. Full article

► Show Figures

Figure 1

24 pages, 8702 KB

Open AccessArticle

UST-YOLO11Pose-TRM: An Attention-Enhanced Keypoint Detection and Transformer Regression Framework for Yak Body Measurement

by Hua Li, Jinghan Cai, Tonghai Liu, Yapeng Xiao, Changran Liu and Can Zhou

Animals 2026, 16(10), 1493; https://doi.org/10.3390/ani16101493 - 13 May 2026

Viewed by 214

Abstract

Yak (Bos grunniens) is a vital livestock resource on the Qinghai–Tibet Plateau, and its body measurement parameters play a crucial role in growth and development assessment, health monitoring, and breeding improvement. To overcome the limitations of traditional manual measurements—such as low efficiency, unstable [...] Read more.

Yak (Bos grunniens) is a vital livestock resource on the Qinghai–Tibet Plateau, and its body measurement parameters play a crucial role in growth and development assessment, health monitoring, and breeding improvement. To overcome the limitations of traditional manual measurements—such as low efficiency, unstable accuracy, and the tendency to induce animal stress—this study proposes an intelligent yak body measurement prediction method that integrates keypoint detection with regression modeling, termed UST-YOLO11Pose-TRM. Within the YOLO11-Pose framework, three attention mechanisms—UIB, SENetV2, and TripleAttention—are incorporated to construct a lightweight yet high-precision keypoint detection model, UST-YOLO11Pose, thereby enhancing channel feature representation, global contextual modeling, and spatial dependency perception. Meanwhile, a Transformer-based regression model is designed, leveraging multi-head self-attention to characterize global geometric relationships among keypoints and to achieve accurate prediction of key body measurement parameters, including body length, body height, oblique body length, chest girth, and cannon circumference. Experimental results demonstrate that UST-YOLO11Pose achieves an mAP of 0.958, a Precision of 0.967, and a Recall of 0.955 in keypoint detection tasks, significantly outperforming both same-series and cross-series comparative models with a parameter size of only 10.06 MB. In the body measurement regression task, the Transformer-based regression model attains an RMSE of 0.185, an MAE of 0.122, an MAPE of 2.3%, and a coefficient of determination (R²) of 0.962 on the test set, indicating excellent predictive accuracy and robust fitting stability. In summary, UST-YOLO11Pose-TRM enables accurate, efficient, non-contact yak body measurement, showing strong potential for smart pasture development and precision livestock management. Full article

(This article belongs to the Section Cattle)

► Show Figures

Figure 1

18 pages, 4584 KB

Open AccessArticle

MaLCA: Point Cloud Registration with Mamba-Enhanced Features and Local Correspondence Augmentation

by Yuchen Huo, Longyun Zhang, Huijuan Guo, Jingyi Gong, Liqun Kuang, Xie Han and Fengguang Xiong

Algorithms 2026, 19(5), 380; https://doi.org/10.3390/a19050380 - 11 May 2026

Viewed by 230

Abstract

High-quality correspondences are critical to the accuracy and robustness of point cloud registration. Existing Transformer-based methods are fundamentally constrained by the quadratic computational complexity of self-attention, resulting in limited scalability. Moreover, conventional outlier removal paradigms operate by pruning initial correspondences, and thus fail [...] Read more.

High-quality correspondences are critical to the accuracy and robustness of point cloud registration. Existing Transformer-based methods are fundamentally constrained by the quadratic computational complexity of self-attention, resulting in limited scalability. Moreover, conventional outlier removal paradigms operate by pruning initial correspondences, and thus fail catastrophically in low-overlap scenarios where initial inliers are inherently scarce. To address these challenges, we propose MaLCA, a point cloud registration method based on Mamba-enhanced features and local correspondence augmentation. We first adopt KPFCN as the backbone to extract multi-scale geometric features from raw point clouds. A Mamba selective state space model then replaces self-attention for global context modeling with linear complexity, while cross-attention is retained to facilitate inter-point-cloud feature interaction. Rather than following the conventional subtraction-based outlier removal paradigm, we introduce a prior-guided local rematching strategy combined with a fused neighbor matching mechanism that iteratively constructs dense, high-quality correspondences from sparse initial inliers, fundamentally overcoming the bottleneck of inlier scarcity in challenging scenes. Extensive experiments on the 3DMatch/3DLoMatch and 4DMatch/4DLoMatch benchmarks demonstrate that MaLCA achieves competitive registration performance across both rigid and deformable scenarios, with particular advantages in low-overlap cases. Full article

► Show Figures

Figure 1

32 pages, 1357 KB

Open AccessArticle

Solving Geometry Problems: A Text–Formula–Image Multimodal Parsing and Fusion Model

by Pengpeng Jian, Zongxiang Song, Ting Song and Yanli Wang

Symmetry 2026, 18(5), 821; https://doi.org/10.3390/sym18050821 (registering DOI) - 10 May 2026

Viewed by 277

Abstract

Solving geometry problems is a critical challenge in education, for it demands the integration of textual semantic descriptions, mathematical formula logic and spatial graphical information, as well as rigorous geometric theorem application and stepwise logical deduction. These are core capabilities that underpin the [...] Read more.

Solving geometry problems is a critical challenge in education, for it demands the integration of textual semantic descriptions, mathematical formula logic and spatial graphical information, as well as rigorous geometric theorem application and stepwise logical deduction. These are core capabilities that underpin the realization of personalized intelligent tutoring and efficient educational resource allocation. Traditional geometry problem solving methods often suffer from deficiencies in accuracy and the fusion of text, formula and image features. Hence, this paper proposes a method of solving geometry problems based on a text–formula–image (TFI) multimodal parsing and fusion model. The TFI parser employs a self-attention multilayer Transformer to enhance the extraction of logical relations among geometric text expressions. Meanwhile, it parses formulas into tree structures to overcome the loss of formula structural features, which utilizes symbolic embedding and tree-structured encoding to preserve hierarchical logical information and yields unified formula representations via a multi-granularity fusion module. The TFI parser also leverages a Feature Pyramid Network (FPN) for the accurate detection of geometric and non-geometric instances, resolves the issues of blurred segmentation for slender geometric elements and the inaccurate localization of small-sized symbols through mask averaging and RoIAlign, and generates high-dimensional image features using DenseNet-121. The TFI multimodal fusion model integrates a contrastive learning mechanism and constructs fused feature representations by stacking self-attention and cross-attention layers. This design effectively narrows the semantic gap between text, formula, and image features, addressing the inadequacy of traditional fusion approaches in deep cross-modal feature alignment. An attention-augmented Gated Recurrent Unit (GRU) network processes the fused TFI features to produce target operation trees and geometry solutions, ensuring interpretable and precise reasoning performance. The proposed method is evaluated on the PGDP5K and GeoEval datasets, and it achieves an average accuracy of 59.63% in geometry problem solving, which validates its effectiveness. This paradigm offers a viable technical approach for uniformly modeling complex educational tasks, including geometry problem solving and timetable scheduling. Full article

(This article belongs to the Special Issue Symmetry and Asymmetry in Human-Computer Interaction)

► Show Figures

Figure 1

20 pages, 6641 KB

Open AccessArticle

Topology-Aware Road Extraction from Remote Sensing Images Using Deep Learning and Graph-Based Connectivity Refinement

by Zixuan Teng, Zezhong Zheng, Xiangyang Sun and Hao Xue

ISPRS Int. J. Geo-Inf. 2026, 15(5), 208; https://doi.org/10.3390/ijgi15050208 - 9 May 2026

Viewed by 410

Abstract

Road networks are fundamental components of transportation infrastructure and play a crucial role in various geospatial applications. Although deep learning-based semantic segmentation models have achieved promising results in extracting roads from high-resolution remote sensing imagery, the resulting networks often suffer from topological fragmentation [...] Read more.

Road networks are fundamental components of transportation infrastructure and play a crucial role in various geospatial applications. Although deep learning-based semantic segmentation models have achieved promising results in extracting roads from high-resolution remote sensing imagery, the resulting networks often suffer from topological fragmentation due to occlusions and shadows. To address this issue, we propose a topology-aware road extraction method that integrates deep learning-based segmentation with a graph-based connectivity refinement strategy. Specifically, a Pyramid Scene Parsing Network (PSPNet) is first employed to generate initial road probability maps. Subsequently, a connectivity-oriented post-processing pipeline is introduced, which incorporates a multi-source cost function strategy and a direction-aware Dijkstra search algorithm. By utilizing endpoint tangent vectors as inertial weights, the algorithm effectively reconstructs fragmented segments while ensuring geometric smoothness and topological consistency. Furthermore, a dynamic road width restoration strategy is applied to transform refined skeletons into physically consistent road entities. Experiments conducted on two publicly available datasets, CHN6-CUG and DeepGlobe, demonstrate the effectiveness of the proposed method. Quantitative results show that the refinement process significantly enhances road connectivity with a minimal trade-off in pixel-level accuracy. Specifically, the Conn metric increases by 0.1989 on the CHN6-CUG dataset and 0.3055 on the DeepGlobe dataset, while MIoU remains high with only marginal decreases of 1.07% and 0.45%, respectively. These findings indicate that the method effectively restores structural continuity, helping with reliable road network generation and subsequent integration into Geographic Information System (GIS)-based applications such as urban planning and autonomous navigation. Full article

(This article belongs to the Topic Digital and Intelligent Technologies and Application in Urban Construction, Operation, Maintenance, and Renewal)

► Show Figures

Figure 1

28 pages, 27037 KB

Open AccessArticle

WMC-DFINE: An Improved DFINE Model for Aluminum Profile Surface Defect Detection

by Pengfei He, Yunming Ding, Shuwen Yan, Guoheng Wang and Xia Liu

Sensors 2026, 26(10), 2994; https://doi.org/10.3390/s26102994 - 9 May 2026

Viewed by 518

Abstract

The automated inspection of aluminum profile surface defects, which heavily relies on data acquired by machine vision sensors, is a critical task in industrial quality control. Addressing the current challenges of intense background texture interference and the difficulty in detecting defects with extreme [...] Read more.

The automated inspection of aluminum profile surface defects, which heavily relies on data acquired by machine vision sensors, is a critical task in industrial quality control. Addressing the current challenges of intense background texture interference and the difficulty in detecting defects with extreme aspect ratios on aluminum profiles, this research puts forward a complete end-to-end defect detection algorithm named WMC-DFINE (WIFA-MKSS-CSFF-DFINE) based on the DFINE framework. First, a Wavelet-Integrated Frequency Attention (WIFA) module is introduced, which utilizes a discrete wavelet transform to decouple features into the frequency domain, thereby dynamically suppressing high-frequency background noise and enhancing defect edge responses. Second, a Cross-Scale Feature Fusion (CSFF) module based on dual-channel pooling is designed to ensure the continuity of defect features, thereby resolving the semantic misalignment issue in traditional fusion. Third, a Multi-Kernel Strip Shuffle (MKSS) module is incorporated, utilizing decomposed convolution kernels to capture the geometric features of slender scratches. Finally, a knowledge distillation strategy is employed to transfer structured knowledge from a complex teacher model to a lightweight student model. Experiments on the Tianchi aluminum defect dataset demonstrate that WMC-DFINE achieves a mAP of 82.1%, which surpasses algorithms including YOLOv12, RT-DETR, and the baseline model DFINE. Furthermore, the distilled student model, WMC-DFINE-distill, improves the mAP by 3.2% compared to DFINE, reduces parameter count by 47%, and achieves an inference speed of 59.75 FPS on the experimental equipment. The proposed method effectively resolves the problem of balancing background suppression and defect detail feature preservation, offering a practical and efficient scheme for real-time industrial defect inspection. Full article

(This article belongs to the Section Industrial Sensors)

► Show Figures

Figure 1

Search Results (341)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (341)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI