MDPI - Publisher of Open Access Journals

19 pages, 5485 KB

Open AccessArticle

Reliable Object Pose Alignment in Mixed-Reality Environments Using Background-Referenced 3D Reconstruction

by Gyu-Bin Shin, Bok-Deuk Song, Vladimirov Blagovest Iordanov, Sangjoon Park, Soyeon Lee and Suk-Ho Lee

Sensors 2026, 26(8), 2453; https://doi.org/10.3390/s26082453 - 16 Apr 2026

Viewed by 243

Abstract

Accurate alignment of real-world object poses with their virtual counterparts using sensors, e.g. cameras, is essential for consistent interaction in mixed-reality systems. However, objects can undergo abrupt, untracked movements during periods when a tracking system is inactive, e.g., overnight, causing stored pose records [...] Read more.

Accurate alignment of real-world object poses with their virtual counterparts using sensors, e.g. cameras, is essential for consistent interaction in mixed-reality systems. However, objects can undergo abrupt, untracked movements during periods when a tracking system is inactive, e.g., overnight, causing stored pose records to become inconsistent with the real scene and breaking user interaction in the virtual environment. Off-the-shelf 3D reconstruction networks such as MASt3R (Matching and Stereo 3D Reconstruction) method provide metrically scaled 3D point maps and pixel correspondences, but they are trained on static scenes and therefore fail to produce reliable object correspondences when the object has moved. We propose a robust pipeline that combines MASt3R’s metrically scaled 3D outputs with a background-based alignment strategy to recover and apply the true pose change of moved objects. Our method first segments foreground and background and extracts 3D background point sets for a reference day and a current day. An affine transformation between these background point sets is estimated via a standard registration technique and used to express the current-day object 3D coordinates in the reference coordinate frame. Within that unified frame we compute the object pose change and apply the resulting transform to the virtual object, restoring real–virtual consistency. Experiments on real scenes demonstrate that the proposed approach reliably corrects pose misalignments introduced during inactive periods and substantially improves over applying MASt3R alone, thereby enabling restored and consistent user interaction in the virtual environment. Full article

(This article belongs to the Special Issue Deep Learning Technology and Image Sensing: 2nd Edition)

► Show Figures

Figure 1

19 pages, 5016 KB

Open AccessArticle

Characterizing Urban Road CO₂ Emissions: A Study Based on GPS Data from Heavy-Duty Diesel Trucks

by Yanyan Wang, Li Wang, Jiaqiang Li, Yanlin Chen, Jiguang Wang, Jiachen Xu and Hongping Zhou

Atmosphere 2026, 17(4), 387; https://doi.org/10.3390/atmos17040387 - 10 Apr 2026

Viewed by 316

Abstract

Accurately quantifying carbon dioxide (CO₂) emissions from heavy-duty diesel trucks (HDTs) is crucial for developing effective transportation emission reduction strategies. In this study, we adopted a bottom–up approach and, in conjunction with the “International Vehicle Emissions” (IVE) model, constructed a high-resolution [...] Read more.

Accurately quantifying carbon dioxide (CO₂) emissions from heavy-duty diesel trucks (HDTs) is crucial for developing effective transportation emission reduction strategies. In this study, we adopted a bottom–up approach and, in conjunction with the “International Vehicle Emissions” (IVE) model, constructed a high-resolution 1 × 1 km CO₂ emission inventory for the urban area of Kunming, China. Using data from 1.24 million track points collected from 5996 heavy-duty diesel trucks, we implemented a map matching algorithm based on a simplified hidden Markov model (HMM) to efficiently process large-scale GPS data. Furthermore, we improved upon traditional spatial allocation methods by dynamically integrating track point density with static road network density. The results indicate that although higher driving speeds correspond to lower CO₂ emission rates, heavy-duty diesel trucks typically operate within an observed speed range of 40–60 km/h, with an average emission factor of approximately 500 g/km. Vehicles compliant with the “National III” emission standards remain the primary source of CO₂ emissions in this region. Correlation analysis reveals a significant positive relationship (p < 0.01) between emissions from heavy-duty diesel trucks and both traffic volume and mileage. Notably, daytime vehicle restriction policies led to a temporal redistribution of emissions rather than a net reduction in emissions; this resulted in increased activity levels of heavy-duty diesel trucks at night, leading to a surge in nighttime emissions. In terms of spatial distribution, the “dual-density” allocation method proposed in this study more accurately captured emission hotspots, revealing that CO₂ emissions are primarily concentrated in the southeastern part of the city—a distribution pattern largely influenced by the city’s industrial layout. Full article

(This article belongs to the Special Issue Traffic Related Emission (3rd Edition))

► Show Figures

Figure 1

26 pages, 4409 KB

Open AccessArticle

Low-Altitude Target Localization Method Based on Exogenous Radar with Multi-Base Station and 5G SSB Signals

by Yike Xu, Gangyi Tu, Luyan Zhang, Yi Zhou, Meiling Xiong and Yang Li

Sensors 2026, 26(7), 2183; https://doi.org/10.3390/s26072183 - 1 Apr 2026

Viewed by 265

Abstract

In this work, we propose a localization method based on an exogenous radar with multi-base station and the synchronization signal block (SSB) in 5G downlink signals. We combine physical cell identities (PCIs)-based identification with the extensive cancellation algorithm (ECA) to reconstruct and cancel [...] Read more.

In this work, we propose a localization method based on an exogenous radar with multi-base station and the synchronization signal block (SSB) in 5G downlink signals. We combine physical cell identities (PCIs)-based identification with the extensive cancellation algorithm (ECA) to reconstruct and cancel the present strongest SSB signal, thereby obtaining reference signal receiving power (RSRP) values of them in descending order of strength. Then, we designed a two-stage localization method. Firstly, we determined the target’s coarse location based on the directional characteristics of different SSB beams. Subsequently, we compared the RSRP values extracted from the actually received signals against those pre-obtained when the target is at various reference points. The reference point corresponding to the closest match was selected as the estimated target position. We conducted simulations under various signal-to-noise ratio (SNR) levels, reference point densities, and signal jitter conditions. The simulation results demonstrate that the method outperforms techniques such as Fang’s method for time difference of arrival (Fang-TDOA) and observed time difference of arrival (OTDOA). Full article

(This article belongs to the Section Radar Sensors)

► Show Figures

Figure 1

21 pages, 32230 KB

Open AccessArticle

Structure-Aware Feature Descriptor with Multi-Scale Side Window Filtering for Multi-Modal Image Matching

by Junhong Guo, Lixing Zhao, Quan Liang, Xinwang Du, Yixuan Xu and Xiaoyan Li

Appl. Sci. 2026, 16(6), 3018; https://doi.org/10.3390/app16063018 - 20 Mar 2026

Viewed by 234

Abstract

Traditional image feature matching methods often fail to achieve satisfactory performance on multimodal remote sensing images (MRSIs), mainly due to significant nonlinear radiometric distortion (NRD) and complex geometric deformation caused by different imaging mechanisms. The key to successful MRSI matching lies in preserving [...] Read more.

Traditional image feature matching methods often fail to achieve satisfactory performance on multimodal remote sensing images (MRSIs), mainly due to significant nonlinear radiometric distortion (NRD) and complex geometric deformation caused by different imaging mechanisms. The key to successful MRSI matching lies in preserving high-frequency edge structures that are robust to geometric deformation, while overcoming nonlinear intensity mappings induced by NRD. To address these challenges, this paper proposes a novel high-precision matching framework, termed structure-aware feature descriptor with multi-scale side window filtering (SA-SWF). The proposed framework consists of three stages: (1) an anisotropic morphological scale space is constructed based on multi-scale side window filtering to strictly preserve geometric edges, and feature points are extracted using a multi-scale adaptive structure tensor with sub-pixel refinement to ensure high localization precision; (2) a structure-aware feature descriptor is constructed by integrating gradient reversal invariance and entropy-weighted attention mechanisms, rendering the multi-modal description highly robust against contrast inversion and noise; and (3) a coarse-to-fine robust matching strategy is established to progressively refine correspondences from descriptor-space matching to strict sub-pixel geometric verification, thereby minimizing alignment errors. Experiments on 60 multimodal image pairs from six categories, including infrared-infrared, optical–optical, infrared–optical, depth–optical, map–optical, and SAR–optical datasets, demonstrate that SA-SWF consistently outperforms seven state-of-the-art competitors. Across all six dataset categories, SA-SWF achieves a 100% success rate, the highest average number of correct matches (356.8), and the lowest average root mean square error (1.57 pixels). These results confirm the superior robustness, stability, and geometric accuracy of SA-SWF under severe radiometric and geometric distortions. Full article

► Show Figures

Figure 1

14 pages, 12306 KB

Open AccessArticle

Quantitative Autofluorescence Imaging of Oral Mucosa and Lesions: A Proof-of-Concept Study

by Keerthi Gurushanth, Sumsum P. Sunny, Shubha Gurudath, Harshita Thakur, Kripa Adlene Edith, Keerthi Krishnakumar, Shikha Jha, Pavitra Chandrashekhar, Satyajit Topajiche, Lynette Linzbuoy, Sanjana Patrick, Ramyashree Rao, Simranjeet Kaur, Umeshgouda Patil, Ananya Nagaraj, Bofan Song, Rongguang Liang, Shubhasini Raghavan, Anupama Shetty, Amritha Suresh, Moni Abraham Kuriakose and Praveen Birur Nagaraj Show full author list Hide full author list

Diagnostics 2026, 16(6), 857; https://doi.org/10.3390/diagnostics16060857 - 13 Mar 2026

Viewed by 498

Abstract

Background/Objectives: This study aimed to quantitatively assess site-specific mean autofluorescence intensity across normal oral mucosal subsites and to evaluate the effectiveness of Autofluorescence Imaging (AFI) as an adjunct tool for distinguishing benign lesions, OPMDs, and oral cancers by comparing lesion intensity with anatomically [...] Read more.

Background/Objectives: This study aimed to quantitatively assess site-specific mean autofluorescence intensity across normal oral mucosal subsites and to evaluate the effectiveness of Autofluorescence Imaging (AFI) as an adjunct tool for distinguishing benign lesions, OPMDs, and oral cancers by comparing lesion intensity with anatomically matched healthy subsites. Methods: This observational study employed dual-mode imaging, comprising paired White Light Imaging (WLI) and AFI, captured from different oral cavity subsites using a smartphone-based point-of-care device. The Region of Interest (ROI) was annotated on WLI and automatically mapped to the corresponding AFI for both normal mucosa and lesions. WLI and AFI images were separated into their constituent red, green, and blue (RGB) channels, and AFI intensity was quantified via ImageJ. Results: A total of 1380 dual-mode images were acquired from 86 healthy participants. AFI intensities were comparable across most oral subsites, except for the lateral and ventral tongue. The lateral border showed the lowest fluorescence (Green channel-GC: 68.12 ± 28.27; Blue channel-BC: 25.29 ± 7.93), whereas the ventral tongue showed the highest (GC: 98.89 ± 42.22; BC: 37.08 ± 11.04; both p < 0.001). Among 611 lesions, predominantly from the buccal mucosa, AFI intensity declined progressively with increasing disease severity. Homogeneous leukoplakia (n = 149; GC: 38.62 ± 25.05; BC: 21.60 ± 9.50), non-homogeneous leukoplakia (n = 25; GC: 30.42 ± 18.66; BC: 18.25 ± 7.17) and oral cancer (n = 21; GC: 23.39 ± 15.53; BC: 15.82 ± 7.15; all p < 0.001) showed markedly reduced fluorescence, while benign lesions (n: 44; GC: 66.99 ± 30.88; BC: 32.01 ± 13.62) exhibited intermediate intensities, supporting AFI’s discriminative potential. Conclusions: This phase-1, proof-of-concept study highlights subsite-specific variations in autofluorescence intensity within healthy oral mucosa, providing an essential baseline for objective interpretation of lesion-associated fluorescence changes. AFI has the potential to be used as a non-invasive adjunct for monitoring OPMDs. Further validation in larger and more diverse cohorts is required before clinical implementation. Full article

(This article belongs to the Special Issue Optical Imaging: Trends, Impact, and Application in Medical and Biomedical Diagnostics)

► Show Figures

Figure 1

24 pages, 4915 KB

Open AccessArticle

Semantic-Guided Matching of Heterogeneous UAV Imagery and Mobile LiDAR Data Using Deep Learning and Graph Neural Networks

by Tee-Ann Teo, Hao Yu and Pei-Cheng Chen

Drones 2026, 10(3), 185; https://doi.org/10.3390/drones10030185 - 8 Mar 2026

Viewed by 388

Abstract

The integration of heterogeneous geospatial data, specifically low-cost unmanned aerial vehicle (UAV) imagery and mobile light detection and ranging (LiDAR) system point clouds, presents a significant challenge due to the significant radiometric and structural discrepancies between the two modalities. This study proposes a [...] Read more.

The integration of heterogeneous geospatial data, specifically low-cost unmanned aerial vehicle (UAV) imagery and mobile light detection and ranging (LiDAR) system point clouds, presents a significant challenge due to the significant radiometric and structural discrepancies between the two modalities. This study proposes a novel air-to-ground semantic feature matching framework to achieve precise geometric registration between these data sources by effectively incorporating semantic-constraint deep learning-based matching. The methodology transformed the cross-sensor alignment challenge into a robust two-dimensional image matching problem. This was achieved by first using YOLOv11 for semantic segmentation of common road markings in both the UAV orthoimage and the converted LiDAR intensity image to generate highly consistent feature references. Subsequently, the SuperPoint detector and a graph neural network matcher, SuperGlue, were applied to these semantic images to establish reliable geomatics information correspondence points. Experimental results confirmed that this semantic-guided strategy consistently outperformed traditional feature-based matching (i.e., scale-invariant feature transform + fast library for approximate nearest neighbors), particularly by converting the noisy LiDAR intensity image into a stabilized semantic representation. The explicit application of semantic constraints further proved effective in eliminating false matches between geometrically similar but semantically distinct objects. The final object-specific analysis demonstrated that features with clear, complex geometric structures (e.g., pedestrian crossings and directional arrows) provide the most robust matching control. In summary, the proposed framework successfully leverages semantic context to overcome cross-sensor heterogeneity, offering an automated and precise solution for the geometric alignment of mobile LiDAR data. Full article

(This article belongs to the Special Issue When Deep Learning Meets Geometry for Air-to-Ground Perception on Drones: 2nd Edition)

► Show Figures

Figure 1

27 pages, 15861 KB

Open AccessArticle

Explorable 3D Hyperspectral Models from Multi-Angle Gimballed LWIR Pushbroom Imagery

by Nikolay Golosov, Guido Cervone and Mark Salvador

Remote Sens. 2026, 18(5), 781; https://doi.org/10.3390/rs18050781 - 4 Mar 2026

Viewed by 344

Abstract

Hyperspectral imaging in the long-wave infrared (LWIR) range enables identification of chemical compositions and material properties, but reconstructing 3D models from gimballed pushbroom sensors remains challenging because their unique acquisition geometry is incompatible with conventional photogrammetric software designed for frame cameras. This study [...] Read more.

Hyperspectral imaging in the long-wave infrared (LWIR) range enables identification of chemical compositions and material properties, but reconstructing 3D models from gimballed pushbroom sensors remains challenging because their unique acquisition geometry is incompatible with conventional photogrammetric software designed for frame cameras. This study presents a workflow for creating explorable 3D models from multi-angle LWIR hyperspectral imagery by co-registering hyperspectral line-scan data with simultaneously acquired RGB frame camera imagery using deep learning-based image matching. The co-registered images are processed in commercial photogrammetric software (Agisoft Metashape), and a texture-to-image mapping algorithm preserves correspondences between 3D model coordinates and original hyperspectral pixels across multiple viewing angles. Quantitative evaluation against reference data demonstrates that co-registration reduces geometric error approaching the accuracy of models built from high-resolution RGB imagery. The resulting models enable the retrieval of 8–50 spectral signatures per surface point, captured from different viewing geometries. This approach facilitates interactive exploration of angular variations in thermal infrared spectra, supporting material identification for non-Lambertian surfaces where single-angle observations may be insufficient for reliable classification. Full article

► Show Figures

Figure 1

30 pages, 9900 KB

Open AccessArticle

Multimodal Weak Texture Remote Sensing Image Matching Based on Normalized Structural Feature Transform

by Qiang Xiong, Xiaojuan Liu, Xuefeng Zhang and Tao Ke

Remote Sens. 2026, 18(5), 775; https://doi.org/10.3390/rs18050775 - 4 Mar 2026

Viewed by 446

Abstract

Significant nonlinear radiation differences and weak texture differences exist between multimodal weak texture remote sensing images (MWTRSIs). When using traditional methods to match MWTRSIs, the low distinguishability of descriptors in weak texture regions results in poor matching performance. A robust matching method is [...] Read more.

Significant nonlinear radiation differences and weak texture differences exist between multimodal weak texture remote sensing images (MWTRSIs). When using traditional methods to match MWTRSIs, the low distinguishability of descriptors in weak texture regions results in poor matching performance. A robust matching method is proposed based on normalized structural feature transform (NSFT), which can extract spatial structural features of images while mitigating nonlinear radiation differences between weak texture regions. First, the bilateral filter is used to transform the weak texture remote sensing image into a normalized image, which not only greatly weakens the nonlinear radiation difference but also retains most of the structural information. Then, the UC-KAZE detector is designed to extract many evenly distributed feature points on the normalized image. Subsequently, a multimodal weak texture feature descriptor with rotation invariance is designed based on the self-similarity of the weak texture image. Finally, the initial correspondences are constructed by bilateral matching, and the mismatches are removed by the fast sample consensus (FSC) algorithm. We perform comparison experiments on eight types of MWTRSIs. The results show that the proposed method has good scale and rotation invariance and good resistance to nonlinear radiation differences and weak texture differences. Full article

(This article belongs to the Special Issue Image Matching and Target Recognition Technologies: Prospects and Challenges)

► Show Figures

Figure 1

22 pages, 17599 KB

Open AccessArticle

Self-Supervised 3D Cloud Motion Inversion from Ground-Based Binocular All-Sky Images

by Shan Jiang, Chen Zhang, Xu Fu, Lei Lin, Zhikuan Wang, Xingtong Li, Tianying Liu and Jifeng Song

Atmosphere 2026, 17(3), 236; https://doi.org/10.3390/atmos17030236 - 25 Feb 2026

Viewed by 456

Abstract

Addressing the challenge of stable cloud velocity field estimation under complex sky conditions in ground-based cloud imaging, this paper proposes a comprehensive 3D cloud velocity calculation framework. The methodology integrates binocular stereo vision geometry, self-supervised deep feature learning, and graph attention-based matching. First, [...] Read more.

Addressing the challenge of stable cloud velocity field estimation under complex sky conditions in ground-based cloud imaging, this paper proposes a comprehensive 3D cloud velocity calculation framework. The methodology integrates binocular stereo vision geometry, self-supervised deep feature learning, and graph attention-based matching. First, a self-supervised feature detection and description model tailored to the radiometric characteristics of cloud images is developed. By incorporating a homography adaptation strategy constrained by physical priors, the model acquires robust feature representations for weakly textured and highly deformable cloud masses without requiring labeled datasets. Subsequently, a Transformer-based graph neural network matcher is employed to establish global feature correspondences across both cross-view and cross-temporal dimensions, thereby substantially augmenting matching robustness. On this basis, the framework establishes a rigorous calibration model for fisheye cameras to derive cloud base height (CBH) via binocular geometry. These geometric constraints are then coupled with sequential feature tracking results to construct 3D velocity inversion equations, enabling an end-to-end mapping from 2D pixel coordinates to 3D physical space and providing direct estimation of physical cloud motion velocity in meters per second (m/s). The experimental results show that the proposed method extracts 4.5 times more feature points than the traditional SIFT method. Furthermore, the Pearson correlation coefficient for cloud motion trends in continuous sequences reaches 0.662 relative to baseline models, indicating good relative consistency in motion estimation. The framework achieves high-precision and stable velocity estimation across diverse cloud types, including cirrus, cumulus, stratus, and mixed clouds. Full article

(This article belongs to the Section Atmospheric Techniques, Instruments, and Modeling)

► Show Figures

Figure 1

27 pages, 6251 KB

Open AccessArticle

Drift-Free BIM Alignment for Mixed Reality Visualization Through Image Style Transfer and Feature Matching

by Mohamed Zahlan Abdul Muthalif, Davood Shojaei, Kourosh Khoshelham and Debaditya Acharya

Buildings 2026, 16(4), 852; https://doi.org/10.3390/buildings16040852 - 20 Feb 2026

Viewed by 443

Abstract

Accurate localization is a persistent challenge for Mixed Reality (MR) applications in the construction industry, where reliable alignment between digital building models and physical environments is critical. Commercial MR devices such as the Microsoft HoloLens rely on Visual-Inertial Simultaneous Localization and Mapping (VISLAM) [...] Read more.

Accurate localization is a persistent challenge for Mixed Reality (MR) applications in the construction industry, where reliable alignment between digital building models and physical environments is critical. Commercial MR devices such as the Microsoft HoloLens rely on Visual-Inertial Simultaneous Localization and Mapping (VISLAM) for pose estimation, but accumulated drift over extended trajectories and visually ambiguous indoor spaces often reduces localization accuracy. This paper presents a complementary localization refinement methodology that integrates HoloLens spatial tracking with image style transfer and geometry-based pose estimation for Building Information Modeling (BIM)-aligned MR visualization. Image style transfer is used to reduce appearance discrepancies between real-world images and synthetic BIM renderings, improving feature correspondence for geometric alignment. Pose refinement is then applied using feature matching and Perspective-n-Point (PnP) estimation to mitigate accumulated drift when sufficient visual evidence is available. The method is evaluated using 1408 image pairs captured along an indoor trajectory, demonstrating improved BIM alignment, significantly reducing accumulated drift to 1–2 pixels. The proposed approach supports more reliable MR visualization for construction-related tasks such as inspection, coordination, and spatial decision-making. Full article

(This article belongs to the Special Issue Digital Technologies in Buildings and Critical Infrastructure: Transforming Design, Construction, and Operations)

► Show Figures

Figure 1

20 pages, 4390 KB

Open AccessArticle

Study on Temperature Response Characteristics of Gas Containing Coal at Different Freezing Temperatures

by Qiang Wu, Zhaofeng Wang, Liguo Wang, Shujun Ma, Yongxin Sun, Shijie Li and Boyu Lin

Fuels 2026, 7(1), 11; https://doi.org/10.3390/fuels7010011 - 19 Feb 2026

Viewed by 306

Abstract

In the process of using the freezing method to uncover coal from stone gates, the thermal evolution profiles of the coal body during the freezing process tend to be complex due to the presence of gas and moisture. To investigate the temperature response [...] Read more.

In the process of using the freezing method to uncover coal from stone gates, the thermal evolution profiles of the coal body during the freezing process tend to be complex due to the presence of gas and moisture. To investigate the temperature response of coal containing gas under different freezing temperature conditions, a self-developed low-temperature freezing test system for coal containing water and gas was used to conduct freezing and cooling tests at different freezing temperatures (−5 °C to −30 °C). The temperature changes at various measuring points inside the coal over time were monitored in real time, and the temperature distribution, cooling law, and strain evolution process of the coal in the axial and radial directions were analyzed. The experimental results show that the cooling process of the center point of the coal can be divided into four stages: rapid cooling, extremely slow temperature drop, relatively slow cooling, and stable constant temperature. The time required to reach the stable constant temperature stage is inversely proportional to the freezing temperature, and corresponding prediction formulas have been established based on this. The standardized coal briquettes exhibit a gradient distribution characteristic of gradually increasing temperature from outside to inside in both axial and radial directions, with the radial temperature distribution being well matched by an exponential decay model. The strain of coal is affected by both thermal shrinkage and ice-induced expansion. The occurrence time of frost heave is positively correlated with freezing temperature, while the strain of frost heave is negatively correlated with freezing temperature. The axial frost heave effect is significantly stronger than the radial effect, but the radial frost heave occurs slightly earlier than the axial effect. This study reveals the thermal-mechanical coupling response mechanism of gas-containing coal during the low-temperature freezing process, and the research results can provide theoretical support for parameter optimization and engineering application of low-temperature freezing anti-outburst technology. Full article

► Show Figures

Figure 1

18 pages, 1073 KB

Open AccessArticle

HierFinRAG—Hierarchical Multimodal RAG for Financial Document Understanding

by Quang-Vinh Dang, Ngoc-Son-An Nguyen and Thi-Bich-Diem Vo

Informatics 2026, 13(2), 30; https://doi.org/10.3390/informatics13020030 - 10 Feb 2026

Viewed by 2138

Abstract

Financial document understanding remains a critical challenge for Large Language Models, primarily due to the complex interplay between narrative text and structured numerical tables. Existing Retrieval-Augmented Generation (RAG) systems often treat these modalities in isolation, leading to significant failures in tasks requiring joint [...] Read more.

Financial document understanding remains a critical challenge for Large Language Models, primarily due to the complex interplay between narrative text and structured numerical tables. Existing Retrieval-Augmented Generation (RAG) systems often treat these modalities in isolation, leading to significant failures in tasks requiring joint reasoning. This study introduces HierFinRAG, a novel hierarchical multimodal framework designed to unify tabular and textual data processing. Our approach employs a Table-Text Graph Neural Network (TTGNN) to explicitly model semantic and structural dependencies between table cells and corresponding text, coupled with a Symbolic–Neural Fusion module that routes queries between a neural generator and a symbolic calculator for precise arithmetic operations. We evaluate the system on the FinQA and FinanceBench datasets, comparing performance against strong baselines including Vanilla RAG and GPT-4o with Code Interpreter. Results demonstrate that HierFinRAG achieves an Exact Match score of 82.5% on FinQA, surpassing the best baseline by 6.5 percentage points, while maintaining a 3.5× faster inference latency than agentic approaches. These findings indicate that integrating hierarchical structural awareness with hybrid reasoning significantly enhances the accuracy and interpretability of financial artificial intelligence systems. Full article

► Show Figures

Figure 1

27 pages, 91954 KB

Open AccessArticle

A Robust DEM Registration Method via Physically Consistent Image Rendering

by Yunchou Li, Niangang Jiao, Feng Wang and Hongjian You

Appl. Sci. 2026, 16(3), 1238; https://doi.org/10.3390/app16031238 - 26 Jan 2026

Viewed by 422

Abstract

Digital elevation models (DEMs) play a critical role in geospatial analysis and surface modeling. However, due to differences in data collection payload, data processing methodology, and data reference baseline, DEMs acquired from various sources often exhibit systematic spatial offsets. This limitation substantially constrains [...] Read more.

Digital elevation models (DEMs) play a critical role in geospatial analysis and surface modeling. However, due to differences in data collection payload, data processing methodology, and data reference baseline, DEMs acquired from various sources often exhibit systematic spatial offsets. This limitation substantially constrains their accuracy and reliability in multi-source joint analysis and fusion applications. Traditional registration methods such as the Least-Z Difference (LZD) method are sensitive to gross errors, while multimodal registration approaches overlook the importance of elevation information. To address these challenges, this paper proposes a DEM registration method based on physically consistent rendering and multimodal image matching. The approach converts DEMs into image data through irradiance-based models and parallax geometric models. Feature point pairs are extracted using template-based matching techniques and further refined through elevation consistency analysis. Reliable correspondences are selected by jointly considering elevation error distributions and geometric consistency constraints, enabling robust affine transformation estimation and elevation bias correction. The experimental results demonstrate that in typical terrains such as urban areas, glaciers, and plains, the proposed method outperforms classical DEM registration algorithms and state-of-the-art remote sensing image registration algorithms. The results indicate clear advantages in registration accuracy, robustness, and adaptability to diverse terrain conditions, highlighting the potential of the proposed framework as a universal DEM collaborative registration solution. Full article

(This article belongs to the Section Earth Sciences)

► Show Figures

Figure 1

16 pages, 1834 KB

Open AccessArticle

FPC-Net: Revisiting SuperPoint with Descriptor-Free Keypoint Detection via Feature Pyramids and Consistency-Based Implicit Matching

by Ionuț-Orlando Grigore-Atimuț, Claudiu Leoveanu-Condrei and Călin-Adrian Popa

Appl. Sci. 2026, 16(3), 1223; https://doi.org/10.3390/app16031223 - 25 Jan 2026

Viewed by 535

Abstract

The extraction and matching of interest points are fundamental to many geometric computer vision tasks. Traditionally, matching is performed by assigning descriptors to interest points and identifying correspondences based on descriptor similarity. This work introduces a technique whereby interest points are inherently associated [...] Read more.

The extraction and matching of interest points are fundamental to many geometric computer vision tasks. Traditionally, matching is performed by assigning descriptors to interest points and identifying correspondences based on descriptor similarity. This work introduces a technique whereby interest points are inherently associated during detection, eliminating the need for computing, storing, transmitting, or matching descriptors. Although the matching accuracy is marginally lower than that of conventional approaches, our method completely eliminates the need for descriptors, leading to a drastic reduction in memory usage for localization systems. We assess its effectiveness by comparing it against both classical handcrafted methods and modern learned approaches. Full article

(This article belongs to the Special Issue Applications of Deep Learning and Artificial Intelligence Methods: 3rd Edition)

► Show Figures

Figure 1

16 pages, 3075 KB

Open AccessArticle

Liner Wear Evaluation of Jaw Crushers Based on Binocular Vision Combined with FoundationStereo

by Chuyu Wen, Zhihong Jiang, Zhaoyu Fu, Quan Liu and Yifeng Zhang

Appl. Sci. 2026, 16(2), 998; https://doi.org/10.3390/app16020998 - 19 Jan 2026

Viewed by 306

Abstract

To address the bottlenecks of traditional jaw crusher liner wear detection—high safety risks, insufficient precision, and limited full-range analysis—this paper proposes a non-contact, high-precision wear analysis method based on binocular vision and deep learning. At its core is the integration of the state-of-the-art [...] Read more.

To address the bottlenecks of traditional jaw crusher liner wear detection—high safety risks, insufficient precision, and limited full-range analysis—this paper proposes a non-contact, high-precision wear analysis method based on binocular vision and deep learning. At its core is the integration of the state-of-the-art FoundationStereo zero-shot stereo matching algorithm, following scenario-specific adaptations, into the 3D reconstruction of industrial liners for wear analysis. A novel wear quantification methodology and corresponding indicator system are also proposed. After calibrating the ZED2 binocular camera and fine-tuning the algorithm, FoundationStereo achieves an Endpoint Error (EPE) of 0.09, significantly outperforming traditional algorithms. To meet on-site efficiency requirements, a “single-view rapid acquisition + CUDA engineering acceleration” strategy is implemented, reducing point cloud generation latency from 165 ms to 120 ms by rewriting kernel functions and optimizing memory access patterns. Geometric accuracy verification shows a Mean Absolute Error (MAE) ≤ 0.128 mm, fully meeting industrial measurement standards. A complete process of “3D reconstruction–model registration–quantitative analysis” is constructed, utilizing three core indicators (maximum wear depth, average wear depth, and wear area ratio) to characterize liner wear. Statistical results—such as an average maximum wear depth of 55.05 mm—are highly consistent with manual inspection data, providing a safe, efficient, and precise digital solution for the predictive maintenance and intelligent operation and maintenance (O&M) of liners. Full article

► Show Figures

Figure 1

Search Results (450)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (450)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI