Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,438)

Search Parameters:
Keywords = input reconstruction

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 2631 KiB  
Article
Automatic 3D Reconstruction: Mesh Extraction Based on Gaussian Splatting from Romanesque–Mudéjar Churches
by Nelson Montas-Laracuente, Emilio Delgado Martos, Carlos Pesqueira-Calvo, Giovanni Intra Sidola, Ana Maitín, Alberto Nogales and Álvaro José García-Tejedor
Appl. Sci. 2025, 15(15), 8379; https://doi.org/10.3390/app15158379 - 28 Jul 2025
Abstract
This research introduces an automated 3D virtual reconstruction system tailored for architectural heritage (AH) applications, contributing to the ongoing paradigm shift from traditional CAD-based workflows to artificial intelligence-driven methodologies. It reviews recent advancements in machine learning and deep learning—particularly neural radiance fields (NeRFs) [...] Read more.
This research introduces an automated 3D virtual reconstruction system tailored for architectural heritage (AH) applications, contributing to the ongoing paradigm shift from traditional CAD-based workflows to artificial intelligence-driven methodologies. It reviews recent advancements in machine learning and deep learning—particularly neural radiance fields (NeRFs) and its successor, Gaussian splatting (GS)—as state-of-the-art techniques in the domain. The study advocates for replacing point cloud data in heritage building information modeling workflows with image-based inputs, proposing a novel “photo-to-BIM” pipeline. A proof-of-concept system is presented, capable of processing photographs or video footage of ancient ruins—specifically, Romanesque–Mudéjar churches—to automatically generate 3D mesh reconstructions. The system’s performance is assessed using both objective metrics and subjective evaluations of mesh quality. The results confirm the feasibility and promise of image-based reconstruction as a viable alternative to conventional methods. The study successfully developed a system for automated 3D mesh reconstruction of AH from images. It applied GS and Mip-splatting for NeRFs, proving superior in noise reduction for subsequent mesh extraction via surface-aligned Gaussian splatting for efficient 3D mesh reconstruction. This photo-to-mesh pipeline signifies a viable step towards HBIM. Full article
14 pages, 1855 KiB  
Article
Response of Tree-Ring Oxygen Isotopes to Climate Variations in the Banarud Area in the West Part of the Alborz Mountains
by Yajun Wang, Shengqian Chen, Haichao Xie, Yanan Su, Shuai Ma and Tingting Xie
Forests 2025, 16(8), 1238; https://doi.org/10.3390/f16081238 - 28 Jul 2025
Abstract
Stable oxygen isotopes in tree rings (δ18O) serve as important proxies for climate change and offer unique advantages for climate reconstruction in arid and semi-arid regions. We established an annual δ18O chronology spanning 1964–2023 using Juniperus excelsa tree-ring samples [...] Read more.
Stable oxygen isotopes in tree rings (δ18O) serve as important proxies for climate change and offer unique advantages for climate reconstruction in arid and semi-arid regions. We established an annual δ18O chronology spanning 1964–2023 using Juniperus excelsa tree-ring samples collected from the Alborz Mountains in Iran. We analyzed relationships between δ18O and key climate variables: precipitation, temperature, Palmer Drought Severity Index (PDSI), vapor pressure (VP), and potential evapotranspiration (PET). Correlation analysis reveals that tree-ring δ18O is highly sensitive to hydroclimatic variations. Tree-ring cellulose δ18O shows significant negative correlations with annual total precipitation and spring PDSI, and significant positive correlations with spring temperature (particularly maximum temperature), April VP, and spring PET. The strongest correlation occurs with spring PET. These results indicate that δ18O responds strongly to the balance between springtime moisture supply (precipitation and soil moisture) and atmospheric evaporative demand (temperature, VP, and PET), reflecting an integrated signal of both regional moisture availability and energy input. The pronounced response of δ18O to spring evaporative conditions highlights its potential for capturing high-resolution changes in spring climatic conditions. Our δ18O series remained stable from the 1960s to the 1990s, but showed greater interannual variability after 2000, likely linked to regional warming and climate instability. A comparison with the δ18O variations from the eastern Alborz Mountains indicates that, despite some differences in magnitude, δ18O records from the western and eastern Alborz Mountains show broadly similar variability patterns. On a larger climatic scale, δ18O correlates significantly and positively with the Niño 3.4 index but shows no significant correlation with the Arctic Oscillation (AO) or the North Atlantic Oscillation (NAO). This suggests that ENSO-driven interannual variability in the tropical Pacific plays a key role in regulating regional hydroclimatic processes. This study confirms the strong potential of tree-ring oxygen isotopes from the Alborz Mountains for reconstructing hydroclimatic conditions and high-frequency climate variability. Full article
(This article belongs to the Section Forest Meteorology and Climate Change)
Show Figures

Figure 1

17 pages, 6870 KiB  
Article
Edge- and Color–Texture-Aware Bag-of-Local-Features Model for Accurate and Interpretable Skin Lesion Diagnosis
by Dichao Liu and Kenji Suzuki
Diagnostics 2025, 15(15), 1883; https://doi.org/10.3390/diagnostics15151883 - 27 Jul 2025
Abstract
Background/Objectives: Deep models have achieved remarkable progress in the diagnosis of skin lesions but face two significant drawbacks. First, they cannot effectively explain the basis of their predictions. Although attention visualization tools like Grad-CAM can create heatmaps using deep features, these features [...] Read more.
Background/Objectives: Deep models have achieved remarkable progress in the diagnosis of skin lesions but face two significant drawbacks. First, they cannot effectively explain the basis of their predictions. Although attention visualization tools like Grad-CAM can create heatmaps using deep features, these features often have large receptive fields, resulting in poor spatial alignment with the input image. Second, the design of most deep models neglects interpretable traditional visual features inspired by clinical experience, such as color–texture and edge features. This study aims to propose a novel approach integrating deep learning with traditional visual features to handle these limitations. Methods: We introduce the edge- and color–texture-aware bag-of-local-features model (ECT-BoFM), which limits the receptive field of deep features to a small size and incorporates edge and color–texture information from traditional features. A non-rigid reconstruction strategy ensures that traditional features enhance rather than constrain the model’s performance. Results: Experiments on the ISIC 2018 and 2019 datasets demonstrated that ECT-BoFM yields precise heatmaps and achieves high diagnostic performance, outperforming state-of-the-art methods. Furthermore, training models using only a small number of the most predictive patches identified by ECT-BoFM achieved diagnostic performance comparable to that obtained using full images, demonstrating its efficiency in exploring key clues. Conclusions: ECT-BoFM successfully combines deep learning and traditional visual features, addressing the interpretability and diagnostic accuracy challenges of existing methods. ECT-BoFM provides an interpretable and accurate framework for skin lesion diagnosis, advancing the integration of AI in dermatological research and clinical applications. Full article
Show Figures

Figure 1

24 pages, 10103 KiB  
Article
Design Technique and Efficient Polyphase Implementation for 2D Elliptically Shaped FIR Filters
by Doru Florin Chiper and Radu Matei
Sensors 2025, 25(15), 4644; https://doi.org/10.3390/s25154644 - 26 Jul 2025
Viewed by 54
Abstract
This paper presents a novel analytical approach for the efficient design of a particular class of 2D FIR filters, having a frequency response with an elliptically shaped support in the frequency plane. The filter design is based on a Gaussian shaped prototype filter, [...] Read more.
This paper presents a novel analytical approach for the efficient design of a particular class of 2D FIR filters, having a frequency response with an elliptically shaped support in the frequency plane. The filter design is based on a Gaussian shaped prototype filter, which is frequently used in signal and image processing. In order to express the Gaussian prototype frequency response as a trigonometric polynomial, we developed it into a Fourier series up to a specified order, given by the imposed approximation precision. We determined analytically a 1D to 2D frequency transformation, which was applied to the factored frequency response of the prototype, yielding directly the factored frequency response of a directional, elliptically shaped 2D filter, with specified selectivity and an orientation angle. The designed filters have accurate shapes and negligible distortions. We also designed a 2D uniform filter bank of elliptical filters, which was then applied in decomposing a test image into sub-band images, thus proving its usefulness as an analysis filter bank. Then, the original image was accurately reconstructed from its sub-band images. Very selective directional elliptical filters can be used in efficiently extracting straight lines with specified orientations from images, as shown in simulation examples. A computationally efficient implementation at the system level was also discussed, based on a polyphase and block filtering approach. The proposed implementation is illustrated for a smaller size of the filter kernel and input image and is shown to have reduced computational complexity due to its parallel structure, being much more arithmetically efficient compared not only to the direct filtering approach but also with the most recent similar implementations. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

14 pages, 6202 KiB  
Article
Masked Channel Modeling Enables Vision Transformers to Learn Better Semantics
by Jiayi Chen, Yanbiao Ma, Wei Dai and Zhihao Li
Entropy 2025, 27(8), 794; https://doi.org/10.3390/e27080794 - 25 Jul 2025
Viewed by 86
Abstract
Leveraging the ability of Vision Transformers (ViTs) to model contextual information across spatial patches, Masked Image Modeling (MIM) has emerged as a successful pre-training paradigm for visual representation learning by masking parts of the input and reconstructing the original image. However, this characteristic [...] Read more.
Leveraging the ability of Vision Transformers (ViTs) to model contextual information across spatial patches, Masked Image Modeling (MIM) has emerged as a successful pre-training paradigm for visual representation learning by masking parts of the input and reconstructing the original image. However, this characteristic of ViTs has led many existing MIM methods to focus primarily on spatial patch reconstruction, overlooking the importance of semantic continuity in the channel dimension. Therefore, we propose a novel Masked Channel Modeling (MCM) pre-training paradigm, which reconstructs masked channel features using the contextual information from unmasked channels, thereby enhancing the model’s understanding of images from the perspective of channel semantic continuity. Considering that traditional RGB reconstruction targets lack sufficient semantic attributes in the channel dimension, MCM introduces advanced features extracted by the CLIP image encoder as reconstruction targets. This guides the model to better capture semantic continuity across feature channels. Extensive experiments on downstream tasks, including image classification, object detection, and semantic segmentation, demonstrate the effectiveness and superiority of MCM. Our code will be available later. Full article
(This article belongs to the Section Information Theory, Probability and Statistics)
Show Figures

Figure 1

27 pages, 30210 KiB  
Article
Research on a Rapid Three-Dimensional Compressor Flow Field Prediction Method Integrating U-Net and Physics-Informed Neural Networks
by Chen Wang and Hongbing Ma
Mathematics 2025, 13(15), 2396; https://doi.org/10.3390/math13152396 - 25 Jul 2025
Viewed by 78
Abstract
This paper presents a neural network model, PINN-AeroFlow-U, for reconstructing full-field aerodynamic quantities around three-dimensional compressor blades, including regions near the wall. This model is based on structured CFD training data and physics-informed loss functions and is proposed for direct 3D compressor flow [...] Read more.
This paper presents a neural network model, PINN-AeroFlow-U, for reconstructing full-field aerodynamic quantities around three-dimensional compressor blades, including regions near the wall. This model is based on structured CFD training data and physics-informed loss functions and is proposed for direct 3D compressor flow prediction. It maps flow data from the physical domain to a uniform computational domain and employs a U-Net-based neural network capable of capturing the sharp local transitions induced by fluid acceleration near the blade leading edge, as well as learning flow features associated with internal boundaries (e.g., the wall boundary). The inputs to PINN-AeroFlow-U are the flow-field coordinate data from high-fidelity multi-geometry blade solutions, the 3D blade geometry, and the first-order metric coefficients obtained via mesh transformation. Its outputs include the pressure field, temperature field, and velocity vector field within the blade passage. To enhance physical interpretability, the network’s loss function incorporates both the Euler equations and gradient constraints. PINN-AeroFlow-U achieves prediction errors of 1.063% for the pressure field and 2.02% for the velocity field, demonstrating high accuracy. Full article
Show Figures

Figure 1

21 pages, 4388 KiB  
Article
An Omni-Dimensional Dynamic Convolutional Network for Single-Image Super-Resolution Tasks
by Xi Chen, Ziang Wu, Weiping Zhang, Tingting Bi and Chunwei Tian
Mathematics 2025, 13(15), 2388; https://doi.org/10.3390/math13152388 - 25 Jul 2025
Viewed by 166
Abstract
The goal of single-image super-resolution (SISR) tasks is to generate high-definition images from low-quality inputs, with practical uses spanning healthcare diagnostics, aerial imaging, and surveillance systems. Although cnns have considerably improved image reconstruction quality, existing methods still face limitations, including inadequate restoration of [...] Read more.
The goal of single-image super-resolution (SISR) tasks is to generate high-definition images from low-quality inputs, with practical uses spanning healthcare diagnostics, aerial imaging, and surveillance systems. Although cnns have considerably improved image reconstruction quality, existing methods still face limitations, including inadequate restoration of high-frequency details, high computational complexity, and insufficient adaptability to complex scenes. To address these challenges, we propose an Omni-dimensional Dynamic Convolutional Network (ODConvNet) tailored for SISR tasks. Specifically, ODConvNet comprises four key components: a Feature Extraction Block (FEB) that captures low-level spatial features; an Omni-dimensional Dynamic Convolution Block (DCB), which utilizes a multidimensional attention mechanism to dynamically reweight convolution kernels across spatial, channel, and kernel dimensions, thereby enhancing feature expressiveness and context modeling; a Deep Feature Extraction Block (DFEB) that stacks multiple convolutional layers with residual connections to progressively extract and fuse high-level features; and a Reconstruction Block (RB) that employs subpixel convolution to upscale features and refine the final HR output. This mechanism significantly enhances feature extraction and effectively captures rich contextual information. Additionally, we employ an improved residual network structure combined with a refined Charbonnier loss function to alleviate gradient vanishing and exploding to enhance the robustness of model training. Extensive experiments conducted on widely used benchmark datasets, including DIV2K, Set5, Set14, B100, and Urban100, demonstrate that, compared with existing deep learning-based SR methods, our ODConvNet method improves Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM), and the visual quality of SR images is also improved. Ablation studies further validate the effectiveness and contribution of each component in our network. The proposed ODConvNet offers an effective, flexible, and efficient solution for the SISR task and provides promising directions for future research. Full article
Show Figures

Figure 1

20 pages, 2786 KiB  
Article
Inverse Kinematics-Augmented Sign Language: A Simulation-Based Framework for Scalable Deep Gesture Recognition
by Binghao Wang, Lei Jing and Xiang Li
Algorithms 2025, 18(8), 463; https://doi.org/10.3390/a18080463 - 24 Jul 2025
Viewed by 142
Abstract
In this work, we introduce IK-AUG, a unified algorithmic framework for kinematics-driven data augmentation tailored to sign language recognition (SLR). Departing from traditional augmentation techniques that operate at the pixel or feature level, our method integrates inverse kinematics (IK) and virtual simulation to [...] Read more.
In this work, we introduce IK-AUG, a unified algorithmic framework for kinematics-driven data augmentation tailored to sign language recognition (SLR). Departing from traditional augmentation techniques that operate at the pixel or feature level, our method integrates inverse kinematics (IK) and virtual simulation to synthesize anatomically valid gesture sequences within a structured 3D environment. The proposed system begins with sparse 3D keypoints extracted via a pose estimator and projects them into a virtual coordinate space. A differentiable IK solver based on forward-and-backward constrained optimization is then employed to reconstruct biomechanically plausible joint trajectories. To emulate natural signer variability and enhance data richness, we define a set of parametric perturbation operators spanning spatial displacement, depth modulation, and solver sensitivity control. These operators are embedded into a generative loop that transforms each original gesture sample into a diverse sequence cluster, forming a high-fidelity augmentation corpus. We benchmark our method across five deep sequence models (CNN3D, TCN, Transformer, Informer, and Sparse Transformer) and observe consistent improvements in accuracy and convergence. Notably, Informer achieves 94.1% validation accuracy with IK-AUG enhanced training, underscoring the framework’s efficacy. These results suggest that algorithmic augmentation via kinematic modeling offers a scalable, annotation free pathway for improving SLR systems and lays the foundation for future integration with multi-sensor inputs in hybrid recognition pipelines. Full article
(This article belongs to the Section Algorithms for Multidisciplinary Applications)
Show Figures

Figure 1

14 pages, 492 KiB  
Article
Learnable Priors Support Reconstruction in Diffuse Optical Tomography
by Alessandra Serianni, Alessandro Benfenati and Paola Causin
Photonics 2025, 12(8), 746; https://doi.org/10.3390/photonics12080746 - 24 Jul 2025
Viewed by 135
Abstract
Diffuse Optical Tomography (DOT) is a non-invasive medical imaging technique that makes use of Near-Infrared (NIR) light to recover the spatial distribution of optical coefficients in biological tissues for diagnostic purposes. Due to the intense scattering of light within tissues, the reconstruction process [...] Read more.
Diffuse Optical Tomography (DOT) is a non-invasive medical imaging technique that makes use of Near-Infrared (NIR) light to recover the spatial distribution of optical coefficients in biological tissues for diagnostic purposes. Due to the intense scattering of light within tissues, the reconstruction process inherent to DOT is severely ill-posed. In this paper, we propose to tackle the ill-conditioning by learning a prior over the solution space using an autoencoder-type neural network. Specifically, the decoder part of the autoencoder is used as a generative model. It maps a latent code to estimated physical parameters given in input to the forward model. The latent code is itself the result of an optimization loop which minimizes the discrepancy of the solution computed by the forward model with available observations. The structure and interpretability of the latent space are enhanced by minimizing the rank of its covariance matrix, thereby promoting more effective utilization of its information-carrying capacity. The deep learning-based prior significantly enhances reconstruction capabilities in this challenging domain, demonstrating the potential of integrating advanced neural network techniques into DOT. Full article
Show Figures

Figure 1

17 pages, 4338 KiB  
Article
Lightweight Attention-Based CNN Architecture for CSI Feedback of RIS-Assisted MISO Systems
by Anming Dong, Yupeng Xue, Sufang Li, Wendong Xu and Jiguo Yu
Mathematics 2025, 13(15), 2371; https://doi.org/10.3390/math13152371 - 24 Jul 2025
Viewed by 181
Abstract
Reconfigurable Intelligent Surface (RIS) has emerged as a promising enabling technology for wireless communications, which significantly enhances system performance through real-time manipulation of electromagnetic wave reflection characteristics. In RIS-assisted communication systems, existing deep learning-based channel state information (CSI) feedback methods often suffer from [...] Read more.
Reconfigurable Intelligent Surface (RIS) has emerged as a promising enabling technology for wireless communications, which significantly enhances system performance through real-time manipulation of electromagnetic wave reflection characteristics. In RIS-assisted communication systems, existing deep learning-based channel state information (CSI) feedback methods often suffer from excessive parameter requirements and high computational complexity. To address this challenge, this paper proposes LwCSI-Net, a lightweight autoencoder network specifically designed for RIS-assisted multiple-input single-output (MISO) systems, aiming to achieve efficient and low-complexity CSI feedback. The core contribution of this work lies in an innovative lightweight feedback architecture that deeply integrates multi-layer convolutional neural networks (CNNs) with attention mechanisms. Specifically, the network employs 1D convolutional operations with unidirectional kernel sliding, which effectively reduces trainable parameters while maintaining robust feature-extraction capabilities. Furthermore, by incorporating an efficient channel attention (ECA) mechanism, the model dynamically allocates weights to different feature channels, thereby enhancing the capture of critical features. This approach not only improves network representational efficiency but also reduces redundant computations, leading to optimized computational complexity. Additionally, the proposed cross-channel residual block (CRBlock) establishes inter-channel information-exchange paths, strengthening feature fusion and ensuring outstanding stability and robustness under high compression ratio (CR) conditions. Our experimental results show that for CRs of 16, 32, and 64, LwCSI-Net significantly improves CSI reconstruction performance while maintaining fewer parameters and lower computational complexity, achieving an average complexity reduction of 35.63% compared to state-of-the-art (SOTA) CSI feedback autoencoder architectures. Full article
(This article belongs to the Special Issue Data-Driven Decentralized Learning for Future Communication Networks)
Show Figures

Figure 1

25 pages, 2129 KiB  
Article
Zero-Shot 3D Reconstruction of Industrial Assets: A Completion-to-Reconstruction Framework Trained on Synthetic Data
by Yongjie Xu, Haihua Zhu and Barmak Honarvar Shakibaei Asli
Electronics 2025, 14(15), 2949; https://doi.org/10.3390/electronics14152949 - 24 Jul 2025
Viewed by 137
Abstract
Creating high-fidelity digital twins (DTs) for Industry 4.0 applications, it is fundamentally reliant on the accurate 3D modeling of physical assets, a task complicated by the inherent imperfections of real-world point cloud data. This paper addresses the challenge of reconstructing accurate, watertight, and [...] Read more.
Creating high-fidelity digital twins (DTs) for Industry 4.0 applications, it is fundamentally reliant on the accurate 3D modeling of physical assets, a task complicated by the inherent imperfections of real-world point cloud data. This paper addresses the challenge of reconstructing accurate, watertight, and topologically sound 3D meshes from sparse, noisy, and incomplete point clouds acquired in complex industrial environments. We introduce a robust two-stage completion-to-reconstruction framework, C2R3D-Net, that systematically tackles this problem. The methodology first employs a pretrained, self-supervised point cloud completion network to infer a dense and structurally coherent geometric representation from degraded inputs. Subsequently, a novel adaptive surface reconstruction network generates the final high-fidelity mesh. This network features a hybrid encoder (FKAConv-LSA-DC), which integrates fixed-kernel and deformable convolutions with local self-attention to robustly capture both coarse geometry and fine details, and a boundary-aware multi-head interpolation decoder, which explicitly models sharp edges and thin structures to preserve geometric fidelity. Comprehensive experiments on the large-scale synthetic ShapeNet benchmark demonstrate state-of-the-art performance across all standard metrics. Crucially, we validate the framework’s strong zero-shot generalization capability by deploying the model—trained exclusively on synthetic data—to reconstruct complex assets from a custom-collected industrial dataset without any additional fine-tuning. The results confirm the method’s suitability as a robust and scalable approach for 3D asset modeling, a critical enabling step for creating high-fidelity DTs in demanding, unseen industrial settings. Full article
Show Figures

Figure 1

27 pages, 8957 KiB  
Article
DFAN: Single Image Super-Resolution Using Stationary Wavelet-Based Dual Frequency Adaptation Network
by Gyu-Il Kim and Jaesung Lee
Symmetry 2025, 17(8), 1175; https://doi.org/10.3390/sym17081175 - 23 Jul 2025
Viewed by 207
Abstract
Single image super-resolution is the inverse problem of reconstructing a high-resolution image from its low-resolution counterpart. Although recent Transformer-based architectures leverage global context integration to improve reconstruction quality, they often overlook frequency-specific characteristics, resulting in the loss of high-frequency information. To address this [...] Read more.
Single image super-resolution is the inverse problem of reconstructing a high-resolution image from its low-resolution counterpart. Although recent Transformer-based architectures leverage global context integration to improve reconstruction quality, they often overlook frequency-specific characteristics, resulting in the loss of high-frequency information. To address this limitation, we propose the Dual Frequency Adaptive Network (DFAN). DFAN first decomposes the input into low- and high-frequency components via Stationary Wavelet Transform. In the low-frequency branch, Swin Transformer layers restore global structures and color consistency. In contrast, the high-frequency branch features a dedicated module that combines Directional Convolution with Residual Dense Blocks, precisely reinforcing edges and textures. A frequency fusion module then adaptively merges these complementary features using depthwise and pointwise convolutions, achieving a balanced reconstruction. During training, we introduce a frequency-aware multi-term loss alongside the standard pixel-wise loss to explicitly encourage high-frequency preservation. Extensive experiments on the Set5, Set14, BSD100, Urban100, and Manga109 benchmarks show that DFAN achieves up to +0.64 dBpeak signal-to-noise ratio, +0.01 structural similarity index measure, and −0.01learned perceptual image patch similarity over the strongest frequency-domain baselines, while also delivering visibly sharper textures and cleaner edges. By unifying spatial and frequency-domain advantages, DFAN effectively mitigates high-frequency degradation and enhances SISR performance. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

27 pages, 6578 KiB  
Article
Evaluating Neural Radiance Fields for ADA-Compliant Sidewalk Assessments: A Comparative Study with LiDAR and Manual Methods
by Hang Du, Shuaizhou Wang, Linlin Zhang, Mark Amo-Boateng and Yaw Adu-Gyamfi
Infrastructures 2025, 10(8), 191; https://doi.org/10.3390/infrastructures10080191 - 22 Jul 2025
Viewed by 236
Abstract
An accurate assessment of sidewalk conditions is critical for ensuring compliance with the Americans with Disabilities Act (ADA), particularly to safeguard mobility for wheelchair users. This paper presents a novel 3D reconstruction framework based on neural radiance field (NeRF), which utilize a monocular [...] Read more.
An accurate assessment of sidewalk conditions is critical for ensuring compliance with the Americans with Disabilities Act (ADA), particularly to safeguard mobility for wheelchair users. This paper presents a novel 3D reconstruction framework based on neural radiance field (NeRF), which utilize a monocular video input from consumer-grade cameras to generate high-fidelity 3D models of sidewalk environments. The framework enables automatic extraction of ADA-relevant geometric features, including the running slope, the cross slope, and vertical displacements, facilitating an efficient and scalable compliance assessment process. A comparative study is conducted across three surveying methods—manual measurements, LiDAR scanning, and the proposed NeRF-based approach—evaluated on four sidewalks and one curb ramp. Each method was assessed based on accuracy, cost, time, level of automation, and scalability. The NeRF-based approach achieved high agreement with LiDAR-derived ground truth, delivering an F1 score of 96.52%, a precision of 96.74%, and a recall of 96.34% for ADA compliance classification. These results underscore the potential of NeRF to serve as a cost-effective, automated alternative to traditional and LiDAR-based methods, with sufficient precision for widespread deployment in municipal sidewalk audits. Full article
Show Figures

Figure 1

18 pages, 2028 KiB  
Article
Research on Single-Tree Segmentation Method for Forest 3D Reconstruction Point Cloud Based on Attention Mechanism
by Lishuo Huo, Zhao Chen, Lingnan Dai, Dianchang Wang and Xinrong Zhao
Forests 2025, 16(7), 1192; https://doi.org/10.3390/f16071192 - 19 Jul 2025
Viewed by 192
Abstract
The segmentation of individual trees holds considerable significance in the investigation and management of forest resources. Utilizing smartphone-captured imagery combined with image-based 3D reconstruction techniques to generate corresponding point cloud data can serve as a more accessible and potentially cost-efficient alternative for data [...] Read more.
The segmentation of individual trees holds considerable significance in the investigation and management of forest resources. Utilizing smartphone-captured imagery combined with image-based 3D reconstruction techniques to generate corresponding point cloud data can serve as a more accessible and potentially cost-efficient alternative for data acquisition compared to conventional LiDAR methods. In this study, we present a Sparse 3D U-Net framework for single-tree segmentation which is predicated on a multi-head attention mechanism. The mechanism functions by projecting the input data into multiple subspaces—referred to as “heads”—followed by independent attention computation within each subspace. Subsequently, the outputs are aggregated to form a comprehensive representation. As a result, multi-head attention facilitates the model’s ability to capture diverse contextual information, thereby enhancing performance across a wide range of applications. This framework enables efficient, intelligent, and end-to-end instance segmentation of forest point cloud data through the integration of multi-scale features and global contextual information. The introduction of an iterative mechanism at the attention layer allows the model to learn more compact feature representations, thereby significantly enhancing its convergence speed. In this study, Dongsheng Bajia Country Park and Jiufeng National Forest Park, situated in Haidian District, Beijing, China, were selected as the designated test sites. Eight representative sample plots within these areas were systematically sampled. Forest stand sequential photographs were captured using an iPhone, and these images were processed to generate corresponding point cloud data for the respective sample plots. This methodology was employed to comprehensively assess the model’s capability for single-tree segmentation. Furthermore, the generalization performance of the proposed model was validated using the publicly available dataset TreeLearn. The model’s advantages were demonstrated across multiple aspects, including data processing efficiency, training robustness, and single-tree segmentation speed. The proposed method achieved an F1 score of 91.58% on the customized dataset. On the TreeLearn dataset, the method attained an F1 score of 97.12%. Full article
(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)
Show Figures

Figure 1

22 pages, 5937 KiB  
Article
CSAN: A Channel–Spatial Attention-Based Network for Meteorological Satellite Image Super-Resolution
by Weiliang Liang and Yuan Liu
Remote Sens. 2025, 17(14), 2513; https://doi.org/10.3390/rs17142513 - 19 Jul 2025
Viewed by 354
Abstract
Meteorological satellites play a critical role in weather forecasting, climate monitoring, water resource management, and more. These satellites feature an array of radiative imaging bands, capturing dozens of spectral images that span from visible to infrared. However, the spatial resolution of these bands [...] Read more.
Meteorological satellites play a critical role in weather forecasting, climate monitoring, water resource management, and more. These satellites feature an array of radiative imaging bands, capturing dozens of spectral images that span from visible to infrared. However, the spatial resolution of these bands varies, with images at longer wavelengths typically exhibiting lower spatial resolutions, which limits the accuracy and reliability of subsequent applications. To alleviate this issue, we propose a channel–spatial attention-based network, named CSAN, designed to super-resolve all low-resolution (LR) bands to the available maximal high-resolution (HR) scale. The CSAN consists of an information fusion unit, a feature extraction module, and an image restoration unit. The information fusion unit adaptively fuses LR and HR images, effectively capturing inter-band spectral relationships and spatial details to enhance the input representation. The feature extraction module integrates channel and spatial attention into the residual network, enabling the extraction of informative spectral and spatial features from the fused inputs. Using these deep features, the image restoration unit reconstructs the missing spatial details in LR images. Extensive experiments demonstrate that the proposed network outperforms other state-of-the-art approaches quantitatively and visually. Full article
Show Figures

Figure 1

Back to TopTop