Saved Queries

Deep learning is increasingly integrated into oral rehabilitation workflows, particularly in implant planning, prosthodontic design automation, and peri-implant diagnosis. However, reported performance is heterogeneous and difficult to compare across tasks, modalities, and validation designs. The goal of this study was to critically analyze deep learning architecture families applied to oral rehabilitation and to provide task-driven selection guidance supported by an evidence table reporting dataset characteristics, validation strategy, and performance metrics. A focused narrative review was conducted using transparent, database-specific search criteria (final n = 10 included studies), emphasizing implant planning (cone–beam computed tomography [CBCT]-based segmentation), prosthodontic design (intraoral scan [IOS]/mesh inputs), and peri-implant diagnosis (periapical/panoramic radiographs). Evidence certainty for each clinical task was assessed using GRADE-informed ratings (High/Moderate/Low/Very Low). Extracted variables included clinical task, imaging modality, dataset size, architecture, validation strategy (internal vs. internal + external), split level, ground truth protocol, and performance metrics. A structured computational and hardware feasibility analysis was conducted for each architecture family to support real-world deployment planning. Encoder–decoder networks (U-Net/nnU-Net) dominate CBCT segmentation for implant planning, while detection architectures (Faster R-CNN, YOLO) support implant localization and peri-implant assessment on radiographs. Generative models (3D GANs, transformer-based point-to-mesh networks) enable crown design from three-dimensional scans. Hybrid CNN–Transformer architectures show promise for multimodal CBCT–IOS fusion, though direct evidence from the included studies remains limited to a single study. External validation remains uncommon yet essential given the risk of domain shift. In conclusion, architecture selection should be anchored to task geometry (2D vs. 3D), artifact burden, and required clinical output type. Reporting standards should prioritize dataset transparency, validation rigor, multi-center external testing, and uncertainty-aware outputs. Full article

(This article belongs to the Special Issue New Advances in Machine Learning and Optimization for Digital Transformation)

►▼ Show Figures

Figure 1

18 pages, 4881 KB

Open AccessArticle

Fractal Dimension Analysis and TOPSIS Method for Comprehensive Evaluation of Slagging Tendency of High-Alkali Coal from Xinjiang

by Jialisen Yimanhazi, Keji Wan, Mingqiang Gao, Qiongqiong He and Zhenyong Miao

Processes 2026, 14(8), 1216; https://doi.org/10.3390/pr14081216 - 10 Apr 2026

Abstract

High-alkali coal can cause slagging and fouling and impact the operational lifespan of the boilers. Traditional single-indicator methods often yield inconsistent results when evaluating the slagging risk of high-alkali coal. In this study, six coal samples were selected and systematically analyzed for their slagging characteristics using scanning electron microscopy (SEM), X-ray fluorescence (XRF), X-ray diffraction (XRD), and ash morphology analysis. Furthermore, a comprehensive evaluation model was constructed by integrating the technique for order preference by similarity to ideal solution (TOPSIS) with the entropy weight method. Additionally, based on images of ash morphology, the fractal dimension (D) was introduced as a quantitative indicator to predict slagging tendency through crack characteristics. The results show that TF, ZD, and KB samples, which are rich in alkaline oxides (CaO, Fe₂O₃, Na₂O, K₂O), form low-melting-point eutectic silicates during combustion, resulting in significant melting and agglomeration with wide cracks between aggregates, indicating a strong slagging tendency. Their fractal dimensions (D) range from 1.81 to 1.92. In contrast, HM and WQ samples, dominated by SiO₂ and Al₂O₃, form high-melting-point mullite and quartz, showing loose ash morphology with uniformly distributed cracks and a weak slagging tendency, with D values of 1.68 and 1.75, respectively. A significant negative correlation was observed between D and the E-TOPSIS model (y = 3.54 − 1.72x). Therefore, fractal analysis allows for rapid assessment of slagging risk without the need for complex chemical testing. This study provides valuable insights for predicting the slagging tendency of high-alkali coal during combustion. Full article

(This article belongs to the Section Petroleum and Low-Carbon Energy Process Engineering)

►▼ Show Figures

Graphical abstract

19 pages, 11440 KB

Open AccessArticle

Cross-Sensor Evaluation of ZY1-02E and ZY1-02D Hyperspectral Satellites for Mapping Soil Organic Matter and Texture in the Black Soil Region

by Kun Shang, He Gu, Hongzhao Tang and Chenchao Xiao

Agronomy 2026, 16(8), 781; https://doi.org/10.3390/agronomy16080781 - 10 Apr 2026

Abstract

Soil health monitoring is critical for the sustainable management of the black soil region, a key resource for global food security. However, traditional field surveys are constrained by high operational costs, limited spatial coverage, and low temporal frequency, making them inadequate for high-resolution and time-sensitive soil monitoring. The recently launched ZY1-02E satellite, equipped with an advanced hyperspectral imager, offers a new potential data source, yet its capability for quantitative soil modelling requires rigorous cross-sensor validation. This study conducts a cross-sensor evaluation of ZY1-02E and its predecessor, ZY1-02D, for mapping soil organic matter (SOM) and soil texture (sand, silt, and clay) in Northeast China. Optimal spectral indices were constructed through exhaustive band combination and correlation screening, and quantitative inversion models were established using a hybrid framework integrating Random Frog feature selection with Gaussian Process Regression (GPR) and Boosting Trees, based on synchronous ground observations. Results demonstrate strong cross-sensor consistency, with spectral indices showing significant linear correlations (

R^{2} > 0.65

) between ZY1-02E and ZY1-02D. Furthermore, the quantitative retrieval models applied to ZY1-02E imagery achieved robust performance, with cross-sensor retrieval consistency exceeding

R^{2} = 0.60

for all parameters and SOM exhibiting the highest agreement (

R^{2} = 0.74

). These findings confirm the radiometric stability and algorithm transferability of ZY1-02E, demonstrating its capability to generate soil parameter products comparable to ZY1-02D without extensive model recalibration. The validated interoperability of the twin-satellite constellation substantially enhances temporal observation capacity during the narrow bare-soil window, effectively mitigating cloud-induced data gaps in high-latitude agricultural regions. Importantly, the enhanced monitoring framework provides a scalable technical paradigm for high-frequency hyperspectral soil mapping, offering critical spatial decision support for precision fertilization, soil degradation mitigation, and conservation tillage management in the Mollisol belt. Full article

(This article belongs to the Special Issue Advances in Remote Sensing Agronomic Application for Mapping and Modeling Soil Properties)

►▼ Show Figures

Figure 1

28 pages, 1509 KB

Open AccessArticle

Quantifying Structural Divergence Between Human and Diffusion-Based Generative Visual Compositions

by Necati Vardar and Çağrı Gümüş

Appl. Sci. 2026, 16(8), 3669; https://doi.org/10.3390/app16083669 - 9 Apr 2026

Abstract

The rapid proliferation of text-to-image generative systems has transformed visual content production, yet the structural characteristics embedded in their compositional outputs remain insufficiently understood. Rather than approaching human–AI differentiation as a purely classification problem, this study investigates whether a controlled set of AI-generated and human-designed posters exhibits measurable structural divergence under thematically matched conditions. A dataset of jazz festival posters was analyzed using interpretable geometric and information-theoretic descriptors, including spatial density (padding ratio), edge density, chromatic dispersion, and entropy-based measures. Instead of relying on deep neural detection architectures, we employed a transparent machine-learning framework to examine intrinsic structural separability within feature space. Results demonstrated highly stable group separation (ROC-AUC = 0.99; 95% CI: 0.978–0.999) under cross-validated evaluation. Distributional analysis further revealed a pronounced divergence in spatial density allocation (Kolmogorov–Smirnov statistic = 0.76, p < 10⁻²⁸), accompanied by a very large effect size (Cohen’s d = 1.365). While padding ratio emerged as the dominant discriminative factor, additional entropy- and chromatic-based descriptors contributed to group separation even when spatial density was excluded (AUC = 0.903). These findings indicate that AI-generated and human-designed posters can diverge in negative space allocation and chromatic organization under controlled thematic and platform-specific conditions. The study contributes to the explainable analysis of generative visual systems by reframing human–AI differentiation as a structural divergence problem grounded in interpretable image statistics rather than as a model-specific artifact detection task. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

►▼ Show Figures

Figure 1

19 pages, 1466 KB

Open AccessArticle

D²MNet: Difference-Aware Decoupling and Multi-Prompt Learning for Medical Difference Visual Question Answering

by Lingge Lai, Weihua Ou, Jianping Gou and Zhonghua Liu

J. Imaging 2026, 12(4), 162; https://doi.org/10.3390/jimaging12040162 - 9 Apr 2026

Abstract

Difference visual question answering (Diff-VQA) aims to answer questions by identifying and reasoning about differences between medical images. Existing methods often rely on simple feature subtraction or fusion to model image differences, while overlooking the asymmetric descriptive requirements of changed and unchanged cases and providing limited task-specific guidance to pretrained language decoders. To address these limitations, we propose D²MNet (Difference-aware Decoupling and Multi-prompt Network), a framework for medical Diff-VQA that combines change-aware reasoning with prompt-guided answer generation. Specifically, a Change Analysis Module (CAM) predicts whether a change is present and produces a binary change-aware prompt; a Difference-Aware Module (DAM) uses dual attention to capture fine-grained difference features; and a multi-prompt learning mechanism (MLM) injects question-aware, change-aware, and learnable prompts into the language decoder to improve contextual alignment and response generation. Experiments on the MIMIC-DiffVQA benchmark show that D2MNet achieves a CIDEr score of 2.907 ± 0.040, outperforming the strongest baseline, ReAl (2.409), under the same evaluation setting. These results demonstrate the effectiveness of the proposed design on benchmark medical Diff-VQA and suggest its potential for assisting difference-aware medical answer generation. Full article

(This article belongs to the Section Medical Imaging)

►▼ Show Figures

Figure 1

23 pages, 20258 KB

Open AccessArticle

Mining Scene Classification and Semantic Segmentation Using 3D Convolutional Neural Networks

by André Estevam Costa Oliveira, Matheus Corrêa Domingos, Valdivino Alexandre de Santiago Júnior and Maria Isabel Sobral Escada

Remote Sens. 2026, 18(8), 1112; https://doi.org/10.3390/rs18081112 - 8 Apr 2026

Abstract

High spatio-temporal resolution satellite imagery has become increasingly accessible thanks to advancements in the aerospace industry which, combined with a growing computational power, has enabled the spring of novel techniques regarding recognition in remote sensing (RS) images. However, there is still a lack of studies around 3D convolutions for spatio-temporal data applied to classification problems in RS. Hence, this study investigates the feasibility of 3D convolutional neural networks (3DCNNs) within a spatio-temporal perspective for scene classification and semantic segmentation in RS images, focusing on the identification of mining sites. We firstly developed a dataset covering several parts of Brazil based on MapBiomas products and Planet imagery, then we evaluated the effectiveness of 3DCNNs in capturing temporal information from a sequence of monthly captured images. Moreover, not only for scene classification but also for semantic segmentation, we compared 3D and 2D approaches. As for scene classification, a 3DCNN was better than the corresponding 2D model, while a 2D U-Net was better than a U-Net3D for semantic segmentation. The main explanation for this lies in the fact that a less costly annotation and training time strategy was adopted, but this may have harmed spatio-temporal approaches for semantic segmentation but not for scene classification. However, U-Net3D presented the highest Precision of all models, meaning that it is highly accurate when it predicts a positive. Moreover, 3DCNN (U-Net3D) presented significantly better performance with respect to semantic segmentation compared to other spatio-temporal approaches like ConvLSTM+U-Net and TempCNN. Sensitivity analysis revealed that the near-infrared (NIR) band played a decisive role in distinguishing mining areas, emphasizing its importance in highlighting subtle spectral variations associated with land-cover disturbances. Full article

(This article belongs to the Section Environmental Remote Sensing)

►▼ Show Figures

Figure 1

17 pages, 4006 KB

Open AccessArticle

Intervertebral Disc Elastography to Relate Shear Modulus and Relaxometry in Compression and Bending

by Zachary R. Davis, P. Cameron Gossett, Robert L. Wilson, Woong Kim, Yue Mei, Kent D. Butz, Nancy C. Emery, Eric A. Nauman, Stéphane Avril, Corey P. Neu and Deva D. Chan

Bioengineering 2026, 13(4), 437; https://doi.org/10.3390/bioengineering13040437 - 8 Apr 2026

Abstract

Intervertebral disc degeneration is the most recognized cause of low back pain, characterized by the decline in tissue structure and mechanics. Image-based mechanical parameters (e.g., strain, stiffness) may provide an ideal assessment of disc function that is lost with degeneration, but unfortunately, these remain underdeveloped. Moreover, it is unknown whether strain or stiffness of the disc may be predicted by MRI relaxometry (e.g., T₁ or T₂), an increasingly accepted quantitative measure of disc structure. In this study, we quantified T₁ and T₂ relaxation times and compared to in-plane strains measured with displacement-encoded MRI within human cadaveric discs under physiological levels of compression and bending. Using a novel inverse approach, we then estimated shear modulus in orthogonal image planes and regionally compared these values to relaxation times and 2D strains. Intratissue strain depended on the loading mode, and shear modulus in the nucleus pulposus was typically an order of magnitude lower than the annulus fibrosus. Relative shear moduli estimated from strain data derived under compression generally did not correspond with those from bending experiments. Only one anatomical region showed a significant correlation between relative shear modulus and relaxometry (T₁ vs. µ_rel, coronal plane under bending). Together, these results suggest that future inverse analyses may be improved by incorporating multiple loading conditions into the same model and that image-based elastography and relaxometry should be viewed as complementary measures of disc structure and function to assess degeneration in future studies. Full article

(This article belongs to the Special Issue Musculoskeletal Bioengineering: Imaging and Computational Modeling in Biomechanics—Second Edition)

►▼ Show Figures

Figure 1

21 pages, 11316 KB

Open AccessArticle

Multimodal Fusion Prediction of Radiation Pneumonitis via Key Pre-Radiotherapy Imaging Feature Selection Based on Dual-Layer Attention Multiple-Instance Learning

by Hao Wang, Dinghui Wu, Shuguang Han, Jingli Tang and Wenlong Zhang

J. Imaging 2026, 12(4), 158; https://doi.org/10.3390/jimaging12040158 - 8 Apr 2026

Abstract

Radiation pneumonitis (RP), one of the most common and severe complications in locally advanced non-small cell lung cancer (LA-NSCLC) patients following thoracic radiotherapy, presents significant challenges in prediction due to the complexity of clinical risk factors, incomplete multimodal data, and unavailable slice-level annotations in pre-radiotherapy CT images. To address these challenges, we propose a multimodal fusion framework based on Dual-Layer Attention-Based Adaptive Bag Embedding Multiple-Instance Learning (DAAE-MIL) for accurate RP prediction. This study retrospectively collected data from 995 LA-NSCLC patients who received thoracic radiotherapy between November 2018 and April 2025. After screening, Subject datasets (n = 670) were allocated for training (n = 535), and the remaining samples (n = 135) were reserved for an independent test set. The proposed framework first extracts pre-radiotherapy CT image features using a fine-tuned C3D network, followed by the DAAE-MIL module to screen critical instances and generate bag-level representations, thereby enhancing the accuracy of deep feature extraction. Subsequently, clinical data, radiomics features, and CT-derived deep features are integrated to construct a multimodal prediction model. The proposed model demonstrates promising RP prediction performance across multiple evaluation metrics, outperforming both state-of-the-art and unimodal RP prediction approaches. On the test set, it achieves an accuracy (ACC) of 0.93 and an area under the curve (AUC) of 0.97. This study validates that the proposed method effectively addresses the limitations of single-modal prediction and the unknown key features in pre-radiotherapy CT images while providing significant clinical value for RP risk assessment. Full article

(This article belongs to the Section Medical Imaging)

►▼ Show Figures

Figure 1

30 pages, 1323 KB

Open AccessArticle

Circular Polarization-Based Quantum Encoding for Image Transmission over Error-Prone Channels

by Udara Jayasinghe and Anil Fernando

Signals 2026, 7(2), 37; https://doi.org/10.3390/signals7020037 - 8 Apr 2026

Abstract

Quantum image transmission over noisy communication channels remains a challenge due to the fragility of quantum states and their susceptibility to channel impairments. Existing quantum encoding schemes often exhibit limited noise resilience, while advanced approaches introduce computational and implementation complexity. To address these limitations, this paper proposes a circular polarization-based quantum encoding framework for image transmission over error-prone channels. In the proposed approach, source images are compressed and source-encoded using standard image coding formats, including the joint photographic experts group (JPEG) standard and the high-efficiency image file format (HEIF), and converted into classical bitstreams. The resulting bitstreams are protected using channel coding and mapped onto quantum states via circular polarization representations, where left- and right-hand circularly polarized states encode binary information. The encoded quantum states are transmitted over noisy quantum channels to model channel impairments. At the receiver, appropriate quantum decoding and channel decoding operations are applied to recover the classical bitstream, followed by source decoding to reconstruct the image. The performance of the proposed framework is evaluated using image quality metrics, including peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and universal quality index (UQI). Simulation results demonstrate that the proposed circular polarization-based encoding scheme outperforms existing quantum image encoding techniques, achieving channel SNR gains of 4 dB over state-of-the-art Hadamard-based encoding and 3 dB over frequency-domain quantum encoding methods under severe noise conditions. These results indicate that circular polarization-based quantum encoding provides improved noise robustness and reconstruction fidelity for practical quantum image transmission systems. Full article

(This article belongs to the Special Issue Advanced Signal Processing Technologies: Integrating AI, Future Communications, and Innovative Applications)

►▼ Show Figures

Figure 1

21 pages, 4058 KB

Open AccessArticle

Transient Voltage Stability Assessment Method Based on CWT-ResNet

by Chong Shao, Yongsheng Jin, Bolin Zhang, Xin He, Chen Zhou and Haiying Dong

Energies 2026, 19(7), 1804; https://doi.org/10.3390/en19071804 - 7 Apr 2026

Abstract

Accurate and rapid transient voltage stability assessment is crucial for the safe and stable operation of new energy bases in desert and grassland regions. Existing deep learning methods fail to adequately capture the high-dimensional dynamic coupling features of transient voltage signals in large-scale renewable energy bases with UHVDC transmission, and suffer from poor performance under class-imbalanced sample conditions. This paper proposes a transient voltage stability assessment method utilizing continuous wavelet transform (CWT) time–frequency images and a deep residual network (ResNet-50). CWT with the Morlet wavelet basis converts voltage time-series signals into multi-scale time–frequency images to simultaneously capture temporal and frequency-domain transient features. An improved focal loss (FL) function is introduced to dynamically adjust category weights based on actual sample distribution, enhancing model robustness under extreme class imbalance. The proposed method is validated on a modified IEEE 39-bus system incorporating the Qishao UHVDC line and wind/photovoltaic integration in Northwest China, using 1490 simulation samples under diverse fault scenarios. Results demonstrate that the proposed CWT-ResNet achieves 98.88% accuracy, 94.74% precision, 100% recall, and 97.29% F1-score, outperforming SVM, 1D-CNN, and 1D-ResNet baselines. Under 5 dB noise conditions, the method maintains over 90% accuracy, demonstrating strong noise robustness. Full article

(This article belongs to the Special Issue Challenges and Innovations in Stability and Control of Power Systems)

►▼ Show Figures

Figure 1

22 pages, 249676 KB

Open AccessArticle

AI- and AR-Assisted Reactivation of Chinese Paper Cutting Using Temple Arts and Ancient Paintings

by Naai-Jung Shih and Yan-Ting Chen

Heritage 2026, 9(4), 150; https://doi.org/10.3390/heritage9040150 - 7 Apr 2026

Viewed by 134

Abstract

Traditional Chinese paper cutting represents an important intangible cultural heritage. Can artificial intelligence (AI) reactivate the heritage in a new style? The aim of this study was to use AI to reactivate temple arts and paintings by converting them into the style of traditional Chinese paper cuttings. Thirty sets of old images taken 18 years ago and 10 images of ancient paintings from the National Palace Museum were restyled in Nano Banana (Pro)^®. Related design elements included integrated isolated parts, visual depth, details, and solid and void alternation. Three-dimensional stone and wood sculptures were reconstructed using Rodin^® or Meshy^® and converted into AR models in Sketchfab^®. From the generated 2D images and their 3D representations, a reactivated style of Chinese paper cutting was developed that can be interacted with in the AR smartphone platform or RP in the physical world. Approximately 370 images were regenerated, and 167 versions of models were reconstructed. AI should be considered part of culture. Rethinking traditional folk art highlights demand for the cross-reference and cross-reactivation of heterogeneous art forms. This AI model interprets novel 3D structural and visual details and creates a unique 2D and 3D identity for each subject. Full article

(This article belongs to the Special Issue Advanced 2D and 3D Modeling Techniques and AI Applications for Archaeological Site Documentation and Preservation)

►▼ Show Figures

Figure 1

20 pages, 3455 KB

Open AccessArticle

FocusMamba: A Local–Global Mamba Framework Inspired by Visual Observation for Brain Tumor Segmentation

by Qiang Li, Tao Ni, Xueyan Wang and Hengxin Liu

Appl. Sci. 2026, 16(7), 3571; https://doi.org/10.3390/app16073571 - 6 Apr 2026

Viewed by 130

Abstract

Accurate brain tumor segmentation from magnetic resonance imaging (MRI) is crucial for brain tumor diagnosis, clinical treatment decisions, and advancing research. CNNs and Transformers have dominated this area, but CNNs struggle with long-range modeling, whereas Transformers are limited by the high computational costs of self-attention. Recently, Mamba has garnered significant attention due to its remarkable performance in long sequence modeling. However, the original Mamba architecture, designed primarily for 1D sequence modeling, fails to effectively capture the spatial and structural relationships essential for brain tumor segmentation. In this paper, we propose FocusMamba, a Mamba-based model inspired by human visual observation patterns, which jointly enhances local detail modeling and global contextual understanding. FocusMamba consists of three components: (i) a novel hierarchical and tri-directional Mamba unit that elevates attention from the global to the window level, reinforcing local semantic feature extraction, while simultaneously achieving window-level interactions to maintain broader global awareness, (ii) a large kernel convolution unit that captures long-range dependencies within whole-volume features, overcoming the limitations of Mamba’s single-scale context modeling, and (iii) a fusion unit that enhances the overall feature representation by fusing information from different levels. Extensive experiments on the BraTS 2023 and BraTS 2020 datasets demonstrate that FocusMamba achieves superior segmentation performance compared with several advanced methods. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

►▼ Show Figures

Figure 1

20 pages, 11231 KB

Open AccessArticle

YOLO-Based Shading Artifact Reduction for CBCT-to-MDCT Translation Using Two-Stage Learning

by Yangheon Lee and Hyun-Cheol Park

Mathematics 2026, 14(7), 1223; https://doi.org/10.3390/math14071223 - 6 Apr 2026

Viewed by 202

Abstract

Cone-beam computed tomography (CBCT) offers advantages of low radiation dose and rapid acquisition but suffers from scatter-induced shading artifacts that limit diagnostic value compared to multi-detector CT (MDCT). While CycleGAN enables unpaired image translation, its uniform loss application struggles with localized artifact removal. We propose a two-stage learning framework with YOLO-based region correction loss. Stage 1 trains a standard CycleGAN to establish stable CBCT-MDCT domain mapping. Stage 2 fine-tunes the model by applying gradient magnitude minimization loss selectively to artifact regions detected by a pretrained YOLO detector, enabling focused correction while preserving anatomical structures. Using 11,000 2D CBCT slices from 17 patients (14 training, 3 testing) and 23,500 2D MDCT slices from 50 patients, our method achieves a 14.0% reduction in artifact score compared to baseline CycleGAN while maintaining high structural similarity (SSIM > 0.96). Independent evaluation using integral nonuniformity (INU) and shading index (SI) confirms consistent improvement across physics-based metrics. The self-regulating mechanism, where YOLO detection confidence naturally decreases as artifacts diminish, provides automatic adjustment without manual intervention. This work demonstrates that combining staged learning with object detection offers an effective solution for localized artifact removal in medical image translation, potentially improving diagnostic accuracy while preserving the low-dose benefits of CBCT. Full article

(This article belongs to the Special Issue Application of Machine Learning and Mathematical Methods in Image Analysis and Computer Vision)

►▼ Show Figures

Figure 1

17 pages, 12185 KB

Open AccessArticle

Adjustable Complexity Transformer Architecture for Image Denoising

by Jan-Ray Liao, Wen Lin and Li-Wen Chang

Signals 2026, 7(2), 33; https://doi.org/10.3390/signals7020033 - 6 Apr 2026

Viewed by 234

Abstract

In recent years, image denoising has seen a shift from traditional non-local self-similarity methods like BM3D to deep-learning based approaches that use learnable convolutions and attention mechanisms. While pixel-level attention is effective at capturing long-range relationships similar to non-local self-similarity based methods, it incurs extremely high computational costs that scale quadratically with image resolution. As an alternative, channel-wise attention is resolution-independent and computationally efficient but may miss crucial spatial details. In this paper, an adjustable attention mechanism is introduced that bridges the gap between pixel and channel attentions. In the proposed model, average pooling and variable-size convolutions are added before attention calculation to adjust spatial resolution and, thus, allow dynamical adjustment of computational complexity. This adjustable attention is applied in a transformer-based U-Net architecture and achieves performance comparable to state-of-the-art methods in both real and Gaussian blind denoising tasks. To be more concrete, the proposed method achieves a Peak Signal-to-Noise Ratio of 39.65 dB and a Structural Similarity Index Measure of 0.913 on the Smartphone Image Denoising Dataset. Therefore, the proposed method demonstrates a balance between efficiency and denoising quality. Full article

(This article belongs to the Topic Image Processing, Signal Processing and Their Applications)

►▼ Show Figures

Figure 1

29 pages, 7604 KB

Open AccessArticle

Shading and Geometric Constraint Neural Radiance Field for DSM Reconstruction from Multi-View Satellite Images

by Zhihua Hu, Zhiwen Chen, Yushun Li, Yuxuan Liu, Kao Zhang, Chenguang Zhao and Yongxian Zhang

Remote Sens. 2026, 18(7), 1091; https://doi.org/10.3390/rs18071091 - 5 Apr 2026

Viewed by 143

Abstract

With the continued development of spatial information technologies, Digital Surface Models (DSMs) have become fundamental data products for urban planning, virtual reality, geographic information systems, and digital-earth applications. Neural Radiance Fields (NeRFs) have achieved remarkable success in multi-view 3D reconstruction in computer vision. Still, their application to DSM generation from satellite imagery remains challenging because of differences in imaging geometry, complex surface structure, and varying illumination conditions. To address these issues, this paper proposes a Shading and Geometric Constraint (SGC) method tailored to satellite photogrammetry and designed to integrate with existing NeRF-based frameworks such as Sat-NeRF and EO-NeRF. First, a physical imaging model based on Lambertian reflectance and spherical harmonics is introduced to represent the complex illumination variations in satellite images. Synthetic images generated by this model provide auxiliary supervision that improves robustness to illumination inconsistency. Second, inspired by classical shading-based refinement methods, we introduce a bilateral edge-preserving geometric constraint. Unlike standard smoothness terms, this constraint uses photometric discrepancies to weight geometric smoothing, thereby preserving sharp building boundaries while smoothing flat surfaces. We integrate the method into two state-of-the-art baselines, Sat-NeRF and EO-NeRF. EO-NeRF+SGC achieves up to a 57.93% reduction in elevation MAE relative to EO-NeRF, which is the largest relative MAE reduction reported in this study. The method also recovers finer structural details and sharper edges than recently published NeRF-based DSM reconstruction methods. Full article

(This article belongs to the Special Issue 3D Information Recovery and 2D Image Processing for Remotely Sensed Optical Images (Third Edition))

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 145.

Go to page 1 2 3 4 5

Search Results (7,245)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI