Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (140)

Search Parameters:
Keywords = LPIP

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 4290 KB  
Article
Information Modeling of Asymmetric Aesthetics Using DCGAN: A Data-Driven Approach to the Generation of Marbling Art
by Muhammed Fahri Unlersen and Hatice Unlersen
Information 2026, 17(1), 94; https://doi.org/10.3390/info17010094 - 15 Jan 2026
Viewed by 226
Abstract
Traditional Turkish marbling (Ebru) art is an intangible cultural heritage characterized by highly asymmetric, fluid, and non-reproducible patterns, making its long-term preservation and large-scale dissemination challenging. It is highly sensitive to environmental conditions, making it enormously difficult to mass produce while maintaining its [...] Read more.
Traditional Turkish marbling (Ebru) art is an intangible cultural heritage characterized by highly asymmetric, fluid, and non-reproducible patterns, making its long-term preservation and large-scale dissemination challenging. It is highly sensitive to environmental conditions, making it enormously difficult to mass produce while maintaining its original aesthetic qualities. A data-driven generative model is therefore required to create unlimited, high-fidelity digital surrogates that safeguard this UNESCO heritage against physical loss and enable large-scale cultural applications. This study introduces a deep generative modeling framework for the digital reconstruction of traditional Turkish marbling (Ebru) art using a Deep Convolutional Generative Adversarial Network (DCGAN). A dataset of 20,400 image patches, systematically derived from 17 original marbling works, was used to train the proposed model. The framework aims to mathematically capture the asymmetric, fluid, and stochastic nature of Ebru patterns, enabling the reproduction of their aesthetic structure in a digital medium. The generated images were evaluated using multiple quantitative and perceptual metrics, including Fréchet Inception Distance (FID), Kernel Inception Distance (KID), Learned Perceptual Image Patch Similarity (LPIPS), and PRDC-based indicators (Precision, Recall, Density, Coverage). For experimental validation, the proposed DCGAN framework is additionally compared against a Vanilla GAN baseline trained under identical conditions, highlighting the advantages of convolutional architectures for modeling marbling textures. The results show that the DCGAN model achieved a high level of realism and diversity without mode collapse or overfitting, producing images that were perceptually close to authentic marbling works. In addition to the quantitative evaluation, expert qualitative assessment by a traditional Ebru artist confirmed that the model reproduced the organic textures, color dynamics, and compositional asymmetrical characteristic of real marbling art. The proposed approach demonstrates the potential of deep generative models for the digital preservation, dissemination, and reinterpretation of intangible cultural heritage recognized by UNESCO. Full article
Show Figures

Graphical abstract

25 pages, 8224 KB  
Article
QWR-Dec-Net: A Quaternion-Wavelet Retinex Framework for Low-Light Image Enhancement with Applications to Remote Sensing
by Vladimir Frants, Sos Agaian, Karen Panetta and Artyom Grigoryan
Information 2026, 17(1), 89; https://doi.org/10.3390/info17010089 - 14 Jan 2026
Viewed by 125
Abstract
Computer vision and deep learning are essential in diverse fields such as autonomous driving, medical imaging, face recognition, and object detection. However, enhancing low-light remote sensing images remains challenging for both research and real-world applications. Low illumination degrades image quality due to sensor [...] Read more.
Computer vision and deep learning are essential in diverse fields such as autonomous driving, medical imaging, face recognition, and object detection. However, enhancing low-light remote sensing images remains challenging for both research and real-world applications. Low illumination degrades image quality due to sensor limitations and environmental factors, weakening visual fidelity and reducing performance in vision tasks. Common issues such as insufficient lighting, backlighting, and limited exposure create low contrast, heavy shadows, and poor visibility, particularly at night. We propose QWR-Dec-Net, a quaternion-based Retinex decomposition network tailored for low-light image enhancement. QWR-Dec-Net consists of two key modules: a decomposition module that separates illumination and reflectance, and a denoising module that fuses a quaternion holistic color representation with wavelet multi-frequency information. This structure jointly improves color constancy and noise suppression. Experiments on low-light remote sensing datasets (LSCIDMR and UCMerced) show that QWR-Dec-Net outperforms current methods in PSNR, SSIM, LPIPS, and classification accuracy. The model’s accurate illumination estimation and stable reflectance make it well-suited for remote sensing tasks such as object detection, video surveillance, precision agriculture, and autonomous navigation. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

21 pages, 37629 KB  
Article
FacadeGAN: Facade Texture Placement with GANs
by Elif Şanlıalp and Muhammed Abdullah Bulbul
Appl. Sci. 2026, 16(2), 860; https://doi.org/10.3390/app16020860 - 14 Jan 2026
Viewed by 112
Abstract
This study presents a texture-aware image synthesis framework designed to generate material-consistent façades using adversarial learning. The proposed architecture incorporates a mask-guided channel-wise attention mechanism that adaptively merges segmentation information with texture statistics to reconcile structural guiding with textural fidelity. A thorough comparative [...] Read more.
This study presents a texture-aware image synthesis framework designed to generate material-consistent façades using adversarial learning. The proposed architecture incorporates a mask-guided channel-wise attention mechanism that adaptively merges segmentation information with texture statistics to reconcile structural guiding with textural fidelity. A thorough comparative analysis was performed utilizing three internal variants—Vanilla GAN, Wasserstein GAN (WGAN), and WGAN-GP—against leading baselines, including TextureGAN and Pix2Pix. The assessment utilized a comprehensive multi-metric framework that included SSIM, FID, KID, LPIPS, and DISTS, in conjunction with a VGG-19 based perceptual loss. Experimental results indicate a notable divergence between pixel-wise accuracy and perceptual realism; although established baselines attained elevated PSNR values, the suggested Vanilla GAN and WGAN models exhibited enhanced perceptual fidelity, achieving the lowest LPIPS and DISTS scores. The WGAN-GP model, although theoretically stable, produced smoother but less complex textures due to the regularization enforced by the gradient penalty term. Ablation investigations further validated that the attention mechanism consistently enhanced structural alignment and texture sharpness across all topologies. Thus, the study suggests that Vanilla GAN and WGAN architectures, enhanced by attention-based fusion, offer an optimal balance between realism and structural fidelity for high-frequency texture creation applications. Full article
Show Figures

Figure 1

23 pages, 5097 KB  
Article
A Deep Feature Fusion Underwater Image Enhancement Model Based on Perceptual Vision Swin Transformer
by Shasha Tian, Adisorn Sirikham, Jessada Konpang and Chuyang Wang
J. Imaging 2026, 12(1), 44; https://doi.org/10.3390/jimaging12010044 - 14 Jan 2026
Viewed by 132
Abstract
Underwater optical images are the primary carriers of underwater scene information, playing a crucial role in marine resource exploration, underwater environmental monitoring, and engineering inspection. However, wavelength-dependent absorption and scattering severely deteriorate underwater images, leading to reduced contrast, chromatic distortions, and loss of [...] Read more.
Underwater optical images are the primary carriers of underwater scene information, playing a crucial role in marine resource exploration, underwater environmental monitoring, and engineering inspection. However, wavelength-dependent absorption and scattering severely deteriorate underwater images, leading to reduced contrast, chromatic distortions, and loss of structural details. To address these issues, we propose a U-shaped underwater image enhancement framework that integrates Swin-Transformer blocks with lightweight attention and residual modules. A Dual-Window Multi-Head Self-Attention (DWMSA) in the bottleneck models long-range context while preserving fine local structure. A Global-Aware Attention Map (GAMP) adaptively re-weights channels and spatial locations to focus on severely degraded regions. A Feature-Augmentation Residual Network (FARN) stabilizes deep training and emphasizes texture and color fidelity. Trained with a combination of Charbonnier, perceptual, and edge losses, our method achieves state-of-the-art results in PSNR and SSIM, the lowest LPIPS, and improvements in UIQM and UCIQE on the UFO-120 and EUVP datasets, with average metrics of PSNR 29.5 dB, SSIM 0.94, LPIPS 0.17, UIQM 3.62, and UCIQE 0.59. Qualitative results show reduced color cast, restored contrast, and sharper details. Code, weights, and evaluation scripts will be released to support reproducibility. Full article
(This article belongs to the Special Issue Underwater Imaging (2nd Edition))
Show Figures

Figure 1

23 pages, 3855 KB  
Article
Visual-to-Tactile Cross-Modal Generation Using a Class-Conditional GAN with Multi-Scale Discriminator and Hybrid Loss
by Nikolay Neshov, Krasimir Tonchev, Agata Manolova, Radostina Petkova and Ivaylo Bozhilov
Sensors 2026, 26(2), 426; https://doi.org/10.3390/s26020426 - 9 Jan 2026
Viewed by 176
Abstract
Understanding surface textures through visual cues is crucial for applications in haptic rendering and virtual reality. However, accurately translating visual information into tactile feedback remains a challenging problem. To address this challenge, this paper presents a class-conditional Generative Adversarial Network (cGAN) for cross-modal [...] Read more.
Understanding surface textures through visual cues is crucial for applications in haptic rendering and virtual reality. However, accurately translating visual information into tactile feedback remains a challenging problem. To address this challenge, this paper presents a class-conditional Generative Adversarial Network (cGAN) for cross-modal translation from texture images to vibrotactile spectrograms, using samples from the LMT-108 dataset. The generator is adapted from pix2pix and enhanced with Conditional Batch Normalization (CBN) at the bottleneck to incorporate texture class semantics. A dedicated label predictor, based on a DenseNet-201 and trained separately prior to cGAN training, provides the conditioning label. The discriminator is derived from pix2pixHD and uses a multi-scale architecture with three discriminators, each comprising three downsampling layers. A grid search over multi-scale discriminator configurations shows that this setup yields optimal perceptual similarity measured by Learned Perceptual Image Patch Similarity (LPIPS). The generator is trained using a hybrid loss that combines adversarial, L1, and feature matching losses derived from intermediate discriminator features, while the discriminators are trained using standard adversarial loss. Quantitative evaluation with LPIPS and Fréchet Inception Distance (FID) confirms superior similarity to real spectrograms. GradCAM visualizations highlight the benefit of class conditioning. The proposed model outperforms pix2pix, pix2pixHD, Residue-Fusion GAN, and several ablated versions. The generated spectrograms can be converted into vibrotactile signals using the Griffin–Lim algorithm, enabling applications in haptic feedback and virtual material simulation. Full article
(This article belongs to the Special Issue Intelligent Sensing and Artificial Intelligence for Image Processing)
Show Figures

Figure 1

18 pages, 7859 KB  
Article
Preserving Formative Tendencies in AI Image Generation: Toward Architectural AI Typologies Through Iterative Blending
by Dong-Ho Lee and Sung-Hak Ko
Buildings 2026, 16(1), 183; https://doi.org/10.3390/buildings16010183 - 1 Jan 2026
Viewed by 187
Abstract
This study explores an alternative design methodology for architectural image generation using generative AI, addressing the challenge of how AI-generated imagery can preserve formative tendencies while enabling creative variation and user agency. Departing from conventional prompt-based approaches, the process utilizes only a minimal [...] Read more.
This study explores an alternative design methodology for architectural image generation using generative AI, addressing the challenge of how AI-generated imagery can preserve formative tendencies while enabling creative variation and user agency. Departing from conventional prompt-based approaches, the process utilizes only a minimal initial image set and proceeds by reintroducing solely the synthesized outcomes during the blending and iterative synthesis stages. The central research question asks whether AI can sustain and transform architectural tendencies through iterative synthesis despite limited input data, and how such tendencies might accumulate into consistent typological patterns. The research examines how formative tendencies are preserved and transformed, based on four aesthetic elements: layer, scale, density, and assembly. These four elements reflect diverse architectural ideas in spatial, proportional, volumetric, and tectonic characteristics commonly observed in architectural representations. Observing how these tendencies evolve across iterations allows the study to evaluate how AI negotiates between structural preservation and creative deviation, revealing the generative patterns underlying emerging AI typologies. The study employs SSIM, LPIPS, and CLIP similarity metrics as supplementary indicators to contextualize these tendencies. The results demonstrate that iterative blending enables the deconstruction and recomposition of archetypal formal languages, generating new visual variations while preserving identifiable structural and semantic tendencies. These outputs do not converge into generalized imagery but instead retain identifiable tendencies. Furthermore, the study positions user selection and intervention as a crucial mechanism for mediating between accidental transformation and intentional direction, proposing AI not as a passive generator but as a dialogical tool. Finally, the study conceptualizes such consistent formal languages as “AI Typologies” and presents the potential for a systematic design methodology founded upon them as a complementary alternative to prompt-based workflows. Full article
Show Figures

Figure 1

19 pages, 5557 KB  
Article
BDNet: A Real-Time Biomedical Image Denoising Network with Gradient Information Enhancement Loss
by Lemin Shi, Xin Feng, Ping Gong, Dianxin Song, Hao Zhang, Langxi Liu, Yuqiang Zhang and Mingye Li
Biosensors 2026, 16(1), 26; https://doi.org/10.3390/bios16010026 - 1 Jan 2026
Viewed by 323
Abstract
Biomedical imaging plays a critical role in medical diagnostics and research, yet image noise remains a significant challenge that hinders accurate analysis. To address this issue, we propose BDNet, a real-time biomedical image denoising network optimized for enhancing gradient and high-frequency information while [...] Read more.
Biomedical imaging plays a critical role in medical diagnostics and research, yet image noise remains a significant challenge that hinders accurate analysis. To address this issue, we propose BDNet, a real-time biomedical image denoising network optimized for enhancing gradient and high-frequency information while effectively suppressing noise. The network adopts a lightweight U-Net-inspired encoder–decoder architecture, incorporating a Convolutional Block Attention Module at the bottleneck to refine spatial and channel-wise feature extraction. A novel gradient-based loss function—combining Sobel operator-derived gradient loss with L1, L2, and LSSIM losses—ensures faithful preservation of fine structural details. Extensive experiments on the Fluorescence Microscopy Denoising (FMD) dataset demonstrate that BDNet achieves state-of-the-art performance across multiple metrics, including PSNR, RMSE, SSIM, and LPIPS, outperforming both convolutional and Transformer-based models in accuracy and efficiency. With its superior denoising capability and real-time inference speed, BDNet provides an effective and practical solution for improving biomedical image quality, particularly in fluorescence microscopy applications. Full article
(This article belongs to the Section Biosensors and Healthcare)
Show Figures

Figure 1

23 pages, 5039 KB  
Article
A3DSimVP: Enhancing SimVP-v2 with Audio and 3D Convolution
by Junfeng Yang, Mingrui Long, Hongjia Zhu, Limei Liu, Wenzhi Cao, Qin Li and Han Peng
Electronics 2026, 15(1), 112; https://doi.org/10.3390/electronics15010112 - 25 Dec 2025
Viewed by 226
Abstract
In modern high-demand applications, such as real-time video communication, cloud gaming, and high-definition live streaming, achieving both superior transmission speed and high visual fidelity is paramount. However, unstable networks and packet loss remain major bottlenecks, making accurate and low-latency video error concealment a [...] Read more.
In modern high-demand applications, such as real-time video communication, cloud gaming, and high-definition live streaming, achieving both superior transmission speed and high visual fidelity is paramount. However, unstable networks and packet loss remain major bottlenecks, making accurate and low-latency video error concealment a critical challenge. Traditional error control strategies, such as Forward Error Correction (FEC) and Automatic Repeat Request (ARQ), often introduce excessive latency or bandwidth overhead. Meanwhile, receiver-side concealment methods struggle under high motion or significant packet loss, motivating the exploration of predictive models. SimVP-v2, with its efficient convolutional architecture and Gated Spatiotemporal Attention (GSTA) mechanism, provides a strong baseline by reducing complexity and achieving competitive prediction performance. Despite its merits, SimVP-v2’s reliance on 2D convolutions for implicit temporal aggregation limits its capacity to capture complex motion trajectories and long-term dependencies. This often results in artifacts such as motion blur, detail loss, and accumulated errors. Furthermore, its single-modality design ignores the complementary contextual cues embedded in the audio stream. To overcome these issues, we propose A3DSimVP (Audio- and 3D-Enhanced SimVP-v2), which integrates explicit spatio-temporal modeling with multimodal feature fusion. Architecturally, we replace the 2D depthwise separable convolutions within the GSTA module with their 3D counterparts, introducing a redesigned GSTA-3D module that significantly improves motion coherence across frames. Additionally, an efficient audio–visual fusion strategy supplements visual features with contextual audio guidance, thereby enhancing the model’s robustness and perceptual realism. We validate the effectiveness of A3DSimVP’s improvements through extensive experiments on the KTH dataset. Our model achieves a PSNR of 27.35 dB, surpassing the 27.04 of the SimVP-v2 baseline. Concurrently, our improved A3DSimVP model reduces the loss metrics on the KTH dataset, achieving an MSE of 43.82 and an MAE of 385.73, both lower than the baseline. Crucially, our LPIPS metric is substantially lowered to 0.22. These data tangibly confirm that A3DSimVP significantly enhances both structural fidelity and perceptual quality while maintaining high predictive accuracy. Notably, A3DSimVP attains faster inference speeds than the baseline with only a marginal increase in computational overhead. These results establish A3DSimVP as an efficient and robust solution for latency-critical video applications. Full article
(This article belongs to the Special Issue Digital Intelligence Technology and Applications, 2nd Edition)
Show Figures

Figure 1

23 pages, 12620 KB  
Article
The Color Image Watermarking Algorithm Based on Quantum Discrete Wavelet Transform and Chaotic Mapping
by Yikang Yuan, Wenbo Zhao, Zhongyan Li and Wanquan Liu
Symmetry 2026, 18(1), 33; https://doi.org/10.3390/sym18010033 - 24 Dec 2025
Viewed by 307
Abstract
Quantum watermarking is a technique that embeds specific information into a quantum carrier for the purpose of digital copyright protection. In this paper, we propose a novel color image watermarking algorithm that integrates quantum discrete wavelet transform with Sinusoidal–Tent mapping and baker mapping. [...] Read more.
Quantum watermarking is a technique that embeds specific information into a quantum carrier for the purpose of digital copyright protection. In this paper, we propose a novel color image watermarking algorithm that integrates quantum discrete wavelet transform with Sinusoidal–Tent mapping and baker mapping. Initially, chaotic sequences are generated using Sinusoidal–Tent mapping to determine the channels suitable for watermark embedding. Subsequently, a one-level quantum Haar wavelet transform is applied to the selected channel to decompose the image. The watermarked image is then scrambled via discrete baker mapping, and the scrambled image is embedded into the High-High subbands. The invisibility of the watermark is evaluated by calculating the peak signal-to-noise ratio, Structural similarity index measure, and Learned Perceptual Image Patch Similarity, with comparisons made against the color histogram. The robustness of the proposed algorithm is assessed through the calculation of Normalized Cross-Correlation. In the simulation results, PSNR is close to 63, SSIM is close to 1, LPIPS is close to 0.001, and NCC is close to 0.97. This indicates that the proposed watermarking algorithm exhibits excellent visual quality and a robust capability to withstand various attacks. Additionally, through ablation study, the contribution of each technique to overall performance was systematically evaluated. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

29 pages, 6232 KB  
Article
Research on Multi-Temporal Infrared Image Generation Based on Improved CLE Diffusion
by Hua Gong, Wenfei Gao, Fang Liu and Yuanjing Ma
Computers 2025, 14(12), 548; https://doi.org/10.3390/computers14120548 - 11 Dec 2025
Viewed by 309
Abstract
To address the problems of dynamic brightness imbalance in image sequences and blurred object edges in multi-temporal infrared image generation, we propose an improved multi-temporal infrared image generation model based on CLE Diffusion. First, the model adopts CLE Diffusion to capture the dynamic [...] Read more.
To address the problems of dynamic brightness imbalance in image sequences and blurred object edges in multi-temporal infrared image generation, we propose an improved multi-temporal infrared image generation model based on CLE Diffusion. First, the model adopts CLE Diffusion to capture the dynamic evolution patterns of image sequences. By modeling brightness variation through the noise evolution of the diffusion process, it enables controllable generation across multiple time points. Second, we design a periodic time encoding strategy and a feature linear modulator and build a temporal control module. Through channel-level modulation, this module jointly models temporal information and brightness features to improve the model’s temporal representation capability. Finally, to tackle structural distortion and edge blurring in infrared images, we design a multi-scale edge pyramid strategy and build a structure consistency module based on attention mechanisms. This module jointly computes multi-scale edge and structural features to enforce edge enhancement and structural consistency. Extensive experiments on both public visible-light and self-constructed infrared multi-temporal datasets demonstrate our model’s state-of-the-art (SOTA) performance. It generates high-quality images across all time points, achieving superior performance on the PSNR, SSIM, and LPIPS metrics. The generated images have clear edges and structural consistency. Full article
(This article belongs to the Special Issue Advanced Image Processing and Computer Vision (2nd Edition))
Show Figures

Graphical abstract

22 pages, 2302 KB  
Article
MAF-GAN: A Multi-Attention Fusion Generative Adversarial Network for Remote Sensing Image Super-Resolution
by Zhaohe Wang, Hai Tan, Zhongwu Wang, Jinlong Ci and Haoran Zhai
Remote Sens. 2025, 17(24), 3959; https://doi.org/10.3390/rs17243959 - 7 Dec 2025
Viewed by 444
Abstract
Existing Generative Adversarial Networks (GANs) frequently yield remote sensing images with blurred fine details, distorted textures, and compromised spatial structures when applied to super-resolution (SR) tasks, so this study proposes a Multi-Attention Fusion Generative Adversarial Network (MAF-GAN) to address these limitations: the generator [...] Read more.
Existing Generative Adversarial Networks (GANs) frequently yield remote sensing images with blurred fine details, distorted textures, and compromised spatial structures when applied to super-resolution (SR) tasks, so this study proposes a Multi-Attention Fusion Generative Adversarial Network (MAF-GAN) to address these limitations: the generator of MAF-GAN is built on a U-Net backbone, which incorporates Oriented Convolutions (OrientedConv) to enhance the extraction of directional features and textures, while a novel co-calibration mechanism—incorporating channel, spatial, gating, and spectral attention—is embedded in the encoding path and skip connections, supplemented by an adaptive weighting strategy to enable effective multi-scale feature fusion, and a composite loss function is further designed to integrate adversarial loss, perceptual loss, hybrid pixel loss, total variation loss, and feature consistency loss for optimizing model performance; extensive experiments on the GF7-SR4×-MSD dataset demonstrate that MAF-GAN achieves state-of-the-art performance, delivering a Peak Signal-to-Noise Ratio (PSNR) of 27.14 dB, Structural Similarity Index (SSIM) of 0.7206, Learned Perceptual Image Patch Similarity (LPIPS) of 0.1017, and Spectral Angle Mapper (SAM) of 1.0871, which significantly outperforms mainstream models including SRGAN, ESRGAN, SwinIR, HAT, and ESatSR as well as exceeds traditional interpolation methods (e.g., Bicubic) by a substantial margin, and notably, MAF-GAN maintains an excellent balance between reconstruction quality and inference efficiency to further reinforce its advantages over competing methods; additionally, ablation studies validate the individual contribution of each proposed component to the model’s overall performance, and this method generates super-resolution remote sensing images with more natural visual perception, clearer spatial structures, and superior spectral fidelity, thus offering a reliable technical solution for high-precision remote sensing applications. Full article
(This article belongs to the Section Environmental Remote Sensing)
Show Figures

Figure 1

23 pages, 10651 KB  
Article
Noise-Aware Hybrid Compression of Deep Models with Zero-Shot Denoising and Failure Prediction
by Lizhe Zhang, Quan Zhou, Ruihua Liu, Lang Huyan, Juanni Liu and Yi Zhang
Appl. Sci. 2025, 15(24), 12882; https://doi.org/10.3390/app152412882 - 5 Dec 2025
Viewed by 485
Abstract
Deep learning-based image compression achieves remarkable average rate-distortion performance but is prone to failure on noisy, high-frequency, or high-entropy inputs. This work systematically investigates these failure cases and proposes a noise-aware hybrid compression framework to address them. A High-Frequency Vulnerability Index (HFVI) is [...] Read more.
Deep learning-based image compression achieves remarkable average rate-distortion performance but is prone to failure on noisy, high-frequency, or high-entropy inputs. This work systematically investigates these failure cases and proposes a noise-aware hybrid compression framework to address them. A High-Frequency Vulnerability Index (HFVI) is proposed, integrating frequency energy, encoder Jacobian sensitivity, and texture entropy into a unified measure of degradation susceptibility. Guided by HFVI, the system incorporates a selective zero-shot denoising module (P2PA) and a lightweight hybrid codec selector that determines, for each image, whether P2PA is necessary and selecting the more reliable codec (a learning-based model or JPEG2000) accordingly, without retraining any compression backbones. Experiments span a 200,000-image cross-domain benchmark incorporating general datasets, synthetic noise (eight levels), and real-noise datasets demonstrate that the proposed pipeline improves PSNR by up to 1.28 dB, raises SSIM by 0.02, reduces LPIPS by roughly 0.05, and decreases the failure-case rate by 6.7% over the best baseline (Joint-IC). Additional intensity-profile and cross-validation analyses further validate the robustness and deployment readiness of the method, showing that the hybrid selector provides a practical path toward reliable, noise-adaptive deep image compression. Full article
Show Figures

Figure 1

23 pages, 9200 KB  
Article
GC-HG Gaussian Splatting Single-View 3D Reconstruction Method Based on Depth Prior and Pseudo-Triplane
by Hua Gong, Peide Wang, Yuanjing Ma and Yong Zhang
Algorithms 2025, 18(12), 761; https://doi.org/10.3390/a18120761 - 30 Nov 2025
Viewed by 1110
Abstract
3D Gaussian Splatting (3DGS) is a multi-view 3D reconstruction method that relies solely on image loss for supervision, lacking explicit constraints on the geometric consistency of the rendering model. It uses a multi-view scene-by-scene training paradigm, which limits generalization to unknown scenes in [...] Read more.
3D Gaussian Splatting (3DGS) is a multi-view 3D reconstruction method that relies solely on image loss for supervision, lacking explicit constraints on the geometric consistency of the rendering model. It uses a multi-view scene-by-scene training paradigm, which limits generalization to unknown scenes in the case of single-view limited input. To address these issues, this paper proposes a Geometric Consistency-High Generalization (GC-HG), a single-view 3DGS reconstruction framework integrating depth prior and a pseudo-triplane. First, we utilize the VGGT 3D geometry pre-trained model to derive depth prior, back-projecting them into point clouds to construct a dual-modal input alongside the image. Second, we introduce a pseudo-triplane mechanism with a learnable Z-plane token for feature decoupling and pseudo-triplane feature fusion, thereby enhancing geometry perception and consistency. Finally, we integrate a parent–child hierarchical Gaussian renderer into the feed-forward 3DGS framework, combining depth and 3D offsets to model depth and geometry information, while mapping parent and child Gaussians into a linear structure through an MLP. Evaluations on the RealEstate10K dataset validate our approach, demonstrating improvements in geometric modeling and generalization for single-view reconstruction. Our method improves Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Learned Perceptual Image Patch Similarity (LPIPS) metrics, demonstrating its advantages in geometric consistency modeling and cross-scene generalization. Full article
(This article belongs to the Special Issue Artificial Intelligence in Modeling and Simulation (2nd Edition))
Show Figures

Figure 1

32 pages, 5853 KB  
Article
A Large-Scale 3D Gaussian Reconstruction Method for Optimized Adaptive Density Control in Training Resource Scheduling
by Ke Yan, Hui Wang, Zhuxin Li, Yuting Wang, Shuo Li and Hongmei Yang
Remote Sens. 2025, 17(23), 3868; https://doi.org/10.3390/rs17233868 - 28 Nov 2025
Viewed by 1783
Abstract
In response to the challenges of low computational efficiency, insufficient detail restoration, and dependence on multiple GPUs in 3D Gaussian Splatting for large-scale UAV scene reconstruction, this study introduces an improved 3D Gaussian Splatting framework. It primarily targets two aspects: optimization of the [...] Read more.
In response to the challenges of low computational efficiency, insufficient detail restoration, and dependence on multiple GPUs in 3D Gaussian Splatting for large-scale UAV scene reconstruction, this study introduces an improved 3D Gaussian Splatting framework. It primarily targets two aspects: optimization of the partitioning strategy and enhancement of adaptive density control. Specifically, an adaptive partitioning strategy guided by scene complexity is designed to ensure more balanced computational workloads across spatial blocks. To preserve scene integrity, auxiliary point clouds are integrated during partition optimization. Furthermore, a pixel weight-scaling mechanism is employed to regulate the average gradient in adaptive density control, thereby mitigating excessive densification of Gaussians. This design accelerates the training process while maintaining high-fidelity rendering quality. Additionally, a task-scheduling algorithm based on frequency-domain analysis is incorporated to further improve computational resource utilization. Extensive experiments on multiple large-scale UAV datasets demonstrate that the proposed framework can be trained efficiently on a single RTX 3090 GPU, achieving more than a 50% reduction in average optimization time while maintaining PSNR, SSIM and LPIPS values that are comparable to or better than representative 3DGS-based methods; on the MatrixCity-S dataset (>6000 images), it attains the highest PSNR among 3DGS-based approaches and completes training on a single 24 GB GPU in less than 60% of the training time of DOGS. Nevertheless, the current framework still requires several hours of optimization for city-scale scenes and has so far only been evaluated on static UAV imagery with a fixed camera model, which may limit its applicability to dynamic scenes or heterogeneous sensor configurations. Full article
Show Figures

Figure 1

30 pages, 28451 KB  
Article
Boosting Diffusion Networks with Deep External Context-Aware Encoders for Low-Light Image Enhancement
by Pengliang Tang, Yu Wang and Aidong Men
Sensors 2025, 25(23), 7232; https://doi.org/10.3390/s25237232 - 27 Nov 2025
Viewed by 586
Abstract
Low-light image enhancement (LLIE) requires modeling spatially extensive and interdependent degradations across large pixel regions, while directly equipping diffusion-based LLIE with heavy global modules inside the iterative denoising backbone leads to prohibitive computational overhead. To enhance long-range context modeling without inflating the per-step [...] Read more.
Low-light image enhancement (LLIE) requires modeling spatially extensive and interdependent degradations across large pixel regions, while directly equipping diffusion-based LLIE with heavy global modules inside the iterative denoising backbone leads to prohibitive computational overhead. To enhance long-range context modeling without inflating the per-step cost of diffusion, we propose ECA-Diff, a diffusion framework augmented with a deep External Context-Aware Encoder (ECAE). A latent-space context network built with hybrid Transformer–Convolution blocks extracts holistic cues from the input, generates multi-scale context features once, and injects them into the diffusion backbone as lightweight conditional guidance across all sampling steps. In addition, a CIELAB-space Luminance-Adaptive Chromaticity Loss regularizes conditional diffusion training and mitigates the cool color cast frequently observed in low-luminance regions. Experiments on paired and unpaired benchmarks show that ECA-Diff consistently outperforms recent state-of-the-art LLIE methods in both full-reference (PSNR/SSIM/LPIPS) and no-reference (NIQE/BRISQUE) metrics, with the external context path introducing only modest overhead relative to the baseline diffusion backbone. These results indicate that decoupling global context estimation from the iterative denoising process is an effective way to boost diffusion-based LLIE and provides a general compute-once conditioning paradigm for low-level image restoration. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Back to TopTop