Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (143)

Search Parameters:
Keywords = LPIP

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
41 pages, 9383 KB  
Article
Deep Learning Style Transfer for Enhanced Smoke Plume Visibility: A Standardized False Color Composite (SFCC) in GEMS Satellite Imagery
by Yemin Jeong, Seung Hee Kim, Menas Kafatos, Jeong-Ah Yu, Kyoung-Hee Sung, Sang-Min Kim, Seung-Yeon Kim, Goo Kim, Jae-Jin Kim and Yangwon Lee
Remote Sens. 2026, 18(3), 483; https://doi.org/10.3390/rs18030483 - 2 Feb 2026
Viewed by 190
Abstract
Wildfire smoke visualization using geostationary satellite imagery is essential for real-time monitoring and atmospheric analysis; however, inconsistencies in color tone across Geostationary Environment Monitoring Spectrometer (GEMS) images hinder reliable interpretation and model training. This study proposes a Standardized False Color Composite (SFCC) framework [...] Read more.
Wildfire smoke visualization using geostationary satellite imagery is essential for real-time monitoring and atmospheric analysis; however, inconsistencies in color tone across Geostationary Environment Monitoring Spectrometer (GEMS) images hinder reliable interpretation and model training. This study proposes a Standardized False Color Composite (SFCC) framework based on deep learning style transfer to enhance the visual consistency and interpretability of wildfire smoke scenes. Four tone-standardization methods were compared: the statistical Empirical Cumulative Distribution Function (ECDF) correction and three neural approaches—ReHistoGAN, StyTr2, and Style Injection Diffusion Model (SI-DM). Each model was evaluated visually and quantitatively using six metrics (SSIM, LPIPS, FID, histogram similarity, ArtFID, and LSCI) and validated on three major wildfire events in Korea (2022–2025). Among the tested models, SI-DM achieved the most balanced performance, preserving structural features while ensuring consistent color-tone alignment (ArtFID = 1.620; LSCI mean = 0.894). Qualitative assessments further confirmed that SI-DM effectively delineated smoke boundaries and maintained natural background tones under complex atmospheric conditions. Additional analysis using GEMS UVAI, VISAI, and CHOCHO demonstrated that the styled composites partially reflect the optical and chemical characteristics distinguishing wildfire smoke from dust aerosols. The proposed SFCC framework establishes a foundation for visually standardized satellite smoke imagery and provides potential for future aerosol-type classification and automated detection applications. Full article
21 pages, 5838 KB  
Article
SRCT: Structure-Preserving Method for Sub-Meter Remote Sensing Image Super-Resolution
by Tianxiong Gao, Shuyan Zhang, Wutao Yao, Erping Shang, Jin Yang, Yong Ma and Yan Ma
Sensors 2026, 26(2), 733; https://doi.org/10.3390/s26020733 - 22 Jan 2026
Viewed by 98
Abstract
To address the scarcity of sub-meter remote sensing samples and structural inconsistencies such as edge blur and contour distortion in super-resolution reconstruction, this paper proposes SRCT, a super-resolution method tailored for sub-meter remote sensing imagery. The method consists of two parts: external structure [...] Read more.
To address the scarcity of sub-meter remote sensing samples and structural inconsistencies such as edge blur and contour distortion in super-resolution reconstruction, this paper proposes SRCT, a super-resolution method tailored for sub-meter remote sensing imagery. The method consists of two parts: external structure guidance and internal structure optimization. External structure guidance is jointly realized by the structure encoder (SE) and structure guidance module (SGM): the SE extracts key structural features from high-resolution images, and the SGM injects these structural features into the super-resolution network layer by layer, achieving structural transfer from external priors to the reconstruction network. Internal structure optimization is handled by the backbone network SGCT, which introduces a dual-branch residual dense group (DBRDG): one branch uses window-based multi-head self-attention to model global geometric structures, and the other branch uses lightweight convolutions to model local texture features, enabling the network to adaptively balance structure and texture reconstruction internally. Experimental results show that SRCT clearly outperforms existing methods on structure-related metrics, with DISTS reduced by 8.7% and LPIPS reduced by 7.2%, and significantly improves reconstruction quality in structure-sensitive regions such as building contours and road continuity, providing a new technical route for sub-meter remote sensing image super-resolution reconstruction. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

23 pages, 7327 KB  
Article
Knit-Pix2Pix: An Enhanced Pix2Pix Network for Weft-Knitted Fabric Texture Generation
by Xin Ru, Yingjie Huang, Laihu Peng and Yongchao Hou
Sensors 2026, 26(2), 682; https://doi.org/10.3390/s26020682 - 20 Jan 2026
Viewed by 181
Abstract
Texture mapping of weft-knitted fabrics plays a crucial role in virtual try-on and digital textile design due to its computational efficiency and real-time performance. However, traditional texture mapping techniques typically adapt pre-generated textures to deformed surfaces through geometric transformations. These methods overlook the [...] Read more.
Texture mapping of weft-knitted fabrics plays a crucial role in virtual try-on and digital textile design due to its computational efficiency and real-time performance. However, traditional texture mapping techniques typically adapt pre-generated textures to deformed surfaces through geometric transformations. These methods overlook the complex variations in yarn length, thickness, and loop morphology during stretching, often resulting in visual distortions. To overcome these limitations, we propose Knit-Pix2Pix, a dedicated framework for generating realistic weft-knitted fabric textures directly from knitted unit mesh maps. These maps provide grid-based representations where each cell corresponds to a physical loop region, capturing its deformation state. Knit-Pix2Pix is an integrated architecture that combines a multi-scale feature extraction module, a grid-guided attention mechanism, and a multi-scale discriminator. Together, these components address the multi-scale and deformation-aware requirements of this task. To validate our approach, we constructed a dataset of over 2000 pairs of fabric stretching images and corresponding knitted unit mesh maps, with further testing using spring-mass fabric simulation. Experiments show that, compared with traditional texture mapping methods, SSIM increased by 21.8%, PSNR by 20.9%, and LPIPS decreased by 24.3%. This integrated approach provides a practical solution for meeting the requirements of digital textile design. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

21 pages, 4290 KB  
Article
Information Modeling of Asymmetric Aesthetics Using DCGAN: A Data-Driven Approach to the Generation of Marbling Art
by Muhammed Fahri Unlersen and Hatice Unlersen
Information 2026, 17(1), 94; https://doi.org/10.3390/info17010094 - 15 Jan 2026
Viewed by 404
Abstract
Traditional Turkish marbling (Ebru) art is an intangible cultural heritage characterized by highly asymmetric, fluid, and non-reproducible patterns, making its long-term preservation and large-scale dissemination challenging. It is highly sensitive to environmental conditions, making it enormously difficult to mass produce while maintaining its [...] Read more.
Traditional Turkish marbling (Ebru) art is an intangible cultural heritage characterized by highly asymmetric, fluid, and non-reproducible patterns, making its long-term preservation and large-scale dissemination challenging. It is highly sensitive to environmental conditions, making it enormously difficult to mass produce while maintaining its original aesthetic qualities. A data-driven generative model is therefore required to create unlimited, high-fidelity digital surrogates that safeguard this UNESCO heritage against physical loss and enable large-scale cultural applications. This study introduces a deep generative modeling framework for the digital reconstruction of traditional Turkish marbling (Ebru) art using a Deep Convolutional Generative Adversarial Network (DCGAN). A dataset of 20,400 image patches, systematically derived from 17 original marbling works, was used to train the proposed model. The framework aims to mathematically capture the asymmetric, fluid, and stochastic nature of Ebru patterns, enabling the reproduction of their aesthetic structure in a digital medium. The generated images were evaluated using multiple quantitative and perceptual metrics, including Fréchet Inception Distance (FID), Kernel Inception Distance (KID), Learned Perceptual Image Patch Similarity (LPIPS), and PRDC-based indicators (Precision, Recall, Density, Coverage). For experimental validation, the proposed DCGAN framework is additionally compared against a Vanilla GAN baseline trained under identical conditions, highlighting the advantages of convolutional architectures for modeling marbling textures. The results show that the DCGAN model achieved a high level of realism and diversity without mode collapse or overfitting, producing images that were perceptually close to authentic marbling works. In addition to the quantitative evaluation, expert qualitative assessment by a traditional Ebru artist confirmed that the model reproduced the organic textures, color dynamics, and compositional asymmetrical characteristic of real marbling art. The proposed approach demonstrates the potential of deep generative models for the digital preservation, dissemination, and reinterpretation of intangible cultural heritage recognized by UNESCO. Full article
Show Figures

Graphical abstract

25 pages, 8224 KB  
Article
QWR-Dec-Net: A Quaternion-Wavelet Retinex Framework for Low-Light Image Enhancement with Applications to Remote Sensing
by Vladimir Frants, Sos Agaian, Karen Panetta and Artyom Grigoryan
Information 2026, 17(1), 89; https://doi.org/10.3390/info17010089 - 14 Jan 2026
Viewed by 276
Abstract
Computer vision and deep learning are essential in diverse fields such as autonomous driving, medical imaging, face recognition, and object detection. However, enhancing low-light remote sensing images remains challenging for both research and real-world applications. Low illumination degrades image quality due to sensor [...] Read more.
Computer vision and deep learning are essential in diverse fields such as autonomous driving, medical imaging, face recognition, and object detection. However, enhancing low-light remote sensing images remains challenging for both research and real-world applications. Low illumination degrades image quality due to sensor limitations and environmental factors, weakening visual fidelity and reducing performance in vision tasks. Common issues such as insufficient lighting, backlighting, and limited exposure create low contrast, heavy shadows, and poor visibility, particularly at night. We propose QWR-Dec-Net, a quaternion-based Retinex decomposition network tailored for low-light image enhancement. QWR-Dec-Net consists of two key modules: a decomposition module that separates illumination and reflectance, and a denoising module that fuses a quaternion holistic color representation with wavelet multi-frequency information. This structure jointly improves color constancy and noise suppression. Experiments on low-light remote sensing datasets (LSCIDMR and UCMerced) show that QWR-Dec-Net outperforms current methods in PSNR, SSIM, LPIPS, and classification accuracy. The model’s accurate illumination estimation and stable reflectance make it well-suited for remote sensing tasks such as object detection, video surveillance, precision agriculture, and autonomous navigation. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

21 pages, 37629 KB  
Article
FacadeGAN: Facade Texture Placement with GANs
by Elif Şanlıalp and Muhammed Abdullah Bulbul
Appl. Sci. 2026, 16(2), 860; https://doi.org/10.3390/app16020860 - 14 Jan 2026
Viewed by 192
Abstract
This study presents a texture-aware image synthesis framework designed to generate material-consistent façades using adversarial learning. The proposed architecture incorporates a mask-guided channel-wise attention mechanism that adaptively merges segmentation information with texture statistics to reconcile structural guiding with textural fidelity. A thorough comparative [...] Read more.
This study presents a texture-aware image synthesis framework designed to generate material-consistent façades using adversarial learning. The proposed architecture incorporates a mask-guided channel-wise attention mechanism that adaptively merges segmentation information with texture statistics to reconcile structural guiding with textural fidelity. A thorough comparative analysis was performed utilizing three internal variants—Vanilla GAN, Wasserstein GAN (WGAN), and WGAN-GP—against leading baselines, including TextureGAN and Pix2Pix. The assessment utilized a comprehensive multi-metric framework that included SSIM, FID, KID, LPIPS, and DISTS, in conjunction with a VGG-19 based perceptual loss. Experimental results indicate a notable divergence between pixel-wise accuracy and perceptual realism; although established baselines attained elevated PSNR values, the suggested Vanilla GAN and WGAN models exhibited enhanced perceptual fidelity, achieving the lowest LPIPS and DISTS scores. The WGAN-GP model, although theoretically stable, produced smoother but less complex textures due to the regularization enforced by the gradient penalty term. Ablation investigations further validated that the attention mechanism consistently enhanced structural alignment and texture sharpness across all topologies. Thus, the study suggests that Vanilla GAN and WGAN architectures, enhanced by attention-based fusion, offer an optimal balance between realism and structural fidelity for high-frequency texture creation applications. Full article
Show Figures

Figure 1

23 pages, 5097 KB  
Article
A Deep Feature Fusion Underwater Image Enhancement Model Based on Perceptual Vision Swin Transformer
by Shasha Tian, Adisorn Sirikham, Jessada Konpang and Chuyang Wang
J. Imaging 2026, 12(1), 44; https://doi.org/10.3390/jimaging12010044 - 14 Jan 2026
Viewed by 281
Abstract
Underwater optical images are the primary carriers of underwater scene information, playing a crucial role in marine resource exploration, underwater environmental monitoring, and engineering inspection. However, wavelength-dependent absorption and scattering severely deteriorate underwater images, leading to reduced contrast, chromatic distortions, and loss of [...] Read more.
Underwater optical images are the primary carriers of underwater scene information, playing a crucial role in marine resource exploration, underwater environmental monitoring, and engineering inspection. However, wavelength-dependent absorption and scattering severely deteriorate underwater images, leading to reduced contrast, chromatic distortions, and loss of structural details. To address these issues, we propose a U-shaped underwater image enhancement framework that integrates Swin-Transformer blocks with lightweight attention and residual modules. A Dual-Window Multi-Head Self-Attention (DWMSA) in the bottleneck models long-range context while preserving fine local structure. A Global-Aware Attention Map (GAMP) adaptively re-weights channels and spatial locations to focus on severely degraded regions. A Feature-Augmentation Residual Network (FARN) stabilizes deep training and emphasizes texture and color fidelity. Trained with a combination of Charbonnier, perceptual, and edge losses, our method achieves state-of-the-art results in PSNR and SSIM, the lowest LPIPS, and improvements in UIQM and UCIQE on the UFO-120 and EUVP datasets, with average metrics of PSNR 29.5 dB, SSIM 0.94, LPIPS 0.17, UIQM 3.62, and UCIQE 0.59. Qualitative results show reduced color cast, restored contrast, and sharper details. Code, weights, and evaluation scripts will be released to support reproducibility. Full article
(This article belongs to the Special Issue Underwater Imaging (2nd Edition))
Show Figures

Figure 1

23 pages, 3855 KB  
Article
Visual-to-Tactile Cross-Modal Generation Using a Class-Conditional GAN with Multi-Scale Discriminator and Hybrid Loss
by Nikolay Neshov, Krasimir Tonchev, Agata Manolova, Radostina Petkova and Ivaylo Bozhilov
Sensors 2026, 26(2), 426; https://doi.org/10.3390/s26020426 - 9 Jan 2026
Viewed by 346
Abstract
Understanding surface textures through visual cues is crucial for applications in haptic rendering and virtual reality. However, accurately translating visual information into tactile feedback remains a challenging problem. To address this challenge, this paper presents a class-conditional Generative Adversarial Network (cGAN) for cross-modal [...] Read more.
Understanding surface textures through visual cues is crucial for applications in haptic rendering and virtual reality. However, accurately translating visual information into tactile feedback remains a challenging problem. To address this challenge, this paper presents a class-conditional Generative Adversarial Network (cGAN) for cross-modal translation from texture images to vibrotactile spectrograms, using samples from the LMT-108 dataset. The generator is adapted from pix2pix and enhanced with Conditional Batch Normalization (CBN) at the bottleneck to incorporate texture class semantics. A dedicated label predictor, based on a DenseNet-201 and trained separately prior to cGAN training, provides the conditioning label. The discriminator is derived from pix2pixHD and uses a multi-scale architecture with three discriminators, each comprising three downsampling layers. A grid search over multi-scale discriminator configurations shows that this setup yields optimal perceptual similarity measured by Learned Perceptual Image Patch Similarity (LPIPS). The generator is trained using a hybrid loss that combines adversarial, L1, and feature matching losses derived from intermediate discriminator features, while the discriminators are trained using standard adversarial loss. Quantitative evaluation with LPIPS and Fréchet Inception Distance (FID) confirms superior similarity to real spectrograms. GradCAM visualizations highlight the benefit of class conditioning. The proposed model outperforms pix2pix, pix2pixHD, Residue-Fusion GAN, and several ablated versions. The generated spectrograms can be converted into vibrotactile signals using the Griffin–Lim algorithm, enabling applications in haptic feedback and virtual material simulation. Full article
(This article belongs to the Special Issue Intelligent Sensing and Artificial Intelligence for Image Processing)
Show Figures

Figure 1

18 pages, 7859 KB  
Article
Preserving Formative Tendencies in AI Image Generation: Toward Architectural AI Typologies Through Iterative Blending
by Dong-Ho Lee and Sung-Hak Ko
Buildings 2026, 16(1), 183; https://doi.org/10.3390/buildings16010183 - 1 Jan 2026
Viewed by 311
Abstract
This study explores an alternative design methodology for architectural image generation using generative AI, addressing the challenge of how AI-generated imagery can preserve formative tendencies while enabling creative variation and user agency. Departing from conventional prompt-based approaches, the process utilizes only a minimal [...] Read more.
This study explores an alternative design methodology for architectural image generation using generative AI, addressing the challenge of how AI-generated imagery can preserve formative tendencies while enabling creative variation and user agency. Departing from conventional prompt-based approaches, the process utilizes only a minimal initial image set and proceeds by reintroducing solely the synthesized outcomes during the blending and iterative synthesis stages. The central research question asks whether AI can sustain and transform architectural tendencies through iterative synthesis despite limited input data, and how such tendencies might accumulate into consistent typological patterns. The research examines how formative tendencies are preserved and transformed, based on four aesthetic elements: layer, scale, density, and assembly. These four elements reflect diverse architectural ideas in spatial, proportional, volumetric, and tectonic characteristics commonly observed in architectural representations. Observing how these tendencies evolve across iterations allows the study to evaluate how AI negotiates between structural preservation and creative deviation, revealing the generative patterns underlying emerging AI typologies. The study employs SSIM, LPIPS, and CLIP similarity metrics as supplementary indicators to contextualize these tendencies. The results demonstrate that iterative blending enables the deconstruction and recomposition of archetypal formal languages, generating new visual variations while preserving identifiable structural and semantic tendencies. These outputs do not converge into generalized imagery but instead retain identifiable tendencies. Furthermore, the study positions user selection and intervention as a crucial mechanism for mediating between accidental transformation and intentional direction, proposing AI not as a passive generator but as a dialogical tool. Finally, the study conceptualizes such consistent formal languages as “AI Typologies” and presents the potential for a systematic design methodology founded upon them as a complementary alternative to prompt-based workflows. Full article
Show Figures

Figure 1

19 pages, 5557 KB  
Article
BDNet: A Real-Time Biomedical Image Denoising Network with Gradient Information Enhancement Loss
by Lemin Shi, Xin Feng, Ping Gong, Dianxin Song, Hao Zhang, Langxi Liu, Yuqiang Zhang and Mingye Li
Biosensors 2026, 16(1), 26; https://doi.org/10.3390/bios16010026 - 1 Jan 2026
Viewed by 490
Abstract
Biomedical imaging plays a critical role in medical diagnostics and research, yet image noise remains a significant challenge that hinders accurate analysis. To address this issue, we propose BDNet, a real-time biomedical image denoising network optimized for enhancing gradient and high-frequency information while [...] Read more.
Biomedical imaging plays a critical role in medical diagnostics and research, yet image noise remains a significant challenge that hinders accurate analysis. To address this issue, we propose BDNet, a real-time biomedical image denoising network optimized for enhancing gradient and high-frequency information while effectively suppressing noise. The network adopts a lightweight U-Net-inspired encoder–decoder architecture, incorporating a Convolutional Block Attention Module at the bottleneck to refine spatial and channel-wise feature extraction. A novel gradient-based loss function—combining Sobel operator-derived gradient loss with L1, L2, and LSSIM losses—ensures faithful preservation of fine structural details. Extensive experiments on the Fluorescence Microscopy Denoising (FMD) dataset demonstrate that BDNet achieves state-of-the-art performance across multiple metrics, including PSNR, RMSE, SSIM, and LPIPS, outperforming both convolutional and Transformer-based models in accuracy and efficiency. With its superior denoising capability and real-time inference speed, BDNet provides an effective and practical solution for improving biomedical image quality, particularly in fluorescence microscopy applications. Full article
(This article belongs to the Section Biosensors and Healthcare)
Show Figures

Figure 1

23 pages, 5039 KB  
Article
A3DSimVP: Enhancing SimVP-v2 with Audio and 3D Convolution
by Junfeng Yang, Mingrui Long, Hongjia Zhu, Limei Liu, Wenzhi Cao, Qin Li and Han Peng
Electronics 2026, 15(1), 112; https://doi.org/10.3390/electronics15010112 - 25 Dec 2025
Viewed by 315
Abstract
In modern high-demand applications, such as real-time video communication, cloud gaming, and high-definition live streaming, achieving both superior transmission speed and high visual fidelity is paramount. However, unstable networks and packet loss remain major bottlenecks, making accurate and low-latency video error concealment a [...] Read more.
In modern high-demand applications, such as real-time video communication, cloud gaming, and high-definition live streaming, achieving both superior transmission speed and high visual fidelity is paramount. However, unstable networks and packet loss remain major bottlenecks, making accurate and low-latency video error concealment a critical challenge. Traditional error control strategies, such as Forward Error Correction (FEC) and Automatic Repeat Request (ARQ), often introduce excessive latency or bandwidth overhead. Meanwhile, receiver-side concealment methods struggle under high motion or significant packet loss, motivating the exploration of predictive models. SimVP-v2, with its efficient convolutional architecture and Gated Spatiotemporal Attention (GSTA) mechanism, provides a strong baseline by reducing complexity and achieving competitive prediction performance. Despite its merits, SimVP-v2’s reliance on 2D convolutions for implicit temporal aggregation limits its capacity to capture complex motion trajectories and long-term dependencies. This often results in artifacts such as motion blur, detail loss, and accumulated errors. Furthermore, its single-modality design ignores the complementary contextual cues embedded in the audio stream. To overcome these issues, we propose A3DSimVP (Audio- and 3D-Enhanced SimVP-v2), which integrates explicit spatio-temporal modeling with multimodal feature fusion. Architecturally, we replace the 2D depthwise separable convolutions within the GSTA module with their 3D counterparts, introducing a redesigned GSTA-3D module that significantly improves motion coherence across frames. Additionally, an efficient audio–visual fusion strategy supplements visual features with contextual audio guidance, thereby enhancing the model’s robustness and perceptual realism. We validate the effectiveness of A3DSimVP’s improvements through extensive experiments on the KTH dataset. Our model achieves a PSNR of 27.35 dB, surpassing the 27.04 of the SimVP-v2 baseline. Concurrently, our improved A3DSimVP model reduces the loss metrics on the KTH dataset, achieving an MSE of 43.82 and an MAE of 385.73, both lower than the baseline. Crucially, our LPIPS metric is substantially lowered to 0.22. These data tangibly confirm that A3DSimVP significantly enhances both structural fidelity and perceptual quality while maintaining high predictive accuracy. Notably, A3DSimVP attains faster inference speeds than the baseline with only a marginal increase in computational overhead. These results establish A3DSimVP as an efficient and robust solution for latency-critical video applications. Full article
(This article belongs to the Special Issue Digital Intelligence Technology and Applications, 2nd Edition)
Show Figures

Figure 1

23 pages, 12620 KB  
Article
The Color Image Watermarking Algorithm Based on Quantum Discrete Wavelet Transform and Chaotic Mapping
by Yikang Yuan, Wenbo Zhao, Zhongyan Li and Wanquan Liu
Symmetry 2026, 18(1), 33; https://doi.org/10.3390/sym18010033 - 24 Dec 2025
Viewed by 426
Abstract
Quantum watermarking is a technique that embeds specific information into a quantum carrier for the purpose of digital copyright protection. In this paper, we propose a novel color image watermarking algorithm that integrates quantum discrete wavelet transform with Sinusoidal–Tent mapping and baker mapping. [...] Read more.
Quantum watermarking is a technique that embeds specific information into a quantum carrier for the purpose of digital copyright protection. In this paper, we propose a novel color image watermarking algorithm that integrates quantum discrete wavelet transform with Sinusoidal–Tent mapping and baker mapping. Initially, chaotic sequences are generated using Sinusoidal–Tent mapping to determine the channels suitable for watermark embedding. Subsequently, a one-level quantum Haar wavelet transform is applied to the selected channel to decompose the image. The watermarked image is then scrambled via discrete baker mapping, and the scrambled image is embedded into the High-High subbands. The invisibility of the watermark is evaluated by calculating the peak signal-to-noise ratio, Structural similarity index measure, and Learned Perceptual Image Patch Similarity, with comparisons made against the color histogram. The robustness of the proposed algorithm is assessed through the calculation of Normalized Cross-Correlation. In the simulation results, PSNR is close to 63, SSIM is close to 1, LPIPS is close to 0.001, and NCC is close to 0.97. This indicates that the proposed watermarking algorithm exhibits excellent visual quality and a robust capability to withstand various attacks. Additionally, through ablation study, the contribution of each technique to overall performance was systematically evaluated. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

29 pages, 6232 KB  
Article
Research on Multi-Temporal Infrared Image Generation Based on Improved CLE Diffusion
by Hua Gong, Wenfei Gao, Fang Liu and Yuanjing Ma
Computers 2025, 14(12), 548; https://doi.org/10.3390/computers14120548 - 11 Dec 2025
Viewed by 349
Abstract
To address the problems of dynamic brightness imbalance in image sequences and blurred object edges in multi-temporal infrared image generation, we propose an improved multi-temporal infrared image generation model based on CLE Diffusion. First, the model adopts CLE Diffusion to capture the dynamic [...] Read more.
To address the problems of dynamic brightness imbalance in image sequences and blurred object edges in multi-temporal infrared image generation, we propose an improved multi-temporal infrared image generation model based on CLE Diffusion. First, the model adopts CLE Diffusion to capture the dynamic evolution patterns of image sequences. By modeling brightness variation through the noise evolution of the diffusion process, it enables controllable generation across multiple time points. Second, we design a periodic time encoding strategy and a feature linear modulator and build a temporal control module. Through channel-level modulation, this module jointly models temporal information and brightness features to improve the model’s temporal representation capability. Finally, to tackle structural distortion and edge blurring in infrared images, we design a multi-scale edge pyramid strategy and build a structure consistency module based on attention mechanisms. This module jointly computes multi-scale edge and structural features to enforce edge enhancement and structural consistency. Extensive experiments on both public visible-light and self-constructed infrared multi-temporal datasets demonstrate our model’s state-of-the-art (SOTA) performance. It generates high-quality images across all time points, achieving superior performance on the PSNR, SSIM, and LPIPS metrics. The generated images have clear edges and structural consistency. Full article
(This article belongs to the Special Issue Advanced Image Processing and Computer Vision (2nd Edition))
Show Figures

Graphical abstract

22 pages, 2302 KB  
Article
MAF-GAN: A Multi-Attention Fusion Generative Adversarial Network for Remote Sensing Image Super-Resolution
by Zhaohe Wang, Hai Tan, Zhongwu Wang, Jinlong Ci and Haoran Zhai
Remote Sens. 2025, 17(24), 3959; https://doi.org/10.3390/rs17243959 - 7 Dec 2025
Viewed by 497
Abstract
Existing Generative Adversarial Networks (GANs) frequently yield remote sensing images with blurred fine details, distorted textures, and compromised spatial structures when applied to super-resolution (SR) tasks, so this study proposes a Multi-Attention Fusion Generative Adversarial Network (MAF-GAN) to address these limitations: the generator [...] Read more.
Existing Generative Adversarial Networks (GANs) frequently yield remote sensing images with blurred fine details, distorted textures, and compromised spatial structures when applied to super-resolution (SR) tasks, so this study proposes a Multi-Attention Fusion Generative Adversarial Network (MAF-GAN) to address these limitations: the generator of MAF-GAN is built on a U-Net backbone, which incorporates Oriented Convolutions (OrientedConv) to enhance the extraction of directional features and textures, while a novel co-calibration mechanism—incorporating channel, spatial, gating, and spectral attention—is embedded in the encoding path and skip connections, supplemented by an adaptive weighting strategy to enable effective multi-scale feature fusion, and a composite loss function is further designed to integrate adversarial loss, perceptual loss, hybrid pixel loss, total variation loss, and feature consistency loss for optimizing model performance; extensive experiments on the GF7-SR4×-MSD dataset demonstrate that MAF-GAN achieves state-of-the-art performance, delivering a Peak Signal-to-Noise Ratio (PSNR) of 27.14 dB, Structural Similarity Index (SSIM) of 0.7206, Learned Perceptual Image Patch Similarity (LPIPS) of 0.1017, and Spectral Angle Mapper (SAM) of 1.0871, which significantly outperforms mainstream models including SRGAN, ESRGAN, SwinIR, HAT, and ESatSR as well as exceeds traditional interpolation methods (e.g., Bicubic) by a substantial margin, and notably, MAF-GAN maintains an excellent balance between reconstruction quality and inference efficiency to further reinforce its advantages over competing methods; additionally, ablation studies validate the individual contribution of each proposed component to the model’s overall performance, and this method generates super-resolution remote sensing images with more natural visual perception, clearer spatial structures, and superior spectral fidelity, thus offering a reliable technical solution for high-precision remote sensing applications. Full article
(This article belongs to the Section Environmental Remote Sensing)
Show Figures

Figure 1

23 pages, 10651 KB  
Article
Noise-Aware Hybrid Compression of Deep Models with Zero-Shot Denoising and Failure Prediction
by Lizhe Zhang, Quan Zhou, Ruihua Liu, Lang Huyan, Juanni Liu and Yi Zhang
Appl. Sci. 2025, 15(24), 12882; https://doi.org/10.3390/app152412882 - 5 Dec 2025
Viewed by 535
Abstract
Deep learning-based image compression achieves remarkable average rate-distortion performance but is prone to failure on noisy, high-frequency, or high-entropy inputs. This work systematically investigates these failure cases and proposes a noise-aware hybrid compression framework to address them. A High-Frequency Vulnerability Index (HFVI) is [...] Read more.
Deep learning-based image compression achieves remarkable average rate-distortion performance but is prone to failure on noisy, high-frequency, or high-entropy inputs. This work systematically investigates these failure cases and proposes a noise-aware hybrid compression framework to address them. A High-Frequency Vulnerability Index (HFVI) is proposed, integrating frequency energy, encoder Jacobian sensitivity, and texture entropy into a unified measure of degradation susceptibility. Guided by HFVI, the system incorporates a selective zero-shot denoising module (P2PA) and a lightweight hybrid codec selector that determines, for each image, whether P2PA is necessary and selecting the more reliable codec (a learning-based model or JPEG2000) accordingly, without retraining any compression backbones. Experiments span a 200,000-image cross-domain benchmark incorporating general datasets, synthetic noise (eight levels), and real-noise datasets demonstrate that the proposed pipeline improves PSNR by up to 1.28 dB, raises SSIM by 0.02, reduces LPIPS by roughly 0.05, and decreases the failure-case rate by 6.7% over the best baseline (Joint-IC). Additional intensity-profile and cross-validation analyses further validate the robustness and deployment readiness of the method, showing that the hybrid selector provides a practical path toward reliable, noise-adaptive deep image compression. Full article
Show Figures

Figure 1

Back to TopTop