MDPI - Publisher of Open Access Journals

15 pages, 792 KiB

Open AccessArticle

Koffka Ring Perception in Digital Environments with Brightness Modulation

by Mile Matijević, Željko Bosančić and Martina Hajdek

Appl. Sci. 2025, 15(15), 8501; https://doi.org/10.3390/app15158501 (registering DOI) - 31 Jul 2025

Viewed by 124

Various parameters and observation conditions contribute to the emergence of color. This phenomenon poses a challenge in modern visual communication systems, which are continuously being enhanced through new insights gained from research into specific psychophysical effects. One such effect is the psychophysical phenomenon [...] Read more.

Various parameters and observation conditions contribute to the emergence of color. This phenomenon poses a challenge in modern visual communication systems, which are continuously being enhanced through new insights gained from research into specific psychophysical effects. One such effect is the psychophysical phenomenon of simultaneous contrast. Nearly 90 years ago, Kurt Koffka described one of the earliest illusions related to simultaneous contrast. This study examined the perception of gray tone variations in the Koffka ring against different background color combinations (red, blue, green) displayed on a computer screen. The intensity of the effect was measured using lightness difference ΔL₀₀ across light-, medium-, and dark-gray tones. The results were analyzed using descriptive statistics, while statistically significant differences were determined using the Friedman ANOVA and post hoc Wilcox tests. The strongest visual effect was observed the for dark-gray tones of the Koffka ring on blue/green and red/green backgrounds, indicating that perceptual organization and spatial parameters influence the illusion’s magnitude. The findings suggest important implications for digital media design, where understanding these effects can help avoid unintended color tone distortions caused by simultaneous contrast. Full article

(This article belongs to the Special Issue New Findings in Visual Communications on Visibility or Legibility in Different Media)

► Show Figures

Figure 1

21 pages, 2267 KiB

Open AccessArticle

Dual-Branch Network for Blind Quality Assessment of Stereoscopic Omnidirectional Images: A Spherical and Perceptual Feature Integration Approach

by Zhe Wang, Yi Liu and Yang Song

Electronics 2025, 14(15), 3035; https://doi.org/10.3390/electronics14153035 - 30 Jul 2025

Viewed by 163

Abstract

Stereoscopic omnidirectional images (SOIs) have gained significant attention for their immersive viewing experience by providing binocular depth with panoramic scenes. However, evaluating their visual quality remains challenging due to its unique spherical geometry, binocular disparity, and viewing conditions. To address these challenges, this [...] Read more.

Stereoscopic omnidirectional images (SOIs) have gained significant attention for their immersive viewing experience by providing binocular depth with panoramic scenes. However, evaluating their visual quality remains challenging due to its unique spherical geometry, binocular disparity, and viewing conditions. To address these challenges, this paper proposes a dual-branch deep learning framework that integrates spherical structural features and perceptual binocular cues to assess the quality of SOIs without reference. Specifically, the global branch leverages spherical convolutions to capture wide-range spatial distortions, while the local branch utilizes a binocular difference module based on discrete wavelet transform to extract depth-aware perceptual information. A feature complementarity module is introduced to fuse global and local representations for final quality prediction. Experimental evaluations on two public SOIQA datasets—NBU-SOID and SOLID—demonstrate that the proposed method achieves state-of-the-art performance, with PLCC/SROCC values of 0.926/0.918 and 0.918/0.891, respectively. These results validate the effectiveness and robustness of our approach in stereoscopic omnidirectional image quality assessment tasks. Full article

(This article belongs to the Special Issue AI in Signal and Image Processing)

► Show Figures

Figure 1

25 pages, 2093 KiB

Open AccessArticle

Deep Learning-Based Speech Enhancement for Robust Sound Classification in Security Systems

by Samuel Yaw Mensah, Tao Zhang, Nahid AI Mahmud and Yanzhang Geng

Electronics 2025, 14(13), 2643; https://doi.org/10.3390/electronics14132643 - 30 Jun 2025

Viewed by 798

Abstract

Deep learning has emerged as a powerful technique for speech enhancement, particularly in security systems where audio signals are often degraded by non-stationary noise. Traditional signal processing methods struggle in such conditions, making it difficult to detect critical sounds like gunshots, alarms, and [...] Read more.

Deep learning has emerged as a powerful technique for speech enhancement, particularly in security systems where audio signals are often degraded by non-stationary noise. Traditional signal processing methods struggle in such conditions, making it difficult to detect critical sounds like gunshots, alarms, and unauthorized speech. This study investigates a hybrid deep learning framework that combines Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Generative Adversarial Networks (GANs) to enhance speech quality and improve sound classification accuracy in noisy security environments. The proposed model is trained and validated using real-world datasets containing diverse noise distortions, including VoxCeleb for benchmarking speech enhancement and UrbanSound8K and ESC-50 for sound classification. Performance is evaluated using industry-standard metrics such as Perceptual Evaluation of Speech Quality (PESQ), Short-Time Objective Intelligibility (STOI), and Signal-to-Noise Ratio (SNR). The architecture includes multi-layered neural networks, residual connections, and dropout regularization to ensure robustness and generalizability. Additionally, the paper addresses key challenges in deploying deep learning models for security applications, such as computational complexity, latency, and vulnerability to adversarial attacks. Experimental results demonstrate that the proposed DNN + GAN-based approach significantly improves speech intelligibility and classification performance in high-interference scenarios, offering a scalable solution for enhancing the reliability of audio-based security systems. Full article

► Show Figures

Figure 1

35 pages, 8283 KiB

Open AccessArticle

PIABC: Point Spread Function Interpolative Aberration Correction

by Chanhyeong Cho, Chanyoung Kim and Sanghoon Sull

Sensors 2025, 25(12), 3773; https://doi.org/10.3390/s25123773 - 17 Jun 2025

Viewed by 453

Abstract

Image quality in high-resolution digital single-lens reflex (DSLR) systems is degraded by Complementary Metal-Oxide-Semiconductor (CMOS) sensor noise and optical imperfections. Sensor noise becomes pronounced under high-ISO (International Organization for Standardization) settings, while optical aberrations such as blur and chromatic fringing distort the signal. [...] Read more.

Image quality in high-resolution digital single-lens reflex (DSLR) systems is degraded by Complementary Metal-Oxide-Semiconductor (CMOS) sensor noise and optical imperfections. Sensor noise becomes pronounced under high-ISO (International Organization for Standardization) settings, while optical aberrations such as blur and chromatic fringing distort the signal. Optical and sensor-level noise are distinct and hard to separate, but prior studies suggest that improving optical fidelity can suppress or mask sensor noise. Upon this understanding, we introduce a framework that utilizes densely interpolated Point Spread Functions (PSFs) to recover high-fidelity images. The process begins by simulating Gaussian-based PSFs as pixel-wise chromatic and spatial distortions derived from real degraded images. These PSFs are then encoded into a latent space to enhance their features and used to generate refined PSFs via similarity-weighted interpolation at each target position. The interpolated PSFs are applied through Wiener filtering, followed by residual correction, to restore images with improved structural fidelity and perceptual quality. We compare our method—based on pixel-wise, physical correction, and densely interpolated PSF at pre-processing—with post-processing networks, including deformable convolutional neural networks (CNNs) that enhance image quality without modeling degradation. Evaluations on DIV2K and RealSR-V3 confirm that our strategy not only enhances structural restoration but also more effectively suppresses sensor-induced artifacts, demonstrating the benefit of explicit physical priors for perceptual fidelity. Full article

(This article belongs to the Special Issue Sensors for Pattern Recognition and Computer Vision)

► Show Figures

Figure 1

22 pages, 23449 KiB

Open AccessArticle

Enhancing Perception Quality in Remote Sensing Image Compression via Invertible Neural Network

by Junhui Li and Xingsong Hou

Remote Sens. 2025, 17(12), 2074; https://doi.org/10.3390/rs17122074 - 17 Jun 2025

Viewed by 476

Abstract

Despite the impressive performance of existing image compression algorithms, they struggle to balance perceptual quality and high image fidelity. To address this issue, we propose a novel invertible neural network-based remote sensing image compression (INN-RSIC) method. Our approach captures the compression distortion from [...] Read more.

Despite the impressive performance of existing image compression algorithms, they struggle to balance perceptual quality and high image fidelity. To address this issue, we propose a novel invertible neural network-based remote sensing image compression (INN-RSIC) method. Our approach captures the compression distortion from an existing image compression algorithm and encodes it as Gaussian-distributed latent variables using an INN, ensuring that the distortion in the decoded image remains independent of the ground truth. By using the inverse mapping of the INN, we input the decoded image with randomly resampled Gaussian variables, generating enhanced images with improved perceptual quality. We incorporate channel expansion, Haar transformation, and invertible blocks into the INN to accurately represent compression distortion. Additionally, a quantization module (QM) is introduced to mitigate format conversion impact, enhancing generalization and perceptual quality. Extensive experiments show that INN-RSIC achieves superior perceptual quality and fidelity compared to existing algorithms. As a lightweight plug-and-play (PnP) method, the proposed INN-based enhancer can be easily integrated into existing high-fidelity compression algorithms, enabling flexible and simultaneous decoding of images with enhanced perceptual quality. Full article

► Show Figures

Graphical abstract

18 pages, 15092 KiB

Open AccessArticle

Ultra-Low Bitrate Predictive Portrait Video Compression with Diffusion Models

by Xinyi Chen, Weimin Lei, Wei Zhang, Yanwen Wang and Mingxin Liu

Symmetry 2025, 17(6), 913; https://doi.org/10.3390/sym17060913 - 10 Jun 2025

Viewed by 726

Abstract

Deep neural video compression codecs have shown great promise in recent years. However, there are still considerable challenges for ultra-low bitrate video coding. Inspired by recent diffusion models for image and video compression attempts, we attempt to leverage diffusion models for ultra-low bitrate [...] Read more.

Deep neural video compression codecs have shown great promise in recent years. However, there are still considerable challenges for ultra-low bitrate video coding. Inspired by recent diffusion models for image and video compression attempts, we attempt to leverage diffusion models for ultra-low bitrate portrait video compression. In this paper, we propose a predictive portrait video compression method that leverages the temporal prediction capabilities of diffusion models. Specifically, we develop a temporal diffusion predictor based on a conditional latent diffusion model, with the predicted results serving as decoded frames. We symmetrically integrate a temporal diffusion predictor at the encoding and decoding side, respectively. When the perceptual quality of the predicted results in encoding end falls below a predefined threshold, a new frame sequence is employed for prediction. While the predictor at the decoding side directly generates predicted frames as reconstruction based on the evaluation results. This symmetry ensures that the prediction frames generated at the decoding end are consistent with those at the encoding end. We also design an adaptive coding strategy that incorporates frame quality assessment and adaptive keyframe control. To ensure consistent quality of subsequent predicted frames and achieve high perceptual reconstruction, this strategy dynamically evaluates the visual quality of the predicted results during encoding, retains the predicted frames that meet the quality threshold, and adaptively adjusts the length of the keyframe sequence based on motion complexity. The experimental results demonstrate that, compared with the traditional video codecs and other popular methods, the proposed scheme provides superior compression performance at ultra-low bitrates while maintaining competitiveness in visual effects, achieving more than 24% bitrate savings compared with VVC in terms of perceptual distortion. Full article

(This article belongs to the Special Issue 2025 9th International Conference on Electronic Information Technology and Computer Engineering)

► Show Figures

Figure 1

21 pages, 10091 KiB

Open AccessArticle

Scalable Hyperspectral Enhancement via Patch-Wise Sparse Residual Learning: Insights from Super-Resolved EnMAP Data

by Parth Naik, Rupsa Chakraborty, Sam Thiele and Richard Gloaguen

Remote Sens. 2025, 17(11), 1878; https://doi.org/10.3390/rs17111878 - 28 May 2025

Viewed by 727

Abstract

A majority of hyperspectral super-resolution methods aim to enhance the spatial resolution of hyperspectral imaging data (HSI) by integrating high-resolution multispectral imaging data (MSI), leveraging rich spectral information for various geospatial applications. Key challenges include spectral distortions from high-frequency spatial data, high computational [...] Read more.

A majority of hyperspectral super-resolution methods aim to enhance the spatial resolution of hyperspectral imaging data (HSI) by integrating high-resolution multispectral imaging data (MSI), leveraging rich spectral information for various geospatial applications. Key challenges include spectral distortions from high-frequency spatial data, high computational complexity, and limited training data, particularly for new-generation sensors with unique noise patterns. In this contribution, we propose a novel parallel patch-wise sparse residual learning (P²SR) algorithm for resolution enhancement based on fusion of HSI and MSI. The proposed method uses multi-decomposition techniques (i.e., Independent component analysis, Non-negative matrix factorization, and 3D wavelet transforms) to extract spatial and spectral features to form a sparse dictionary. The spectral and spatial characteristics of the scene encoded in the dictionary enable reconstruction through a first-order optimization algorithm to ensure an efficient sparse representation. The final spatially enhanced HSI is reconstructed by combining the learned features from low-resolution HSI and applying an MSI-regulated guided filter to enhance spatial fidelity while minimizing artifacts. P²SR is deployable on a high-performance computing (HPC) system with parallel processing, ensuring scalability and computational efficiency for large HSI datasets. Extensive evaluations on three diverse study sites demonstrate that P²SR consistently outperforms traditional and state-of-the-art (SOA) methods in both quantitative metrics and qualitative spatial assessments. Specifically, P²SR achieved the best average PSNR (25.2100) and SAM (12.4542) scores, indicating superior spatio-spectral reconstruction contributing to sharper spatial features, reduced mixed pixels, and enhanced geological features. P²SR also achieved the best average ERGAS (8.9295) and Q2n (0.5156), which suggests better overall fidelity across all bands and perceptual accuracy with the least spectral distortions. Importantly, we show that P²SR preserves critical spectral signatures, such as Fe²⁺ absorption, and improves the detection of fine-scale environmental and geological structures. P²SR’s ability to maintain spectral fidelity while enhancing spatial detail makes it a powerful tool for high-precision remote sensing applications, including mineral mapping, land-use analysis, and environmental monitoring. Full article

► Show Figures

Graphical abstract

38 pages, 8101 KiB

Open AccessArticle

Multi-Scale Self-Attention-Based Convolutional-Neural-Network Post-Filtering for AV1 Codec: Towards Enhanced Visual Quality and Overall Coding Performance

by Woowoen Gwun, Kiho Choi and Gwang Hoon Park

Mathematics 2025, 13(11), 1782; https://doi.org/10.3390/math13111782 - 27 May 2025

Viewed by 773

Abstract

This paper presents MS-MTSA, a multi-scale multi-type self-attention network designed to enhance AV1-compressed video through targeted post-filtering. The objective is to address two persistent artifact issues observed in our previous MTSA model: visible seams at patch boundaries and grid-like distortions from upsampling. To [...] Read more.

This paper presents MS-MTSA, a multi-scale multi-type self-attention network designed to enhance AV1-compressed video through targeted post-filtering. The objective is to address two persistent artifact issues observed in our previous MTSA model: visible seams at patch boundaries and grid-like distortions from upsampling. To this end, MS-MTSA introduces two key architectural enhancements. First, multi-scale block-wise self-attention applies sequential attention over 16 × 16 and 12 × 12 blocks to better capture local context and improve spatial continuity. Second, refined patch-wise self-attention includes a lightweight convolutional refinement layer after upsampling to suppress structured artifacts in flat regions. These targeted modifications significantly improve both perceptual and quantitative quality. The proposed network achieves BD-rate reductions of 12.44% for Y, 21.70% for Cb, and 19.90% for Cr compared to the AV1 anchor. Visual evaluations confirm improved texture fidelity and reduced seam artifacts, demonstrating the effectiveness of combining multi-scale attention and structural refinement for artifact suppression in compressed video. Full article

(This article belongs to the Special Issue Image Processing and Machine Learning with Applications)

► Show Figures

Figure 1

24 pages, 6314 KiB

Open AccessArticle

CDFAN: Cross-Domain Fusion Attention Network for Pansharpening

by Jinting Ding, Honghui Xu and Shengjun Zhou

Entropy 2025, 27(6), 567; https://doi.org/10.3390/e27060567 - 27 May 2025

Viewed by 484

Abstract

Pansharpening provides a computational solution to the resolution limitations of imaging hardware by enhancing the spatial quality of low-resolution hyperspectral (LRMS) images using high-resolution panchromatic (PAN) guidance. From an information-theoretic perspective, the task involves maximizing the mutual information between PAN and LRMS inputs [...] Read more.

Pansharpening provides a computational solution to the resolution limitations of imaging hardware by enhancing the spatial quality of low-resolution hyperspectral (LRMS) images using high-resolution panchromatic (PAN) guidance. From an information-theoretic perspective, the task involves maximizing the mutual information between PAN and LRMS inputs while minimizing spectral distortion and redundancy in the fused output. However, traditional spatial-domain methods often fail to preserve high-frequency texture details, leading to entropy degradation in the resulting images. On the other hand, frequency-based approaches struggle to effectively integrate spatial and spectral cues, often neglecting the underlying information content distributions across domains. To address these shortcomings, we introduce a novel architecture, termed the Cross-Domain Fusion Attention Network (CDFAN), specifically designed for the pansharpening task. CDFAN is composed of two core modules: the Multi-Domain Interactive Attention (MDIA) module and the Spatial Multi-Scale Enhancement (SMCE) module. The MDIA module utilizes discrete wavelet transform (DWT) to decompose the PAN image into frequency sub-bands, which are then employed to construct attention mechanisms across both wavelet and spatial domains. Specifically, wavelet-domain features are used to formulate query vectors, while key features are derived from the spatial domain, allowing attention weights to be computed over multi-domain representations. This design facilitates more effective fusion of spectral and spatial cues, contributing to superior reconstruction of high-resolution multispectral (HRMS) images. Complementing this, the SMCE module integrates multi-scale convolutional pathways to reinforce spatial detail extraction at varying receptive fields. Additionally, an Expert Feature Compensator is introduced to adaptively balance contributions from different scales, thereby optimizing the trade-off between local detail preservation and global contextual understanding. Comprehensive experiments conducted on standard benchmark datasets demonstrate that CDFAN achieves notable improvements over existing state-of-the-art pansharpening methods, delivering enhanced spectral–spatial fidelity and producing images with higher perceptual quality. Full article

(This article belongs to the Section Signal and Data Analysis)

► Show Figures

Figure 1

20 pages, 5649 KiB

Open AccessArticle

Edge-Deployed Band-Split Rotary Position Encoding Transformer for Ultra-Low-Signal-to-Noise-Ratio Unmanned Aerial Vehicle Speech Enhancement

by Feifan Liu, Muying Li, Luming Guo, Hao Guo, Jie Cao, Wei Zhao and Jun Wang

Drones 2025, 9(6), 386; https://doi.org/10.3390/drones9060386 - 22 May 2025

Cited by 1 | Viewed by 828

Abstract

Addressing the significant challenge of speech enhancement in ultra-low-Signal-to-Noise-Ratio (SNR) scenarios for Unmanned Aerial Vehicle (UAV) voice communication, particularly under edge deployment constraints, this study proposes the Edge-Deployed Band-Split Rotary Position Encoding Transformer (Edge-BS-RoFormer), a novel, lightweight band-split rotary position encoding transformer. While [...] Read more.

Addressing the significant challenge of speech enhancement in ultra-low-Signal-to-Noise-Ratio (SNR) scenarios for Unmanned Aerial Vehicle (UAV) voice communication, particularly under edge deployment constraints, this study proposes the Edge-Deployed Band-Split Rotary Position Encoding Transformer (Edge-BS-RoFormer), a novel, lightweight band-split rotary position encoding transformer. While existing deep learning methods face limitations in dynamic UAV noise suppression under such constraints, including insufficient harmonic modeling and high computational complexity, the proposed Edge-BS-RoFormer distinctively synergizes a band-split strategy for fine-grained spectral processing, a dual-dimension Rotary Position Encoding (

RoPE

) mechanism for superior joint time–frequency modeling, and

FlashAttention

to optimize computational efficiency, pivotal for its lightweight nature and robust ultra-low-SNR performance. Experiments on our self-constructed DroneNoise-LibriMix (DN-LM) dataset demonstrate Edge-BS-RoFormer’s superiority. Under a −15 dB SNR, it achieves Scale-Invariant Signal-to-Distortion Ratio (SI-SDR) improvements of 2.2 dB over Deep Complex U-Net (DCUNet), 25.0 dB over the Dual-Path Transformer Network (DPTNet), and 2.3 dB over HTDemucs. Correspondingly, the Perceptual Evaluation of Speech Quality (PESQ) is enhanced by 0.11, 0.18, and 0.15, respectively. Crucially, its efficacy for edge deployment is substantiated by a minimal model storage of 8.534 MB, 11.617 GFLOPs (an 89.6% reduction vs. DCUNet), a runtime memory footprint of under 500MB, a Real-Time Factor (RTF) of 0.325 (latency: 330.830 ms), and a power consumption of 6.536 W on an NVIDIA Jetson AGX Xavier, fulfilling real-time processing demands. This study delivers a validated lightweight solution, exemplified by its minimal computational overhead and real-time edge inference capability, for effective speech enhancement in complex UAV acoustic scenarios, including dynamic noise conditions. Furthermore, the open-sourced dataset and model contribute to advancing research and establishing standardized evaluation frameworks in this domain. Full article

(This article belongs to the Section Drone Communications)

► Show Figures

Figure 1

21 pages, 2555 KiB

Open AccessArticle

Semantic-Aware Low-Light Image Enhancement by Learning from Multiple Color Spaces

by Bo Jiang, Xuefei Wang, Naidi Yang, Yuhan Liu, Xi Chen and Qiwen Wu

Appl. Sci. 2025, 15(10), 5556; https://doi.org/10.3390/app15105556 - 15 May 2025

Viewed by 621

Abstract

Extreme low-light image enhancement presents persistent challenges due to compounded degradations involving underexposure, sensor noise, and structural detail loss. Traditional low-light image enhancement methods predominantly employ global adjustment strategies that disregard semantic context, often resulting in incomplete detail recovery or color distortion. To [...] Read more.

Extreme low-light image enhancement presents persistent challenges due to compounded degradations involving underexposure, sensor noise, and structural detail loss. Traditional low-light image enhancement methods predominantly employ global adjustment strategies that disregard semantic context, often resulting in incomplete detail recovery or color distortion. To address these limitations, we propose a semantic-aware knowledge-guided framework (SKF) that systematically integrates semantic priors for improved illumination recovery. Our framework introduces three key modules: A Semantic Feature Enhancement Module for integrating hierarchical semantic features, a Semantic-Guided Color Histogram Loss to enforce color consistency, and a Semantic-Guided Adversarial Loss to enhance perceptual realism. Furthermore, we improve the semantic-guided color histogram loss by leveraging multi-color space constraints. Inspired by human visual perception mechanisms, our enhanced loss function calculates color discrepancies across three color spaces—RGB, LAB, and LCH—through three components:

l o s s_{r g b}

,

l o s s_{l a b}

and

l o s s_{l c h}

. These components collaboratively optimize image contrast and saturation, thereby simultaneously enhancing contrast preservation and chromatic naturalness. Full article

(This article belongs to the Special Issue Deep Learning and Transformer Technologies for Image/Video Enhancement and Restoration)

► Show Figures

Figure 1

30 pages, 1467 KiB

Open AccessArticle

Rate–Distortion–Perception Trade-Off in Information Theory, Generative Models, and Intelligent Communications

by Xueyan Niu, Bo Bai, Nian Guo, Weixi Zhang and Wei Han

Entropy 2025, 27(4), 373; https://doi.org/10.3390/e27040373 - 31 Mar 2025

Cited by 1 | Viewed by 2039

Abstract

Traditional rate–distortion (RD) theory examines the trade-off between the average length of the compressed representation of a source and the additive distortions of its reconstruction. The rate–distortion–perception (RDP) framework, which integrates the perceptual dimension into the RD paradigm, has garnered significant attention due [...] Read more.

Traditional rate–distortion (RD) theory examines the trade-off between the average length of the compressed representation of a source and the additive distortions of its reconstruction. The rate–distortion–perception (RDP) framework, which integrates the perceptual dimension into the RD paradigm, has garnered significant attention due to recent advancements in machine learning, where perceptual fidelity is assessed by the divergence between input and reconstruction distributions. In communication systems where downstream tasks involve generative modeling, high perceptual fidelity is essential, despite distortion constraints. However, while zero distortion implies perfect realism, the converse is not true, highlighting an imbalance in the significance of distortion and perceptual constraints. This article clarifies that incorporating perceptual constraints does not decrease the necessary rate; instead, under certain conditions, additional rate is required, even with the aid of common and private randomness, which are key elements in generative models. Consequently, we project an increase in expected traffic in intelligent communication networks with the consideration of perceptual quality. Nevertheless, a modest increase in rate can enable generative models to significantly enhance the perceptual quality of reconstructions. By exploring the synergies between generative modeling and communication through the lens of information-theoretic results, this article demonstrates the benefits of intelligent communication systems and advocates for the application of the RDP framework in advancing compression and semantic communication research. Full article

(This article belongs to the Special Issue Semantic Information Theory)

► Show Figures

Figure 1

17 pages, 19409 KiB

Open AccessArticle

Wavelet-Based Topological Loss for Low-Light Image Denoising

by Alexandra Malyugina, Nantheera Anantrasirichai and David Bull

Sensors 2025, 25(7), 2047; https://doi.org/10.3390/s25072047 - 25 Mar 2025

Viewed by 679

Abstract

Despite significant advances in image denoising, most algorithms rely on supervised learning, with their performance largely dependent on the quality and diversity of training data. It is widely assumed that digital image distortions are caused by spatially invariant Additive White Gaussian Noise (AWGN). [...] Read more.

Despite significant advances in image denoising, most algorithms rely on supervised learning, with their performance largely dependent on the quality and diversity of training data. It is widely assumed that digital image distortions are caused by spatially invariant Additive White Gaussian Noise (AWGN). However, the analysis of real-world data suggests that this assumption is invalid. Therefore, this paper tackles image corruption by real noise, providing a framework to capture and utilise the underlying structural information of an image along with the spatial information conventionally used for deep learning tasks. We propose a novel denoising loss function that incorporates topological invariants and is informed by textural information extracted from the image wavelet domain. The effectiveness of this proposed method was evaluated by training state-of-the-art denoising models on the BVI-Lowlight dataset, which features a wide range of real noise distortions. Adding a topological term to common loss functions leads to a significant increase in the LPIPS (Learned Perceptual Image Patch Similarity) metric, with the improvement reaching up to 25%. The results indicate that the proposed loss function enables neural networks to learn noise characteristics better. We demonstrate that they can consequently extract the topological features of noise-free images, resulting in enhanced contrast and preserved textural information. Full article

(This article belongs to the Special Issue Machine Learning in Image/Video Processing and Sensing)

► Show Figures

Figure 1

17 pages, 2335 KiB

Open AccessArticle

Attention-Based Color Difference Perception for Photographic Images

by Hua Qiang, Xuande Zhang and Jinliang Hou

Appl. Sci. 2025, 15(5), 2704; https://doi.org/10.3390/app15052704 - 3 Mar 2025

Viewed by 937

Abstract

Traditional color difference (CD) measurement methods cannot adapt to large sizes and complex content of photographic images. Existing deep learning-based CD measurement algorithms only focus on local features and cannot accurately simulate the human perception of CD. The objective of this paper is [...] Read more.

Traditional color difference (CD) measurement methods cannot adapt to large sizes and complex content of photographic images. Existing deep learning-based CD measurement algorithms only focus on local features and cannot accurately simulate the human perception of CD. The objective of this paper is to propose a high-precision image CD measurement model that simulates the perceptual process of the human visual system and apply it to the CD perception of smartphone photography images. Based on this, a CD measurement network called CD-Attention is proposed, which integrates CNN and Vision Transformer features. First, a CNN and the ViT are used separately to extract local features and global semantic features from the reference image and the distorted image. Secondly, deformable convolution is used for attention guidance, utilizing the global semantic features of the ViT to direct CNN to focus on salient regions of the image, enhancing the transformation modeling capability of CNN features. Thirdly, through the feature fusion module, the CNN features that have been guided by attention are fused with the global semantic features of the ViT. Finally, a dual-branch network for high-frequency and low-frequency predictions is used for score estimation, and the final score is obtained through a weighted sum. Validated on the large-scale SPCD dataset, the CD-Attention model has achieved state-of-the-art performance, outperforming 30 existing CD measurement methods and demonstrating useful generalization ability. It has been demonstrated that CD-Attention can achieve CD measurement for large-sized and content-complex smartphone photography images. At the same time, the effectiveness of CD-Attention’s feature extraction and attention guidance are verified by ablation experiments. Full article

(This article belongs to the Special Issue Advances in Image Enhancement and Restoration Technology)

► Show Figures

Figure 1

38 pages, 13077 KiB

Open AccessArticle

Accentuation as a Mechanism of Visual Illusions: Insights from Adaptive Resonance Theory (ART)

by Baingio Pinna, Jurģis Šķilters and Daniele Porcheddu

Information 2025, 16(3), 172; https://doi.org/10.3390/info16030172 - 25 Feb 2025

Cited by 1 | Viewed by 1146

Abstract

This study introduces and examines the principle of accentuation as a novel mechanism in perceptual organization, analyzing its effects through the framework of Grossberg’s Adaptive Resonance Theory (ART). We demonstrate that localized accentuators, manifesting as minimal dissimilarities or discontinuities, can significantly modulate global [...] Read more.

This study introduces and examines the principle of accentuation as a novel mechanism in perceptual organization, analyzing its effects through the framework of Grossberg’s Adaptive Resonance Theory (ART). We demonstrate that localized accentuators, manifesting as minimal dissimilarities or discontinuities, can significantly modulate global perceptions, inducing illusions of geometric distortion, orientation shifts, and apparent motion. Through a series of phenomenological experiments, we establish that accentuation can supersede classical Gestalt principles, influencing figure-ground segregation, shape perception, and lexical processing. Our findings suggest that accentuation functions as an autonomous organizing principle, leveraging salience-driven attentional capture to generate perceptual effects. We then apply the ART model to elucidate these phenomena, focusing on its core constructs of complementary computing, boundary–surface interactions, and resonant states. Specifically, we show how accentuation-induced asymmetries in boundary signals within the boundary contour system (BCS) can propagate through laminar cortical circuits, biasing figure-ground assignments and shape representations. The interaction between these biased signals and top–down expectations, as modeled by ART’s resonance mechanisms, provides a neurally plausible account for the observed illusions. This integration of accentuation effects with ART offers novel insights into the neural substrates of visual perception and presents a unifying theoretical framework for a diverse array of perceptual phenomena, bridging low-level feature processing with high-level cognitive representations. Full article

(This article belongs to the Special Issue The Resonant Brain: A Themed Issue Dedicated to Professor Stephen Grossberg)

► Show Figures

Figure 1

Search Results (122)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (122)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI