MDPI - Publisher of Open Access Journals

17 pages, 2230 KiB

Open AccessArticle

Enhancing Diffusion-Based Music Generation Performance with LoRA

by Seonpyo Kim, Geonhui Kim, Shoki Yagishita, Daewoon Han, Jeonghyeon Im and Yunsick Sung

Appl. Sci. 2025, 15(15), 8646; https://doi.org/10.3390/app15158646 (registering DOI) - 5 Aug 2025

Recent advancements in generative artificial intelligence have significantly progressed the field of text-to-music generation, enabling users to create music from natural language descriptions. Despite the success of various models, such as MusicLM, MusicGen, and AudioLDM, the current approaches struggle to capture fine-grained genre-specific [...] Read more.

Recent advancements in generative artificial intelligence have significantly progressed the field of text-to-music generation, enabling users to create music from natural language descriptions. Despite the success of various models, such as MusicLM, MusicGen, and AudioLDM, the current approaches struggle to capture fine-grained genre-specific characteristics, precisely control musical attributes, and handle underrepresented cultural data. This paper introduces a novel, lightweight fine-tuning method for the AudioLDM framework using low-rank adaptation (LoRA). By updating only selected attention and projection layers, the proposed method enables efficient adaptation to musical genres with limited data and computational cost. The proposed method enhances controllability over key musical parameters such as rhythm, emotion, and timbre. At the same time, it maintains the overall quality of music generation. This paper represents the first application of LoRA in AudioLDM, offering a scalable solution for fine-grained, genre-aware music generation and customization. The experimental results demonstrate that the proposed method improves the semantic alignment and statistical similarity compared with the baseline. The contrastive language–audio pretraining score increased by 0.0498, indicating enhanced text-music consistency. The kernel audio distance score decreased by 0.8349, reflecting improved similarity to real music distributions. The mean opinion score ranged from 3.5 to 3.8, confirming the perceptual quality of the generated music. Full article

(This article belongs to the Special Issue Recent Advances in AI Convergence: Innovations at the Crossroads of Disciplines)

► Show Figures

Figure 1

16 pages, 506 KiB

Open AccessArticle

Exploring the Link Between Sound Quality Perception, Music Perception, Music Engagement, and Quality of Life in Cochlear Implant Recipients

by Ayşenur Karaman Demirel, Ahmet Alperen Akbulut, Ayşe Ayça Çiprut and Nilüfer Bal

Audiol. Res. 2025, 15(4), 94; https://doi.org/10.3390/audiolres15040094 (registering DOI) - 2 Aug 2025

Viewed by 64

Abstract

Background/Objectives: This study investigated the association between cochlear implant (CI) users’ assessed perception of musical sound quality and their subjective music perception and music-related quality of life (QoL). The aim was to provide a comprehensive evaluation by integrating a relatively objective Turkish [...] Read more.

Background/Objectives: This study investigated the association between cochlear implant (CI) users’ assessed perception of musical sound quality and their subjective music perception and music-related quality of life (QoL). The aim was to provide a comprehensive evaluation by integrating a relatively objective Turkish Multiple Stimulus with Hidden Reference and Anchor (TR-MUSHRA) test and a subjective music questionnaire. Methods: Thirty CI users and thirty normal-hearing (NH) adults were assessed. Perception of sound quality was measured using the TR-MUSHRA test. Subjective assessments were conducted with the Music-Related Quality of Life Questionnaire (MuRQoL). Results: TR-MUSHRA results showed that while NH participants rated all filtered stimuli as perceptually different from the original, CI users provided similar ratings for stimuli with adjacent high-pass filter settings, indicating less differentiation in perceived sound quality. On the MuRQoL, groups differed on the Frequency subscale but not the Importance subscale. Critically, no significant correlation was found between the TR-MUSHRA scores and the MuRQoL subscale scores in either group. Conclusions: The findings demonstrate that TR-MUSHRA is an effective tool for assessing perceived sound quality relatively objectively, but there is no relationship between perceiving sound quality differences and measures of self-reported musical engagement and its importance. Subjective music experience may represent different domains beyond the perception of sound quality. Therefore, successful auditory rehabilitation requires personalized strategies that consider the multifaceted nature of music perception beyond simple perceptual judgments. Full article

(This article belongs to the Special Issue Hearing Beyond Words: Advancements in Music Perception and Enjoyment for the Hearing-Impaired Population)

► Show Figures

Figure 1

28 pages, 3794 KiB

Open AccessArticle

A Robust System for Super-Resolution Imaging in Remote Sensing via Attention-Based Residual Learning

by Rogelio Reyes-Reyes, Yeredith G. Mora-Martinez, Beatriz P. Garcia-Salgado, Volodymyr Ponomaryov, Jose A. Almaraz-Damian, Clara Cruz-Ramos and Sergiy Sadovnychiy

Mathematics 2025, 13(15), 2400; https://doi.org/10.3390/math13152400 - 25 Jul 2025

Viewed by 203

Abstract

Deep learning-based super-resolution (SR) frameworks are widely used in remote sensing applications. However, existing SR models still face limitations, particularly in recovering contours, fine features, and textures, as well as in effectively integrating channel information. To address these challenges, this study introduces a [...] Read more.

Deep learning-based super-resolution (SR) frameworks are widely used in remote sensing applications. However, existing SR models still face limitations, particularly in recovering contours, fine features, and textures, as well as in effectively integrating channel information. To address these challenges, this study introduces a novel residual model named OARN (Optimized Attention Residual Network) specifically designed to enhance the visual quality of low-resolution images. The network operates on the Y channel of the YCbCr color space and integrates LKA (Large Kernel Attention) and OCM (Optimized Convolutional Module) blocks. These components can restore large-scale spatial relationships and refine textures and contours, improving feature reconstruction without significantly increasing computational complexity. The performance of OARN was evaluated using satellite images from WorldView-2, GaoFen-2, and Microsoft Virtual Earth. Evaluation was conducted using objective quality metrics, such as Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), Edge Preservation Index (EPI), and Perceptual Image Patch Similarity (LPIPS), demonstrating superior results compared to state-of-the-art methods in both objective measurements and subjective visual perception. Moreover, OARN achieves this performance while maintaining computational efficiency, offering a balanced trade-off between processing time and reconstruction quality. Full article

(This article belongs to the Special Issue Computing in Image Processing for Remote Sensing and Biomedical Applications)

► Show Figures

Figure 1

25 pages, 6911 KiB

Open AccessArticle

Image Inpainting Algorithm Based on Structure-Guided Generative Adversarial Network

by Li Zhao, Tongyang Zhu, Chuang Wang, Feng Tian and Hongge Yao

Mathematics 2025, 13(15), 2370; https://doi.org/10.3390/math13152370 - 24 Jul 2025

Viewed by 308

Abstract

To address the challenges of image inpainting in scenarios with extensive or irregular missing regions—particularly detail oversmoothing, structural ambiguity, and textural incoherence—this paper proposes an Image Structure-Guided (ISG) framework that hierarchically integrates structural priors with semantic-aware texture synthesis. The proposed methodology advances a [...] Read more.

To address the challenges of image inpainting in scenarios with extensive or irregular missing regions—particularly detail oversmoothing, structural ambiguity, and textural incoherence—this paper proposes an Image Structure-Guided (ISG) framework that hierarchically integrates structural priors with semantic-aware texture synthesis. The proposed methodology advances a two-stage restoration paradigm: (1) Structural Prior Extraction, where adaptive edge detection algorithms identify residual contours in corrupted regions, and a transformer-enhanced network reconstructs globally consistent structural maps through contextual feature propagation; (2) Structure-Constrained Texture Synthesis, wherein a multi-scale generator with hybrid dilated convolutions and channel attention mechanisms iteratively refines high-fidelity textures under explicit structural guidance. The framework introduces three innovations: (1) a hierarchical feature fusion architecture that synergizes multi-scale receptive fields with spatial-channel attention to preserve long-range dependencies and local details simultaneously; (2) spectral-normalized Markovian discriminator with gradient-penalty regularization, enabling adversarial training stability while enforcing patch-level structural consistency; and (3) dual-branch loss formulation combining perceptual similarity metrics with edge-aware constraints to align synthesized content with both semantic coherence and geometric fidelity. Our experiments on the two benchmark datasets (Places2 and CelebA) have demonstrated that our framework achieves more unified textures and structures, bringing the restored images closer to their original semantic content. Full article

► Show Figures

Figure 1

27 pages, 8957 KiB

Open AccessArticle

DFAN: Single Image Super-Resolution Using Stationary Wavelet-Based Dual Frequency Adaptation Network

by Gyu-Il Kim and Jaesung Lee

Symmetry 2025, 17(8), 1175; https://doi.org/10.3390/sym17081175 - 23 Jul 2025

Viewed by 295

Abstract

Single image super-resolution is the inverse problem of reconstructing a high-resolution image from its low-resolution counterpart. Although recent Transformer-based architectures leverage global context integration to improve reconstruction quality, they often overlook frequency-specific characteristics, resulting in the loss of high-frequency information. To address this [...] Read more.

Single image super-resolution is the inverse problem of reconstructing a high-resolution image from its low-resolution counterpart. Although recent Transformer-based architectures leverage global context integration to improve reconstruction quality, they often overlook frequency-specific characteristics, resulting in the loss of high-frequency information. To address this limitation, we propose the Dual Frequency Adaptive Network (DFAN). DFAN first decomposes the input into low- and high-frequency components via Stationary Wavelet Transform. In the low-frequency branch, Swin Transformer layers restore global structures and color consistency. In contrast, the high-frequency branch features a dedicated module that combines Directional Convolution with Residual Dense Blocks, precisely reinforcing edges and textures. A frequency fusion module then adaptively merges these complementary features using depthwise and pointwise convolutions, achieving a balanced reconstruction. During training, we introduce a frequency-aware multi-term loss alongside the standard pixel-wise loss to explicitly encourage high-frequency preservation. Extensive experiments on the Set5, Set14, BSD100, Urban100, and Manga109 benchmarks show that DFAN achieves up to +0.64 dBpeak signal-to-noise ratio, +0.01 structural similarity index measure, and −0.01learned perceptual image patch similarity over the strongest frequency-domain baselines, while also delivering visibly sharper textures and cleaner edges. By unifying spatial and frequency-domain advantages, DFAN effectively mitigates high-frequency degradation and enhances SISR performance. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

26 pages, 7178 KiB

Open AccessArticle

Super-Resolution Reconstruction of Formation MicroScanner Images Based on the SRGAN Algorithm

by Changqiang Ma, Xinghua Qi, Liangyu Chen, Yonggui Li, Jianwei Fu and Zejun Liu

Processes 2025, 13(7), 2284; https://doi.org/10.3390/pr13072284 - 17 Jul 2025

Viewed by 329

Abstract

Formation MicroScanner Image (FMI) technology is a key method for identifying fractured reservoirs and optimizing oil and gas exploration, but its inherent insufficient resolution severely constrains the fine characterization of geological features. This study innovatively applies a Super-Resolution Generative Adversarial Network (SRGAN) to [...] Read more.

Formation MicroScanner Image (FMI) technology is a key method for identifying fractured reservoirs and optimizing oil and gas exploration, but its inherent insufficient resolution severely constrains the fine characterization of geological features. This study innovatively applies a Super-Resolution Generative Adversarial Network (SRGAN) to the super-resolution reconstruction of FMI logging image to address this bottleneck problem. By collecting FMI logging image of glutenite from a well in Xinjiang, a training set containing 24,275 images was constructed, and preprocessing strategies such as grayscale conversion and binarization were employed to optimize input features. Leveraging SRGAN’s generator-discriminator adversarial mechanism and perceptual loss function, high-quality mapping from low-resolution FMI logging image to high-resolution images was achieved. This study yields significant results: in RGB image reconstruction, SRGAN achieved a Peak Signal-to-Noise Ratio (PSNR) of 41.39 dB, surpassing the optimal traditional method (bicubic interpolation) by 61.6%; its Structural Similarity Index (SSIM) reached 0.992, representing a 34.1% improvement; in grayscale image processing, SRGAN effectively eliminated edge blurring, with the PSNR (40.15 dB) and SSIM (0.990) exceeding the suboptimal method (bilinear interpolation) by 36.6% and 9.9%, respectively. These results fully confirm that SRGAN can significantly restore edge contours and structural details in FMI logging image, with performance far exceeding traditional interpolation methods. This study not only systematically verifies, for the first time, SRGAN’s exceptional capability in enhancing FMI resolution, but also provides a high-precision data foundation for reservoir parameter inversion and geological modeling, holding significant application value for advancing the intelligent exploration of complex hydrocarbon reservoirs. Full article

(This article belongs to the Special Issue Data Acquisition, Processing, Analysis Methods and Process Control in Energy Exploration Systems)

► Show Figures

Figure 1

25 pages, 906 KiB

Open AccessArticle

Query-Efficient Two-Phase Reinforcement Learning Framework for Black-Box Adversarial Attacks

by Zerou Ma and Tao Feng

Symmetry 2025, 17(7), 1093; https://doi.org/10.3390/sym17071093 - 8 Jul 2025

Viewed by 354

Abstract

Generating adversarial examples under black-box settings poses significant challenges due to the inaccessibility of internal model information. This complexity is further exacerbated when attempting to achieve a balance between the attack success rate and perceptual quality. In this paper, we propose QTRL, a [...] Read more.

Generating adversarial examples under black-box settings poses significant challenges due to the inaccessibility of internal model information. This complexity is further exacerbated when attempting to achieve a balance between the attack success rate and perceptual quality. In this paper, we propose QTRL, a query-efficient two-phase reinforcement learning framework for generating high-quality black-box adversarial examples. Unlike existing approaches that treat adversarial generation as a single-step optimization problem, QTRL introduces a progressive two-phase learning strategy. The initial phase focuses on training the agent to develop effective adversarial strategies, while the second phase refines the perturbations to improve visual quality without sacrificing attack performance. To compensate for the unavailability of gradient information inherent in black-box settings, QTRL designs distinct reward functions for the two phases: the first prioritizes attack success, whereas the second incorporates perceptual similarity metrics to guide refinement. Furthermore, a hard sample mining mechanism is introduced to revisit previously failed attacks, significantly enhancing the robustness and generalization capabilities of the learned policy. Experimental results on the MNIST and CIFAR-10 datasets demonstrate that QTRL achieves attack success rates comparable to those of state-of-the-art methods while substantially reducing query overhead, offering a practical and extensible solution for adversarial research in black-box scenarios. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

14 pages, 1112 KiB

Open AccessArticle

Individual Noise-Tolerance Profiles and Neural Signal-to-Noise Ratio: Insights into Predicting Speech-in-Noise Performance and Noise-Reduction Outcomes

by Subong Kim, Susan Arzac, Natalie Dokic, Jenn Donnelly, Nicole Genser, Kristen Nortwich and Alexis Rooney

Audiol. Res. 2025, 15(4), 78; https://doi.org/10.3390/audiolres15040078 - 2 Jul 2025

Viewed by 278

Abstract

Background/Objectives: Individuals with similar hearing sensitivity exhibit varying levels of tolerance to background noise, a trait tied to unique individual characteristics that affect their responsiveness to noise reduction (NR) processing in hearing aids. The present study aimed to capture such individual characteristics [...] Read more.

Background/Objectives: Individuals with similar hearing sensitivity exhibit varying levels of tolerance to background noise, a trait tied to unique individual characteristics that affect their responsiveness to noise reduction (NR) processing in hearing aids. The present study aimed to capture such individual characteristics by employing electrophysiological measures and subjective noise-tolerance profiles, and both were analyzed in relation to speech-in-noise performance and NR outcomes. Methods: From a sample of 42 participants with normal hearing, the neural signal-to-noise ratio (SNR)—a cortical index comparing the amplitude ratio between auditory evoked responses to target speech onset versus noise onset—was calculated, and individual noise-tolerance profiles were also derived using k-means cluster analysis to classify participants into distinct subgroups. Results: The neural SNR showed significant correlations with speech-in-noise performance and NR outcomes with varying strength. In contrast, noise-tolerance subgroups did not show meaningful group-level differences in either speech-in-noise or NR outcomes. The neural SNR and noise-tolerance profiles were found to be statistically independent. Conclusions: While the neural SNR reliably predicted perceptual performance in background noise and NR outcomes, our noise-tolerance profiles lacked sufficient sensitivity. Still, subjective ratings of individual noise tolerance are clinically accessible, and thus, integrating both physiology and subjective measures in the same cohort is a valuable strategy. Full article

► Show Figures

Figure 1

16 pages, 2376 KiB

Open AccessArticle

Nested U-Net-Based GAN Model for Super-Resolution of Stained Light Microscopy Images

by Seong-Hyeon Kang and Ji-Youn Kim

Photonics 2025, 12(7), 665; https://doi.org/10.3390/photonics12070665 - 1 Jul 2025

Viewed by 381

Abstract

The purpose of this study was to propose a deep learning-based model for the super-resolution reconstruction of stained light microscopy images. To achieve this, perceptual loss was applied to the generator to reflect multichannel signal intensity, distribution, and structural similarity. A nested U-Net [...] Read more.

The purpose of this study was to propose a deep learning-based model for the super-resolution reconstruction of stained light microscopy images. To achieve this, perceptual loss was applied to the generator to reflect multichannel signal intensity, distribution, and structural similarity. A nested U-Net architecture was employed to address the representational limitations of the conventional U-Net. For quantitative evaluation, the peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and correlation coefficient (CC) were calculated. In addition, intensity profile analysis was performed to assess the model’s ability to restore the boundary signals more precisely. The experimental results demonstrated that the proposed model outperformed both the signal and structural restoration compared to single U-Net and U-Net-based generative adversarial network (GAN) models. Consequently, the PSNR, SSIM, and CC values demonstrated relative improvements of approximately 1.017, 1.023, and 1.010 times, respectively, compared to the input images. In particular, the intensity profile analysis confirmed the effectiveness of the nested U-Net-based generator in restoring cellular boundaries and structures in the stained microscopy images. In conclusion, the proposed model effectively enhanced the resolution of stained light microscopy images acquired in a multichannel format. Full article

(This article belongs to the Special Issue Recent Advances in Biomedical Optics and Biophotonics)

► Show Figures

Figure 1

30 pages, 30354 KiB

Open AccessArticle

Typological Transcoding Through LoRA and Diffusion Models: A Methodological Framework for Stylistic Emulation of Eclectic Facades in Krakow

by Zequn Chen, Nan Zhang, Chaoran Xu, Zhiyu Xu, Songjiang Han and Lishan Jiang

Buildings 2025, 15(13), 2292; https://doi.org/10.3390/buildings15132292 - 29 Jun 2025

Viewed by 388

Abstract

The stylistic emulation of historical building facades presents significant challenges for artificial intelligence (AI), particularly for complex and data-scarce styles like Krakow’s Eclecticism. This study aims to develop a methodological framework for a “typological transcoding” of style that moves beyond mere visual mimicry, [...] Read more.

The stylistic emulation of historical building facades presents significant challenges for artificial intelligence (AI), particularly for complex and data-scarce styles like Krakow’s Eclecticism. This study aims to develop a methodological framework for a “typological transcoding” of style that moves beyond mere visual mimicry, which is crucial for heritage preservation and urban renewal. The proposed methodology integrates architectural typology with Low-Rank Adaptation (LoRA) for fine-tuning a Stable Diffusion (SD) model. This process involves a typology-guided preparation of a curated dataset (150 images) and precise control of training parameters. The resulting typologically guided LoRA-tuned model demonstrates significant performance improvements over baseline models. Quantitative analysis shows a 24.6% improvement in Fréchet Inception Distance (FID) and a 7.0% improvement in Learned Perceptual Image Patch Similarity (LPIPS). Furthermore, qualitative evaluations by 68 experts confirm superior realism and stylistic accuracy. The findings indicate that this synergy enables data-efficient, typology-grounded stylistic emulation, highlighting AI’s potential as a creative partner for nuanced reinterpretation. However, achieving deeper semantic understanding and robust 3D inference remains an ongoing challenge. Full article

(This article belongs to the Special Issue Research on Promoting the Social Sustainability of Urban Neighbourhoods)

► Show Figures

Figure 1

12 pages, 2513 KiB

Open AccessArticle

Optoelectronic Memristor Based on ZnO/Cu₂O for Artificial Synapses and Visual System

by Chen Meng, Hongxin Liu, Tong Li, Jin Luo and Sijie Zhang

Electronics 2025, 14(12), 2490; https://doi.org/10.3390/electronics14122490 - 19 Jun 2025

Viewed by 428

Abstract

The development of artificial intelligence has resulted in significant challenges to conventional von Neumann architectures, including the separation of storage and computation, and power consumption bottlenecks. The new generation of brain-like devices is accelerating its evolution in the direction of high-density integration and [...] Read more.

The development of artificial intelligence has resulted in significant challenges to conventional von Neumann architectures, including the separation of storage and computation, and power consumption bottlenecks. The new generation of brain-like devices is accelerating its evolution in the direction of high-density integration and integrated sensing, storage, and computing. The structural and information transmission similarity between memristors and biological synapses signifies their unique potential in sensing and memory. Therefore, memristors have become potential candidates for neural devices. In this paper, we have designed an optoelectronic memristor based on a ZnO/Cu₂O structure to achieve synaptic behavior through the modulation of electrical signals, demonstrating the recognition of a dataset by a neural network. Furthermore, the optical synaptic functions, such as short-term/long-term potentiation and learn-forget-relearn behavior, and advanced synaptic behavior of optoelectronic modulation, are successfully simulated. The mechanism of light-induced conductance enhancement is explained by the barrier change at the interface. This work explores a new pathway for constructing next-generation optoelectronic synaptic devices, which lays the foundation for future brain-like visual chips and intelligent perceptual devices. Full article

► Show Figures

Figure 1

26 pages, 4992 KiB

Open AccessArticle

NDVI and Beyond: Vegetation Indices as Features for Crop Recognition and Segmentation in Hyperspectral Data

by Andreea Nițu, Corneliu Florea, Mihai Ivanovici and Andrei Racoviteanu

Sensors 2025, 25(12), 3817; https://doi.org/10.3390/s25123817 - 18 Jun 2025

Viewed by 576

Abstract

Vegetation indices have long been central to vegetation monitoring through remote sensing. The most popular one is the Normalized Difference Vegetation Index (NDVI), yet many vegetation indices (VIs) exist. In this paper, we investigate their distinctiveness and discriminative power in the context of [...] Read more.

Vegetation indices have long been central to vegetation monitoring through remote sensing. The most popular one is the Normalized Difference Vegetation Index (NDVI), yet many vegetation indices (VIs) exist. In this paper, we investigate their distinctiveness and discriminative power in the context of applications for agriculture based on hyperspectral data. More precisely, this paper merges two complementary perspectives: an unsupervised analysis with PRISMA satellite imagery to explore whether these indices are truly distinct in practice and a supervised classification over UAV hyperspectral data. We assess their discriminative power, statistical correlations, and perceptual similarities. Our findings suggest that while many VIs have a certain correlation with the NDVI, meaningful differences emerge depending on landscape and application context, thus supporting their effectiveness as discriminative features usable in remote crop segmentation and recognition applications. Full article

(This article belongs to the Special Issue Selected Papers from International Symposium on Electronics and Telecommunications ISETC 2024)

► Show Figures

Figure 1

35 pages, 8283 KiB

Open AccessArticle

PIABC: Point Spread Function Interpolative Aberration Correction

by Chanhyeong Cho, Chanyoung Kim and Sanghoon Sull

Sensors 2025, 25(12), 3773; https://doi.org/10.3390/s25123773 - 17 Jun 2025

Viewed by 453

Abstract

Image quality in high-resolution digital single-lens reflex (DSLR) systems is degraded by Complementary Metal-Oxide-Semiconductor (CMOS) sensor noise and optical imperfections. Sensor noise becomes pronounced under high-ISO (International Organization for Standardization) settings, while optical aberrations such as blur and chromatic fringing distort the signal. [...] Read more.

Image quality in high-resolution digital single-lens reflex (DSLR) systems is degraded by Complementary Metal-Oxide-Semiconductor (CMOS) sensor noise and optical imperfections. Sensor noise becomes pronounced under high-ISO (International Organization for Standardization) settings, while optical aberrations such as blur and chromatic fringing distort the signal. Optical and sensor-level noise are distinct and hard to separate, but prior studies suggest that improving optical fidelity can suppress or mask sensor noise. Upon this understanding, we introduce a framework that utilizes densely interpolated Point Spread Functions (PSFs) to recover high-fidelity images. The process begins by simulating Gaussian-based PSFs as pixel-wise chromatic and spatial distortions derived from real degraded images. These PSFs are then encoded into a latent space to enhance their features and used to generate refined PSFs via similarity-weighted interpolation at each target position. The interpolated PSFs are applied through Wiener filtering, followed by residual correction, to restore images with improved structural fidelity and perceptual quality. We compare our method—based on pixel-wise, physical correction, and densely interpolated PSF at pre-processing—with post-processing networks, including deformable convolutional neural networks (CNNs) that enhance image quality without modeling degradation. Evaluations on DIV2K and RealSR-V3 confirm that our strategy not only enhances structural restoration but also more effectively suppresses sensor-induced artifacts, demonstrating the benefit of explicit physical priors for perceptual fidelity. Full article

(This article belongs to the Special Issue Sensors for Pattern Recognition and Computer Vision)

► Show Figures

Figure 1

19 pages, 23096 KiB

Open AccessArticle

GAN-Based Super-Resolution in Linear R-SAM Imaging for Enhanced Non-Destructive Semiconductor Measurement

by Thi Thu Ha Vu, Tan Hung Vo, Trong Nhan Nguyen, Jaeyeop Choi, Le Hai Tran, Vu Hoang Minh Doan, Van Bang Nguyen, Wonjo Lee, Sudip Mondal and Junghwan Oh

Appl. Sci. 2025, 15(12), 6780; https://doi.org/10.3390/app15126780 - 17 Jun 2025

Viewed by 504

Abstract

The precise identification and non-destructive measurement of structural features and defects in semiconductor wafers are essential for ensuring process integrity and sustaining high yield in advanced manufacturing environments. Unlike conventional measurement techniques, scanning acoustic microscopy (SAM) is an advanced method that provides detailed [...] Read more.

The precise identification and non-destructive measurement of structural features and defects in semiconductor wafers are essential for ensuring process integrity and sustaining high yield in advanced manufacturing environments. Unlike conventional measurement techniques, scanning acoustic microscopy (SAM) is an advanced method that provides detailed visualizations of both surface and internal wafer structures. However, in practical industrial applications, the scanning time and image quality of SAM significantly impact its overall performance and utility. Prolonged scanning durations can lead to production bottlenecks, while suboptimal image quality can compromise the accuracy of defect detection. To address these challenges, this study proposes LinearTGAN, an improved generative adversarial network (GAN)-based model specifically designed to improve the resolution of linear acoustic wafer images acquired by the breakthrough rotary scanning acoustic microscopy (R-SAM) system. Empirical evaluations demonstrate that the proposed model significantly outperforms conventional GAN-based approaches, achieving a Peak Signal-to-Noise Ratio (PSNR) of 29.479 dB, a Structural Similarity Index Measure (SSIM) of 0.874, a Learned Perceptual Image Patch Similarity (LPIPS) of 0.095, and a Fréchet Inception Distance (FID) of 0.445. To assess the measurement aspect of LinearTGAN, a lightweight defect segmentation module was integrated and tested on annotated wafer datasets. The super-resolved images produced by LinearTGAN significantly enhanced segmentation accuracy and improved the sensitivity of microcrack detection. Furthermore, the deployment of LinearTGAN within the R-SAM system yielded a 92% improvement in scanning performance for 12-inch wafers while simultaneously enhancing image fidelity. The integration of super-resolution techniques into R-SAM significantly advances the precision, robustness, and efficiency of non-destructive measurements, highlighting their potential to have a transformative impact in semiconductor metrology and quality assurance. Full article

► Show Figures

Figure 1

23 pages, 3946 KiB

Open AccessArticle

The Impact of Color Blindness on Player Engagement and Emotional Experiences: A Multimodal Study in a Game-Based Environment

by Merve Tillem and Ahmet Gün

Multimodal Technol. Interact. 2025, 9(6), 62; https://doi.org/10.3390/mti9060062 - 13 Jun 2025

Viewed by 578

Abstract

Color blindness can create challenges in recognizing visual cues, potentially affecting players’ performance, emotional involvement, and overall gaming experience. This study examines the impact of color blindness on player engagement and emotional experiences in digital games. The research aims to analyze how color-blind [...] Read more.

Color blindness can create challenges in recognizing visual cues, potentially affecting players’ performance, emotional involvement, and overall gaming experience. This study examines the impact of color blindness on player engagement and emotional experiences in digital games. The research aims to analyze how color-blind individuals engage with and emotionally respond to games, offering insights into more inclusive and accessible game design. An experiment-based study was conducted using a between-group design with a total of 13 participants, including 5 color-blind and 8 non-color-blind participants (aged 18–30). The sample was carefully selected to ensure participants had similar levels of digital gaming experience and familiarity with digital games, reducing potential biases related to skill or prior exposure. A custom-designed game, “Color Quest,” was developed to assess engagement and emotional responses. Emotional responses were measured through Emotion AI analysis, video recordings, and self-reported feedback forms. Participants were also asked to rate their engagement and emotional experience on a 1 to 5 scale, with additional qualitative feedback collected for deeper insights. The results indicate that color-blind players generally reported lower engagement levels compared to non-color-blind players. Although quantitative data did not reveal a direct correlation between color blindness and visual experience, self-reported feedback suggests that color-related design choices negatively impact emotional involvement and player immersion. Furthermore, in the survey responses from participants, color-blind individuals rated their experiences lower compared to individuals with normal vision. Participants emphasized that certain visual elements created difficulties in gameplay, and alternative sensory cues, such as audio feedback, helped mitigate these challenges. This study presents an experimental evaluation of color blindness in gaming, emphasizing how sensory adaptation strategies can support player engagement and emotional experience. This study contributes to game accessibility research by highlighting the importance of perceptual diversity and inclusive sensory design in enhancing player engagement for color-blind individuals. Full article

(This article belongs to the Special Issue Multimodal User Interfaces and Experiences: Challenges, Applications, and Perspectives—2nd Edition)

► Show Figures

Figure 1

Search Results (320)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (320)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI