Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (79)

Search Parameters:
Keywords = Fréchet inception distance (FID)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 1622 KiB  
Article
Enhancing Wearable Fall Detection System via Synthetic Data
by Minakshi Debnath, Sana Alamgeer, Md Shahriar Kabir and Anne H. Ngu
Sensors 2025, 25(15), 4639; https://doi.org/10.3390/s25154639 - 26 Jul 2025
Viewed by 377
Abstract
Deep learning models rely heavily on extensive training data, but obtaining sufficient real-world data remains a major challenge in clinical fields. To address this, we explore methods for generating realistic synthetic multivariate fall data to supplement limited real-world samples collected from three fall-related [...] Read more.
Deep learning models rely heavily on extensive training data, but obtaining sufficient real-world data remains a major challenge in clinical fields. To address this, we explore methods for generating realistic synthetic multivariate fall data to supplement limited real-world samples collected from three fall-related datasets: SmartFallMM, UniMib, and K-Fall. We apply three conventional time-series augmentation techniques, a Diffusion-based generative AI method, and a novel approach that extracts fall segments from public video footage of older adults. A key innovation of our work is the exploration of two distinct approaches: video-based pose estimation to extract fall segments from public footage, and Diffusion models to generate synthetic fall signals. Both methods independently enable the creation of highly realistic and diverse synthetic data tailored to specific sensor placements. To our knowledge, these approaches and especially their application in fall detection represent rarely explored directions in this research area. To assess the quality of the synthetic data, we use quantitative metrics, including the Fréchet Inception Distance (FID), Discriminative Score, Predictive Score, Jensen–Shannon Divergence (JSD), and Kolmogorov–Smirnov (KS) test, and visually inspect temporal patterns for structural realism. We observe that Diffusion-based synthesis produces the most realistic and distributionally aligned fall data. To further evaluate the impact of synthetic data, we train a long short-term memory (LSTM) model offline and test it in real time using the SmartFall App. Incorporating Diffusion-based synthetic data improves the offline F1-score by 7–10% and boosts real-time fall detection performance by 24%, confirming its value in enhancing model robustness and applicability in real-world settings. Full article
Show Figures

Figure 1

24 pages, 7849 KiB  
Article
Face Desensitization for Autonomous Driving Based on Identity De-Identification of Generative Adversarial Networks
by Haojie Ji, Liangliang Tian, Jingyan Wang, Yuchi Yao and Jiangyue Wang
Electronics 2025, 14(14), 2843; https://doi.org/10.3390/electronics14142843 - 15 Jul 2025
Viewed by 278
Abstract
Automotive intelligent agents are increasingly collecting facial data for applications such as driver behavior monitoring and identity verification. These excessive collections of facial data bring serious risks of sensitive information leakage to autonomous driving. Facial information has been explicitly required to be anonymized, [...] Read more.
Automotive intelligent agents are increasingly collecting facial data for applications such as driver behavior monitoring and identity verification. These excessive collections of facial data bring serious risks of sensitive information leakage to autonomous driving. Facial information has been explicitly required to be anonymized, but the availability of most desensitized facial data is poor, which will greatly affect its application in autonomous driving. This paper proposes an automotive sensitive information anonymization method with high-quality generated facial images by considering the data availability under privacy protection. By comparing K-Same and Generative Adversarial Networks (GANs), this paper proposes a hierarchical self-attention mechanism in StyleGAN3 to enhance the feature perception of face images. The synchronous regularization of sample data is applied to optimize the loss function of the discriminator of StyleGAN3, thereby improving the convergence stability of the model. The experimental results demonstrate that the proposed facial desensitization model reduces the Frechet inception distance (FID) and structural similarity index measure (SSIM) by 95.8% and 24.3%, respectively. The image quality and privacy desensitization of the facial data generated by the StyleGAN3 model have been fully verified in this work. This research provides an efficient and robust facial privacy protection solution for autonomous driving, which is conducive to promoting the security guarantee of automotive data. Full article
(This article belongs to the Special Issue Development and Advances in Autonomous Driving Technology)
Show Figures

Figure 1

23 pages, 3645 KiB  
Article
Color-Guided Mixture-of-Experts Conditional GAN for Realistic Biomedical Image Synthesis in Data-Scarce Diagnostics
by Patrycja Kwiek, Filip Ciepiela and Małgorzata Jakubowska
Electronics 2025, 14(14), 2773; https://doi.org/10.3390/electronics14142773 - 10 Jul 2025
Viewed by 270
Abstract
Background: Limited availability of high-quality labeled biomedical image datasets presents a significant challenge for training deep learning models in medical diagnostics. This study proposes a novel image generation framework combining conditional generative adversarial networks (cGANs) with a Mixture-of-Experts (MoE) architecture and color histogram-aware [...] Read more.
Background: Limited availability of high-quality labeled biomedical image datasets presents a significant challenge for training deep learning models in medical diagnostics. This study proposes a novel image generation framework combining conditional generative adversarial networks (cGANs) with a Mixture-of-Experts (MoE) architecture and color histogram-aware loss functions to enhance synthetic blood cell image quality. Methods: RGB microscopic images from the BloodMNIST dataset (eight blood cell types, resolution 3 × 128 × 128) underwent preprocessing with k-means clustering to extract the dominant colors and UMAP for visualizing class similarity. Spearman correlation-based distance matrices were used to evaluate the discriminative power of each RGB channel. A MoE–cGAN architecture was developed with residual blocks and LeakyReLU activations. Expert generators were conditioned on cell type, and the generator’s loss was augmented with a Wasserstein distance-based term comparing red and green channel histograms, which were found most relevant for class separation. Results: The red and green channels contributed most to class discrimination; the blue channel had minimal impact. The proposed model achieved 0.97 classification accuracy on generated images (ResNet50), with 0.96 precision, 0.97 recall, and a 0.96 F1-score. The best Fréchet Inception Distance (FID) was 52.1. Misclassifications occurred mainly among visually similar cell types. Conclusions: Integrating histogram alignment into the MoE–cGAN training significantly improves the realism and class-specific variability of synthetic images, supporting robust model development under data scarcity in hematological imaging. Full article
Show Figures

Figure 1

30 pages, 30354 KiB  
Article
Typological Transcoding Through LoRA and Diffusion Models: A Methodological Framework for Stylistic Emulation of Eclectic Facades in Krakow
by Zequn Chen, Nan Zhang, Chaoran Xu, Zhiyu Xu, Songjiang Han and Lishan Jiang
Buildings 2025, 15(13), 2292; https://doi.org/10.3390/buildings15132292 - 29 Jun 2025
Viewed by 408
Abstract
The stylistic emulation of historical building facades presents significant challenges for artificial intelligence (AI), particularly for complex and data-scarce styles like Krakow’s Eclecticism. This study aims to develop a methodological framework for a “typological transcoding” of style that moves beyond mere visual mimicry, [...] Read more.
The stylistic emulation of historical building facades presents significant challenges for artificial intelligence (AI), particularly for complex and data-scarce styles like Krakow’s Eclecticism. This study aims to develop a methodological framework for a “typological transcoding” of style that moves beyond mere visual mimicry, which is crucial for heritage preservation and urban renewal. The proposed methodology integrates architectural typology with Low-Rank Adaptation (LoRA) for fine-tuning a Stable Diffusion (SD) model. This process involves a typology-guided preparation of a curated dataset (150 images) and precise control of training parameters. The resulting typologically guided LoRA-tuned model demonstrates significant performance improvements over baseline models. Quantitative analysis shows a 24.6% improvement in Fréchet Inception Distance (FID) and a 7.0% improvement in Learned Perceptual Image Patch Similarity (LPIPS). Furthermore, qualitative evaluations by 68 experts confirm superior realism and stylistic accuracy. The findings indicate that this synergy enables data-efficient, typology-grounded stylistic emulation, highlighting AI’s potential as a creative partner for nuanced reinterpretation. However, achieving deeper semantic understanding and robust 3D inference remains an ongoing challenge. Full article
Show Figures

Figure 1

14 pages, 1438 KiB  
Article
CDBA-GAN: A Conditional Dual-Branch Attention Generative Adversarial Network for Robust Sonar Image Generation
by Wanzeng Kong, Han Yang, Mingyang Jia and Zhe Chen
Appl. Sci. 2025, 15(13), 7212; https://doi.org/10.3390/app15137212 - 26 Jun 2025
Viewed by 315
Abstract
The acquisition of real-world sonar data necessitates substantial investments of manpower, material resources, and financial capital, rendering it challenging to obtain sufficient authentic samples for sonar-related research tasks. Consequently, sonar image simulation technology has become increasingly vital in the field of sonar data [...] Read more.
The acquisition of real-world sonar data necessitates substantial investments of manpower, material resources, and financial capital, rendering it challenging to obtain sufficient authentic samples for sonar-related research tasks. Consequently, sonar image simulation technology has become increasingly vital in the field of sonar data analysis. Traditional sonar simulation methods predominantly focus on low-level physical modeling, which often suffers from limited image controllability and diminished fidelity in multi-category and multi-background scenarios. To address these limitations, this paper proposes a Conditional Dual-Branch Attention Generative Adversarial Network (CDBA-GAN). The framework comprises three key innovations: The conditional information fusion module, dual-branch attention feature fusion mechanism, and cross-layer feature reuse. By integrating encoded conditional information with the original input data of the generative adversarial network, the fusion module enables precise control over the generation of sonar images under specific conditions. A hierarchical attention mechanism is implemented, sequentially performing channel-level and pixel-level attention operations. This establishes distinct weight matrices at both granularities, thereby enhancing the correlation between corresponding elements. The dual-branch attention features are fused via a skip-connection architecture, facilitating efficient feature reuse across network layers. The experimental results demonstrate that the proposed CDBA-GAN generates condition-specific sonar images with a significantly lower Fréchet inception distance (FID) compared to existing methods. Notably, the framework exhibits robust imaging performance under noisy interference and outperforms state-of-the-art models (e.g., DCGAN, WGAN, SAGAN) in fidelity across four categorical conditions, as quantified by FID metrics. Full article
Show Figures

Figure 1

22 pages, 4478 KiB  
Article
Welding Image Data Augmentation Method Based on LRGAN Model
by Ying Wang, Zhe Dai, Qiang Zhang and Zihao Han
Appl. Sci. 2025, 15(12), 6923; https://doi.org/10.3390/app15126923 - 19 Jun 2025
Viewed by 373
Abstract
This study focuses on the data bottleneck issue in the training of deep learning models during the intelligent welding control process and proposes an improved model called LRGAN (loss reconstruction generative adversarial networks). First, a five-layer spectral normalization neural network was designed as [...] Read more.
This study focuses on the data bottleneck issue in the training of deep learning models during the intelligent welding control process and proposes an improved model called LRGAN (loss reconstruction generative adversarial networks). First, a five-layer spectral normalization neural network was designed as the discriminator of the model. By incorporating the least squares loss function, the gradients of the model parameters were constrained within a reasonable range, which not only accelerated the convergence process but also effectively limited drastic changes in model parameters, alleviating the vanishing gradient problem. Next, a nine-layer residual structure was introduced in the generator to optimize the training of deep networks, preventing the mode collapse issue caused by the increase in the number of layers. The final experimental results show that the proposed LRGAN model outperforms other generative models in terms of evaluation metrics such as peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and Fréchet inception distance (FID). It provides an effective solution to the small sample problem in the intelligent welding control process. Full article
(This article belongs to the Section Robotics and Automation)
Show Figures

Figure 1

19 pages, 23096 KiB  
Article
GAN-Based Super-Resolution in Linear R-SAM Imaging for Enhanced Non-Destructive Semiconductor Measurement
by Thi Thu Ha Vu, Tan Hung Vo, Trong Nhan Nguyen, Jaeyeop Choi, Le Hai Tran, Vu Hoang Minh Doan, Van Bang Nguyen, Wonjo Lee, Sudip Mondal and Junghwan Oh
Appl. Sci. 2025, 15(12), 6780; https://doi.org/10.3390/app15126780 - 17 Jun 2025
Viewed by 515
Abstract
The precise identification and non-destructive measurement of structural features and defects in semiconductor wafers are essential for ensuring process integrity and sustaining high yield in advanced manufacturing environments. Unlike conventional measurement techniques, scanning acoustic microscopy (SAM) is an advanced method that provides detailed [...] Read more.
The precise identification and non-destructive measurement of structural features and defects in semiconductor wafers are essential for ensuring process integrity and sustaining high yield in advanced manufacturing environments. Unlike conventional measurement techniques, scanning acoustic microscopy (SAM) is an advanced method that provides detailed visualizations of both surface and internal wafer structures. However, in practical industrial applications, the scanning time and image quality of SAM significantly impact its overall performance and utility. Prolonged scanning durations can lead to production bottlenecks, while suboptimal image quality can compromise the accuracy of defect detection. To address these challenges, this study proposes LinearTGAN, an improved generative adversarial network (GAN)-based model specifically designed to improve the resolution of linear acoustic wafer images acquired by the breakthrough rotary scanning acoustic microscopy (R-SAM) system. Empirical evaluations demonstrate that the proposed model significantly outperforms conventional GAN-based approaches, achieving a Peak Signal-to-Noise Ratio (PSNR) of 29.479 dB, a Structural Similarity Index Measure (SSIM) of 0.874, a Learned Perceptual Image Patch Similarity (LPIPS) of 0.095, and a Fréchet Inception Distance (FID) of 0.445. To assess the measurement aspect of LinearTGAN, a lightweight defect segmentation module was integrated and tested on annotated wafer datasets. The super-resolved images produced by LinearTGAN significantly enhanced segmentation accuracy and improved the sensitivity of microcrack detection. Furthermore, the deployment of LinearTGAN within the R-SAM system yielded a 92% improvement in scanning performance for 12-inch wafers while simultaneously enhancing image fidelity. The integration of super-resolution techniques into R-SAM significantly advances the precision, robustness, and efficiency of non-destructive measurements, highlighting their potential to have a transformative impact in semiconductor metrology and quality assurance. Full article
Show Figures

Figure 1

27 pages, 9000 KiB  
Article
AI-Driven Biophilic Façade Design for Senior Multi-Family Housing Using LoRA and Stable Diffusion
by Ji-Yeon Kim and Sung-Jun Park
Buildings 2025, 15(9), 1546; https://doi.org/10.3390/buildings15091546 - 3 May 2025
Cited by 2 | Viewed by 936
Abstract
South Korea is rapidly transitioning into an aging society, resulting in a growing demand for senior multi-family housing. Nevertheless, current façade designs remain limited in diversity and fail to adequately address the visual needs and preferences of the elderly population. This study presents [...] Read more.
South Korea is rapidly transitioning into an aging society, resulting in a growing demand for senior multi-family housing. Nevertheless, current façade designs remain limited in diversity and fail to adequately address the visual needs and preferences of the elderly population. This study presents a biophilic façade design approach for senior housing, utilizing Stable Diffusion (SD) fine-tuned with low-rank adaptation (LoRA) to support the implementation of differentiated biophilic design (BD) strategies. Prompts were derived from an analysis of Korean and worldwide cases, reflecting the perceptual and cognitive characteristics of older adults. A dataset focusing on key BD attributes—specifically color and shapes/forms—was constructed and used to train the LoRA model. To enhance accuracy and contextual relevance in image generation, ControlNet was applied. The validity of the dataset was evaluated through expert assessments using Likert-scale analysis, while model reliability was examined using loss function trends and Frechet Inception Distance (FID) scores. Our findings indicate that the proposed approach enables more precise and scalable applications of biophilic design in senior housing façades. This approach highlights the potential of AI-assisted design workflows in promoting age-inclusive and biophilic urban environments. Full article
(This article belongs to the Section Building Energy, Physics, Environment, and Systems)
Show Figures

Figure 1

11 pages, 3520 KiB  
Article
Enhancing Atmospheric Turbulence Phase Screen Generation with an Improved Diffusion Model and U-Net Noise Generation Network
by Hangning Kou, Min Wan and Jingliang Gu
Photonics 2025, 12(4), 381; https://doi.org/10.3390/photonics12040381 - 15 Apr 2025
Viewed by 708
Abstract
Simulating atmospheric turbulence phase screens is essential for optical system research and turbulence compensation. Traditional methods, such as multi-harmonic power spectrum inversion and Zernike polynomial fitting, often suffer from sampling errors and limited diversity. To overcome these challenges, this paper proposes an improved [...] Read more.
Simulating atmospheric turbulence phase screens is essential for optical system research and turbulence compensation. Traditional methods, such as multi-harmonic power spectrum inversion and Zernike polynomial fitting, often suffer from sampling errors and limited diversity. To overcome these challenges, this paper proposes an improved denoising diffusion probabilistic model (DDPM) for generating high-fidelity atmospheric turbulence phase screens. The model effectively captures the statistical distribution of turbulence phase screens using small training datasets. A refined loss function incorporating the structure function enhances accuracy. Additionally, a self-attention module strengthens the model’s ability to learn phase screen features. The experimental results demonstrate that the proposed approach significantly reduces the Fréchet Inception Distance (FID) from 154.45 to 59.80, with the mean loss stabilizing around 0.1 after 50,000 iterations. The generated phase screens exhibit high precision and diversity, providing an efficient and adaptable solution for atmospheric turbulence simulation. Full article
(This article belongs to the Section Data-Science Based Techniques in Photonics)
Show Figures

Figure 1

24 pages, 7057 KiB  
Article
Construction and Enhancement of a Rural Road Instance Segmentation Dataset Based on an Improved StyleGAN2-ADA
by Zhixin Yao, Renna Xi, Taihong Zhang, Yunjie Zhao, Yongqiang Tian and Wenjing Hou
Sensors 2025, 25(8), 2477; https://doi.org/10.3390/s25082477 - 15 Apr 2025
Viewed by 443
Abstract
With the advancement of agricultural automation, the demand for road recognition and understanding in agricultural machinery autonomous driving systems has significantly increased. To address the scarcity of instance segmentation data for rural roads and rural unstructured scenes, particularly the lack of support for [...] Read more.
With the advancement of agricultural automation, the demand for road recognition and understanding in agricultural machinery autonomous driving systems has significantly increased. To address the scarcity of instance segmentation data for rural roads and rural unstructured scenes, particularly the lack of support for high-resolution and fine-grained classification, a 20-class instance segmentation dataset was constructed, comprising 10,062 independently annotated instances. An improved StyleGAN2-ADA data augmentation method was proposed to generate higher-quality image data. This method incorporates a decoupled mapping network (DMN) to reduce the coupling degree of latent codes in W-space and integrates the advantages of convolutional networks and transformers by designing a convolutional coupling transfer block (CCTB). The core cross-shaped window self-attention mechanism in the CCTB enhances the network’s ability to capture complex contextual information and spatial layouts. Ablation experiments comparing the improved and original StyleGAN2-ADA networks demonstrate significant improvements, with the inception score (IS) increasing from 42.38 to 77.31 and the Fréchet inception distance (FID) decreasing from 25.09 to 12.42, indicating a notable enhancement in data generation quality and authenticity. In order to verify the effect of data enhancement on the model performance, the algorithms Mask R-CNN, SOLOv2, YOLOv8n, and OneFormer were tested to compare the performance difference between the original dataset and the enhanced dataset, which further confirms the effectiveness of the improved module. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

21 pages, 6983 KiB  
Article
OP-Gen: A High-Quality Remote Sensing Image Generation Algorithm Guided by OSM Images and Textual Prompts
by Huolin Xiong, Zekun Li, Qunbo Lv, Baoyu Zhu, Yu Zhang, Chaoyang Yu and Zheng Tan
Remote Sens. 2025, 17(7), 1226; https://doi.org/10.3390/rs17071226 - 30 Mar 2025
Viewed by 825
Abstract
The application of diffusion models in the field of remote sensing image generation has significantly improved the performance of generation algorithms. However, existing methods still exhibit certain limitations, such as the inability to generate images with rich texture details and minimal geometric distortions [...] Read more.
The application of diffusion models in the field of remote sensing image generation has significantly improved the performance of generation algorithms. However, existing methods still exhibit certain limitations, such as the inability to generate images with rich texture details and minimal geometric distortions in a controllable manner. To address these shortcomings, this work introduces an innovative remote sensing image generation algorithm, OP-Gen, which is guided by textual descriptions and OpenStreetMap (OSM) images. OP-Gen incorporates two information extraction branches: ControlNet and OSM-prompt (OP). The ControlNet branch extracts structural and spatial information from OSM images and injects this information into the diffusion model, providing guidance for the overall structural framework of the generated images. In the OP branch, we design an OP-Controller module, which extracts detailed semantic information from textual prompts based on the structural information of the OSM image. This information is subsequently injected into the diffusion model, enriching the generated images with fine-grained details, aligning the generated details with the structural framework, and thus significantly enhancing the realism of the output. The proposed OP-Gen algorithm achieves state-of-the-art performance in both qualitative and quantitative evaluations. The qualitative results demonstrate that OP-Gen outperforms existing methods in terms of structural coherence and texture detail richness. Quantitatively, the algorithm achieves a Fréchet inception distance (FID) of 45.01, a structural similarity index measure (SSIM) of 0.1904, and a Contrastive Language-Image Pretraining (CLIP) score of 0.3071, all of which represent the best performance among the current algorithms of the same type. Full article
Show Figures

Figure 1

31 pages, 383 KiB  
Review
Neural Architecture Search for Generative Adversarial Networks: A Comprehensive Review and Critical Analysis
by Abrar Alotaibi and Moataz Ahmed
Appl. Sci. 2025, 15(7), 3623; https://doi.org/10.3390/app15073623 - 26 Mar 2025
Viewed by 1617
Abstract
Neural Architecture Search (NAS) has emerged as a pivotal technique in optimizing the design of Generative Adversarial Networks (GANs), automating the search for effective architectures while addressing the challenges inherent in manual design. This paper provides a comprehensive review of NAS methods applied [...] Read more.
Neural Architecture Search (NAS) has emerged as a pivotal technique in optimizing the design of Generative Adversarial Networks (GANs), automating the search for effective architectures while addressing the challenges inherent in manual design. This paper provides a comprehensive review of NAS methods applied to GANs, categorizing and comparing various approaches based on criteria such as search strategies, evaluation metrics, and performance outcomes. The review highlights the benefits of NAS in improving GAN performance, stability, and efficiency, while also identifying limitations and areas for future research. Key findings include the superiority of evolutionary algorithms and gradient-based methods in certain contexts, the importance of robust evaluation metrics beyond traditional scores like Inception Score (IS) and Fréchet Inception Distance (FID), and the need for diverse datasets in assessing GAN performance. By presenting a structured comparison of existing NAS-GAN techniques, this paper aims to guide researchers in developing more effective NAS methods and advancing the field of GANs. Full article
Show Figures

Figure 1

23 pages, 3871 KiB  
Article
Direct Distillation: A Novel Approach for Efficient Diffusion Model Inference
by Zilai Li and Rongkai Zhang
J. Imaging 2025, 11(2), 66; https://doi.org/10.3390/jimaging11020066 - 19 Feb 2025
Viewed by 1305
Abstract
Diffusion models are among the most common techniques used for image generation, having achieved state-of-the-art performance by implementing auto-regressive algorithms. However, multi-step inference processes are typically slow and require extensive computational resources. To address this issue, we propose the use of an information [...] Read more.
Diffusion models are among the most common techniques used for image generation, having achieved state-of-the-art performance by implementing auto-regressive algorithms. However, multi-step inference processes are typically slow and require extensive computational resources. To address this issue, we propose the use of an information bottleneck to reschedule inference using a new sampling strategy, which employs a lightweight distilled neural network to map intermediate stages to the final output. This approach reduces the number of iterations and FLOPS required for inference while ensuring the diversity of generated images. A series of validation experiments were conducted involving the COCO dataset as well as the LAION dataset and two proposed distillation models, requiring 57.5 million and 13.5 million parameters, respectively. Results showed that these models were able to bypass 40–50% of the inference steps originally required by a stable U-Net diffusion model, which included 859 million parameters. In the original sampling process, each inference step required 67,749 million multiply–accumulate operations (MACs), while our two distillate models only required 3954 million MACs and 3922 million MACs per inference step. In addition, our distillation algorithm produced a Fréchet inception distance (FID) of 16.75 in eight steps, which was remarkably lower than those of the progressive distillation, adversarial distillation, and DDIM solver algorithms, which produced FID values of 21.0, 30.0, 22.3, and 24.0, respectively. Notably, this process did not require parameters from the original diffusion model to establish a new distillation model prior to training. Information theory was used to further analyze primary bottlenecks in the FID results of existing distillation algorithms, demonstrating that both GANs and typical distillation failed to achieve generative diversity while implicitly studying incorrect posterior probability distributions. Meanwhile, we use information theory to analyze the latest distillation models including LCM-SDXL, SDXL-Turbo, SDXL-Lightning, DMD, and MSD, which reveals the basic reason for the diversity problem confronted by them, and compare those distillation models with our algorithm in the FID and CLIP Score. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

11 pages, 2064 KiB  
Article
Optical Coherence Tomography Image Enhancement and Layer Detection Using Cycle-GAN
by Ye Eun Kim, Eun Ji Lee, Jung Suk Yoon, Jiyoon Kwak and Hyunjoong Kim
Diagnostics 2025, 15(3), 277; https://doi.org/10.3390/diagnostics15030277 - 24 Jan 2025
Cited by 1 | Viewed by 995
Abstract
Background/Objectives: Variations in image clarity across different OCT devices, along with the inconsistent delineation of RNFL boundaries, pose a challenge to achieving consistent diagnoses for glaucoma. Recently, deep learning methods such as GANs for image transformation have been gaining attention. This paper introduces [...] Read more.
Background/Objectives: Variations in image clarity across different OCT devices, along with the inconsistent delineation of RNFL boundaries, pose a challenge to achieving consistent diagnoses for glaucoma. Recently, deep learning methods such as GANs for image transformation have been gaining attention. This paper introduces deep learning methods to transform low-clarity images from one OCT device into high-clarity images from another, concurrently estimating the retinal nerve fiber layer (RNFL) segmentation lines in the enhanced images. Methods: We applied two deep learning methods, pix2pix and cycle-GAN, and provided a comparison of their performance by evaluating the similarity between the generated and actual images, as well as comparing the generated RNFL boundary delineation with the actual boundaries. Results: The image conversion performance was compared based on two criteria: Fréchet Inception Distance (FID) and curve dissimilarity. In the comparison of FID values, the cycle-GAN method showed significantly lower values than the pix2pix method (p-value < 0.001). In terms of curve similarity, the cycle-GAN method also demonstrated higher similarity to the actual curves compared to both manually annotated curves and the pix2pix method (p-value < 0.001). Conclusions: We demonstrated that the cycle-GAN method produces more consistent and precise outcomes in the converted images compared to the pix2pix method. The resulting segmented lines showed a high degree of similarity to those manually annotated by clinical experts in high-clarity images, surpassing the boundary accuracy observed in the original low-clarity scans. Full article
(This article belongs to the Special Issue Latest Advances in Ophthalmic Imaging)
Show Figures

Figure 1

19 pages, 10633 KiB  
Article
RSVQ-Diffusion Model for Text-to-Remote-Sensing Image Generation
by Xin Gao, Yao Fu, Xiaonan Jiang, Fanlu Wu, Yu Zhang, Tianjiao Fu, Chao Li and Junyan Pei
Appl. Sci. 2025, 15(3), 1121; https://doi.org/10.3390/app15031121 - 23 Jan 2025
Cited by 1 | Viewed by 1858
Abstract
Despite significant challenges, the text-guided remote sensing image generation method shows great potential in many practical applications such as generative adversarial networks in remote sensing tasks; generated images still face challenges such as low realism, face challenges, and unclear details. Moreover, the inherent [...] Read more.
Despite significant challenges, the text-guided remote sensing image generation method shows great potential in many practical applications such as generative adversarial networks in remote sensing tasks; generated images still face challenges such as low realism, face challenges, and unclear details. Moreover, the inherent spatial complexity of remote sensing images and the limited scale of publicly available datasets make it particularly challenging to generate high-quality remote sensing images from text descriptions. To address these challenges, this paper proposes the RSVQ-Diffusion model for remote sensing image generation, achieving high-quality text-to-remote-sensing image generation applicable for target detection, simulation, and other fields. Specifically, this paper designs a spatial position encoding mechanism to integrate the spatial information of remote sensing images during model training. Additionally, the Transformer module is improved by incorporating a short-sequence local perception mechanism into the diffusion image decoder, addressing issues of unclear details and regional distortions in generated remote sensing images. Compared with the VQ-Diffusion model, our proposed model achieves significant improvements in the Fréchet Inception Distance (FID), the Inception Score (IS), and the text–image alignment (Contrastive Language-Image Pre-training, CLIP) scores. The FID score successfully decreased from 96.68 to 90.36; the CLIP score increased from 26.92 to 27.22, and the IS increased from 7.11 to 7.24. Full article
Show Figures

Figure 1

Back to TopTop