MDPI - Publisher of Open Access Journals

22 pages, 4799 KB

Open AccessArticle

Forest Disturbance Classification Under Imbalanced and Small-Sample Conditions Based on Collaborative Semi-Supervised Learning and Sample Generation

by Yudan Liu, Yuxin Zhao, Yan Yan, Yan Shao, Xinqi Qu and Ling Wu

Remote Sens. 2026, 18(10), 1579; https://doi.org/10.3390/rs18101579 - 14 May 2026

Viewed by 224

Abstract

Accurate and timely information on forest disturbance drivers is important for sustainable forest management, global carbon cycle accounting, and climate change response. However, forest disturbance classification is difficult due to two major challenges: limited labeled samples and highly imbalanced disturbance class distribution. In [...] Read more.

Accurate and timely information on forest disturbance drivers is important for sustainable forest management, global carbon cycle accounting, and climate change response. However, forest disturbance classification is difficult due to two major challenges: limited labeled samples and highly imbalanced disturbance class distribution. In this article, a new framework for multi-type forest disturbance classification based on collaborative semi-supervised learning and sample generation was proposed. First, forest disturbance is detected using long-term remote sensing time series data and disturbance detection algorithms. Spatiotemporal, spectral and terrain features of different disturbance types are extracted. On this basis, to address the problem of imbalanced and small-sample conditions, a collaborative classification strategy is developed. Based on a small number of labeled samples, Support Vector Machine (SVM) and Random Forest (RF) are used to build dual base classifiers. A confident learning (CL) framework is applied to select high-confidence pseudo-labeled samples from unlabeled data. Then, a latent diffusion model (LDM) is introduced to generate high-fidelity pseudo-samples. This increases the sample size and balances the class distribution. Based on the augmented dataset, the dual classifiers are iteratively optimized using a co-training strategy, which improves model generalization under complex conditions. The results show that the proposed framework could generate high-quality pseudo-samples and effectively reduce class imbalance. The overall accuracy (OA) of the proposed framework reaches 93.2%, which is 5.7% and 4.4% higher than single classifier baselines, respectively. After introducing the LDM-based balancing mechanism, performance is further improved by 1.8% compared with the pure semi-supervised framework. This study provides an efficient and reliable solution for large-scale forest ecosystem monitoring. Full article

(This article belongs to the Topic Quantifying Forest Structure, Biomass, and Dynamics Using Inventory and Remote Sensing Data)

► Show Figures

Figure 1

27 pages, 6893 KB

Open AccessArticle

LoRA-Based Deep Learning for High-Fidelity Satellite Image Super-Resolution in Big Data Remote Sensing

by Noha Rashad Mahmoud, Hussam Elbehiery, Basheer Abdel Fattah Youssef and Hanaa Bayomi Ali Mobarz

Computers 2026, 15(5), 313; https://doi.org/10.3390/computers15050313 - 14 May 2026

Viewed by 273

Abstract

High-resolution satellite imagery is pivotal for accurate analysis in remote sensing applications, including land-use monitoring, urban planning, and environmental assessment. However, obtaining such data is often costly and limited. Consequently, super-resolution techniques, such as deep learning models and fine-tuning strategies like LoRA, offer [...] Read more.

High-resolution satellite imagery is pivotal for accurate analysis in remote sensing applications, including land-use monitoring, urban planning, and environmental assessment. However, obtaining such data is often costly and limited. Consequently, super-resolution techniques, such as deep learning models and fine-tuning strategies like LoRA, offer a promising alternative to the critical research challenge, especially given the diversity and large scale of satellite datasets. While deep learning-based super-resolution models have been very promising recently, their effectiveness, efficiency, and scalability across heterogeneous satellite scenes are not well studied. This work studies the performance of representative deep learning Super-Resolution frameworks, including the Enhanced Super-Resolution Generative Adversarial Network. (ESRGAN), Swin Transformer for Image Restoration (SwinIR), and latent diffusion models (LDM), under unified experimental conditions using the WorldStrat dataset. The main goal is to establish whether adaptation strategies for parameter efficiency can boost reconstruction quality while reducing computational and training costs. Toward this goal, we investigate hybrid sequential pipelines, ensemble averaging, and Low-Rank Adaptation (LoRA)–based fine-tuning. The experiments indicate that these pipelines, which use multi-model methods, achieve only marginal performance gains while incurring substantial increases in computational complexity. LoRA-Based Fine-Tuning, by contrast, has demonstrated superiority in enhancing reconstruction accuracy and quality across all model families, despite using only a small percentage of trainable parameters. LoRA-based models demonstrate superiority over multi-model methods in both efficiency and performance. The presented results confirm that LoRA is an effective and accessible technique for high-fidelity satellite-based super-resolution image synthesis. The manuscript identifies LoRA as one of the enabling technologies advancing the state of the art in Deep Learning-based Super Resolution for large-scale satellite-based image synthesis. Full article

(This article belongs to the Special Issue Machine Learning: Techniques, Industry Applications, Code Sharing, and Future Trends)

► Show Figures

Figure 1

23 pages, 3743 KB

Open AccessArticle

CT-to-PET Synthesis in the Head–Neck and Thoracic Region via Conditional 3D Latent Diffusion Modeling

by Mohammed A. Mahdi, Mohammed Al-Shalabi, Reda Elbarougy, Ehab T. Alnfrawy, Muhammad Usman Hadi and Rao Faizan Ali

Bioengineering 2026, 13(5), 534; https://doi.org/10.3390/bioengineering13050534 - 3 May 2026

Viewed by 1782

Abstract

Background: Positron emission tomography (PET) provides physiologic information central to oncologic staging and treatment assessment, but its availability is limited by cost, radiation exposure, and scanner access. Synthesizing PET from computed tomography (CT) is attractive but challenging, as tracer uptake is only [...] Read more.

Background: Positron emission tomography (PET) provides physiologic information central to oncologic staging and treatment assessment, but its availability is limited by cost, radiation exposure, and scanner access. Synthesizing PET from computed tomography (CT) is attractive but challenging, as tracer uptake is only partially constrained by anatomy, making the mapping inherently one-to-many. Methods: We propose a conditional 3D latent diffusion framework (3D-LDM) for CT-to-PET synthesis in the head–neck and thoracic region. The pipeline localizes anatomy by segmenting lungs in CT and restricting the volume to reduce irrelevant variability. PET volumes are encoded into a compact latent space using a KL-regularized 3D autoencoder, and a conditional 3D diffusion U-Net learns to generate PET latents conditioned on CT via a denoising diffusion process. The model was trained and evaluated on 900 paired PET/CT studies. Performance was assessed in SUV space using MAE, PSNR, and SSIM, and compared against transformer-, CNN-, and GAN-based baselines. Results: On the held-out test cohort, 3D-LDM achieved the best overall quantitative fidelity (MAE = 303.05 ± 22.16 SUV units, PSNR = 32.64 ± 1.79, SSIM = 0.86 ± 0.03), outperforming all baselines with statistically significant differences (p < 0.001). At the lesion level, the model achieved a precision of 0.76 (95% CI: 0.71, 0.81) and recall of 0.76 (95% CI: 0.72, 0.80), detecting an average of 3.19 lesions per scan with a false-positive rate of 0.72/scan. Lesion-wise NMSE was 11.37%, significantly outperforming GAN and transformer baselines. Conclusions: 3D-LDM enables efficient, high-fidelity PET synthesis in the head–neck and thoracic regions, substantially improving lesion-level accuracy over state-of-the-art baselines. While it is not a replacement for diagnostic PET, these results support the model’s potential as a clinical decision support tool. Full article

(This article belongs to the Special Issue Machine Learning Applications in Cancer Diagnosis and Prognosis)

► Show Figures

Figure 1

12 pages, 863 KB

Open AccessArticle

High-Fidelity Synthesis of Temporomandibular Joint Cone-Beam Computed Tomography Images via Latent Diffusion Models

by Qinlanhui Zhang, Yunhao Zheng and Jun Wang

J. Clin. Med. 2026, 15(9), 3344; https://doi.org/10.3390/jcm15093344 - 28 Apr 2026

Viewed by 277

Abstract

Background: The development of robust artificial intelligence (AI) models for diagnosing Temporomandibular Disorders (TMDs) is severely constrained by data scarcity and patient privacy regulations. Cone-beam computed tomography (CBCT), the gold standard for assessing osseous changes in the temporomandibular joint (TMJ), inherently contains [...] Read more.

Background: The development of robust artificial intelligence (AI) models for diagnosing Temporomandibular Disorders (TMDs) is severely constrained by data scarcity and patient privacy regulations. Cone-beam computed tomography (CBCT), the gold standard for assessing osseous changes in the temporomandibular joint (TMJ), inherently contains sensitive biometric facial features, making de-identification difficult without losing critical anatomical information. This study aims to develop and evaluate TMJCTGenerator, a specialized latent diffusion model (LDM) framework designed to synthesize high-fidelity, diverse, and anonymous TMJ CBCT images. We hypothesize that this LDM approach can achieve superior anatomical fidelity and diversity compared to traditional generative adversarial network (GAN)- and variational autoencoder (VAE)-based methods, specifically in capturing fine osseous details within sagittal and coronal views of the mandibular condyle. Methods: A training dataset comprising 348 anonymized CBCT volumes was obtained in this retrospective comparative study to extract high-resolution sagittal and coronal regions of interest of the mandibular condyle. An independent test set of 39 anonymized CBCT volumes was further included. We developed a class-conditional LDM that integrates a pre-trained VAE for perceptual compression with a conditional U-Net for iterative denoising in the latent space. Performance was evaluated via qualitative anatomical fidelity assessment, Fréchet Inception Distance (FID), and a blinded Visual Turing test conducted by experienced clinicians to determine the distinguishability of synthetic images from real data. Results: Qualitative analysis revealed that TMJCTGenerator produced images with superior sharpness and anatomical consistency compared to baseline models, successfully reconstructing fine bone structures essential for diagnosing degenerative joint disease. TMJCTGenerator achieved lower FID scores than both VAE and GAN baselines. In the visual Turing test, clinicians were unable to reliably distinguish the generated images from real scans, and non-inferiority analysis confirmed that the synthetic data were statistically non-inferior to real data. Furthermore, TMJCTGenerator demonstrated the capability to generate diverse pathological conditions, ranging from normal anatomy to severe osteoarthritic changes. Conclusions: The proposed LDM framework effectively addresses the data scarcity and privacy bottlenecks in TMJ AI research by generating realistic, fully anonymous medical imaging data. TMJCTGenerator outperforms traditional generative methods in both visual fidelity and diversity, offering a viable solution for training downstream diagnostic algorithms. The source code and pre-trained models of TMJCTGenerator have been made open-source. Full article

(This article belongs to the Section Dentistry, Oral Surgery and Oral Medicine)

► Show Figures

Figure 1

22 pages, 12911 KB

Open AccessArticle

Distribution-Preserving Latent Image Steganography via Conditional Optimal Transport and Theoretical Target Synthesis

by Kamil Woźniak, Marek R. Ogiela and Lidia Ogiela

Electronics 2026, 15(6), 1321; https://doi.org/10.3390/electronics15061321 - 22 Mar 2026

Viewed by 484

Abstract

We propose Distribution-Preserving Latent Steganography via Conditional Optimal Transport (DPL-COT), a coverless image steganography framework for latent diffusion models. Unlike classical cover-modifying schemes, DPL-COT embeds a bitstream directly into the initialization noise latent

z_{T} \sim N (0, I)

without [...] Read more.

We propose Distribution-Preserving Latent Steganography via Conditional Optimal Transport (DPL-COT), a coverless image steganography framework for latent diffusion models. Unlike classical cover-modifying schemes, DPL-COT embeds a bitstream directly into the initialization noise latent

z_{T} \sim N (0, I)

without model retraining. Our primary objective is high recoverability and a low bit error rate (BER) under deterministic inversion, which is inherently imperfect due to numerical discretization and VAE nonlinearity. To maximize decoding stability, we restrict embedding to the natural tails of the latent prior by selecting the largest-magnitude coordinates, thereby increasing the sign decision margin against inversion drift. To preserve distributional stealth, per-bit target values are analytically derived from truncated Gaussians matching the marginal distribution of the selected coordinates. Conditional 1D optimal transport is applied independently for each bit class, mapping every coordinate to its target value while preserving rank order. We generate 5000 stego images using a pretrained diffusion model and demonstrate a favorable capacity–reliability trade-off (e.g., 4916 bits/image with 0.473% mean BER) and strong robustness to JPEG compression (sub-1% mean BER at

Q = 60

). Compared with LDStega, a recent LDM-based scheme reporting 99.28% clean-channel accuracy, DPL-COT achieves 99.53% at a comparable operating point and sustains above-99% accuracy under all tested JPEG quality factors. Latent-space tests further confirm negligible cover–stego distribution shift (mean

{KS}_{2} < 0.003

, mean

W_{1} < 0.003

), a property not formally addressed by prior methods. Full article

(This article belongs to the Special Issue Future Trends and Challenges of Ubiquitous Computing and Smart Systems, 2nd Edition)

► Show Figures

Figure 1

21 pages, 10714 KB

Open AccessArticle

LoRA-Fine-Tuned Latent Diffusion for High-Fidelity Digitization of Classic Mongolian Patterns

by Jiatong Liu and Yue Huang

Appl. Sci. 2026, 16(1), 11; https://doi.org/10.3390/app16010011 - 19 Dec 2025

Cited by 2 | Viewed by 2633

Abstract

Mongolian patterns represent an important component of Mongolian cultural heritage, characterized by their dual structure of geometric symmetry and dynamic ornamental motifs. However, existing artificial intelligence-based generative methods struggle to preserve both low-frequency structural regularity and high-frequency decorative detail under limited data conditions. [...] Read more.

Mongolian patterns represent an important component of Mongolian cultural heritage, characterized by their dual structure of geometric symmetry and dynamic ornamental motifs. However, existing artificial intelligence-based generative methods struggle to preserve both low-frequency structural regularity and high-frequency decorative detail under limited data conditions. This study proposes a parameter-efficient digitization framework based on latent diffusion models (LDMs) fine-tuned with low-rank adaptation (LoRA) to achieve high-fidelity reconstruction of classic Mongolian patterns. A curated few-shot dataset and a low-rank constraint enable effective learning from only eight representative samples, while a dual-prompt mechanism and MSE-driven optimization improve geometric stability and semantic consistency. Integrated within a transparent ComfyUI workflow, the method supports controllable generation and reproducible experimentation. Experimental evaluations demonstrate that the proposed LoRA-LDM model achieves superior structural accuracy, reduced visual distortion, and enhanced motif preservation compared with baseline models. The results confirm the method’s applicability for digital preservation, reconstruction, and derivative design of structured cultural heritage motifs. Full article

► Show Figures

Figure 1

18 pages, 2235 KB

Open AccessArticle

3D Latent Diffusion Model for MR-Only Radiotherapy: Accurate and Consistent Synthetic CT Generation

by Mohammed A. Mahdi, Mohammed Al-Shalabi, Ehab T. Alnfrawy, Reda Elbarougy, Muhammad Usman Hadi and Rao Faizan Ali

Diagnostics 2025, 15(23), 3010; https://doi.org/10.3390/diagnostics15233010 - 26 Nov 2025

Cited by 2 | Viewed by 1474

Abstract

Background: The clinical imperative to reduce patient ionizing radiation exposure during diagnosis and treatment planning necessitates robust, high-fidelity synthetic imaging solutions. Current cross-modal synthesis techniques, primarily based on GANs and deterministic CNNs, exhibit instability and critical errors in modeling high-contrast tissues, thereby [...] Read more.

Background: The clinical imperative to reduce patient ionizing radiation exposure during diagnosis and treatment planning necessitates robust, high-fidelity synthetic imaging solutions. Current cross-modal synthesis techniques, primarily based on GANs and deterministic CNNs, exhibit instability and critical errors in modeling high-contrast tissues, thereby hindering their reliability for safety-critical applications such as radiotherapy. Objectives: Our primary objective was to develop a stable, high accuracy framework for 3D Magnetic Resonance Imaging (MRI) to Computed Tomography (CT) synthesis capable of generating clinically equivalent synthetic CTs (sCTs) across multiple anatomical sites. Methods: We introduce a novel 3D Latent Diffusion Model (3DLDM) that operates in a compressed latent space, mitigating the computational burden of 3D diffusion while leveraging the stability of the denoising objective. Results: Across the Head & Neck, Thorax, and Abdomen, the 3DLDM achieved a Mean Absolute Error (MAE) of 56.44 Hounsfield Units (HU). This result demonstrates a significant 3.63% reduction in overall error compared to the strongest adversarial baseline, CycleGAN (MAE = 60.07 HU, p < 0.05), a 10.76% reduction compared to NNUNet (MAE = 67.20 HU, p < 0.01), and a 20.79% reduction compared to the transformer-based SwinUNeTr (MAE = 77.23 HU, p < 0.0001). The model also achieved the highest structural similarity (SSIM = 0.885 ± 0.031), significantly exceeding SwinUNeTr (p < 0.0001), NNUNet (p < 0.01), and Pix2Pix (p < 0.0001). Likewise, the 3D-LDM achieved the highest peak signal-to-noise ratio (PSNR = 29.73 ± 1.60 dB), with statistically significant gains over CycleGAN (p < 0.01), NNUNet (p < 0.001), and SwinUNeTr (p < 0.0001). Conclusions: This work validates a scalable, accurate approach for volumetric synthesis, positioning the 3DLDM to enable MR-only radiotherapy planning and accelerate radiation-free multi-modal imaging in the clinic. Full article

(This article belongs to the Special Issue Medical Image Analysis and Machine Learning)

► Show Figures

Figure 1

19 pages, 4399 KB

Open AccessArticle

Privacy-Preserving Synthetic Mammograms: A Generative Model Approach to Privacy-Preserving Breast Imaging Datasets

by Damir Shodiev, Egor Ushakov, Arsenii Litvinov and Yury Markin

Informatics 2025, 12(4), 112; https://doi.org/10.3390/informatics12040112 - 18 Oct 2025

Viewed by 2118

Abstract

Background: Significant progress has been made in the field of machine learning, enabling the development of methods for automatic interpretation of medical images that provide high-quality diagnostics. However, most of these methods require access to confidential data, making them difficult to apply under [...] Read more.

Background: Significant progress has been made in the field of machine learning, enabling the development of methods for automatic interpretation of medical images that provide high-quality diagnostics. However, most of these methods require access to confidential data, making them difficult to apply under strict privacy requirements. Existing privacy-preserving approaches, such as federated learning and dataset distillation, have limitations related to data access, visual interpretability, etc. Methods: This study explores the use of generative models to create synthetic medical data that preserves the statistical properties of the original data while ensuring privacy. The research is carried out on the VinDr-Mammo dataset of digital mammography images. A conditional generative method using Latent Diffusion Models (LDMs) is proposed with conditioning on diagnostic labels and lesion information. Diagnostic utility and privacy robustness are assessed via cancer classification tasks and re-identification tasks using Siamese neural networks and membership inference. Results: The generated synthetic data achieved a Fréchet Inception Distance (FID) of 5.8, preserving diagnostic features. A model trained solely on synthetic data achieved comparable performance to one trained on real data (ROC-AUC: 0.77 vs. 0.82). Visual assessments showed that synthetic images are indistinguishable from real ones. Privacy evaluations demonstrated a low re-identification risk (e.g., mAP@R = 0.0051 on the test set), confirming the effectiveness of the privacy-preserving approach. Conclusions: The study demonstrates that privacy-preserving generative models can produce synthetic medical images with sufficient quality for diagnostic task while significantly reducing the risk of patient re-identification. This approach enables secure data sharing and model training in privacy-sensitive domains such as medical imaging. Full article

(This article belongs to the Special Issue Health Data Management in the Age of AI)

► Show Figures

Figure 1

25 pages, 5513 KB

Open AccessArticle

Ptycho-LDM: A Hybrid Framework for Efficient Phase Retrieval of EUV Photomasks Using Conditional Latent Diffusion Models

by Suman Saha, Paolo Ansuinelli, Luis Barba, Iacopo Mochi and Benjamín Béjar Haro

Photonics 2025, 12(9), 900; https://doi.org/10.3390/photonics12090900 - 8 Sep 2025

Viewed by 1722

Abstract

Extreme ultraviolet (EUV) photomask inspection is a critical step in semiconductor manufacturing, requiring high-resolution, high-throughput solutions to detect nanometer-scale defects. Traditional actinic imaging systems relying on complex optics have a high cost of ownership and require frequent upgrades. An alternative is lensless imaging [...] Read more.

Extreme ultraviolet (EUV) photomask inspection is a critical step in semiconductor manufacturing, requiring high-resolution, high-throughput solutions to detect nanometer-scale defects. Traditional actinic imaging systems relying on complex optics have a high cost of ownership and require frequent upgrades. An alternative is lensless imaging techniques based on ptychography, which offer high-fidelity reconstruction but suffer from slow throughput and high data demands. In particular, the ptychographic standard solver—the iterative Difference Map (DifMap) algorithm—requires many measurements and iterations to converge. We propose Ptycho-LDM, a hybrid framework integrating DifMap with a conditional Latent Diffusion Model for rapid and accurate phase retrieval. Ptycho-LDM alleviates high data acquisition demand by leveraging data-driven priors while offering improved computational efficiency. Our method performs coarse object retrieval using a resource-constrained reconstruction from DifMap and refines the result using a learned prior over photomask patterns. This prior enables high-fidelity reconstructions even in measurement-limited regimes where DifMap alone fails to converge. Experiments on actinic patterned mask inspection (APMI) show that Ptycho-LDM recovers fine structure and defect details with far fewer probe positions, surpassing the DifMap in accuracy and speed. Furthermore, evaluations on both noisy synthetic data and real APMI measurements confirm the robustness and effectiveness of Ptycho-LDM across practical scenarios. By combining generative modeling with physics-based constraints, Ptycho-LDM offers a promising scalable, high-throughput solution for next-generation photomask inspection. Full article

(This article belongs to the Special Issue Computational Imaging for Semiconductor Devices Metrology Applications)

► Show Figures

Figure 1

26 pages, 423 KB

Open AccessArticle

Enhancing Privacy-Preserving Network Trace Synthesis Through Latent Diffusion Models

by Jin-Xi Yu, Yi-Han Xu, Min Hua, Gang Yu and Wen Zhou

Information 2025, 16(8), 686; https://doi.org/10.3390/info16080686 - 12 Aug 2025

Cited by 2 | Viewed by 2047

Abstract

Network trace is a comprehensive record of data packets traversing a computer network, serving as a critical resource for analyzing network behavior. However, in practice, the limited availability of high-quality network traces, coupled with the presence of sensitive information such as IP addresses [...] Read more.

Network trace is a comprehensive record of data packets traversing a computer network, serving as a critical resource for analyzing network behavior. However, in practice, the limited availability of high-quality network traces, coupled with the presence of sensitive information such as IP addresses and MAC addresses, poses significant challenges to advancing network trace analysis. To address these issues, this paper focuses on network trace synthesis in two practical scenarios: (1) data expansion, where users create synthetic traces internally to diversify and enhance existing network trace utility; (2) data release, where synthesized network traces are shared externally. Inspired by the powerful generative capabilities of latent diffusion models (LDMs), this paper introduces NetSynDM, which leverages LDM to address the challenges of network trace synthesis in data expansion scenarios. To address the challenges in the data release scenario, we integrate differential privacy (DP) mechanisms into NetSynDM, introducing DPNetSynDM, which leverages DP Stochastic Gradient Descent (DP-SGD) to update NetSynDM, incorporating privacy-preserving noise throughout the training process. Experiments on five widely used network trace datasets show that our methods outperform prior works. NetSynDM achieves an average 166.1% better performance in fidelity compared to baselines. DPNetSynDM strikes an improved balance between privacy and fidelity, surpassing previous state-of-the-art network trace synthesis method fidelity scores of 18.4% on UGR16 while reducing privacy risk scores by approximately 9.79%. Full article

► Show Figures

Figure 1

19 pages, 4410 KB

Open AccessReview

Latent Diffusion Models for Image Watermarking: A Review of Recent Trends and Future Directions

by Hongjun Hur, Minjae Kang, Sanghyeok Seo and Jong-Uk Hou

Electronics 2025, 14(1), 25; https://doi.org/10.3390/electronics14010025 - 25 Dec 2024

Cited by 2 | Viewed by 10768

Abstract

Recent advancements in deep learning-based generative models have simplified image generation, increasing the need for improved source tracing and copyright protection, especially with the efficient, high-quality output of latent diffusion models (LDMs) raising concerns about unauthorized use. This paper provides a comprehensive review [...] Read more.

Recent advancements in deep learning-based generative models have simplified image generation, increasing the need for improved source tracing and copyright protection, especially with the efficient, high-quality output of latent diffusion models (LDMs) raising concerns about unauthorized use. This paper provides a comprehensive review of watermarking techniques applied to latent diffusion models, focusing on recent trends and the potential utility of these approaches. Watermarking using latent diffusion models offers the potential to overcome these limitations by embedding watermarks in the latent space during the image generation process. This represents a new paradigm of watermarking that leverages a degree of freedom unavailable in traditional watermarking techniques and underscores the need to explore the potential advancements in watermark technology. LDM-based watermarking allows for the natural internalization of watermarks within the content generation process, enabling robust watermarking without compromising image quality. We categorize the methods based on embedding strategies and analyze their effectiveness in achieving key functionalities—source tracing, copyright protection, and AI-generated content identification. The review highlights the strengths and limitations of current techniques and discusses future directions for enhancing the robustness and applicability of watermarking in the evolving landscape of generative AI. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

19 pages, 3870 KB

Open AccessArticle

Enhancing Amyloid PET Quantification: MRI-Guided Super-Resolution Using Latent Diffusion Models

by Jay Shah, Yiming Che, Javad Sohankar, Ji Luo, Baoxin Li, Yi Su, Teresa Wu and for the Alzheimer’s Disease Neuroimaging Initiative

Life 2024, 14(12), 1580; https://doi.org/10.3390/life14121580 - 1 Dec 2024

Cited by 9 | Viewed by 4102

Abstract

Amyloid PET imaging plays a crucial role in the diagnosis and research of Alzheimer’s disease (AD), allowing non-invasive detection of amyloid-β plaques in the brain. However, the low spatial resolution of PET scans limits the accurate quantification of amyloid deposition due to partial [...] Read more.

Amyloid PET imaging plays a crucial role in the diagnosis and research of Alzheimer’s disease (AD), allowing non-invasive detection of amyloid-β plaques in the brain. However, the low spatial resolution of PET scans limits the accurate quantification of amyloid deposition due to partial volume effects (PVE). In this study, we propose a novel approach to addressing PVE using a latent diffusion model for resolution recovery (LDM-RR) of PET imaging. We leverage a synthetic data generation pipeline to create high-resolution PET digital phantoms for model training. The proposed LDM-RR model incorporates a weighted combination of L₁, L₂, and MS-SSIM losses at both noise and image scales to enhance MRI-guided reconstruction. We evaluated the model’s performance in improving statistical power for detecting longitudinal changes and enhancing agreement between amyloid PET measurements from different tracers. The results demonstrate that the LDM-RR approach significantly improves PET quantification accuracy, reduces inter-tracer variability, and enhances the detection of subtle changes in amyloid deposition over time. We show that deep learning has the potential to improve PET quantification in AD, effectively contributing to the early detection and monitoring of disease progression. Full article

(This article belongs to the Special Issue Alzheimer’s Disease: Recent Developments in Pathogenesis, Diagnosis, and Therapy)

► Show Figures

Figure 1

14 pages, 9421 KB

Open AccessArticle

Cross-Attention and Seamless Replacement of Latent Prompts for High-Definition Image-Driven Video Editing

by Liangbing Zhao, Zicheng Zhang, Xuecheng Nie, Luoqi Liu and Si Liu

Electronics 2024, 13(1), 7; https://doi.org/10.3390/electronics13010007 - 19 Dec 2023

Cited by 1 | Viewed by 4703

Abstract

Recently, text-driven video editing has received increasing attention due to the surprising success of the text-to-image model in improving video quality. However, video editing based on the text prompt is facing huge challenges in achieving precise and controllable editing. Herein, we propose Latent [...] Read more.

Recently, text-driven video editing has received increasing attention due to the surprising success of the text-to-image model in improving video quality. However, video editing based on the text prompt is facing huge challenges in achieving precise and controllable editing. Herein, we propose Latent prompt Image-driven Video Editing (LIVE) with a precise and controllable video editing function. The important innovation of LIVE is to utilize the latent codes from reference images as latent prompts to rapidly enrich visual details. The novel latent prompt mechanism endows two powerful capabilities for LIVE: one is a comprehensively interactive ability between video frame and latent prompt in the spatial and temporal dimensions, achieved by revisiting and enhancing cross-attention, and the other is the efficient expression ability of training continuous input videos and images within the diffusion space by fine-tuning various components such as latent prompts, textual embeddings, and LDM parameters. Therefore, LIVE can efficiently generate various edited videos with visual consistency by seamlessly replacing the objects in each frame with user-specified targets. The high-definition experimental results from real-world videos not only confirmed the effectiveness of LIVE but also demonstrated important potential application prospects of LIVE in image-driven video editing. Full article

► Show Figures

Figure 1

20 pages, 2974 KB

Open AccessArticle

Multi-Layer Preprocessing and U-Net with Residual Attention Block for Retinal Blood Vessel Segmentation

by Ahmed Alsayat, Mahmoud Elmezain, Saad Alanazi, Meshrif Alruily, Ayman Mohamed Mostafa and Wael Said

Diagnostics 2023, 13(21), 3364; https://doi.org/10.3390/diagnostics13213364 - 1 Nov 2023

Cited by 11 | Viewed by 4138

Abstract

Retinal blood vessel segmentation is a valuable tool for clinicians to diagnose conditions such as atherosclerosis, glaucoma, and age-related macular degeneration. This paper presents a new framework for segmenting blood vessels in retinal images. The framework has two stages: a multi-layer preprocessing stage [...] Read more.

Retinal blood vessel segmentation is a valuable tool for clinicians to diagnose conditions such as atherosclerosis, glaucoma, and age-related macular degeneration. This paper presents a new framework for segmenting blood vessels in retinal images. The framework has two stages: a multi-layer preprocessing stage and a subsequent segmentation stage employing a U-Net with a multi-residual attention block. The multi-layer preprocessing stage has three steps. The first step is noise reduction, employing a U-shaped convolutional neural network with matrix factorization (CNN with MF) and detailed U-shaped U-Net (D_U-Net) to minimize image noise, culminating in the selection of the most suitable image based on the PSNR and SSIM values. The second step is dynamic data imputation, utilizing multiple models for the purpose of filling in missing data. The third step is data augmentation through the utilization of a latent diffusion model (LDM) to expand the training dataset size. The second stage of the framework is segmentation, where the U-Nets with a multi-residual attention block are used to segment the retinal images after they have been preprocessed and noise has been removed. The experiments show that the framework is effective at segmenting retinal blood vessels. It achieved Dice scores of 95.32, accuracy of 93.56, precision of 95.68, and recall of 95.45. It also achieved efficient results in removing noise using CNN with matrix factorization (MF) and D-U-NET according to values of PSNR and SSIM for (0.1, 0.25, 0.5, and 0.75) levels of noise. The LDM achieved an inception score of 13.6 and an FID of 46.2 in the augmentation step. Full article

(This article belongs to the Special Issue Medical Data Processing and Analysis—2nd Edition)

► Show Figures

Figure 1

Search Results (14)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (14)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI