Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (10)

Search Parameters:
Keywords = hierarchical VAE

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 4892 KB  
Article
Diffusion Model-Based Augmentation Using Asymmetric Attention Mechanisms for Cardiac MRI Images
by Mertcan Özdemir and Osman Eroğul
Diagnostics 2025, 15(16), 1985; https://doi.org/10.3390/diagnostics15161985 - 8 Aug 2025
Viewed by 697
Abstract
Background: The limited availability of cardiac MRI data significantly constrains deep learning applications in cardiovascular imaging, necessitating innovative approaches to address data scarcity while preserving critical cardiac anatomical features. Methods: We developed a specialized denoising diffusion probabilistic model incorporating an attention-enhanced UNet architecture [...] Read more.
Background: The limited availability of cardiac MRI data significantly constrains deep learning applications in cardiovascular imaging, necessitating innovative approaches to address data scarcity while preserving critical cardiac anatomical features. Methods: We developed a specialized denoising diffusion probabilistic model incorporating an attention-enhanced UNet architecture with strategically placed attention blocks across five hierarchical levels. The model was trained and evaluated on the OCMR dataset and compared against state-of-the-art generative approaches including StyleGAN2-ADA, WGAN-GP, and VAE baselines. Results: Our approach achieved superior image quality with a Fréchet Inception Distance of 77.78, significantly outperforming StyleGAN2-ADA (117.70), WGAN-GP (227.98), and VAE (325.26). Structural similarity metrics demonstrated excellent performance (SSIM: 0.720 ± 0.143; MS-SSIM: 0.925 ± 0.069). Clinical validation by cardiac radiologists yielded discrimination accuracy of only 60.0%, indicating near-realistic image quality that is challenging for experts to distinguish from real images. Comprehensive anatomical analysis revealed that 13 of 20 cardiac metrics showed no significant differences between real and synthetic images, with particularly strong preservation of left ventricular features. Discussion: The generated synthetic images demonstrate high anatomical fidelity with expert-level quality, as evidenced by the difficulty radiologists experienced in distinguishing synthetic from real images. The strong preservation of cardiac anatomical features, particularly left ventricular characteristics, indicates the model’s potential for medical image analysis applications. Conclusions: This work establishes diffusion models as a robust solution for cardiac MRI data augmentation, successfully generating anatomically accurate synthetic images that enhance downstream clinical applications while maintaining diagnostic fidelity. Full article
(This article belongs to the Topic Machine Learning and Deep Learning in Medical Imaging)
Show Figures

Figure 1

18 pages, 901 KB  
Article
A Hierarchical Latent Modulation Approach for Controlled Text Generation
by Jincheng Zou, Guorong Chen, Jian Wang, Bao Zhang, Hong Hu and Cong Liu
Mathematics 2025, 13(5), 713; https://doi.org/10.3390/math13050713 - 22 Feb 2025
Viewed by 1330
Abstract
Generative models based on Variational Autoencoders (VAEs) represent an important area of research in Controllable Text Generation (CTG). However, existing approaches often do not fully exploit the potential of latent variables, leading to limitations in both the diversity and thematic consistency of the [...] Read more.
Generative models based on Variational Autoencoders (VAEs) represent an important area of research in Controllable Text Generation (CTG). However, existing approaches often do not fully exploit the potential of latent variables, leading to limitations in both the diversity and thematic consistency of the generated text. To overcome these challenges, this paper introduces a new framework based on Hierarchical Latent Modulation (HLM). The framework incorporates a hierarchical latent space modulation module for the generation and embedding of conditional modulation parameters. By using low-rank tensor factorization (LMF), the approach combines multi-layer latent variables and generates modulation parameters based on conditional labels, enabling precise control over the features during text generation. Additionally, layer-by-layer normalization and random dropout mechanisms are employed to address issues such as the under-utilization of conditional information and the collapse of generative patterns. We performed experiments on five baseline models based on VAEs for conditional generation, and the results demonstrate the effectiveness of the proposed framework. Full article
(This article belongs to the Special Issue Mathematical Foundations in NLP: Applications and Challenges)
Show Figures

Figure 1

17 pages, 5104 KB  
Article
Hierarchical Vector-Quantized Variational Autoencoder and Vector Credibility Mechanism for High-Quality Image Inpainting
by Cheng Li, Dan Xu and Kuai Chen
Electronics 2024, 13(10), 1852; https://doi.org/10.3390/electronics13101852 - 9 May 2024
Cited by 1 | Viewed by 3150
Abstract
Image inpainting infers the missing areas of a corrupted image according to the information of the undamaged part. Many existing image inpainting methods can generate plausible inpainted results from damaged images with the fast-developed deep-learning technology. However, they still suffer from over-smoothed textures [...] Read more.
Image inpainting infers the missing areas of a corrupted image according to the information of the undamaged part. Many existing image inpainting methods can generate plausible inpainted results from damaged images with the fast-developed deep-learning technology. However, they still suffer from over-smoothed textures or textural distortion in the cases of complex textural details or large damaged areas. To restore textures at a fine-grained level, we propose an image inpainting method based on a hierarchical VQ-VAE with a vector credibility mechanism. It first trains the hierarchical VQ-VAE with ground truth images to update two codebooks and to obtain two corresponding vector collections containing information on ground truth images. The two vector collections are fed to a decoder to generate the corresponding high-fidelity outputs. An encoder then is trained with the corresponding damaged image. It generates vector collections approximating the ground truth by the help of the prior knowledge provided by the codebooks. After that, the two vector collections pass through the decoder from the hierarchical VQ-VAE to produce the inpainted results. In addition, we apply a vector credibility mechanism to promote vector collections from damaged images and approximate vector collections from ground truth images. To further improve the inpainting result, we apply a refinement network, which uses residual blocks with different dilation rates to acquire both global information and local textural details. Extensive experiments conducted on several datasets demonstrate that our method outperforms the state-of-the-art ones. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Image and Video Processing)
Show Figures

Figure 1

22 pages, 2364 KB  
Article
The Effects of Selected Extraction Methods and Natural Deep Eutectic Solvents on the Recovery of Active Principles from Aralia elata var. mandshurica (Rupr. & Maxim.) J. Wen: A Non-Targeted Metabolomics Approach
by Alyona Kaleta, Nadezhda Frolova, Anastasia Orlova, Alena Soboleva, Natalia Osmolovskaya, Elena Flisyuk, Olga Pozharitskaya, Andrej Frolov and Alexander Shikov
Pharmaceuticals 2024, 17(3), 355; https://doi.org/10.3390/ph17030355 - 9 Mar 2024
Cited by 19 | Viewed by 2378
Abstract
The methods and solvents employed in routine extraction protocols essentially impact the composition of the resulting extracts, i.e., the relative abundances of individual biologically active metabolites and the quality and stability of the isolates. Natural deep eutectic solvents (NADESs) represent a new class [...] Read more.
The methods and solvents employed in routine extraction protocols essentially impact the composition of the resulting extracts, i.e., the relative abundances of individual biologically active metabolites and the quality and stability of the isolates. Natural deep eutectic solvents (NADESs) represent a new class of environmentally friendly solvents, which are recognized as promising extractants alternative to conventional organic liquids. However, their relative efficiencies when applied in different extraction workflows are still poorly characterized. Therefore, here, we compare the potential of three extraction methods for the extraction of biologically active natural products from Aralia elata var. mandshurica with selected natural deep eutectic solvents (NADESs) using a non-targeted metabolomics approach. The non-targeted metabolite profiling relied on reversed-phase ultra-high-performance liquid chromatography–high-resolution mass spectrometry (RP-UHPLC-HR-MS). The roots of A. elata were extracted by maceration, ultrasound-assisted extraction (UAE), and vibrocavitation-assisted extraction (VAE). Principal component analysis (PCA) revealed a clear separation of the extracts obtained with the three extraction methods employed with NADES1 (choline chloride/malic acid) and NADES2 (sorbitol/malic acid/water). Based on the results of the hierarchical clustering analysis obtained for the normalized relative abundances of individual metabolites and further statistical evaluation with the t-test, it could be concluded that NADES1 showed superior extraction efficiency for all the protocols addressed. Therefore, this NADES was selected to compare the efficiencies of the three extraction methods in more detail. PCA followed by the t-test yielded only 3 metabolites that were more efficiently extracted by maceration, whereas 46 compounds were more abundant in the extracts obtained by VAE. When VAE and UAE were compared, 108 metabolites appeared to be more abundant in the extracts obtained by VAE, whereas only 1 metabolite was more efficiently recovered by UAE. These facts clearly indicate the advantage of the VAE method over maceration and UAE. Seven of the twenty-seven metabolites tentatively identified by tandem mass spectrometry (MS/MS) were found in the roots of A. elata for the first time. Additional studies are necessary to understand the applicability of VAE for the extraction of other plant materials. Full article
(This article belongs to the Section Natural Products)
Show Figures

Figure 1

11 pages, 2769 KB  
Communication
Fast Jukebox: Accelerating Music Generation with Knowledge Distillation
by Michel Pezzat-Morales, Hector Perez-Meana and Toru Nakashika
Appl. Sci. 2023, 13(9), 5630; https://doi.org/10.3390/app13095630 - 3 May 2023
Cited by 2 | Viewed by 3049
Abstract
The Jukebox model can generate high-diversity music within a single system, which is achieved by using a hierarchical VQ-VAE architecture to compress audio in a discrete space at different compression levels. Even though the results are impressive, the inference stage is tremendously slow. [...] Read more.
The Jukebox model can generate high-diversity music within a single system, which is achieved by using a hierarchical VQ-VAE architecture to compress audio in a discrete space at different compression levels. Even though the results are impressive, the inference stage is tremendously slow. To address this issue, we propose a Fast Jukebox, which uses different knowledge distillation strategies to reduce the number of parameters of the prior model for compressed space. Since the Jukebox has shown highly diverse audio generation capabilities, we used a simple compilation of songs for experimental purposes. Evaluation results obtained using emotional valence show that the proposed approach achieved a tendency towards actively pleasant, thus reducing inference time for all VQ-VAE levels without compromising quality. Full article
Show Figures

Figure 1

12 pages, 3795 KB  
Article
HierTTS: Expressive End-to-End Text-to-Waveform Using a Multi-Scale Hierarchical Variational Auto-Encoder
by Zengqiang Shang, Peiyang Shi, Pengyuan Zhang, Li Wang and Guangying Zhao
Appl. Sci. 2023, 13(2), 868; https://doi.org/10.3390/app13020868 - 8 Jan 2023
Cited by 6 | Viewed by 3369
Abstract
End-to-end text-to-speech (TTS) models that directly generate waveforms from text are gaining popularity. However, existing end-to-end models are still not natural enough in their prosodic expressiveness. Additionally, previous studies on improving the expressiveness of TTS have mainly focused on acoustic models. There is [...] Read more.
End-to-end text-to-speech (TTS) models that directly generate waveforms from text are gaining popularity. However, existing end-to-end models are still not natural enough in their prosodic expressiveness. Additionally, previous studies on improving the expressiveness of TTS have mainly focused on acoustic models. There is a lack of research on enhancing expressiveness in an end-to-end framework. Therefore, we propose HierTTS, a highly expressive end-to-end text-to-waveform generation model. It deeply couples the hierarchical properties of speech with hierarchical variational auto-encoders and models multi-scale latent variables, at the frame, phone, subword, word, and sentence levels. The hierarchical encoder encodes the speech signal from fine-grained features into coarse-grained latent variables. In contrast, the hierarchical decoder generates fine-grained features conditioned on the coarse-grained latent variables. We propose a staged KL-weighted annealing strategy to prevent hierarchical posterior collapse. Furthermore, we employ a hierarchical text encoder to extract linguistic information at different levels and act on both the encoder and the decoder. Experiments show that our model performs closer to natural speech in prosody expressiveness and has better generative diversity. Full article
(This article belongs to the Special Issue Audio, Speech and Language Processing)
Show Figures

Figure 1

15 pages, 16165 KB  
Article
Deep Multi-Task Learning for an Autoencoder-Regularized Semantic Segmentation of Fundus Retina Images
by Ge Jin, Xu Chen and Long Ying
Mathematics 2022, 10(24), 4798; https://doi.org/10.3390/math10244798 - 16 Dec 2022
Cited by 3 | Viewed by 2342
Abstract
Automated segmentation of retinal blood vessels is necessary for the diagnosis, monitoring, and treatment planning of the disease. Although current U-shaped structure models have achieved outstanding performance, some challenges still emerge due to the nature of this problem and mainstream models. (1) There [...] Read more.
Automated segmentation of retinal blood vessels is necessary for the diagnosis, monitoring, and treatment planning of the disease. Although current U-shaped structure models have achieved outstanding performance, some challenges still emerge due to the nature of this problem and mainstream models. (1) There does not exist an effective framework to obtain and incorporate features with different spatial and semantic information at multiple levels. (2) The fundus retina images coupled with high-quality blood vessel segmentation are relatively rare. (3) The information on edge regions, which are the most difficult parts to segment, has not received adequate attention. In this work, we propose a novel encoder–decoder architecture based on the multi-task learning paradigm to tackle these challenges. The shared image encoder is regularized by conducting the reconstruction task in the VQ-VAE (Vector Quantized Variational AutoEncoder) module branch to improve the generalization ability. Meanwhile, hierarchical representations are generated and integrated to complement the input image. The edge attention module is designed to make the model capture edge-focused feature representations via deep supervision, focusing on the target edge regions that are most difficult to recognize. Extensive evaluations of three publicly accessible datasets demonstrate that the proposed model outperforms the current state-of-the-art methods. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition with Applications)
Show Figures

Figure 1

18 pages, 666 KB  
Article
A Transformer-Based Hierarchical Variational AutoEncoder Combined Hidden Markov Model for Long Text Generation
by Kun Zhao, Hongwei Ding, Kai Ye and Xiaohui Cui
Entropy 2021, 23(10), 1277; https://doi.org/10.3390/e23101277 - 29 Sep 2021
Cited by 11 | Viewed by 5728
Abstract
The Variational AutoEncoder (VAE) has made significant progress in text generation, but it focused on short text (always a sentence). Long texts consist of multiple sentences. There is a particular relationship between each sentence, especially between the latent variables that control the generation [...] Read more.
The Variational AutoEncoder (VAE) has made significant progress in text generation, but it focused on short text (always a sentence). Long texts consist of multiple sentences. There is a particular relationship between each sentence, especially between the latent variables that control the generation of the sentences. The relationships between these latent variables help in generating continuous and logically connected long texts. There exist very few studies on the relationships between these latent variables. We proposed a method for combining the Transformer-Based Hierarchical Variational AutoEncoder and Hidden Markov Model (HT-HVAE) to learn multiple hierarchical latent variables and their relationships. This application improves long text generation. We use a hierarchical Transformer encoder to encode the long texts in order to obtain better hierarchical information of the long text. HT-HVAE’s generation network uses HMM to learn the relationship between latent variables. We also proposed a method for calculating the perplexity for the multiple hierarchical latent variable structure. The experimental results show that our model is more effective in the dataset with strong logic, alleviates the notorious posterior collapse problem, and generates more continuous and logically connected long text. Full article
Show Figures

Figure 1

17 pages, 4401 KB  
Article
Self-Supervised Variational Auto-Encoders
by Ioannis Gatopoulos and Jakub M. Tomczak
Entropy 2021, 23(6), 747; https://doi.org/10.3390/e23060747 - 14 Jun 2021
Cited by 11 | Viewed by 4600
Abstract
Density estimation, compression, and data generation are crucial tasks in artificial intelligence. Variational Auto-Encoders (VAEs) constitute a single framework to achieve these goals. Here, we present a novel class of generative models, called self-supervised Variational Auto-Encoder (selfVAE), which utilizes deterministic and discrete transformations [...] Read more.
Density estimation, compression, and data generation are crucial tasks in artificial intelligence. Variational Auto-Encoders (VAEs) constitute a single framework to achieve these goals. Here, we present a novel class of generative models, called self-supervised Variational Auto-Encoder (selfVAE), which utilizes deterministic and discrete transformations of data. This class of models allows both conditional and unconditional sampling while simplifying the objective function. First, we use a single self-supervised transformation as a latent variable, where the transformation is either downscaling or edge detection. Next, we consider a hierarchical architecture, i.e., multiple transformations, and we show its benefits compared to the VAE. The flexibility of selfVAE in data reconstruction finds a particularly interesting use case in data compression tasks, where we can trade-off memory for better data quality and vice-versa. We present the performance of our approach on three benchmark image data (Cifar10, Imagenette64, and CelebA). Full article
(This article belongs to the Special Issue Probabilistic Methods for Deep Learning)
Show Figures

Figure 1

17 pages, 8810 KB  
Article
A Visual and VAE Based Hierarchical Indoor Localization Method
by Jie Jiang, Yin Zou, Lidong Chen and Yujie Fang
Sensors 2021, 21(10), 3406; https://doi.org/10.3390/s21103406 - 13 May 2021
Cited by 4 | Viewed by 3407
Abstract
Precise localization and pose estimation in indoor environments are commonly employed in a wide range of applications, including robotics, augmented reality, and navigation and positioning services. Such applications can be solved via visual-based localization using a pre-built 3D model. The increase in searching [...] Read more.
Precise localization and pose estimation in indoor environments are commonly employed in a wide range of applications, including robotics, augmented reality, and navigation and positioning services. Such applications can be solved via visual-based localization using a pre-built 3D model. The increase in searching space associated with large scenes can be overcome by retrieving images in advance and subsequently estimating the pose. The majority of current deep learning-based image retrieval methods require labeled data, which increase data annotation costs and complicate the acquisition of data. In this paper, we propose an unsupervised hierarchical indoor localization framework that integrates an unsupervised network variational autoencoder (VAE) with a visual-based Structure-from-Motion (SfM) approach in order to extract global and local features. During the localization process, global features are applied for the image retrieval at the level of the scene map in order to obtain candidate images, and are subsequently used to estimate the pose from 2D-3D matches between query and candidate images. RGB images only are used as the input of the proposed localization system, which is both convenient and challenging. Experimental results reveal that the proposed method can localize images within 0.16 m and 4° in the 7-Scenes data sets and 32.8% within 5 m and 20° in the Baidu data set. Furthermore, our proposed method achieves a higher precision compared to advanced methods. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

Back to TopTop