MDPI - Publisher of Open Access Journals

34 pages, 28662 KB

Open AccessArticle

Template-Driven Multimodal Face Pseudonymization for Privacy-Preserving Big Data Analytics

by Yeong Su Lee, Hendrik Bothe and Michaela Geierhos

Algorithms 2026, 19(3), 176; https://doi.org/10.3390/a19030176 - 26 Feb 2026

Viewed by 322

Profile images from social networks are a valuable source of data for AI analytics, but they contain biometric identifiers that pose serious privacy risks. The current face anonymization techniques often destroy semantic information, and generative de-identification methods are vulnerable to re-identification attacks. In [...] Read more.

Profile images from social networks are a valuable source of data for AI analytics, but they contain biometric identifiers that pose serious privacy risks. The current face anonymization techniques often destroy semantic information, and generative de-identification methods are vulnerable to re-identification attacks. In this paper, we propose a template-driven multimodal face pseudonymization framework that allows for the privacy-preserving analysis of facial image data while retaining analytically relevant attributes. Our approach uses a FaceNet-based CelebA attribute classifier to extract fine-grained facial attributes and a DeepFace model to extract high-level demographic attributes. Rather than relying on stochastic large language models, we introduce deterministic template-based attribute-to-text conversion to ensure consistency and reproducibility and prevent unintended attribute hallucination. The resulting textual description serves as the sole conditioning input for Janus-Pro, a multimodal text-to-image generation model that synthesizes realistic yet non-identifiable face images. We evaluate our method on the CelebA dataset under a strong adversarial threat model, employing state-of-the-art face recognition systems to assess re-identification and linkability attacks. Our results demonstrate a substantial reduction in identity leakage while preserving semantic attributes. Full article

(This article belongs to the Special Issue Blockchain and Big Data Analytics: AI-Driven Data Science)

► Show Figures

Figure 1

16 pages, 3272 KB

Open AccessArticle

Enhancing Fairness Without Demographic Labels via Identifying and Mitigating Potential Biases

by Pilhyeon Lee and Sungho Park

Symmetry 2026, 18(2), 344; https://doi.org/10.3390/sym18020344 - 12 Feb 2026

Viewed by 348

Abstract

Asymmetries in data distributions and performance across subgroups can induce systematic unfairness in real-world systems. A variety of previous studies have significantly ameliorated the fairness of deep learning models; however, most of them necessarily require additional labels for sensitive attributes, (i.e., ethnicity and [...] Read more.

Asymmetries in data distributions and performance across subgroups can induce systematic unfairness in real-world systems. A variety of previous studies have significantly ameliorated the fairness of deep learning models; however, most of them necessarily require additional labels for sensitive attributes, (i.e., ethnicity and gender). Since sensitive attributes often correspond to personal information, collecting such labels can be restricted and may raise privacy concerns. Although recent work has sought to address these issues by training a model without sensitive attribute labels, we point out that it has limitations, as it assumes specific characteristics of sensitive attributes and is validated in simplistic, constrained environments. Therefore, we propose an Unsupervised Fairness-aware Framework (UFF) that trains a fair classification model without pre-defining the characteristics of the sensitive attributes. It includes branches that capture various types of biases and eliminates them through adversarial training. In various scenarios on benchmark datasets, (i.e., CelebA and UTK Face) for facial attribute classification, the proposed method significantly enhances fairness without assuming specific characteristics of sensitive attributes. Moreover, we introduce g-FAT, which is a new metric to measure generalized trade-off performances between classification accuracy and fairness. For example, on CelebA, ours reduces EO from 11.8 to 7.6 for malignant bias and from 15.6 to 9.6 for benign bias, while improving g-FAT from 80.7 to 84.9 and from 79.0 to 85.2, respectively. In terms of g-FAT, our method achieves the highest trade-off performance among the compared methods on the benchmarks. Full article

(This article belongs to the Special Issue Symmetry/Asymmetry in Computer Vision and Artificial Intelligence)

► Show Figures

Figure 1

18 pages, 52908 KB

Open AccessArticle

M2UNet: A Segmentation-Guided GAN with Attention-Enhanced U²-Net for Face Unmasking

by Mohamed Mahmoud, Mostafa Farouk Senussi, Mahmoud Abdalla, Mahmoud SalahEldin Kasem and Hyun-Soo Kang

Mathematics 2026, 14(3), 477; https://doi.org/10.3390/math14030477 - 29 Jan 2026

Cited by 1 | Viewed by 478

Abstract

Face unmasking is a critical task in image restoration, as masks conceal essential facial features like the mouth, nose, and chin. Current inpainting methods often struggle with structural fidelity when handling large-area occlusions, leading to blurred or inconsistent results. To address this gap, [...] Read more.

Face unmasking is a critical task in image restoration, as masks conceal essential facial features like the mouth, nose, and chin. Current inpainting methods often struggle with structural fidelity when handling large-area occlusions, leading to blurred or inconsistent results. To address this gap, we propose the Masked-to-Unmasked Network (M2UNet), a segmentation-guided generative framework. M2UNet leverages a segmentation-derived mask prior to accurately localize occluded regions and employs a multi-scale, attention-enhanced generator to restore fine-grained facial textures. The framework focuses on producing visually and semantically plausible reconstructions that preserve the structural logic of the face. Evaluated on a synthetic masked-face dataset derived from CelebA, M2UNet achieves state-of-the-art performance with a PSNR of 31.3375 dB and an SSIM of 0.9576. These results significantly outperform recent inpainting methods while maintaining high computational efficiency. Full article

(This article belongs to the Special Issue Advanced Methods and Applications with Deep Learning in Object Recognition)

► Show Figures

Figure 1

18 pages, 10421 KB

Open AccessArticle

A Deep Learning Framework with Multi-Scale Texture Enhancement and Heatmap Fusion for Face Super Resolution

by Bing Xu, Lei Wang, Yanxia Wu, Xiaoming Liu and Lu Gan

AI 2026, 7(1), 20; https://doi.org/10.3390/ai7010020 - 9 Jan 2026

Viewed by 841

Abstract

Face super-resolution (FSR) has made great progress thanks to deep learning and facial priors. However, many existing methods do not fully exploit landmark heatmaps and lack effective multi-scale texture modeling, which often leads to texture loss and artifacts under large upscaling factors. To [...] Read more.

Face super-resolution (FSR) has made great progress thanks to deep learning and facial priors. However, many existing methods do not fully exploit landmark heatmaps and lack effective multi-scale texture modeling, which often leads to texture loss and artifacts under large upscaling factors. To address these problems, we propose a Multi-Scale Residual Stacking Network (MRSNet), which integrates multi-scale texture enhancement with multi-stage heatmap fusion. The MRSNet is built upon Residual Attention-Guided Units (RAGUs) and incorporates a Face Detail Enhancer (FDE), which applies edge, texture, and region branches to achieve differentiated enhancement across facial components. Furthermore, we design a Multi-Scale Texture Enhancement Module (MTEM) that employs progressive average pooling to construct hierarchical receptive fields and employs heatmap-guided attention for adaptive texture refinement. In addition, we introduce a multi-stage heatmap fusion strategy that injects landmark priors into multiple phases of the network, including feature extraction, texture enhancement, and detail reconstruction, enabling deep sharing and progressive integration of prior knowledge. Extensive experiments on CelebA and Helen demonstrate that the proposed method achieves superior detail recovery and generates perceptually realistic high-resolution face images. Both quantitative and qualitative evaluations confirm that our approach outperforms state-of-the-art methods. Full article

(This article belongs to the Special Issue Deep Learning Technologies and Their Applications in Image Processing, Computer Vision, and Computational Intelligence)

► Show Figures

Figure 1

15 pages, 3387 KB

Open AccessArticle

Automatic Apparent Nasal Index from Single Facial Photographs Using a Lightweight Deep Learning Pipeline: A Pilot Study

by Babak Saravi, Lara Schorn, Julian Lommen, Max Wilkat, Andreas Vollmer, Hamza Eren Güzel, Michael Vollmer, Felix Schrader, Christoph K. Sproll, Norbert R. Kübler and Daman D. Singh

Medicina 2025, 61(11), 1922; https://doi.org/10.3390/medicina61111922 - 27 Oct 2025

Viewed by 1404

Abstract

Background and Objectives: Quantifying nasal proportions is central to facial plastic and reconstructive surgery, yet manual measurements are time-consuming and variable. We sought to develop a simple, reproducible deep learning pipeline that localizes the nose in a single frontal photograph and automatically [...] Read more.

Background and Objectives: Quantifying nasal proportions is central to facial plastic and reconstructive surgery, yet manual measurements are time-consuming and variable. We sought to develop a simple, reproducible deep learning pipeline that localizes the nose in a single frontal photograph and automatically computes the two-dimensional, photograph-derived apparent nasal index (aNI)—width/height × 100—enabling classification into five standard anthropometric categories. Materials and Methods: From CelebA we curated 29,998 high-quality near-frontal images (training 20,998; validation 5999; test 3001). Nose masks were manually annotated with the VGG Image Annotator and rasterized to binary masks. Ground-truth aNI was computed from the mask’s axis-aligned bounding box. A lightweight one-class YOLOv8n detector was trained to localize the nose; predicted aNI was computed from the detected bounding box. Performance was assessed on the held-out test set using detection coverage and mAP, agreement metrics between detector- and mask-based aNI (MAE, RMSE, R²; Bland–Altman), and five-class classification metrics (accuracy, macro-F1). Results: The detector returned at least one accepted nose box in 3000/3001 test images (99.97% coverage). Agreement with ground truth was strong: MAE 3.04 nasal index units (95% CI 2.95–3.14), RMSE 4.05, and R² 0.819. Bland–Altman analysis showed a small negative bias (−0.40, 95% CI −0.54 to −0.26) with limits of agreement −8.30 to 7.50 (95% CIs −8.54 to −8.05 and 7.25 to 7.74). After excluding out-of-range cases (<40.0), five-class classification on n = 2976 images achieved macro-F1 0.705 (95% CI 0.608–0.772) and 80.7% accuracy; errors were predominantly adjacent-class swaps, consistent with the small aNI error. Additional analyses confirmed strong ordinal agreement (weighted κ = 0.71 linear, 0.78 quadratic; Spearman ρ = 0.76) and near-perfect adjacent-class accuracy (0.999); performance remained stable when thresholds were shifted ±2 NI units and across sex and age subgroups. Conclusions: A compact detector can deliver near-universal nose localization and accurate automatic estimation of the nasal index from a single photograph, enabling reliable five-class categorization without manual measurements. The approach is fast, reproducible, and promising as a calibrated decision-support adjunct for surgical planning, outcomes tracking, and large-scale morphometric research. Full article

(This article belongs to the Special Issue Recent Advances in Plastic and Reconstructive Surgery)

► Show Figures

Figure 1

25 pages, 3263 KB

Open AccessArticle

Combining MTCNN and Enhanced FaceNet with Adaptive Feature Fusion for Robust Face Recognition

by Sasan Karamizadeh, Saman Shojae Chaeikar and Hamidreza Salarian

Technologies 2025, 13(10), 450; https://doi.org/10.3390/technologies13100450 - 3 Oct 2025

Cited by 2 | Viewed by 2580

Abstract

Face recognition systems typically face actual challenges like facial pose, illumination, occlusion, and ageing that significantly impact the recognition accuracy. In this paper, a robust face recognition system that uses Multi-task Cascaded Convolutional Networks (MTCNN) for face detection and face alignment with an [...] Read more.

Face recognition systems typically face actual challenges like facial pose, illumination, occlusion, and ageing that significantly impact the recognition accuracy. In this paper, a robust face recognition system that uses Multi-task Cascaded Convolutional Networks (MTCNN) for face detection and face alignment with an enhanced FaceNet for facial embedding extraction is presented. The enhanced FaceNet uses attention mechanisms to achieve more discriminative facial embeddings, especially in challenging scenarios. In addition, an Adaptive Feature Fusion module synthetically combines identity-specific embeddings with context information such as pose, lighting, and presence of masks, hence enhancing robustness and accuracy. Training takes place using the CelebA dataset, and the test is conducted independently on LFW and IJB-C to enable subject-disjoint evaluation. CelebA has over 200,000 faces of 10,177 individuals, LFW consists of 13,000+ faces of 5749 individuals in unconstrained conditions, and IJB-C has 31,000 faces and 117,000 video frames with extreme pose and occlusion changes. The system introduced here achieves 99.6% on CelebA, 94.2% on LFW, and 91.5% on IJB-C and outperforms baselines such as simple MTCNN-FaceNet, AFF-Net, and state-of-the-art models such as ArcFace, CosFace, and AdaCos. These findings demonstrate that the proposed framework generalizes effectively between datasets and is resilient in real-world scenarios. Full article

(This article belongs to the Special Issue Emerging Technologies and Intelligent Systems for Sustainable Development)

► Show Figures

Graphical abstract

25 pages, 6911 KB

Open AccessArticle

Image Inpainting Algorithm Based on Structure-Guided Generative Adversarial Network

by Li Zhao, Tongyang Zhu, Chuang Wang, Feng Tian and Hongge Yao

Mathematics 2025, 13(15), 2370; https://doi.org/10.3390/math13152370 - 24 Jul 2025

Cited by 2 | Viewed by 4843

Abstract

To address the challenges of image inpainting in scenarios with extensive or irregular missing regions—particularly detail oversmoothing, structural ambiguity, and textural incoherence—this paper proposes an Image Structure-Guided (ISG) framework that hierarchically integrates structural priors with semantic-aware texture synthesis. The proposed methodology advances a [...] Read more.

To address the challenges of image inpainting in scenarios with extensive or irregular missing regions—particularly detail oversmoothing, structural ambiguity, and textural incoherence—this paper proposes an Image Structure-Guided (ISG) framework that hierarchically integrates structural priors with semantic-aware texture synthesis. The proposed methodology advances a two-stage restoration paradigm: (1) Structural Prior Extraction, where adaptive edge detection algorithms identify residual contours in corrupted regions, and a transformer-enhanced network reconstructs globally consistent structural maps through contextual feature propagation; (2) Structure-Constrained Texture Synthesis, wherein a multi-scale generator with hybrid dilated convolutions and channel attention mechanisms iteratively refines high-fidelity textures under explicit structural guidance. The framework introduces three innovations: (1) a hierarchical feature fusion architecture that synergizes multi-scale receptive fields with spatial-channel attention to preserve long-range dependencies and local details simultaneously; (2) spectral-normalized Markovian discriminator with gradient-penalty regularization, enabling adversarial training stability while enforcing patch-level structural consistency; and (3) dual-branch loss formulation combining perceptual similarity metrics with edge-aware constraints to align synthesized content with both semantic coherence and geometric fidelity. Our experiments on the two benchmark datasets (Places2 and CelebA) have demonstrated that our framework achieves more unified textures and structures, bringing the restored images closer to their original semantic content. Full article

► Show Figures

Figure 1

31 pages, 70417 KB

Open AccessArticle

Lightweight Text-to-Image Generation Model Based on Contrastive Language-Image Pre-Training Embeddings and Conditional Variational Autoencoders

by Yubo Wang and Gaofeng Zhang

Electronics 2025, 14(11), 2185; https://doi.org/10.3390/electronics14112185 - 28 May 2025

Viewed by 2587

Abstract

Deploying text-to-image (T2I) models is challenging due to high computational demands, extensive data needs, and the persistent goal of enhancing generation quality and diversity, particularly on resource-constrained devices. We introduce a lightweight T2I framework that uses a dual-conditioned Conditional Variational Autoencoder (CVAE), leveraging [...] Read more.

Deploying text-to-image (T2I) models is challenging due to high computational demands, extensive data needs, and the persistent goal of enhancing generation quality and diversity, particularly on resource-constrained devices. We introduce a lightweight T2I framework that uses a dual-conditioned Conditional Variational Autoencoder (CVAE), leveraging CLIP embeddings for semantic guidance and enabling explicit attribute control, thereby reducing computational load and data dependency. Key to our approach is a specialized mapping network that bridges CLIP text–image modalities for improved fidelity and Rényi divergence for latent space regularization to foster diversity, as evidenced by richer latent representations. Experiments on CelebA demonstrate competitive generation (FID: 40.53, 42 M params, 21 FPS) with enhanced diversity. Crucially, our model also shows effective generalization to the more complex MS COCO dataset and maintains a favorable balance between visual quality and efficiency (8 FPS at 256 × 256 resolution with 54 M params). Ablation studies and component validations (detailed in appendices) confirm the efficacy of our contributions. This work offers a practical, efficient T2I solution that balances generative performance with resource constraints across different datasets and is suitable for specialized, data-limited domains. Full article

(This article belongs to the Special Issue Big Model Techniques for Image Processing)

► Show Figures

Figure 1

27 pages, 11612 KB

Open AccessArticle

FACDIM: A Face Image Super-Resolution Method That Integrates Conditional Diffusion Models with Prior Attributes

by Jianhua Ren, Yuze Guo and Qiangkui Leng

Electronics 2025, 14(10), 2070; https://doi.org/10.3390/electronics14102070 - 20 May 2025

Viewed by 1715

Abstract

Facial image super-resolution seeks to reconstruct high-quality details from low-resolution inputs, yet traditional methods, such as interpolation, convolutional neural networks (CNNs), and generative adversarial networks (GANs), often fall short, suffering from insufficient realism, loss of high-frequency details, and training instability. Furthermore, many existing [...] Read more.

Facial image super-resolution seeks to reconstruct high-quality details from low-resolution inputs, yet traditional methods, such as interpolation, convolutional neural networks (CNNs), and generative adversarial networks (GANs), often fall short, suffering from insufficient realism, loss of high-frequency details, and training instability. Furthermore, many existing models inadequately incorporate facial structural attributes and semantic information, leading to semantically inconsistent generated images. To overcome these limitations, this study introduces an attribute-prior conditional diffusion implicit model that enhances the controllability of super-resolution generation and improves detail restoration capabilities. Methodologically, the framework consists of four components: a pre-super-resolution module, a facial attribute extraction module, a global feature encoder, and an enhanced conditional diffusion implicit model. Specifically, low-resolution images are subjected to preliminary super-resolution and attribute extraction, followed by adaptive group normalization to integrate feature vectors. Additionally, residual convolutional blocks are incorporated into the diffusion model to utilize attribute priors, complemented by self-attention mechanisms and skip connections to optimize feature transmission. Experiments conducted on the CelebA and FFHQ datasets demonstrate that the proposed model achieves an increase of 2.16 dB in PSNR and 0.08 in SSIM under an 8× magnification factor compared to SR3, with the generated images displaying more realistic textures. Moreover, manual adjustment of attribute vectors allows for directional control over generation outcomes (e.g., modifying facial features or lighting conditions), ensuring alignment with anthropometric characteristics. This research provides a flexible and robust solution for high-fidelity face super-resolution, offering significant advantages in detail preservation and user controllability. Full article

(This article belongs to the Special Issue AI-Driven Image Processing: Theory, Methods, and Applications)

► Show Figures

Figure 1

16 pages, 2542 KB

Open AccessArticle

The Eyes: A Source of Information for Detecting Deepfakes

by Elisabeth Tchaptchet, Elie Fute Tagne, Jaime Acosta, Danda B. Rawat and Charles Kamhoua

Information 2025, 16(5), 371; https://doi.org/10.3390/info16050371 - 30 Apr 2025

Cited by 3 | Viewed by 3692

Abstract

Currently, the phenomenon of deepfakes is becoming increasingly significant, as they enable the creation of extremely realistic images capable of deceiving anyone thanks to deep learning tools based on generative adversarial networks (GANs). These images are used as profile pictures on social media [...] Read more.

Currently, the phenomenon of deepfakes is becoming increasingly significant, as they enable the creation of extremely realistic images capable of deceiving anyone thanks to deep learning tools based on generative adversarial networks (GANs). These images are used as profile pictures on social media with the intent to sow discord and perpetrate scams on a global scale. In this study, we demonstrate that these images can be identified through various imperfections present in the synthesized eyes, such as the irregular shape of the pupil and the difference between the corneal reflections of the two eyes. These defects result from the absence of physical and physiological constraints in most GAN models. We develop a two-level architecture capable of detecting these fake images. This approach begins with an automatic segmentation method for the pupils to verify their shape, as real image pupils naturally have a regular shape, typically round. Next, for all images where the pupils are not regular, the entire image is analyzed to verify the reflections. This step involves passing the facial image through an architecture that extracts and compares the specular reflections of the corneas of the two eyes, assuming that the eyes of real people observing a light source should reflect the same thing. Our experiments with a large dataset of real images from the Flickr-FacesHQ and CelebA datasets, as well as fake images from StyleGAN2 and ProGAN, show the effectiveness of our method. Our experimental results on the Flickr-Faces-HQ (FFHQ) dataset and images generated by StyleGAN2 demonstrated that our algorithm achieved a remarkable detection accuracy of 0.968 and a sensitivity of 0.911. Additionally, the method had a specificity of 0.907 and a precision of 0.90 for this same dataset. And our experimental results on the CelebA dataset and images generated by ProGAN also demonstrated that our algorithm achieved a detection accuracy of 0.870 and a sensitivity of 0.901. Moreover, the method had a specificity of 0.807 and a precision of 0.88 for this same dataset. Our approach maintains good stability of physiological properties during deep learning, making it as robust as some single-class deepfake detection methods. The results of the tests on the selected datasets demonstrate higher accuracy compared to other methods. Full article

► Show Figures

Figure 1

20 pages, 15418 KB

Open AccessArticle

An Explainable Deep Semantic Coding for Binary-Classification- Oriented Communication

by Shuhui Wang, Zuxing Li, Xin Huang and Qi Jiang

Appl. Sci. 2025, 15(9), 4608; https://doi.org/10.3390/app15094608 - 22 Apr 2025

Cited by 1 | Viewed by 1428

Abstract

Semantic communication is emerging as a promising communication paradigm, where semantic coding plays an essential role by explicitly extracting task-critical information. Prior efforts toward semantic coding often rely on learning-based feature extraction methods but tend to overlook data compression and lack a rigorous [...] Read more.

Semantic communication is emerging as a promising communication paradigm, where semantic coding plays an essential role by explicitly extracting task-critical information. Prior efforts toward semantic coding often rely on learning-based feature extraction methods but tend to overlook data compression and lack a rigorous theoretical foundation. To address these limitations, this paper proposes a novel explainable deep semantic coding framework, considering a binary mixed source and a classification task at the receiver. From an information-theoretic perspective, we formulate a semantic coding problem that jointly optimizes data compression rate and classification accuracy subject to distortion constraints. To solve this problem, we leverage deep learning techniques and variational approximation methods to develop practical deep semantic coding schemes. Experiments on the CelebA dataset and the CIFAR-10 dataset demonstrate that the proposed schemes effectively balance data compression and binary classification accuracy, which aligns with the theoretical formulation. Full article

(This article belongs to the Topic Innovation, Communication and Engineering)

► Show Figures

Figure 1

16 pages, 5701 KB

Open AccessArticle

Generating Human-Interpretable Rules from Convolutional Neural Networks

by Russel Pears and Ashwini Kumar Sharma

Information 2025, 16(3), 230; https://doi.org/10.3390/info16030230 - 16 Mar 2025

Viewed by 1646

Abstract

Advancements in the field of artificial intelligence have been rapid in recent years and have revolutionized various industries. Various deep neural network architectures capable of handling both text and images, covering code generation from natural language as well as producing machine translation and [...] Read more.

Advancements in the field of artificial intelligence have been rapid in recent years and have revolutionized various industries. Various deep neural network architectures capable of handling both text and images, covering code generation from natural language as well as producing machine translation and text summaries, have been proposed. For example, convolutional neural networks or CNNs perform image classification at a level equivalent to that of humans on many image datasets. These state-of-the-art networks have reached unprecedented levels of success by using complex architectures with billions of parameters, numerous kernel configurations, weight initialization, and regularization methods. Unfortunately to reach this level of success, the models that CNNs use are essentially black box in nature, with little or no human-interpretable information on the decision-making process. This lack of transparency in decision making gave rise to concerns amongst some sectors of the user community such as healthcare, finance, justice, and defense, among others. This challenge motivated our research, where we successfully produced human-interpretable influential features from CNNs for image classification and captured the interactions between these features by producing a concise decision tree making that makes classification decisions. The proposed methodology makes use of a pretrained VGG-16 with fine-tuning to extract feature maps produced by learnt filters. On the CelebA image benchmark dataset, we successfully produced human-interpretable rules that captured the main facial landmarks responsible for segmenting men from women with 89.6% accuracy, while on the more challenging Cats vs. Dogs dataset, the decision tree achieved 87.6% accuracy. Full article

(This article belongs to the Special Issue Advances in Explainable Artificial Intelligence, 2nd Edition)

► Show Figures

Graphical abstract

21 pages, 5152 KB

Open AccessArticle

GAGAN: Enhancing Image Generation Through Hybrid Optimization of Genetic Algorithms and Deep Convolutional Generative Adversarial Networks

by Despoina Konstantopoulou, Paraskevi Zacharia, Michail Papoutsidakis, Helen C. Leligou and Charalampos Patrikakis

Algorithms 2024, 17(12), 584; https://doi.org/10.3390/a17120584 - 19 Dec 2024

Cited by 6 | Viewed by 3778

Abstract

Generative Adversarial Networks (GANs) are highly effective for generating realistic images, yet their training can be unstable due to challenges such as mode collapse and oscillatory convergence. In this paper, we propose a novel hybrid optimization method that integrates Genetic Algorithms (GAs) to [...] Read more.

Generative Adversarial Networks (GANs) are highly effective for generating realistic images, yet their training can be unstable due to challenges such as mode collapse and oscillatory convergence. In this paper, we propose a novel hybrid optimization method that integrates Genetic Algorithms (GAs) to improve the training process of Deep Convolutional GANs (DCGANs). Specifically, GAs are used to evolve the discriminator’s weights, complementing the gradient-based learning typically employed in GANs. The proposed GAGAN model is trained on the CelebA dataset, using 2000 images, to generate 128 × 128 images, with the generator learning to produce realistic faces from random latent vectors. The discriminator, which classifies images as real or fake, is optimized not only through standard backpropagation, but also through a GA framework that evolves its weights via crossover, mutation, and selection processes. This hybrid method aims to enhance convergence stability and boost image quality by balancing local search from gradient-based methods with the global search capabilities of GAs. Experiments show that the proposed approach reduces generator loss and improves image fidelity, demonstrating that evolutionary algorithms can effectively complement deep learning techniques. This work opens new avenues for optimizing GAN training and enhancing performance in generative models. Full article

(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)

► Show Figures

Figure 1

15 pages, 458 KB

Open AccessArticle

Facial Anti-Spoofing Using “Clue Maps”

by Liang Yu Gong, Xue Jun Li and Peter Han Joo Chong

Sensors 2024, 24(23), 7635; https://doi.org/10.3390/s24237635 - 29 Nov 2024

Viewed by 2513

Abstract

Spoofing attacks (or Presentation Attacks) are easily accessible to facial recognition systems, making the online financial system vulnerable. Thus, it is urgent to develop an anti-spoofing solution with superior generalization ability due to the high demand for spoofing attack detection. Although multi-modality methods [...] Read more.

Spoofing attacks (or Presentation Attacks) are easily accessible to facial recognition systems, making the online financial system vulnerable. Thus, it is urgent to develop an anti-spoofing solution with superior generalization ability due to the high demand for spoofing attack detection. Although multi-modality methods such as combining depth images with RGB images and feature fusion methods could currently perform well with certain datasets, the cost of obtaining the depth information and physiological signals, especially that of the biological signal is relatively high. This paper proposes a representation learning method of an Auto-Encoder structure based on Swin Transformer and ResNet, then applies cross-entropy loss, semi-hard triplet loss, and Smooth L1 pixel-wise loss to supervise the model training. The architecture contains three parts, namely an Encoder, a Decoder, and an auxiliary classifier. The Encoder part could effectively extract the features with patches’ correlations and the Decoder aims to generate universal “Clue Maps” for further contrastive learning. Finally, the auxiliary classifier is adopted to assist the model in making the decision, which regards this result as one preliminary result. In addition, extensive experiments evaluated Attack Presentation Classification Error Rate (APCER), Bonafide Presentation Classification Error Rate (BPCER) and Average Classification Error Rate (ACER) performances on the popular spoofing databases (CelebA, OULU, and CASIA-MFSD) to compare with several existing anti-spoofing models, and our approach could outperform existing models which reach 1.2% and 1.6% ACER on intra-dataset experiment. In addition, the inter-dataset on CASIA-MFSD (training set) and Replay-attack (Testing set) reaches a new state-of-the-art performance with 23.8% Half Total Error Rate (HTER). Full article

(This article belongs to the Special Issue Deep Learning for Perception and Recognition: Method and Applications)

► Show Figures

Figure 1

12 pages, 7654 KB

Open AccessArticle

Memorizing Swin-Transformer Denoising Network for Diffusion Model

by Jindou Chen and Yiqing Shen

Electronics 2024, 13(20), 4050; https://doi.org/10.3390/electronics13204050 - 15 Oct 2024

Cited by 2 | Viewed by 3974

Abstract

Diffusion models have garnered significant attention in the field of image generation. However, existing denoising architectures, such as U-Net, face limitations in capturing the global context, while Vision Transformers (ViTs) may struggle with local receptive fields. To address these challenges, we propose a [...] Read more.

Diffusion models have garnered significant attention in the field of image generation. However, existing denoising architectures, such as U-Net, face limitations in capturing the global context, while Vision Transformers (ViTs) may struggle with local receptive fields. To address these challenges, we propose a novel Swin-Transformer-based denoising network architecture that leverages the strengths of both U-Net and ViT. Moreover, our approach integrates the k-Nearest Neighbor (kNN) based memorizing attention module into the Swin-Transformer, enabling it to effectively harness crucial contextual information from feature maps and enhance its representational capacity. Finally, we introduce an innovative hierarchical time stream embedding scheme that optimizes the incorporation of temporal cues during the denoising process. This method surpasses basic approaches like simple addition or concatenation of fixed time embeddings, facilitating a more effective fusion of temporal information. Extensive experiments conducted on four benchmark datasets demonstrate the superior performance of our proposed model compared to U-Net and ViT as denoising networks. Our model outperforms baselines on the CRC-VAL-HE-7K and CelebA datasets, achieving improved FID scores of 14.39 and 4.96, respectively, and even surpassing DiT and UViT under our experiment setting. The Memorizing Swin-Transformer architecture, coupled with the hierarchical time stream embedding, sets a new state-of-the-art in denoising diffusion models for image generation. Full article

(This article belongs to the Special Issue New Trends in Intelligent User Interfaces and Human-Computer Interactions with Large Language Models)

► Show Figures

Figure 1

Search Results (59)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (59)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI