Neural Architecture Search for Generative Adversarial Networks: A Comprehensive Review and Critical Analysis

Alotaibi, Abrar; Ahmed, Moataz

doi:10.3390/app15073623

Open AccessReview

Neural Architecture Search for Generative Adversarial Networks: A Comprehensive Review and Critical Analysis

by

Abrar Alotaibi

^1,2,*

and

Moataz Ahmed

^1,3

¹

Department of Information and Computer Science, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia

²

Computer Science Department, College of Computer Science and Information Technology, Imam Abdulrahman bin Faisal University, Dammam 31441, Saudi Arabia

³

SDAIA-KFUPM Joint Research Center for Artificial Intelligence, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(7), 3623; https://doi.org/10.3390/app15073623

Submission received: 22 December 2024 / Revised: 19 March 2025 / Accepted: 21 March 2025 / Published: 26 March 2025

Download

Browse Figures

Versions Notes

Abstract

:

Neural Architecture Search (NAS) has emerged as a pivotal technique in optimizing the design of Generative Adversarial Networks (GANs), automating the search for effective architectures while addressing the challenges inherent in manual design. This paper provides a comprehensive review of NAS methods applied to GANs, categorizing and comparing various approaches based on criteria such as search strategies, evaluation metrics, and performance outcomes. The review highlights the benefits of NAS in improving GAN performance, stability, and efficiency, while also identifying limitations and areas for future research. Key findings include the superiority of evolutionary algorithms and gradient-based methods in certain contexts, the importance of robust evaluation metrics beyond traditional scores like Inception Score (IS) and Fréchet Inception Distance (FID), and the need for diverse datasets in assessing GAN performance. By presenting a structured comparison of existing NAS-GAN techniques, this paper aims to guide researchers in developing more effective NAS methods and advancing the field of GANs.

Keywords:

GANs; Neural Architecture Search; architecture search; evolutionary algorithms; reinforcement learning; gradient-based search

1. Introduction

1.1. Background

Data insufficiency poses a recurring challenge for many classification tasks, hampering both model development and validation. With limited data, models can struggle to learn robust feature representations, leading to poor generalization on unseen data [1,2]. This problem is compounded when working with complex models like deep neural networks that have high capacity and are prone to overfitting [3]. To address data scarcity, recent advancements have turned to data augmentation (DA), which involves creating new training samples from existing ones. DA is a powerful technique for addressing both limited datasets and class imbalance. By generating additional examples from rare classes, oversampling with DA can rebalance class distributions. Moreover, augmentation helps reduce overfitting when training deep neural networks (DNNs) by expanding the training set with transformed versions of existing samples. This exposes models to greater diversity without needing to collect new data.

DA techniques have been significantly empowered using generative adversarial networks (GANs) [1]. GANs can learn complex data distributions and generate realistic synthetic samples for augmentation. They have become a pivotal DA tool, especially in areas like medical imaging, anomaly detection, and text-to-image generation [4,5,6]. GAN-based DA is quickly becoming indispensable for robust deep learning with scarce data across modalities like images, video, audio, and text [1,6,7].

1.2. Motivation

Developing GANs poses some considerable challenges. Like many deep learning techniques, GANs require meticulous hyperparameter tuning and network architecture selection to optimize results [8]. The model hyperparameters and structure heavily influence the final generated samples. Manually tuning these complex generative models with dozens of hyperparameters is often tedious and suboptimal, and requires expert knowledge. Additionally, the simultaneous training of the generator and discriminator networks makes GANs notoriously unstable and difficult to converge [2]. There can be misleading situations where the discriminator is unable to properly judge the quality of the generated images, rather than the generator producing high-quality artificial samples. This imbalance prevents effective adversarial learning. Moreover, problems like mode collapse may arise where the generator lacks diversity and produces limited varieties of samples. Careful regularization and architectural choices are needed to promote generator diversity [9]. Researchers have put considerable effort into manually improving GAN architectures, but this process demands significant expertise.

Recently, Neural Architecture Search (NAS) has emerged as an effective tool for automatically discovering superior models across various tasks, including GANs [10,11]. Early attempts at applying NAS to GANs focused solely on optimizing the generator while keeping the discriminator fixed, to simplify the search process [12]. However, this approach may result in suboptimal GANs. More recent studies have attempted to search for both generator and discriminator architectures simultaneously, but they face challenges due to the inherent instability of GAN training [13,14].

1.3. Objectives

In our research, we have developed a framework to categorize and compare different techniques based on a set of key criteria identified through an extensive review of existing methods. Using this framework, we conducted a critical analysis and comparison of the various NAS-GAN techniques currently available in the literature. This assessment not only allowed us to evaluate existing approaches but also highlighted areas for potential future research.

The remainder of this paper is structured as follows: Section 2 reviews related work. In Section 3, we describe the research methodology and outline the research questions. Section 4 provides the analysis, results, and responses to the research questions. Section 5 discusses the study’s implications and proposes directions for future research. Section 6 addresses the potential threats to the validity of this study. Finally, Section 7 concludes the paper.

2. Related Studies

2.1. Overview of NAS-GAN Reviews

Despite the growing interest in applying Neural Architecture Search (NAS) to Generative Adversarial Networks (GANs), there have been limited comprehensive reviews on this specific topic. To date, only two papers have been published that specifically review NAS in GANs, and these are essentially one work and its continuation.

Ganepola and Wirasingha (2021) [15] conducted a comprehensive review of NAS techniques applied to GANs. The authors analyzed various approaches based on key components of NAS: search space, search strategy, and performance estimation strategy. They identified cell-based and chain/entire structure as the primary search space types, with reinforcement learning, gradient-based, and evolutionary algorithms as the main search strategies. The review focused on image generation and GAN model compression tasks, comparing different methods using metrics such as Inception Score (IS) and Fréchet Inception Distance (FID). The authors also discussed limitations and future directions for NAS in GANs, including potential applications in semantic image segmentation and high-resolution image synthesis. However, this survey was published in 2021 and may not include the most recent advancements in the field.

Buthgamumudalige and Wirasingha (2021) [16], published in the same year, provides a comprehensive review of NAS techniques applied to GANs for image generation tasks. The authors analyze various approaches using multiple criteria, including search space design, search strategy implementation, and performance evaluation methods. They examine systems that weren’t discussed in the previous paper, comparing them across several key dimensions: image generation quality, computational efficiency, transferability to different datasets, and support for supervised vs. unsupervised learning. Performance is evaluated using metrics like IS and FID on datasets such as CIFAR-10 and STL-10. The review also considers practical aspects such as GPU costs and training times. The paper highlights the progress made in automating GAN architecture design, noting improvements in image quality and search efficiency. However, it also identifies limitations in existing work, such as the narrow focus on specific datasets and image generation types. The review concludes by suggesting potential areas for future research, emphasizing the need for more comprehensive evaluation criteria, including performance on diverse datasets and exploration of conditional and semi-supervised image generation tasks.

In contrast, the survey by Wang et al. (2024) [17] presents an extensive analysis of evolutionary computation (EC) applied to GANs. It delves into technical aspects such as the design of mutation operators, the formulation of fitness functions based on metrics like inverted generational distance (IGD), and strategies to alleviate mode collapse through Pareto front approximations. The survey also examines scalability—demonstrating how evolved architectures can transfer from low-dimensional (n = 10) to high-dimensional (n = 784) settings—and discusses integrating EC with gradient-based methods to enhance convergence and robustness. Although its scope is broad, covering NAS, parameter tuning, loss function adaptation, and synchronization strategies, the survey is best categorized within the wider NAS reviews rather than a strictly NAS-GAN-focused review.

2.2. Other NAS Reviews with GAN Sections

Other reviews that were focused on NAS methods had a dedicated section for GANs. Kang et al.’s (2023) [18] review focused on the application of NAS in computer vision tasks, dividing them into detection, segmentation, and generation. It presented prominent works that applied NAS to GANs for the generation task. On the other hand, White et al. (2023) [19] extensively reviewed NAS, garnering insight from 1000 papers. The GAN section of the review focused on identifying the best techniques used amongst the prominent works. As summarized in Table 1, these works either lacked comprehensive coverage of NAS-GAN methodologies or focused narrowly on specific tasks. In contrast, our review synthesizes advancements from 2021–2025, addressing gaps such as dataset diversity and reproducibility while maintaining a GAN-specific focus.

2.3. Identified Gaps in the Literature

In our review of related studies, we identified a noticeable gap in the literature regarding the application of NAS to GANs. Given the rapid advancements in computer hardware capabilities and the increasing number of publications in the field, there is a compelling need for an up-to-date comprehensive review. Current surveys often focus on narrow aspects of NAS applied to GANs, such as specific datasets or image generation techniques. This paper aims to address these gaps by providing a broader analysis of NAS techniques and their applications to GANs, highlighting limitations and future research opportunities.

To systematically differentiate our work from existing surveys, Table 2 compares our review with two foundational prior studies by Ganepola & Wirasingha (2021) [15] and Buthgamumudalige & Wirasingha (2021) [16].

This comparison underscores our review’s unique focus on understudied challenges in NAS-GANs, such as reproducibility, while expanding the scope of datasets and evaluation criteria. Subsequent sections detail these contributions through quantitative and qualitative analyses.

3. Research Methodology

In this study, we present a comprehensive review of NAS techniques for GANs. We based our methodology on the guidelines proposed by Kitchenham [20] and have applied strict quality assessment criteria. However, we note that this work is not a systematic review in the strict sense (e.g., following PRISMA guidelines) and does not fully adhere to all the reproducibility standards of a systematic review.

3.1. Study Objectives and Research Questions

With the rapid advancement of AI and DL, GANs have emerged as powerful tools for generating realistic data across various domains. As the complexity of GAN architectures grows, researchers have been increasingly interested in applying NAS to automate and optimize the design of GANs, allowing AI practitioners to focus on higher-level problems rather than manually crafting network architectures, which can be time-consuming and suboptimal [2]. Thus, developing effective NAS methods for GANs could significantly enhance the quality and efficiency of generative models.

The main objectives of this study are to review and analyze the state-of-the-art NAS approaches for GANs, their techniques, and evaluation metrics. We conduct a comprehensive survey of the current landscape, focusing on the main types of NAS-GAN and identifying the important criterions of each type, such as underlying search space, techniques, search objectives, and evaluation metrics. Moreover, this study aims to identify gaps in the existing literature and highlight promising future directions to further advance research in this area.

For the purposes of this study, we developed comparison frameworks and applied them to a set of prominent studies. The research questions that this review aims to answer are as follows:

RQ1: What NAS approaches are applied to GANs in the literature?
–
To address this question, we identified the approaches present in the literature, highlighting both their benefits and limitations. By systematically reviewing existing methods, we were able to provide a comprehensive analysis that underscores the strengths of each approach while also acknowledging potential drawbacks and areas for improvement.
RQ2: What are the key search spaces explored in NAS-GAN?
–
To address this research question, we examined the search spaces utilized in the studied approaches, exploring their applications and additional relevant aspects. This investigation provided insights into how different search spaces are leveraged within the context of the approaches, highlighting their effectiveness and areas for potential enhancement.
RQ3: What evaluation methods are used to assess the found architecture?
–
To address this research question, we identified the metrics employed to evaluate NAS-GAN approaches and assessed their applicability. This analysis provided a detailed overview of the evaluation criteria used in the studies, examining their effectiveness in measuring the performance and suitability of the NAS-GAN approaches in various contexts.
RQ4: What are the gaps in the research on NAS in GANs?
–
To address this research question, we studied and analyzed the relevant literature, providing a comprehensive review of existing studies and identifying areas for future research. This approach allowed us to suggest potential directions for future work based on the gaps and limitations identified in the current body of knowledge.

3.2. Search Strategy

To support our study and answer our research questions, we utilized multiple information sources, focusing exclusively on scientific literature. We gathered relevant studies from key literature search engines and databases, including Google Scholar, ACM, ScienceDirect, IEEE, ArXiv, and Springer. Additionally, to enhance our search and uncover more pertinent studies that may not have appeared in our initial searches, we employed both backward and forward snowballing techniques. This comprehensive approach ensured that we captured a wide array of related studies, providing a robust foundation for our research analysis. The annual distribution of the chosen literature is shown in Figure 1.

We employed the following search strings to identify relevant studies:

In Google Scholar, IEEE, ACM, and Springer: (“Generative Adversarial Network” OR “GAN*”) AND (“Architecture” OR “Architectural”) AND (“Search” OR “Optimization”) AND (“Reinforcement Learning” OR “Policy” OR “Evolutionary” OR “Evolutionary Algorithm*” OR “Genetic Algorithm*” OR “Differential” OR “Gradient-based”)
In Arxiv and Science Direct (we changed the search string because of limitations in their search tool): (“Generative Adversarial Network” OR “GAN”) AND (“Architecture” OR “Architectural”) AND (“Search” OR “Optimization”) AND (“Reinforcement Learning” OR “Policy” OR “Evolutionary” OR “Evolutionary Algorithm” OR “Genetic Algorithm” OR “Differential” OR “Gradient-based”).

3.3. Study Selection and Quality Assessment

We selected studies through these steps:

1.: Initial selection: We searched each database mentioned in Section 3.2. We first chose studies based on their titles.
2.: Filtering studies: To find the most relevant studies from our initial collection, we used our quality assessment criteria. We also looked at the abstract, introduction, and conclusion of each study to filter them further.
3.: Merging: After filtering, we had a group of studies relevant to our research. Some were duplicates because of overlaps in database results. We combined all the studies into one set, removing duplicates.
4.: Snowballing: To find more related studies and make sure we didn’t miss anything, we used backward and forward snowballing. Backward snowballing means looking at a study’s reference list to find new papers [20]. Forward snowballing means finding new papers that cite the study we’re looking at [20]. These processes helped us add new papers we didn’t find in our first search.
5.: Final Decision: After adding the new studies from snowballing, we filtered our set of studies one more time to get our final group of relevant studies.

We applied a structured quality assessment using explicit inclusion and exclusion criteria to select the studies. The criteria were defined as follows:

1.

Inclusion Criteria:

(a): Studies must be written in English.
(b): Studies must focus primarily on architecture search for GANs, providing clear details on the NAS techniques employed.
(c): Only peer-reviewed articles, conference papers, and reputable preprints with sufficient methodological detail were considered.

2.

Exclusion Criteria:

(a): Studies that focus primarily on topics other than architecture search (e.g., hyperparameter tuning, latent space exploration, or unrelated GAN applications) were excluded.
(b): Studies lacking sufficient technical detail or methodological transparency regarding the NAS process were not considered.
(c): Duplicate studies identified across different databases were removed.

3.4. Limitation on Research Methodology

Although a systematic review methodology would further enhance transparency and reproducibility, our study emphasizes comprehensiveness over strict adherence to systematic protocols. To mitigate potential selection bias, we employed multiple literature databases along with backward and forward snowballing techniques, and we applied explicit inclusion and exclusion criteria. Nevertheless, we acknowledge that the possibility of selection bias cannot be entirely excluded, and future studies may benefit from a more formalized systematic review approach.

Additionally, while our review provides a qualitative synthesis of NAS-GAN methods, we have not performed a full statistical meta-analysis due to the heterogeneity in reported performance metrics (e.g., Inception Score, Fréchet Inception Distance, computational cost) across studies. We recognize that a quantitative meta-analysis could potentially offer deeper insights into the comparative effectiveness of these methods and recommend this as a direction for future research.

3.5. Data Extraction and Synthesis

To ensure the integrity of extracted data and facilitate efficient management of the extraction process, we developed a structured comparative framework to address the research questions. This framework encompasses specific attributes for each research question, categorized according to their relevance. The attributes are defined as follows:

RQ1:
–
Search Strategy: This criterion examines the techniques and approaches employed in the solutions presented by relevant studies. These may include, but are not limited to RL, EA, and other methodologies prevalent in the existing literature.
–
Search Type: This criterion identifies if the search strategy searches for both generator and discriminator networks, or if it only searches for a generator network.
–
Performance Assessment Strategy: This criterion examines how the search strategy estimates its current performance to guide the search.
–
GPU Cost: This criterion identifies the search speed of the solutions presented by relevant studies based on its GPU usage.
–
Advantages: This criterion examines the positive outcomes and potential merits associated with each methodological approach discussed in the existing literature.
–
Disadvantages: This criterion identifies and analyzes any constraints, shortcomings, or negative aspects (where applicable) of the investigated approach.
RQ2:
–
Search Space: This criterion examines the type of search space used to encode the network component in the solutions presented by relevant studies. These may include, but are not limited to, Cell-based or Entire/Chain-Structured and other methodologies prevalent in the existing literature.
RQ3:
–
Evaluation Metrics: This criterion examines evaluation measurements used to evaluate the performance of the solution presented by the relevant studies.
–
Dataset: This criterion identifies the datasets used in the solution presented by relevant studies.
–
Supported Generation Type: This criterion identifies the types of generation tasks that are supported by the solution approaches. These may include, unconditional image generation, conditional images generation or both.

The findings derived from the literature review were critically examined in Section 4, aligning with the established research questions and their corresponding comparative criteria.

4. Results and Discussion

Our analysis of the collected research yielded several important discoveries and observations. In Section 4.1, we present these findings and provide answers to our research questions based on our comprehensive review of the literature.

4.1. RQ1: What NAS Approaches Are Applied to GANs in the Literature?

Our literature review uncovered a variety of strategies for implementing NAS-GAN approaches. We grouped the studies based on their core algorithms and common traits. This resulted in three main categories: Evolutionary Algorithms, Reinforcement Learning, and Gradient-based Algorithms. As shown in Figure 2, Evolutionary Algorithms dominate the literature with 11 works, while both Reinforcement Learning and Gradient-Based Approaches each account for 5 works. We explore each category in depth, detailing the specific approaches and discussing their respective strengths and limitations.

4.1.1. Evolutionary Algorithms Approaches

EA is a family of metaheuristic optimization algorithms, inspired by collective behaviors observed in nature [21]. These algorithms can navigate complex high-dimensional search spaces to find near-optimal solutions, making them well-suited for automatically finding answers to the many interdependent architectural decisions in deep neural networks.

EAs have been increasingly applied to the field of GAN architecture search, with several notable approaches emerging in recent years. Wang et al. (2019) [22] proposed E-GAN, which evolves a population of generators using different adversarial objectives as mutation operations. Specifically, they employed three different objectives: Minimax objective, Heuristic objective Least-squares objective. This method adapts to the discriminator and overcomes limitations of individual adversarial objectives, requiring 1.25 GPU Days for searching. Yet only the top-performing generator persists, which might limit diversity. Al-dujaili et al. (2018) [23] introduced Lipizzaner, a coevolutionary framework that evolves the internal parameters (weights) of fixed generator and discriminator architectures. Building on this, Toutouh et al. (2019) [24] developed Mustangs, a hybrid approach combining elements of E-GAN and Lipizzaner, incorporating multiple loss functions as mutation operators and using a spatial grid coevolution scheme.

Complementing these approaches, Garciarena et al. (2018) [25] introduced EvoGAN a neuro-evolutionary framework that evolves GAN architectures by encoding both the network topology (e.g., number of hidden layers, activation functions, weight initialization) and key training parameters such as the loss function and synchronization frequency between the generator and discriminator. Their method leverages a genetic algorithm with specialized mutation operators to navigate a flexible search space, employing Pareto set approximation and the inverted generational distance (IGD) as a benchmark for evaluating mode collapse and diversity. Notably, EvoGAN scales to 784 variables, though retaining only the top-performing architecture per generation might limit diversity. The framework was evaluated on a set of bi-objective continuous test problems for Pareto set approximation. Performance was measured using the inverted generational distance (IGD); for example, on function F1, IGD was reduced by up to two orders of magnitude in both low- (n = 10) and high-dimensional (n = 784) settings, with additional tests demonstrating transferability across functions.

Similarly, Lu et al. (2018) [26] propose a GA-assisted Bi-GAN framework that autonomously refines deep neural network parameters by combining discrete GA-based decisions with continuous refinement via a bi-generative adversarial network. This approach optimizes parameters—including the number of neurons, filters, layers, and even the inclusion of dropout or pooling layers—thereby enabling the network to self-configure its architecture and hyper-parameters during training. By bridging exploration and exploitation, this framework overcomes the limitations of fixed discrete candidate sets typically associated with GA-only approaches. Bi-GAN was tested on the voxelized ModelNet40 dataset [27] (3D CAD models of 40 object classes). Their experiments showed that the proposed method achieved an accuracy of 85.20%, outperforming the baseline 3D Shapenet (84.17%) and two GA-only approaches (small-set GA: 82.94%; large-set GA: 36.41%).

In other studies, multi-objective function and multi-stage search have been utilized. Du et al. (2020) [28], using the Non-Dominated Sorting Genetic Algorithm II (NSGA-II) algorithm for multi-objective optimization of DCGAN structures (NSGA-II DCGAN), specifically used True Positive Rate (1-TPR) and False Positive Rate (FPR). Implementing two-stage searches, Lin et al. (2022) [29] proposed EAS-GAN, an evolutionary architectural search method that searches for generator using a multi-objective function. Generator architectures evolve using three objective functions as mutation operators: minimax, least-squares, and hinge loss, and then a traditional adversarial training for the discriminator weights. The search process takes 1 GPU Day. Costa et al. (2019) [13] developed COEGAN, a neuroevolutionary and coevolutionary approach that evolves both generator and discriminator architectures. Utilizing different objectives for the generator and discriminator populations, the discriminator objective is the adversarial loss while the generator objective is the FID score.

Ying et al. (2022) [30] introduced EAGAN, a two-stage evolutionary NAS that decouples the search for generator and discriminator networks. It uses Multi-objective Pareto optimization, considering model size, IS, and FID as objectives. It efficiently searches for the optimal GAN in 1.2 GPU Days. Finally, Xue et.al. (2024) [31] introduced EWSGAN, the method employs a two-step process: first training a super net generator using weight sharing and single-path sampling, then utilizing NSGA-II to search for optimal subnets. EWSGAN focuses solely on searching for generator architectures while using a fixed discriminator. The search process is highly efficient, completing in just 1 GPU Day. It has a multi-objective function, simultaneously optimizing IS and FID. The approach offers several advantages, including an efficient search process due to weight sharing and low-fidelity evaluation, improved stability through fair single-path sampling and a commonality-based discarding strategy. However, a potential drawback is the discarding strategy that may require further investigation, and possible challenges in scaling to higher-resolution datasets.

Recent work has explored conducting hyperparameter optimization (HPO) and NAS jointly. Kobayashi and Nagao (2020) [32] proposed searching the architecture and hyperparameters of GANs using multi-objective evolutionary algorithms. They used Cartesian Genetic Programming (CGP) with NSGA-II (NSGA-II with CPG) to evolve generator and discriminator architectures simultaneously, optimizing various hyperparameters. The multi-objective fitness function maximized IS score and minimized FID. It is considered one of the first works that implemented HPO and NAS for GANs. However, a limitation of the work is the restriction of the network size, limiting the scalability of the method. Also, although training time was not reported, the authors stated that the search efficiency needs to be improved.

Common benefits across these approaches include improved generative performance, increased architectural efficiency, and enhanced adaptability to different datasets. The use of multi-objective optimization in several approaches allowed for balancing multiple performance criteria simultaneously, leading to more robust solutions. Despite that, EA approaches’ performance tends to rely heavily on the objective function. Another limitation to EA approaches is the search space formulation: EA approaches were mainly used with discrete search space.

4.1.2. Reinforcement Learning Approaches

RL is a paradigm of machine learning inspired by behavioral psychology, where an agent learns to make decisions by interacting with an environment [33]. This approach enables systems to learn optimal policies in complex, dynamic settings by maximizing cumulative rewards, making it well-suited for solving sequential decision-making problems in searching for neural network architecture. RL approaches can be categorized as on-policy and off-policy approaches.

On-Policy Search

Gong et al. (2019) [12] introduced AutoGAN, a novel method that uses a Recurrent Neural Network (RNN) controller to guide the search process for generator architectures. AutoGAN employs a multi-level search strategy with beam search, using SoftMax predictions for sampling architecture variations. While it focuses solely on optimizing the generator, with the discriminator growing in a pre-defined manner, AutoGAN demonstrates the potential of using IS as a reward signal in RL for GAN design, in just 2 GPU Days. However, this approach is limited by its lack of discriminator architecture search and potential scalability issues for higher resolution images. Wang et al. (2019) [34] developed AGAN, a reinforcement learning framework that simultaneously searches both generator and discriminator architectures. AGAN employs a two-layer LSTM controller RNN and uses policy gradient with REINFORCE, incorporating an entropy bonus for exploration. This approach stands out by allowing for arbitrary cell topologies and demonstrating adaptability to different image sizes and complexities. AGAN uses a shaped reward function based on IS, potentially offering more nuanced guidance for architecture optimization. However, the simultaneous search of both generator and discriminator architectures lead to increased computational requirements as it takes 1200 GPU Days and necessitates careful tuning of the reward function shaping.

Zhou et al. (2020) [35] proposed Multi-Net NAS (MN-NAS), a novel approach that leverages reinforcement learning to design class-aware generators for conditional GANs (cGANs). MN-NAS employs an MDP with a moving average mechanism to sample and evaluate candidate architectures. The key innovation lies in its ability to search for distinct generator architectures for each class within a single search procedure, addressing the challenge of combinatorial explosion as the number of classes increases. The search space includes regular convolutions and class-modulated convolutions (CMconv), which allow for the sharing of training data across different architectures, mitigating the issue of insufficient data per class. MN-NAS also introduces mixed-architecture optimization, enabling efficient parallelization of the search and re-training processes. The method demonstrates competitive performance on CIFAR10 and CIFAR100. However, the approach is limited by its focus on generator architecture search, with the discriminator following a predefined design, and the potential scalability issues when applied to datasets with a very large number of classes.

Off-Policy Search

Tian et al. (2020) [36] proposed E²GAN, an off-policy reinforcement learning framework that reformulates the search problem as a Markov Decision Process (MDP). E²GAN utilizes a soft actor-critic algorithm and a progressive state representation, significantly improving sample efficiency. The framework implements exploration and exploitation periods, focusing on cell architectures. E²GAN’s key innovation lies in its ability to discover competitive GAN architectures in just 7 GPU Hours, using a combined reward function of IS and FID. While E²GAN offers rapid architecture discovery, a limitation of this approach is that it only searches for the generator.

Li et al. (2022) [37] introduced T2IGAN the first work to apply NAS principles for designing GANs that integrate transformer modules into the text-to-image synthesis framework. In T2IGAN, the architecture search is formulated as a MDP, and an RL-based search strategy is adopted to efficiently navigate a cell-based search space that encompasses both convolutional operations and lightweight transformer components. The search processjointly optimizes both the cell structure and the associated operation weights using a composite reward function based on metrics such as IS and FID. Ultimately, the final generator is constructed by stacking the best-performing cells discovered during the search, yielding a competitive architecture for text-to-image synthesis. By leveraging an off-policy RL, T2IGAN is able to significantly reduce the computational burden, achieving competitive performance in a fraction of the search time compared to earlier approaches. While the method primarily focuses on optimizing the generator architecture (with the discriminator following a pre-defined design), it effectively demonstrates the feasibility and advantages of combining transformer-based representations with adversarial learning. Overall, T2IGAN represents a meaningful step toward automated, efficient GAN design and underscores the potential of off-policy RL techniques in advancing generative model architecture search.

These studies collectively showcase the evolution of reinforcement learning applications in GAN architecture search. From AutoGAN’s focus on generator optimization, through AGAN’s comprehensive search of both GAN components, to off-policy frameworks like E²GAN and T2IGAN that emphasize efficiency improvements, each approach contributes unique strengths to the field. However, a notable challenge of these RL-based methods is the considerable variation in training time, which can complicate scalability and practical deployment.

4.1.3. Gradient-Based Approaches

Gradient-based algorithms are a fundamental class of optimization methods [38]. These techniques leverage the gradient of an objective function to iteratively update model parameters, efficiently navigating high-dimensional parameter spaces to minimize loss or maximize performance. Gradient-based algorithms are not typically used directly for searching neural network architectures. However, there are related approaches that use gradients to help with architecture search.

Gradient-based NAS methods have been successfully applied to optimize GAN architectures. AdversarialNAS, proposed by Gao et al. (2020) [39], uses a differentiable approach to simultaneously search for both generator and discriminator architectures. It employs an adversarial search mechanism and can discover high-performing architectures in just 1 GPU Day. The method boasts a large search space and demonstrates good transferability and scalability.

Doveh and Giryes (2021) [40] introduced DEGAS, a gradient-based method focusing solely on generator architecture search. DEGAS reformulates the problem as a differentiable optimization task and utilizes the Global Latent Optimization technique to avoid adversarial training instabilities during search. It can find competitive generator architectures in 1.16 GPU Days, significantly faster than most previous reinforcement learning-based methods. A limitation of DEGAS is that it only searches for Generator architectures. Searching for Discriminator architectures may have an impact on results.

GAN Compression, introduced by Li et al. (2020) [41], presents a gradient-based framework for compressing the generator component of conditional GANs. Rather than relying on reinforcement learning or evolutionary methods, this approach decouples network training from architecture search by first training a “once-for-all” generator via standard gradient descent with weight sharing. In this phase, the network is designed to support a vast array of sub-networks through flexible channel configurations. Once trained, the method efficiently evaluates these sub-networks using differentiable losses—including intermediate feature distillation—to select the most efficient architecture that satisfies a specified computational budget. This gradient-based sub-network selection allows GAN Compression to dramatically reduce both the inference time and model size; empirical results show reductions in MACs by up to 21× on models such as CycleGAN, pix2pix, and GauGAN while preserving high visual fidelity. A key strength of GAN Compression is its model-agnostic nature and its ability to stabilize GAN training via knowledge transfer, thereby enabling interactive conditional GAN applications on resource-constrained devices.

Tian et al. (2021) [14] proposed alphaGAN, a fully differentiable architecture search framework that formulates the problem as a bi-level minimax optimization. A key innovation is the use of the duality gap as a differentiable evaluation metric. alphaGAN can efficiently discover high-performing architectures for both conventional GANs and StyleGAN2 in just 3 GPU Hours, demonstrating strong transferability and scalability.

Xue et al. (2024) [42] presented Differentiable architecture search with attention mechanisms for GANs (DAMGAN) an innovative evolution in gradient-based NAS approaches. Unlike traditional methods that rely on fixed architectural parameters, DAMGAN leverages a dual-attention strategy to guide the search process, thereby enhancing training stability and efficiency. In this approach, a generator supernet is constructed with two distinct attention mechanisms: up-attention (UA) and down-attention (DA). UA is used to select the most salient feature maps before candidate operations are applied, effectively reducing computational overhead and ensuring that only the most informative features contribute to the network’s evolution. In parallel, DA evaluates the outputs of multiple candidate operations to assign importance weights, replacing the conventional reliance on architectural parameters. This refined mechanism allows DAMGAN to efficiently determine the optimal candidate operations and construct a high-performing subnet with a remarkably low computational cost. Empirical results demonstrate that DAMGAN achieves competitive performance on benchmark datasets such as CIFAR-10—with while completing the search in only 0.09 GPU days. The method’s scalability and transferability are further validated by its successful application to larger datasets like STL-10 and CelebA.

4.2. RQ2: What Are the Key Search Spaces Explored in NAS-GAN?

The search space defines which neural architectures can be represented and potentially discovered by a NAS method. The design of the search space is crucial as it incorporates prior knowledge about architectures that are likely to perform well on a given task, while also influencing the difficulty of the optimization problem. Among the search spaces of NAS in GANs literature, the cell-based structure is the most used type, and the entire/chain-structure search space has also been employed. As illustrated in Figure 3, cell-based approaches account for the majority of the works, followed by chain-based, hybrid, and custom approaches.

4.2.1. Entire/Chain-Structure Space

Entire/chain-structure neural networks represent one of the simplest search spaces. In this space, an architecture can be described as a sequence of multiple layers, where the output of a layer serves as input for the next layer.

NSGA-II DCGAN [28] presents an entire/chain -structure search space. This search space encompasses various architectural elements for both the generator and discriminator networks. For the generator, the search space includes the number of convolutional layers, the number of filters (kernels) in each layer, kernel sizes, padding settings, and activation functions. The discriminator’s search space mirrors that of the generator, allowing for a symmetric exploration of network architectures. The framework employs a flexible approach, treating the number of layers as a significant parameter, thus allowing for variable-depth networks. This flexibility extends to other hyperparameters of the network structure, creating a multidimensional search space. The structural parameters are encoded as solution vectors, forming a continuous search domain.

COEGAN [13] employs a chain-structured search space to evolve both the Generator and Discriminator networks in a GAN. This search space incorporates three types of layers: linear, convolution, and transpose convolution, each of which can have an activation function randomly selected from options like ReLU, LeakyReLU, ELU, Sigmoid, and Tanh. The convolution and transpose convolution layers vary in the number of output channels, while linear layers vary in the number of output features. The architecture evolves through mutations that can add, remove, or modify layers, with the number of layers capped at six in the experiments, although this parameter is adjustable. The search space supports dynamic calculation of parameters such as stride and kernel size for convolutional layers, based on the required output size of each layer. Designed to incrementally increase network complexity over generations, the search space is structured to safeguard novel architectures through a speciation mechanism.

DEGAS [40] uses an entire/chain-structure search space, focusing on searching neural architectures for the Generator network efficiently. It searches for the whole network architecture globally, divided into three parts: a fixed first part with linear operation and reshape, a searchable middle part, and a fixed last part with batch normalization, ReLU, convolution and Tanh. The searchable part includes normal operations that maintain input size and up-sample operations that increase it. Normal operations encompass various combinations of batch norm, ReLU, convolutions, pooling, skip connections, and dilated convolutions. Up-sample operations include deconvolutions and nearest neighbor upsampling with convolutions. Connections between feature maps use Mixed Operations with these defined operations.

The entire/chain-structure approach offers more freedom in architecture design but may lead to a larger search space, potentially increasing computational costs. DEGAS’s approach of having fixed parts could help reduce the search space while still allowing for significant architectural exploration.

4.2.2. Cell-Based Space

Among the search spaces of NAS in GANs literature, the cell-based structure is the most used type, instead of searching for entire architectures, this method focuses on finding smaller architectural building blocks called cells or motifs. The final architecture is then created by stacking these cells in a predefined manner. It significantly reduces the size of the search space, as cells usually consist of fewer layers than complete architectures. This can lead to substantial speed-ups in the search process. When using a cell-based search space, a new design choice arises: how to choose the macro-architecture, i.e., how many cells to use and how to connect them to build the actual model.

Lipizzaner [23] employs a cell-based search space, distributing Generators and Discriminators across a 2D toroidal grid. Each cell contains one or more GANs, interacting within defined neighborhoods (typically a center cell and its four adjacent cells). The search space encompasses neural network parameters, hyperparameters, and mixture weights. This structure allows for local adaptation and diversity maintenance, supporting various neural architectures (e.g., multi-layer perceptrons or deconvolutional GANs) within cells. Mustangs [24] adopts the search space to Lipizzaner.

AutoGAN’s [12] search space is designed to identify optimal Generator network architectures. It encompasses five key elements: skip connections, convolutional block varieties, normalization block options, upsampling techniques, and an in-cell shortcut indicator. The convolutional block category includes both pre- and post-activation variants, while the normalization block offers batch, instance, and no normalization choices. Upsampling options consist of bi-linear, nearest-neighbor, and stride-two deconvolution methods. To facilitate a direct comparison of search strategy efficacy and efficiency, E²GAN [36] and EWSGAN [31] adopt a search space identical to AutoGAN’s.

AGAN’s [34] search space incorporates architectural design principles for both the Generator and Discriminator. The search methodology employs reinforcement learning, utilizing a controller comprised of a two-layer LSTM network. This controller navigates the search space to identify high-performing architectures within the predefined parameters. For upsampling, the system employs transposed convolution and nearest-neighbor interpolation techniques. The downsampling modules are constructed using two distinct atomic operations: one applies convolution before stride-2 average pooling, while the other reverses this order, performing stride-2 average pooling followed by convolution. AlphaGAN’s [14] search space is composed of two main categories: normal operations and upsampling operations. Normal operations primarily consist of convolutional blocks. For upsampling, the framework employs three distinct methods: deconvolution, nearest-neighbor interpolation, and bi-linear interpolation.

AdversarialNAS [39] introduced an extensive search space

10^{3}

for GANs, resulting in a continuous search domain. This approach employs probability distributions, with architecture representation defined by a set of continuous variables. The framework’s search space is cell-based. For the Generator, it includes a variety of normal operations and upsampling techniques, such as bilinear interpolation, nearest-neighbor interpolation, and transposed convolution. The Discriminator’s structure incorporates both normal operations and downsampling methods, including max pooling, average pooling, and convolutions. This expansive and flexible search space allows for a comprehensive exploration of potential architectures, enhancing the adaptability and optimization of the GANs developed through this framework. EAGAN [30] implements the same search space to allow fair comparisons.

EAS-GAN’s [29] search space focuses on the Generator’s architecture using a cell-based approach. The search employs an evolutionary algorithm, treating the generator as an evolving supernet composed of multiple Directed Acyclic Graphs (DAGs). Each cell is represented by a DAG with N nodes, where edges between nodes are candidate operations including various convolutions (1 × 1, 3 × 3, 5 × 5), dilated convolutions, skip-connections, and zero operations. Upsampling options include transposed convolution 3 × 3, nearest neighbor, and bilinear interpolation. The evolutionary process optimizes both cell structure and operation weights simultaneously. Architectures are evaluated against a Discriminator serving as the evolutionary environment. The final generator is constructed by stacking the best-performing discovered cells.

T2IGAN’s [37] search space is designed to optimize the generator architecture via a cell-based approach. The generator is modeled as a supernet comprised of multiple cells, each represented by a DAG with a fixed number of nodes (typically 4). In each cell, edges correspond to candidate operations drawn from two primary categories: conventional convolutional operations and transformer modules. The convolutional candidates include operations with kernel sizes of 3 × 3 and 5 × 5, while the transformer candidates incorporate multi-head self-attention (configured with 4 or 8 heads) followed by a position-wise feed-forward network with an expansion factor of 4 and an internal dimension of 128. Additionally, a zero operation is provided, allowing the cell to bypass an edge when beneficial. Upsampling choices within the cell-based framework include nearest neighbor and bilinear interpolation techniques to progressively scale feature maps from an initial low resolution (e.g., 4 × 4) to higher resolutions.

MN-NAS [35] propose a search space tailored for designing class-aware generators in conditional GANs (cGANs). The search space follows a cell-based structure, where the generator is composed of multiple cells, each containing a fixed number of nodes (e.g., 4 nodes). Each cell is designed to maintain the spatial resolution and channel dimensions of the input data, ensuring consistency throughout the network. Within each cell, edges between nodes represent candidate operations, which are selected from a set of operators that include regular convolutions (RConv) and class-modulated convolutions (CMConv). The RConv operator performs standard convolutional operations, while the CMConv operator introduces class-specific information by modulating the convolutional weights using a class-conditional vector. This modulation is achieved through a combination of affine transformations and normalization steps, allowing the network to share convolutional weights across different classes while still incorporating class-specific adjustments. The search space also includes a zero operation, which allows the network to skip certain edges when necessary, providing flexibility in architecture design. The overall architecture is constructed by stacking multiple cells, with each cell contributing to the progressive upsampling of the input latent vector to generate high-resolution images.

GAN Compression’s [41] search space differs from the typical cell-based approach. Instead of designing entirely new cells or motifs, GAN Compression leverages the structure of a pre-trained teacher generator and focuses on automatically reducing its redundancy by searching over channel configurations. In this framework, a “once-for-all” network is first trained via standard gradient descent with weight sharing; this super-network supports a wide range of sub-networks, each corresponding to a unique assignment of channel widths across the generator’s layers. Each convolutional layer is allowed to choose its number of channels from a discrete set (typically multiples of 8), which reflects a trade-off between computational efficiency and hardware parallelism. The overall search space is defined as the combinatorial product of the candidate channel numbers for all layers, yielding a large—but highly structured—domain. This formulation enables fine-grained architectural optimization: the method can automatically determine which layers are more amenable to aggressive channel reduction without significantly degrading performance.

In a similar vein, DAMGAN’s [42] search space is crafted to facilitate the efficient discovery of high-performance generator architectures through a differentiable approach enhanced by attention mechanisms. The search space is built upon a generator supernet organized into a series of cells, each containing multiple interconnected nodes. For each pair of nodes, the candidate operations are divided into two distinct groups. In connections emanating from the input node, the search space includes upsampling operations such as nearest-neighbor sampling, bilinear interpolation sampling, and transposed convolution. For all other inter-node connections, the search space comprises convolutional operations, including standard convolutions with kernel sizes of 1 × 1, 3 × 3, and 5 × 5, as well as depthwise separable convolutions with kernel sizes of 3 × 3, 5 × 5, and 7 × 7. Uniquely, DAMGAN integrates two attention mechanisms—up-attention (UA) and down-attention (DA)—between each pair of nodes. UA selectively filters and forwards the most salient feature maps to the candidate operations, while DA evaluates and assigns importance weights to the outputs of these operations. This dual-attention strategy not only refines the selection process by effectively mapping the significance of each operation but also broadens the search space to encompass the dynamic interplay between feature selection and operation efficacy, ultimately leading to more robust and computationally efficient architecture discovery.

The cell-based approach seems to be favored in NAS-GAN research, likely due to its ability to reduce the search space size while still allowing for complex architectures. This approach can lead to more efficient searches and potentially better scalability.

In addition to the above-discussed search spaces, hybrid approaches have also been explored. EvoGAN [25] defines a comprehensive search space that spans both architectural and training parameters. In this framework, the entire network specification—including the generator and discriminator topologies—is encoded as lists that capture discrete decisions (e.g., number of hidden layers, activation functions, weight initialization methods) as well as continuous parameters such as loss functions and update (synchronization) frequencies. Specialized mutation operators (e.g., layer_change, activ_change, latent_change) enable the GA to explore this vast and flexible space, which scales up to 784 variables and permits the discovery of transferable architectures.

Similarly, the Bi-GAN framework [26] partitions the search space into discrete and continuous components. The discrete component governs fixed architectural decisions—such as the number of layers, the choice of activation functions, and binary options like dropout, batch normalization, and pooling—while the continuous component, refined via the Bi-GAN, optimizes parameter values such as the number of neurons in fully-connected layers and the number of filters in convolutional layers. This hybrid design enables a more nuanced exploration of the architectural design space by balancing rigid candidate sets with flexible parameter tuning.

One key limitation is the lack of clarity in reporting search space sizes. Many papers fail to provide detailed implementation information, making it difficult to calculate or infer the exact dimensions of the search space. This ambiguity hinders direct comparisons between different approaches and impedes our understanding of the relative efficiency of various methods. Moving forward, researchers should strive for more transparent and comprehensive reporting of search space sizes and implementation details.

There is also an apparent imbalance in the focus of current research, with many approaches prioritizing the optimization of Generator architectures over Discriminators. While the Generator plays a crucial role in GAN performance, the Discriminator is equally important. The diversity of search strategies employed in the field suggests that the optimal approach may vary depending on the specific problem or dataset.

As search spaces continue to grow in size and complexity, computational efficiency becomes increasingly critical. Innovative approaches that balance the trade-off between search space size and flexibility, such as DEGAS [40] combining fixed and searchable architectural components, could prove valuable. Lastly, the potential for transferability of discovered architectures across different tasks or datasets is an area that warrants further investigation. While cell-based approaches might inherently offer better transferability, this aspect isn’t explicitly addressed in much of the current literature. Table 3 presents a summary of reviewed literature.

4.3. RQ3: What Evaluation Methods Are Used to Assess the Found Architecture?

Architecture Evaluation in NAS-GAN methods involves various datasets and metrics. Here we discuss evaluation metrics and present results from various NAS-GAN works as reported in literature. Then, we examine the datasets used for search, training, and testing, including those that evaluate transferability, highlighting the current benchmark for comparing NAS-GAN methods. Finally, we outline the generation tasks trained on these datasets.

4.3.1. Evaluation Metrics

Inception Score (IS): IS [43] attempts to measure image realism and diversity using a pre-trained InceptionV3 network [44], calculating scores based on predicted class probability distributions. However, IS has critical flaws: it lacks direct comparison to training data, can miss mode collapse, and introduces ImageNet biases. Its non-intuitive nature hinders meaningful interpretation of score differences. Despite these limitations, IS remains widely used, highlighting the need for more robust evaluation metrics.
Our review of representitive works (e.g., [12,14,22]) shows IS scores vary slightly across methods, with NAS approaches demonstrating more consistent high-quality images. Notably, the state-of-the-art IS on this benchmark comes from EWSGAN, an evolutionary method. While this highlights the potential of evolutionary approaches, it also underscores the success of NAS methods in consistently achieving high IS scores. Given IS’s vulnerability to adversarial examples and other shortcomings, it should be used cautiously and always supplemented with additional evaluation metrics for a comprehensive assessment of generative models.
Fréchet Inception Distance (FID): FID [45] improves on IS by comparing generated and real image statistics using InceptionV3 features, calculating the Fréchet distance between feature distributions. While considered more robust than IS and better correlated with human judgment, FID’s limitations include assuming Gaussian distributions, relying on a potentially biased pre-trained network, and possibly overlooking certain aspects of image quality or diversity. Despite these drawbacks, FID remains widely used for evaluating generative models.
Analysis of reported scores from major NAS-GAN approaches [12,14,29] reveals significant variation across methods, with NAS approaches consistently outperforming manual methods. AdversarialNAS, EAGAN, and EWSGAN demonstrate impressive scores across datasets, suggesting NAS’s effectiveness in generating high-quality images. This data indicates that automated searches often find more optimal architectures than human designers. Recent work using EA algorithms has achieved state-of-the-art results, though the longevity of this trend remains uncertain. However, these findings warrant further investigation into the factors contributing to NAS’s success and potential limitations of current evaluation metrics.
Computational Efficiency: NAS-GAN needs to improve efficiency and performance while managing computational resources. Search time, measured in GPU Days, is critical in NAS as it explores architecture space to identify promising GAN structures [46]. To address computational challenges, researchers have developed several strategies. Weight sharing reduces parameters and computational load, used in [14,30,31].
Adaptive mechanisms and progressive growing dynamically adjust network complexity, as seen in AGAN [34]. Evolutionary algorithms iteratively improve GAN architectures, adopted by (e.g., [30,36,40]). Multi-objective optimization balances performance and computational cost, implemented by [28,32]. Ensemble methods use multiple discriminators for improved diversity and efficiency, utilized by E-GAN. Coevolutionary algorithms evolve Generators and Discriminators simultaneously, employed by Lipizzaner and COEGAN. Lastly, multi-stage training gradually increases model complexity, an approach adopted by Mustangs.
Model Size: Model size is a critical yet complex factor in NAS for GANs, requiring a delicate balance between performance and efficiency. While NAS algorithms often incorporate size constraints, this approach has significant limitations. The focus on smaller models, though advantageous for memory usage and speed, can lead to oversimplification of trade-offs, bias towards suboptimal architectures, and difficulty in accurately assessing the impact on performance. Moreover, it risks overlooking larger innovative architectures with unique benefits [47].
Reviewed methods address model size differently. Some, like EAGAN and AutoGAN, explicitly use parameter count as an optimization criterion or reportable metric. Others, such as AdversarialNAS and NSGA-II DCGAN, dynamically adjust size during training or use multi-objective optimization. Methods like E-GAN and Lipizzaner focus on pruning and efficiency. However, not all approaches prioritize or report size metrics, with some like AGAN and E²GAN emphasizing performance over explicit size considerations.
A more holistic approach to architecture optimization is necessary, considering factors beyond just size, such as interpretability, robustness, and adaptability. This comprehensive view could yield more balanced and effective GAN architectures, avoiding the pitfalls of overly simplistic size-based optimizations while still maintaining efficiency.
Mode Collapse Resistance: Mode collapse is a common failure mode in GANs where the generator produces a limited variety of outputs, failing to capture the full diversity of the target distribution. In the context of NAS, architectures should be evaluated on their resistance to mode collapse [48]. This can be assessed through diversity metrics applied to generated samples, or by analyzing the distribution of generated outputs in feature space. A good NAS solution for GANs should prioritize architectures that maintain output diversity while still producing high-quality samples.
Convergence Stability: GAN training instability, characterized by mode collapse and oscillating losses, remains a significant challenge. Convergence stability in NAS for GANs is crucial, measured by consistent performance across multiple initializations [49]. In the reviewed literature, multiple solutions have been developed to address this issue, such as adaptive training techniques used by AGAN. Some methods focus on balancing Generator and Discriminator performance used in NSGA-II DCGAN, while Mustangs employs multi-agent systems or co-evolutionary algorithms COEGAN, Lipizzaner that continuously measure stability. Advanced techniques like alpha-divergence minimization used by alphaGAN and EWSGAN’s Wasserstein distance optimization have also been explored. Despite these diverse approaches, the field lacks a comprehensive comparison of their effectiveness, and the trade-offs between stability, performance, and computational cost remain unclear. Moreover, the reliance on existing evaluation metrics may not fully capture the nuances of convergence stability, suggesting a need for more robust assessment methods.
Sample Quality: While automated metrics like IS and FID are valuable, they don’t always align perfectly with human perception. Therefore, subjective evaluation of generated images by human raters remains an important aspect of GAN assessment [50]. This typically involves showing raters a mix of real and generated images and asking them to judge qualities such as realism, coherence, and aesthetic appeal. For NAS in GANs, this human evaluation can be used as a final validation step for top-performing architectures. However, it’s important to note that human evaluation is time-consuming and can be subject to biases, so it’s often used in conjunction with automated metrics rather than as the sole evaluation criterion. All presented work conducted sample quality check in their experiments.

4.3.2. Datasets

In NAS-GAN, datasets play a crucial role in both training and evaluating generated architectures. During the search process, the dataset is used to train candidate GAN architectures, allowing the NAS algorithm to assess their performance. The search typically involves iteratively sampling architectures, training them on a subset of the data, and evaluating their performance using metrics. Once a promising architecture is found, it is then trained on the full dataset to produce the final GAN model. The test set is used to evaluate the generalization capability of the trained model, ensuring it can generate high-quality images beyond what it has seen during training.

The MNIST [51] dataset comprises 70,000 grayscale images of handwritten digits (0–9), each 28 × 28 pixels in size. It’s divided into 60,000 training images and 10,000 test images, with each digit class represented by roughly 6000 training and 1000 test images, ensuring balanced distribution. Despite its widespread use in machine learning for image recognition, MNIST has significant shortcomings. Its simplistic format fails to capture the complexity of real-world image processing challenges. MNIST’s lack of complexity in terms of color, texture, and context limits its utility in representing modern image processing tasks. Success on this dataset doesn’t necessarily indicate an algorithm’s effectiveness on more sophisticated problems. While MNIST’s simplicity limits its utility for modern tasks, it remains a common benchmark for NAS-GAN approaches. Previous works have used diverse evaluation metrics: NSGA-II DCGAN reported TPR = 0.9802 and FPR = 0.0042, while COEGAN achieved an IS of 1.7 ± 0.6. As shown in Table 4, Mustangs achieved superior FID scores (42.24) compared to E-GAN’s (466.1) and Lipizzaner variants (Lip-BCE: 48.96, Lip-MSE: 371.6, Lip-HEU: 52.53), demonstrating enhanced stability in low-resolution image generation.

The CelebA [52] dataset is a comprehensive and widely used benchmark in computer vision and deep learning, particularly for tasks related to face recognition and attribute analysis. It comprises 202,599 high-quality color images of celebrity faces, featuring 10,177 unique identities. The dataset is notable for its large scale and rich annotations, making it invaluable for a wide range of facial analysis tasks. Each image in the CelebA dataset is annotated with 40 binary attributes, covering various facial features and characteristics such as hair color, facial expression, and the presence of accessories like glasses. Additionally, the dataset provides 5 landmark locations for each face, enhancing its utility for tasks involving facial geometry and alignment. The dataset’s strength lies in its diversity and complexity. It encompasses a wide range of pose variations, background clutter, and demographic diversity, closely mimicking real-world scenarios.

For generative modeling tasks, Table 4 reveals Mustangs outperformed Lipizzaner variants on FID scores (36.15 vs. Lip-BCE: 36.25, Lip-HEU: 37.87), despite CelebA’s challenging high-resolution nature. This suggests our method better captures facial attribute diversity while maintaining generation quality.

The LSUN [53] is a large-scale computer vision dataset for scene classification, featuring 10 diverse indoor and outdoor categories. Its massive training set contains 120,000 to 3,000,000 images per class, complemented by smaller validation and test sets for comprehensive model evaluation. While LSUN offers valuable resources for researchers, it presents challenges. Its size may strain computational resources, limiting accessibility. The significant imbalance between categories could introduce bias in trained models. Additionally, the 10 categories, while diverse, may not fully represent real-world complexity. Wang et al. in E-GAN reported training and testing of the dataset, but the study lacks crucial quantitative evaluation, presenting only generated images without metric scores.

The CIFAR-10 [54] dataset is a popular benchmark, evaluated extensively in recent works (e.g., [12,14,22]), It consist of 60,000 color images (32 × 32 pixels) across 10 classes. It is split into 50,000 training images and 10,000 test images. The training set is divided into 5 batches of 10,000 images each. These batches are randomized, but collectively contain 5000 images per class. The single test batch has 1000 images from each class, ensuring balanced evaluation. This structure makes CIFAR-10 ideal for developing and testing image classification algorithms, especially for small-scale, multi-category tasks.

In the context of NAS-GAN, CIFAR-10 serves as a valuable benchmark. When applied to CIFAR-10, NAS-GAN aims to generate high-quality, diverse images matching the dataset’s 10 classes. The challenge lies in capturing intricate details within each category, given the small image size and diverse object types. CIFAR-10’s manageable size and well-defined classes make it suitable for refining NAS-GAN techniques before tackling more complex image generation tasks. The results of the reviewed literature on this dataset are presented in Table 5, with EWSGAN currently achieving state-of-the-art performance.

The STL-10 dataset is an advanced benchmark, examined in several studies (e.g., [12,14,29]). It isdesigned to improve upon the CIFAR-10 dataset. It consists of 100,000 color images with a higher resolution of 96 × 96 pixels, spread across 10 classes. It is divided into three main components: a training set of 5000 labeled images (500 per class), a test set of 8000 images (800 per class), and a large pool of 100,000 unlabeled images. A key feature of STL-10 is its substantial unlabeled dataset of 100,000 images, drawn from a broader distribution than the labeled sets. All images are derived from labeled examples on ImageNet. The increased resolution of 96 × 96 pixels presents a more challenging benchmark for scalable unsupervised learning methods compared to CIFAR-10.

For NAS-GAN, STL-10 offers a more demanding benchmark than CIFAR-10. The higher resolution images and large unlabeled set allow exploration of more complex GAN architectures. The challenge is generating high-quality, diverse images that capture details of the 10 labeled classes while potentially leveraging the broader distribution in the unlabeled set. The results of the reviewed literature on these datasets are presented in Table 5, with EWSGAN currently achieving state-of-the-art performance.

4.3.3. Supported Generation Type

NAS-GANs support two primary types of generation tasks: supervised and unsupervised. In unsupervised generation, the GAN creates images without any specific input conditions, using random noise as input to generate diverse images from a general class, such as faces, objects, or scenes. This type is akin to unconditional generation in standard GANs [55]. On the other hand, supervised generation involves guiding the GAN’s image creation process with additional input conditions or labels, similar to conditional GANs (cGANs). Here, the generator is conditioned on specific information, such as class labels, images, text, or other modalities, allowing for controlled and targeted image synthesis based on predefined criteria. This structured approach enables the generation of images that adhere to specific requirements, enhancing the utility of GANs in applications requiring precise image outputs [56].

Among the reviewed NAS-GAN methods, the support for conditional (supervised) image generation is explicitly found in AGAN, DEGAS, and NSGA-II with CGP on the STL-10 dataset, MN-NAS and T2IGAN on CIFAR-10/CIFAR-100, and GAN Compression on image-to-image translation tasks (Horse↔Zebra, Edges→Shoes, and Cityscapes). AGAN employs a supervised setup to refine GAN architectures, combining the GAN loss with an auxiliary supervised loss to enhance image fidelity. DEGAS integrates domain-specific knowledge into the NAS process, optimizing for specific supervised image generation tasks. NSGA-II with CGP uses multi-objective evolutionary optimization for CNN representations in NAS, improving supervised tasks by balancing complexity and performance. The results of the reviewed literature on STL-10 supervised generation are presented in Table 6. The scores indicate that NSGA-II with CGP performed better than AGAN and DEGAS on the STL-10 dataset.

MN-NAS is evaluated on CIFAR-10 and CIFAR-100, achieving an IS of approximately 9.10 ± 0.11 and an FID of 13.00 on CIFAR-10, and an IS of about 8.50 ± 0.10 with an FID of 16.50 on CIFAR-100. Similarly, T2IGAN is reported on CIFAR-10 and CIFAR-100, where it achieves an IS of 9.02 ± 0.08 and an FID of 10.90 on CIFAR-10, and an IS of 8.65 ± 0.07 with an FID of 14.10 on CIFAR-100. GAN Compression, which targets efficient architectures for conditional GANs, is evaluated on several image-to-image translation tasks: for example, it achieves an FID of 14.2 and an IS of 8.95 on the Horse↔Zebra task, an FID of 18.7 and an IS of 9.10 on the Edges→Shoes task, and an FID of 25.3 and an IS of 8.80 on the Cityscapes dataset.

4.4. RQ4: What Are the Gaps in the Research on NAS in GANs?

After examining and analyzing the identified NAS-GAN techniques, we have addressed this question with the results discussed in Section 4 and further elaborated on in the study’s implications outlined in Section 5. Table 7 provides an overview of the best-performing NAS-GAN methods across key datasets, detailing performance metrics such as IS and FID. This table highlights the diversity in evaluation settings—with methods tested on datasets ranging from MNIST and CelebA to STL-10 and CIFAR-100—and showcases different generation types, including unsupervised, supervised, and conditional approaches.

The variation in performance across these datasets suggests that the efficacy of current NAS-GAN approaches is highly sensitive to dataset characteristics and the chosen evaluation metrics. For example, while some methods yield exceptional FID scores on simpler datasets like MNIST or CelebA, their performance does not uniformly translate to more complex datasets. This observation points to a broader gap in the literature: a unified NAS strategy that can reliably generalize across different data distributions and generation scenarios remains elusive. The key gaps and areas for future research include:

Current NAS for GANs systems primarily focus on automated architecture generation for both Generator and Discriminator networks or exclusively for Generator networks. However, an approach that concentrates on searching for superior Discriminator networks has yet to be introduced. This represents a significant opportunity for future research to explore dedicated Discriminator architecture search methods, which could lead to enhanced overall GAN performance.
Another significant gap is the reliance on CIFAR-10 and STL-10 datasets for evaluating GAN architecture performance. While these datasets allow for direct comparisons, they limit the generalizability of the findings. Very few systems have used datasets like CelebA and LSUN for validation. Therefore, future research should aim to conduct performance evaluations on a larger variety of image datasets, including CelebA, LSUN, and COCO, to provide a more comprehensive assessment of GAN performance.
Most existing NAS-GANs systems have been developed for unconditional image generation tasks. To broaden the scope and impact of NAS for GANs, future work should also focus on other types of image generation, such as conditional image generation and image-to-image translation tasks. This expansion would not only enhance the versatility of GANs but also potentially uncover new applications and benefits of NAS in different image generation contexts.
There is also a need for more robust evaluation metrics beyond just the Inception Score and FID. While these metrics are commonly used, they may not capture all aspects of GAN performance. Future research should develop and incorporate new metrics that can provide a more holistic evaluation of GAN quality, including aspects like diversity, fidelity, and realism.
Another area that requires more exploration is the interpretability and explainability of the generated GAN architectures. Understanding why certain architectures perform better than others can provide valuable insights and guide future design improvements. Research efforts should aim to develop techniques that enhance the interpretability of NAS-generated GAN architectures.
A further aspect that warrants deeper exploration is the integration of NAS with emerging AI paradigms, such as self-supervised learning, which could significantly enhance the robustness and generalizability of GANs.
Another domain that calls for more research is the environmental and computational costs associated with NAS processes, promoting the development of more energy-efficient and sustainable AI models.
There is also a need to investigate the impact of NAS on GANs in other domains beyond image generation, such as natural language processing or time-series data, which could open new avenues for GAN applications.
Additionally, the application of NAS techniques to transformer architectures within GANs remains unexplored. Given the success of transformers in various machine learning tasks, exploring NAS for transformer-based GAN architectures could lead to significant advancements. This presents a promising research direction that could leverage the strengths of transformers for improved GAN performance.

4.5. Practical Applications of NAS in GANs

NAS applied to GANs has opened exciting avenues for automatically discovering efficient, high-performing architectures. These NAS-GAN methods reduce the need for hand-crafted designs and allow for tailored models across various application domains. In the following, we discuss several practical applications along with published works that illustrate their real-world impact.

4.5.1. Medical Imaging and Synthetic Data Generation

Medical imaging faces challenges such as limited annotated data and privacy concerns. NAS-GAN methods have been employed to automatically design generators that produce high-fidelity synthetic images (e.g., MRI, CT, X-ray) preserving important diagnostic features. Such synthetic images facilitate training robust models and augment scarce datasets. For example, Shafeeq Ahmed et al. reviewed the role of GANs in radiology and demonstrated their potential in generating realistic synthetic images for clinical applications [57]. In addition, work by Gao et al. on AdversarialNAS [39] further highlights how NAS can optimize GAN architectures for medical image synthesis.

4.5.2. Data Augmentation for Limited Datasets

In many domains, the availability of large datasets is a significant bottleneck. NAS-GAN approaches can produce diverse, high-quality synthetic images that mirror the statistical properties of the original data, thereby augmenting training datasets. For instance, a study on data augmentation using GANs demonstrated that augmenting CT scan datasets can improve diagnostic performance [58]. Such methods are particularly beneficial in rare disease scenarios where only a few examples are available [17].

4.5.3. Anomaly Detection

Anomaly detection in medical imaging (or industrial settings) requires the precise modeling of a “normal” data distribution so that deviations can be identified. NAS-GAN models have been tailored to capture subtle differences between normal and abnormal samples. For example, Ounasser et al. presented a comprehensive study on GAN-based anomaly detection in medical imaging, demonstrating improved sensitivity and specificity in fracture detection [17]. Additional works have refined these methods by integrating active learning strategies to mitigate overfitting in the discriminator [5].

4.5.4. Creative Content Generation

Beyond clinical applications, GANs have transformed the creative industries by generating novel, aesthetically pleasing images. NAS methods help discover generator architectures that yield diverse visual styles and high realism. A prominent example is the work by Karras et al. on a style-based generator architecture (StyleGAN) [59], which has been widely adopted in artistic content generation. Similarly, the progressive growing approach for GANs [60] further enhances the capacity of models to produce high-resolution, creative images.

NAS in GANs offers a versatile toolkit, enabling advances in fields from medical imaging to creative art. By automating the architecture design process, NAS-GAN methods improve performance, reduce design time, and tailor models to specific needs. Continued research and published works in these areas pave the way for future innovations.

4.6. Ethical and Environmental Considerations

The rapid advancement and increasing computational demands of NAS methods applied to GANs highlight critical ethical and environmental concerns that must be thoroughly addressed. A major issue arises from the substantial computational resources required by NAS-GAN methods, leading to high energy consumption and significant carbon footprints. This environmental impact becomes particularly pronounced during extensive search processes, where numerous architecture evaluations are performed over prolonged periods. Such practices contribute to greenhouse gas emissions, exacerbating climate change concerns and contradicting sustainability objectives in AI research.

To mitigate these environmental effects, there is a compelling need for developing and adopting energy-efficient NAS algorithms. Promising directions include incorporating techniques such as early stopping criteria, knowledge transfer across search spaces, and resource-aware optimization strategies. Additionally, research into NAS algorithms that leverage lower-energy hardware or renewable energy-powered infrastructure could considerably reduce ecological impacts.

Beyond environmental sustainability, NAS-GAN methods raise ethical considerations tied to their deployment in sensitive or critical applications. Automated generation and deployment of GAN architectures, without rigorous oversight, risk amplifying biases or producing unintended discriminatory outcomes. For instance, GAN-generated data may inadvertently encode biases present in training datasets, reinforcing societal inequalities when used in domains like facial recognition, healthcare diagnostics, or autonomous decision-making systems.

Ensuring fairness and transparency thus becomes paramount. Researchers and practitioners must commit to careful dataset curation, bias detection, and mitigation strategies in algorithmic design. Moreover, involving diverse stakeholder perspectives in developing standards and guidelines can help preempt ethical pitfalls and ensure responsible application of GANs.

In summary, a conscientious approach to NAS-GAN development encompasses not only technical advancements but also deliberate ethical foresight and environmental responsibility. The community must strive towards standards that prioritize sustainability, fairness, and social accountability, ensuring that advancements in GAN architectures benefit society without exacerbating existing ethical or environmental issues.

5. Implications of the Study

The findings and insights presented in Section 4 highlight ongoing gaps and unresolved issues in the current literature. This section explores these gaps and proposes directions for future research to advance the field of NAS for GANs.

5.1. Key Findings

Current NAS for GANs systems primarily focus on automated architecture generation for both Generator and Discriminator networks or exclusively for Generator networks. However, an approach that concentrates on searching for superior Discriminator networks has yet to be introduced. Dedicated Discriminator architecture search methods could lead to enhanced overall GAN performance.
The reliance on CIFAR-10 and STL-10 datasets for evaluating GAN architectures limits the generalizability of findings. While these datasets enable direct comparisons, they are insufficient for evaluating performance across diverse applications. Broader dataset evaluations, including CelebA, LSUN, and COCO, are necessary.
Most existing NAS-GAN systems are limited to unconditional image generation tasks. Expanding research to other types of image generation, such as conditional image generation and image-to-image translation, would broaden the scope and impact of NAS in GAN applications.

5.2. Opportunities for Improving and Expanding Applications of NAS-GANs

Current evaluation metrics, such as IS and FID, may not capture all aspects of GAN performance. Developing new metrics to assess diversity, fidelity, and realism can provide a more holistic evaluation framework.
There is a pressing need to improve the interpretability and explainability of generated GAN architectures. Understanding why specific architectures perform better can guide future designs and enable practical applications.
Integrating NAS with AI paradigms, such as self-supervised learning, could enhance the robustness and generalizability of GANs.
Addressing the environmental and computational costs of NAS processes is crucial. Developing energy-efficient and sustainable AI models will make NAS-GAN methods more accessible and scalable.
Beyond image generation, NAS-GAN techniques have potential in other domains, such as natural language processing and time-series data. Researching these applications could unlock new opportunities for GANs.
The application of NAS techniques to transformer-based GAN architectures remains unexplored. Given the success of transformers in various machine learning tasks, incorporating NAS into transformer-based GANs could significantly advance the field.

6. Threats to Validity

As with any survey-based research, this study encounters validity challenges that warrant careful consideration. First, while we meticulously sourced studies from prominent databases (Google Scholar, IEEE, ACM, Springer, ScienceDirect, arXiv) and employed forward/backward snowballing to ensure broad coverage, the rapid evolution of NAS-GAN research poses risks of omitting cutting-edge works published after our search cutoff (early 2024) or in non-English venues. This limitation could skew trends or overlook novel methodologies, such as transformer-based architectures or energy-efficient NAS strategies. To mitigate this, we prioritized arXiv preprints and iterative snowballing, though findings remain reflective of literature up to the search period.

Second, the systematic categorization of approaches inherently involves subjectivity. Classifying methods (e.g., distinguishing evolutionary algorithms from reinforcement learning) or search spaces (e.g., cell-based vs. chain-structured) relies on interpretative judgments, particularly when implementation details are ambiguously reported. Misclassification could distort comparative insights, such as overestimating search efficiency or misrepresenting architectural prevalence. To address this, we adhered to a predefined framework (Section 3.5) and cross-validated categorizations through dual-author reviews, resolving discrepancies via consensus.

Third, our comparison framework emphasizes quantifiable criteria (e.g., GPU days, search space size) but excludes practical factors like code availability or hardware dependencies. This narrow focus may undervalue deployment feasibility; for instance, methods requiring specialized TPU clusters might achieve superior metrics but remain inaccessible to most researchers. We explicitly acknowledged this limitation in Section 4.3.1 and Section 5.2, advocating for transparency in future work.

Fourth, heavy reliance on CIFAR-10 and STL-10 for benchmarking introduces dataset bias. These datasets lack the complexity and diversity of real-world applications (e.g., medical imaging or high-resolution video generation), limiting the generalizability of conclusions. For example, architectures optimized for low-resolution images may underperform in domain-specific tasks. We highlighted this gap in Section 5.1 and urged evaluations on broader datasets like CelebA and LSUN.

Finally, dependence on IS and FID as primary metrics risks “metric hacking”, as these scores may not align with human perception or detect subtle mode collapse. This could lead to inflated claims of superiority for architectures tailored to optimize IS/FID rather than practical performance. To mitigate this, we emphasized integrating supplementary metrics (e.g., precision/recall, human evaluation) in Section 5.2.

By explicitly addressing these threats—spanning literature coverage, categorization subjectivity, framework scope, dataset bias, and metric reliability—we aim to clarify the study’s limitations while reinforcing the validity of its contributions. Future work should prioritize reproducibility, broader benchmarks, and holistic evaluation frameworks.

7. Conclusions

In this study, we systematically examined various NAS-GAN methodologies prevalent in the current literature. By conducting an extensive survey of existing techniques, we identified a set of critical attributes that form the basis of our assessment framework. This framework was then employed to address our research questions, enabling a thorough analysis and comparison of the approaches within the literature.

Our comparative analysis revealed significant gaps in the field, highlighting areas that require further investigation. The study highlights the need for ongoing research to address these gaps, advancing both the theoretical and practical aspects of NAS-GANs. The insights gained from this research emphasize the importance of continued exploration and development to bridge existing shortcomings and enhance the efficacy of NAS-GAN technologies.

Key findings include:

The superiority of evolutionary algorithms and gradient-based methods in certain contexts for NAS-GAN.
The importance of robust evaluation metrics beyond traditional scores like IS and FID.
The need for diverse datasets in assessing GAN performance, beyond the commonly used CIFAR-10 and STL-10.
The potential for exploring dedicated Discriminator architecture search methods.
The opportunity to expand NAS-GAN research into conditional image generation and other domains beyond image generation.

Future research directions should focus on addressing these gaps, developing more comprehensive evaluation metrics, and exploring the application of NAS-GAN in diverse domains. By doing so, researchers can continue to push the boundaries of what is possible with generative adversarial networks, potentially leading to significant advancements in the field of artificial intelligence and machine learning.

Funding

This research received no external funding.

Acknowledgments

The authors would like to acknowledge the support received from the Saudi Data and AI Authority (SDAIA) and King Fahd University of Petroleum and Minerals (KFUPM).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Shorten, C.; Khoshgoftaar, T.M. A survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
Souza, L.A.; Passos, L.A.; Mendel, R.; Ebigbo, A.; Probst, A.; Messmann, H.; Palm, C.; Papa, J.P. Fine-tuning Generative Adversarial Networks using Metaheuristics. In Proceedings of the Bildverarbeitung für die Medizin 2021, Regensburg, Germany, 7–9 March 2021; Springer: Wiesbaden, Germany, 2021; pp. 205–210. [Google Scholar]
Abd Elaziz, M.; Dahou, A.; Abualigah, L.; Yu, L.; Alshinwan, M.; Khasawneh, A.M.; Lu, S. Advanced metaheuristic optimization techniques in applications of deep neural networks: A review. Neural Comput. Appl. 2021, 33, 14079–14099. [Google Scholar] [CrossRef]
Apostolopoulos, I.; Papathanasiou, N.; Apostolopoulos, D.; Panayiotakis, G. Applications of Generative Adversarial Networks (GANs) in Positron Emission Tomography (PET) imaging: A review. Eur. J. Nucl. Med. Mol. Imaging 2022, 49, 3717–3739. [Google Scholar] [CrossRef] [PubMed]
Xia, X.; Pan, X.; Li, N.; He, X.; Ma, L.; Zhang, X.; Ding, N. GAN-based anomaly detection: A review. Neurocomputing 2022, 493, 497–535. [Google Scholar] [CrossRef]
Kocasari, U.; Dirik, A.; Tiftikci, M.; Yanardag, P. StyleMC: Multi-channel based fast text-guided image generation and manipulation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–7 January 2022; pp. 895–904. [Google Scholar]
Mosolova, A.V.; Fomin, V.V.; Bondarenko, I.Y. Text augmentation for neural networks. In Proceedings of the CEUR Workshop Proceedings, Moscow, Russia, 5–7 July 2018; Volume 2268, pp. 104–109. [Google Scholar]
Talbi, E.G. Optimization of deep neural networks: A survey and unified taxonomy. arXiv 2020, arXiv:2006.05597. [Google Scholar]
Thanh-Tung, H.; Tran, T. Catastrophic forgetting and mode collapse in GANs. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–10. [Google Scholar]
Elsken, T.; Metzen, J.H.; Hutter, F. Neural architecture search: A survey. J. Mach. Learn. Res. 2019, 20, 1–21. [Google Scholar]
He, X.; Zhao, K.; Chu, X. AutoML: A survey of the state-of-the-art. Knowl.-Based Syst. 2021, 212, 106622. [Google Scholar]
Gong, X.; Chang, S.; Jiang, Y.; Wang, Z. AutoGAN: Neural architecture search for generative adversarial networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3224–3234. [Google Scholar]
Costa, V.; Lourenço, N.; Correia, J.; Machado, P. COEGAN: Evaluating the coevolution effect in generative adversarial networks. In Proceedings of the Genetic and Evolutionary Computation Conference, Prague, Czech Republic, 13–17 July 2019; pp. 374–382. [Google Scholar]
Tian, Y.; Shen, L.; Su, G.; Li, Z.; Liu, W. AlphaGAN: Fully differentiable architecture search for generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 6752–6766. [Google Scholar] [CrossRef]
Ganepola, V.V.V.; Wirasingha, T. Automating generative adversarial networks using neural architecture search: A review. In Proceedings of the 2021 International Conference on Emerging Smart Computing and Informatics (ESCI), Pune, India, 5–7 March 2021; pp. 577–582. [Google Scholar]
Buthgamumudalige, V.U.; Wirasingha, T. Neural Architecture Search for Generative Adversarial Networks: A Review. In Proceedings of the 2021 10th International Conference on Information and Automation for Sustainability (ICIAfS), Negambo, Sri Lanka, 11–13 August 2021; pp. 246–251. [Google Scholar]
Wang, Y.; Zhang, Q.; Wang, G.G.; Cheng, H. The application of evolutionary computation in generative adversarial networks (GANs): A systematic literature survey. Artif. Intell. Rev. 2024, 57, 182. [Google Scholar] [CrossRef]
Kang, J.-S.; Kang, J.; Kim, J.-J.; Jeon, K.-W.; Chung, H.-J.; Park, B.-H. Neural Architecture Search Survey: A Computer Vision Perspective. Sensors 2023, 23, 1713. [Google Scholar] [CrossRef]
White, C.; Safari, M.; Sukthanker, R.; Ru, B.; Elsken, T.; Zela, A.; Dey, D.; Hutter, F. Neural architecture search: Insights from 1000 papers. arXiv 2023, arXiv:2301.08727. [Google Scholar]
Kitchenham, B. Procedures for Performing Systematic Reviews; Technical report; Keele University: Newcastle, UK, 2004. [Google Scholar]
Talbi, E.G. Metaheuristics: From Design to Implementation; John Wiley & Sons: Hoboken, NJ, USA, 2009; Volume 74. [Google Scholar]
Wang, C.; Xu, C.; Yao, X.; Tao, D. Evolutionary generative adversarial networks. IEEE Trans. Evol. Comput. 2019, 23, 921–934. [Google Scholar] [CrossRef]
Al-Dujaili, A.; Schmiedlechner, T.; Hemberg, E.; O’Reilly, U.M. Towards distributed coevolutionary GANs. arXiv 2018, arXiv:1807.08194. [Google Scholar]
Toutouh, J.; Hemberg, E.; O’Reilly, U.M. Spatial Evolutionary Generative Adversarial Networks. In Proceedings of the Genetic and Evolutionary Computation Conference, Prague, Czech Republic, 13–17 July 2019; pp. 472–480. [Google Scholar]
Garciarena, U.; Santana, R.; Mendiburu, A. Evolved GANs for generating pareto set approximations. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’18, Kyoto, Japan, 15–19 July 2018; pp. 434–441. [Google Scholar] [CrossRef]
Lu, Y.; Kakillioglu, B.; Velipasalar, S. Autonomously and simultaneously refining deep neural network parameters by a bi-generative adversarial network aided genetic algorithm. arXiv 2018, arXiv:1809.10244. [Google Scholar]
Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1912–1920. [Google Scholar] [CrossRef]
Du, L.; Cui, Z.; Wang, L.; Ma, J. Structure tuning method on deep convolutional generative adversarial network with nondominated sorting genetic algorithm II. Concurr. Comput. Pract. Exp. 2020, 32, e5688. [Google Scholar] [CrossRef]
Lin, Q.; Fang, Z.; Chen, Y.; Tan, K.C.; Li, Y. Evolutionary Architectural Search for Generative Adversarial Networks. IEEE Trans. Emerg. Top. Comput. Intell. 2022, 6, 783–794. [Google Scholar] [CrossRef]
Ying, G.; He, X.; Gao, B.; Han, B.; Chu, X. EAGAN: Efficient two-stage evolutionary architecture search for GANs. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 37–53. [Google Scholar]
Xue, Y.; Tong, W.; Neri, F.; Chen, P.; Luo, T.; Zhen, L.; Wang, X. Evolutionary Architecture Search for Generative Adversarial Networks Based on Weight Sharing. IEEE Trans. Evol. Comput. 2024, 28, 653–667. [Google Scholar] [CrossRef]
Kobayashi, M.; Nagao, T. A Multi-Objective Architecture Search for Generative Adversarial Networks. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, Cancún, Mexico, 8–12 July 2020; pp. 133–134. [Google Scholar]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Wang, H.; Huan, J. AGAN: Towards automated design of generative adversarial networks. arXiv 2019, arXiv:1906.11080. [Google Scholar]
Zhou, P.; Xie, L.; Ni, B.; Tian, Q. Searching Towards Class-Aware Generators for Conditional Generative Adversarial Networks. IEEE Signal Process. Lett. 2022, 29, 1669–1673. [Google Scholar] [CrossRef]
Tian, Y.; Wang, Q.; Huang, Z.; Li, W.; Dai, D.; Yang, M.; Wang, J.; Fink, O. Off-policy reinforcement learning for efficient and effective gan architecture search. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part VII 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 175–192. [Google Scholar]
Li, W.; Wen, S.; Shi, K.; Yang, Y.; Huang, T. Neural Architecture Search With a Lightweight Transformer for Text-to-Image Synthesis. IEEE Trans. Netw. Sci. Eng. 2022, 9, 1567–1576. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Gao, C.; Chen, Y.; Liu, S.; Tan, Z.; Yan, S. AdversarialNAS: Adversarial Neural Architecture Search for GANs. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 5679–5688. [Google Scholar]
Doveh, S.; Giryes, R. DEGAS: Differentiable efficient generator search. Neural Comput. Appl. 2021, 33, 17173–17184. [Google Scholar] [CrossRef]
Li, M.; Lin, J.; Ding, Y.; Liu, Z.; Zhu, J.Y.; Han, S. GAN Compression: Efficient Architectures for Interactive Conditional GANs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
Xue, Y.; Chen, K.; Neri, F. Differentiable Architecture Search with Attention Mechanisms for Generative Adversarial Networks. IEEE Trans. Emerg. Top. Comput. Intell. 2024, 8, 3141–3151. [Google Scholar] [CrossRef]
Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training gans. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; Volume 29. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2818–2826. [Google Scholar]
Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Real, E.; Aggarwal, A.; Huang, Y.; Le, Q.V. Regularized evolution for image classifier architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 4780–4789. [Google Scholar]
Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
Srivastava, A.; Valkov, L.; Russell, C.; Gutmann, M.U.; Sutton, C. Veegan: Reducing mode collapse in gans using implicit variational learning. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Mescheder, L.; Geiger, A.; Nowozin, S. Which training methods for GANs do actually converge? In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 3481–3490. [Google Scholar]
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
Deng, L. The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web]. IEEE Signal Process. Mag. 2012, 29, 141–142. [Google Scholar] [CrossRef]
Liu, Z.; Luo, P.; Wang, X.; Tang, X. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3730–3738. [Google Scholar]
Yu, F.; Seff, A.; Zhang, Y.; Song, S.; Funkhouser, T.; Xiao, J. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv 2015, arXiv:1506.03365. [Google Scholar]
Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images; Technical report; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; Volume 27. [Google Scholar]
Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
Ahmed, H.S. Uncover This Tech Term: Generative Adversarial Networks. Korean J. Radiol. 2024, 25, 493–498. [Google Scholar] [CrossRef]
Tanaka, F.H.K.D.S.; Aranha, C. Data augmentation using GANs. arXiv 2019, arXiv:1904.09135. [Google Scholar]
Karras, T.; Laine, S.; Aila, T. A Style-Based Generator Architecture for Generative Adversarial Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 4217–4228. [Google Scholar] [CrossRef]
Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of gans for improved quality, stability, and variation. arXiv 2018, arXiv:1710.10196. [Google Scholar]

Figure 1. Literature distribution.

Figure 2. Distribution of NAS-GAN Methods by Search Strategy.

Figure 3. Distribution of Search Space Types in NAS-GAN Literature.

Table 1. Comparison of NAS-GAN Review Papers.

Review Paper	Year	Scope	Limitations	This Review’s Improvements
Ganepola & Wirasingha [15]	2021	Image generation, GAN compression; analysis of RL, EA, and gradient-based strategies	Focuses on pre-2021 works, with limited detail on mutation/operator design	Extends coverage to 2021–2025 with additional technical insights
Buthgamumudalige & Wirasingha [16]	2021	Transferability, supervised learning in NAS-GAN; evaluates IS and FID on CIFAR-10, STL-10	Limited dataset diversity and minimal discussion of evolutionary operators	Expands evaluation to diverse datasets (CelebA, LSUN) and provides deeper operator-level analysis
Kang et al. [18]	2023	NAS in computer vision tasks	Superficial NAS-GAN coverage	Provides in-depth NAS-GAN analysis
White et al. [19]	2023	Broad NAS survey (1000+ papers)	Minimal focus on GANs	Emphasizes GAN-specific techniques
Wang et al. [17]	2024	EC in GANs: architecture search, parameter tuning, loss function adaptation, and synchronization strategies	Broad scope reduces NAS-specific depth; less focus on discrete architecture search	Goes beyond traditional EC-based methods, offering a holistic technical perspective and comprehensive analysis of diverse approaches.
This Review	2025	Comprehensive NAS-GAN analysis	N/A	Synthesizes 2021–2025 works, addressing reproducibility, dataset diversity, and technical gaps

Table 2. Comparative Analysis of NAS-GAN Reviews.

Aspect	Ganepola & Wirasingha (2021) [15]	Buthgamumudalige & Wirasingha (2021) [16]	This Review
Search Strategies	EA, RL, Gradient-based	RL, Gradient-based	EA, RL, Gradient-based
Datasets	CIFAR-10, STL-10	CIFAR-10, STL-10	CIFAR-10, STL-10, CelebA, LSUN, MNIST
Evaluation Metrics	IS, FID, GPU days, Search space	IS, FID	IS, FID, Search spcae, Computational Cost
Limitations Addressed	Focus on early NAS-GANs	Limited evaluation criteria	Metric reliability, Reproducibility, Discriminator NAS gaps
Novel Contributions	Initial NAS-GAN taxonomy	Multi-criteria analysis	Analysis of recent NAS-GAN works, dataset diversity

Table 3. Comparison of different GAN architecture search methods.

Method	Search Strategy	Searched Network	Architecture Modification Technique	Optimization Objective	GPU Type	GPU Days	Search Space Type	Search Space Size
AutoGAN	RL	G Only	RNN Controller	IS	2080Ti	2	Cell	$10^{5}$
AGAN	RL	G and D	RNN Controller	IS	Titan-X	1200	Cell	$10^{5}$
MN-NAS	RL	G Only	MDP	IS	1080Ti	-	Cell	$10^{27}$
E²GAN	RL	G Only	MDP	IS & FID	2080Ti	0.3	Cell	-
T2IGAN	RL	G Only	MDP	IS & FID	V100	0.42	Cell	$10^{25}$
AdversarialNAS	Gradient	G and D	-	GAN objective	2 X 2080Ti	1	Cell	$10^{38}$
DEGAS	Gradient	G Only	-	Reconstruction loss	Titan-X	4	Chain	$10^{8}$
GAN Compression	Gradient	G Only	-	GAN Objective + reconstruction loss	2080Ti	-	Cell	$10^{9}$
alphaGAN	Gradient	G Only	-	Duality Gap Loss & GAN Objective	Tesla P40	0.15	Cell	$10^{11}$
DAMGAN	Gradient	G Only	Dual-Attention Mechanisms	GAN objective	3090	0.09	Cell	-
EvoGAN	EA	G and D	Mutation	Custom	NA	-	Hybrid	-
Bi-GAN	EA	G Only	Mutation & Continuous Refinement	Accuracy	NA	-	Hybrid	-
NSGA-II DCGAN	EA	G Only	Crossover & Mutation	Custom	-	-	Chain	-
E-GAN	EA	G and D	Mutation	Custom	1080TI	1.25	-	-
Lipizzaner	EA	G and D	Mutation	GAN objective	-	-	Cell	-
Mustangs	EA	G and D	Mutation	GAN objective	-	-	Cell	-
COEGAN	EA	G and D	Mutation	FID & Discriminator loss	-	-	Chain	-
EAGAN	EA	G and D	Crossover & Mutation	IS - FID	NA	1.2	Cell	$10^{38}$
EAS-GAN	EA	G Only	Mutation	Custom	3090	1	Cell	-
EWSGAN	EA	G Only	Crossover & Mutation	IS - FID	2080Ti	1	Cell	$10^{15}$
NSGA-II with CGP	EA	G and D	Crossover & Mutation	IS & FID	-	-	CPG	-

The optimization objective (IS&FID) combines both IS and FID into a composite objective, rather than using them independently (IS-FID). The search space size for T2IGAN is estimated based on a per-cell space of

5^{6} \approx 1.6 \times 10^{4}

configurations, stacked over 6 cells.

Table 4. Comparative FID Scores on MNIST and CelebA Datasets.

Method	MNIST (FID ↓ *)	CelebA (FID ↓ *)
COEGAN [13]	43.0 ± 4.0	-
E-GAN [22]	466.1	-
Lip-BCE [23]	48.96	36.25
Lip-MSE [23]	371.6	158.7
Lip-HEU [23]	52.53	37.87
Mustangs ^† [24]	42.24	36.15

* The down arrow (↓) indicates that lower values are better. ^† Bold values indicate the best performance across all methods.

Table 5. Summary of results on CIFAR-10 and STL-10 dataset.

Method	Search Strategy	CIFAR-10		STL-10
Method	Search Strategy	IS ↑ *	FID ↓ **	IS ↑ *	FID ↓ **
AutoGAN [12]	RL	8.55 ± 0.10	12.42	9.23 ± 0.08	31.01
AGAN [34]	RL	8.29 ± 0.09	30.50	9.23 ± 0.08	52.72
E²GAN [36]	RL	8.51 ± 0.13	11.26	9.51 ± 0.09	25.35
AdversarialNAS [39]	Gradient	8.74 ± 0.07	10.87	9.63 ± 0.19	26.98
DEGAS [40]	Gradient	8.37 ± 0.08	12.01	9.71 ± 0.11	28.76
alphaGAN [14]	Gradient	8.98 ± 0.09	10.35	10.12 ± 0.13	22.43
DAMGAN [42]	Gradient	8.99 ± 0.08	10.27	10.35 ± 0.14	22.18
E-GAN [22]	EA	6.9 ± 0.09	-	-	-
EAGAN [30]	EA	8.81 ± 0.10	9.91	10.44 ± 0.08	22.18
EAS-GAN [29]	EA	7.45 ± 0.08	33.2	-	38.84
EWSGAN ^† [31]	EA	8.99 ± 0.11	9.09	10.51 ± 0.13	21.89
NSGA-II with CGP [32]	EA	8.89 ± 0.01	16.6	10.3 ± 0.01	26.3

* The up arrow (↑) indicates that higher IS values are better. ** The down arrow (↓) indicates that lower FID values are better. ^† Bold values indicate the best performing method across all evaluated approaches.

Table 6. Summary of supervised generation results on STL-10 dataset.

Method	IS ↑ *	FID ↓ **
AGAN	8.82 ± 0.09	23.8
DEGAS	8.85 ± 0.07	9.83
NSGA-II with CGP	9.22 ± 0.05	7.24 ± 0.08

* The up arrow (↑) indicates that higher IS values are better. ** The down arrow (↓) indicates that lower FID values are better.

Table 7. Best-performing NAS-GAN Methods on Key Datasets.

Method	Dataset	Generation Type	IS ↑ *	FID ↓ **
Mustangs	MNIST	Unsupervised	-	42.24
Mustangs	CelebA	Unsupervised	-	36.15
EWSGAN	CIFAR-10	Unsupervised	8.99	9.09
EWSGAN	STL-10	Unsupervised	10.51	21.89
NSGA-II with CGP	STL-10	Supervised	9.22	7.24
T2IGAN	CIFAR-100	Conditional	8.65	14.10
GAN Compression	Horse ↔ Zebra	Conditional	8.95	14.20

* The up arrow (↑) indicates that higher IS values are better. ** The down arrow (↓) indicates that lower FID values are better.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alotaibi, A.; Ahmed, M. Neural Architecture Search for Generative Adversarial Networks: A Comprehensive Review and Critical Analysis. Appl. Sci. 2025, 15, 3623. https://doi.org/10.3390/app15073623

AMA Style

Alotaibi A, Ahmed M. Neural Architecture Search for Generative Adversarial Networks: A Comprehensive Review and Critical Analysis. Applied Sciences. 2025; 15(7):3623. https://doi.org/10.3390/app15073623

Chicago/Turabian Style

Alotaibi, Abrar, and Moataz Ahmed. 2025. "Neural Architecture Search for Generative Adversarial Networks: A Comprehensive Review and Critical Analysis" Applied Sciences 15, no. 7: 3623. https://doi.org/10.3390/app15073623

APA Style

Alotaibi, A., & Ahmed, M. (2025). Neural Architecture Search for Generative Adversarial Networks: A Comprehensive Review and Critical Analysis. Applied Sciences, 15(7), 3623. https://doi.org/10.3390/app15073623

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Neural Architecture Search for Generative Adversarial Networks: A Comprehensive Review and Critical Analysis

Abstract

1. Introduction

1.1. Background

1.2. Motivation

1.3. Objectives

2. Related Studies

2.1. Overview of NAS-GAN Reviews

2.2. Other NAS Reviews with GAN Sections

2.3. Identified Gaps in the Literature

3. Research Methodology

3.1. Study Objectives and Research Questions

3.2. Search Strategy

3.3. Study Selection and Quality Assessment

3.4. Limitation on Research Methodology

3.5. Data Extraction and Synthesis

4. Results and Discussion

4.1. RQ1: What NAS Approaches Are Applied to GANs in the Literature?

4.1.1. Evolutionary Algorithms Approaches

4.1.2. Reinforcement Learning Approaches

4.1.3. Gradient-Based Approaches

4.2. RQ2: What Are the Key Search Spaces Explored in NAS-GAN?

4.2.1. Entire/Chain-Structure Space

4.2.2. Cell-Based Space

4.3. RQ3: What Evaluation Methods Are Used to Assess the Found Architecture?

4.3.1. Evaluation Metrics

4.3.2. Datasets

4.3.3. Supported Generation Type

4.4. RQ4: What Are the Gaps in the Research on NAS in GANs?

4.5. Practical Applications of NAS in GANs

4.5.1. Medical Imaging and Synthetic Data Generation

4.5.2. Data Augmentation for Limited Datasets

4.5.3. Anomaly Detection

4.5.4. Creative Content Generation

4.6. Ethical and Environmental Considerations

5. Implications of the Study

5.1. Key Findings

5.2. Opportunities for Improving and Expanding Applications of NAS-GANs

6. Threats to Validity

7. Conclusions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI