Alleviate Data Scarcity in Remanufacturing: Classifying the Reusability of Parts with Data-Efficient Generative Adversarial Networks (DE-GANs)

Herold, Maximilian; Ahmeti, Engjëll; Kolakaleti, Naga Sai Teja; Odabasi, Cagatay; Koller, Jan; Döpper, Frank

doi:10.3390/app15179833

Open AccessArticle

Alleviate Data Scarcity in Remanufacturing: Classifying the Reusability of Parts with Data-Efficient Generative Adversarial Networks (DE-GANs)

by

Maximilian Herold

^1,*

,

Engjëll Ahmeti

¹

,

Naga Sai Teja Kolakaleti

¹,

Cagatay Odabasi

²

,

Jan Koller

¹

and

Frank Döpper

^1,3

¹

Fraunhofer Institute for Manufacturing Engineering and Automation IPA, Universitaetsstrasse 9, 95447 Bayreuth, Germany

²

Fraunhofer Institute for Manufacturing Engineering and Automation IPA, Nobelstrasse 12, 70569 Stuttgart, Germany

³

Chair Manufacturing and Remanufacturing Technology, University of Bayreuth, Universitaetsstrasse 30, 95447 Bayreuth, Germany

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(17), 9833; https://doi.org/10.3390/app15179833

Submission received: 24 July 2025 / Revised: 2 September 2025 / Accepted: 3 September 2025 / Published: 8 September 2025

(This article belongs to the Special Issue Emerging Applications of Artificial Intelligence in Mechanical, Control, Geological, Sensing, and Intelligent Detection Engineering: Integration and Innovations)

Download

Browse Figures

Versions Notes

Abstract

Remanufacturing, a key element of the circular economy, enables products and parts to have new life cycles through a systematic process. Initially, used products (cores) are visually inspected and categorized according to their manufacturer and variant before being disassembled and cleaned. Subsequently, parts are manually classified as directly reusable, reusable after reconditioning, or recyclable. As demand for remanufactured parts increases, automated classification becomes crucial. However, current Deep Learning (DL) methods, constrained by the scarcity of unique parts, often suffer from insufficient datasets, leading to overfitting. This research explores the effectiveness of Data-Efficient Generative Adversarial Network (DE-GAN) optimization approaches like FastGAN, APA, and InsGen in enhancing dataset diversity. These methods were evaluated against the State-of-the-Art (SOTA) Deep Convolutional Generative Adversarial Network (DCGAN) using metrics such as the Inception Score (IS), Fréchet Inception Distance (FID), and the classification accuracy of ResNet18 models trained with partially synthetic data. FastGAN achieved the lowest FID values among all models and led to a statistically significant improvement in ResNet18 classification accuracy. At a [1:1] real-to-synthetic ratio, the mean accuracy increased from 72% ± 4% (real-data-only) to 87% ± 3% (p < 0.001), and reached 94% ± 3% after hyperparameter optimization. In contrast, synthetic data generated by the SOTA DCGAN did not yield statistically significant improvements.

Keywords:

DE-GAN; remanufacturing; generative adversarial network; computer vision; deep learning; neural networks; CNN; circular economy; data augmentation

1. Introduction

Remanufacturing is an End-of-Life strategy that is a key element of the circular economy in industrial countries [1]. The remanufacturing process enables the reconditioning and reuse of parts from used products at the end of their useful life, providing them with completely new life cycles. Thereby, compared to the production of new parts, up to 90% of energy and 55% of material can be saved [2].

According to Steinhilper, the remanufacturing process contains five sequential steps: Product disassembly, cleaning, inspection and sorting, reconditioning and/or replenishment, and reassembly [3]. In the inspection and sorting step, the single parts of the disassembled product are examined and sorted according to their reusability in the succeeding process. Depending on its type and condition, the part could be sorted as “directly reusable”, “reusable after reconditioning”, or “not reusable” [3]. Usually, low-value parts that are not directly reusable and wear parts will be discarded from the process and fed to a recycling process.

In general, the decision on the reusability of a part is carried out manually based on a worker’s expert knowledge [4]. However, given the increasing number of used products and product variants, an automated decision-making process is required [5,6]. Recent automation approaches have increasingly turned to Deep Learning (DL)-based vision systems [4,7,8]. Nevertheless, training such supervised approaches requires large, diverse datasets that capture all available part variants across various conditions.

Creating these datasets is challenging, particularly due to the need for extensive physical resources. Specifically, access to large quantities of returned parts across all relevant conditions and classes is needed. In some cases, this may involve binary classification (e.g., reusable vs. recyclable); however, other remanufacturing contexts may require more granular distinctions, such as those concerning parts that are directly reusable, reusable after reconditioning, or recyclable. Regardless of the classification scheme, collecting sufficient examples, particularly of defective or rarely occurring part types, is both costly and time-intensive for frequently processed parts, and it is often infeasible for uncommon remanufacturing parts [9].

Uncommon or exotic remanufacturing parts emerge due to several distinct circumstances. One key factor is core complexity, which increases throughout the product life cycle. A single automotive part may exist in numerous revision levels due to continuous technical and design modifications, resulting in a broad range of part numbers that must be distinguished and managed effectively [10]. In addition, core management introduces further uncertainty. It is not guaranteed that all parts produced will return after the usage phase. For example, a core may never enter the reverse supply chain (e.g., when it is discarded by the user or retained for other purposes), or it may become untraceable or unavailable within the reverse logistics process (e.g., due to issues like labeling errors or insufficient tracking systems) [10]. Even when cores do return, they do so irregularly and unpredictably, influenced by varying usage intensities, failure rates, and consumer behavior, which together disrupt the predictability of supply [11]. Furthermore, parts from car models that are no longer in production, such as vintage vehicles, become increasingly rare as the number of remaining vehicles diminishes. Similarly, parts from newly introduced models may also be scarce during the early stages of their product life cycle, as production scale-up delays the availability of returned cores at remanufacturing facilities. These challenges collectively result in a high degree of variability and scarcity in available parts. As a consequence of this scarcity, training datasets used in DL applications for used part classification are often imbalanced, with certain classes (e.g., defective parts) being significantly underrepresented, ultimately degrading the performance of DL classifiers [12]. In our work, we address the problem of data scarcity in remanufacturing with the help of Data-Efficient Generative Adversarial Networks (DE-GANs).

2. Related Works

This section provides an overview of DE-GANs and their role in addressing the challenges associated with limited and imbalanced datasets. The following paragraphs present key developments in Generative Adversarial Network (GAN) architectures, data augmentation approaches, regularization methods, and lightweight model designs.

GAN-based data augmentation has shown promising results in overcoming the restrictions of limited and imbalanced real-world datasets [13,14,15]. GANs learn a probabilistic model of a given distribution of data and then use this model to generate new samples [16]. The original GAN consists of two neural networks trained in an adversarial manner: a generator network and a discriminator network [16]. The generator network generates synthetic images, and the discriminator network evaluates the quality of the generated images by trying to distinguish between real and synthesized images. Recently, several GAN architectures have been developed and used for the augmentation of high-quality image data. The areas of application are numerous and include medical technology [17,18], production [19], and agriculture [20], among others. Despite their success, GANs suffer from unstable, sensitive training, mode collapse, mode mixture, and a dependence on large training datasets [21]. When training data are scarce, DE-GANs offer a solution to stabilize the training process and deliver realistic images by distribution optimization. Li et al. provided a comprehensive explanation of DE-GANs and highlighted that the decline in performance stems from the discrepancy between data distributions, causing a degradation of the DE-GAN [22]. Yang et al. expanded upon the foundational work by Li et al. [22] on DE-GANs [23]. They broadened the analysis to include conditional generative models and one-shot image synthesis tasks, which were not addressed by Li et al. Furthermore, Yang et al. updated the discourse by integrating the recent advancements and inversion techniques which have emerged since Li et al.’s publication. Their survey also provided a detailed comparison of design concepts and method characteristics across different approaches, emphasizing practical applications and identifying technical limitations for future research. Understanding the limitations of DE-GANs helps to categorize various existing methods, based on distribution optimization, into data selection [24,25,26], knowledge sharing [27,28,29,30,31,32,33,34,35,36,37,38], and DE-GAN optimization approaches [22]. Data selection approaches focus on minimizing the data gap between the distribution of the limited training dataset and the distribution of the entire target dataset by applying sampling strategies [22]. Knowledge sharing and DE-GAN optimization approaches target minimizing the existing empirical gap between the generated distribution after the training process and the empirical distribution of the training samples by utilizing additional prior knowledge or optimized frameworks [22].

DE-GAN optimization approaches challenge the risks of overfitting the discriminator and the imbalance between the latent and data spaces due to limited training data by utilizing data augmentation, regularization, and simple, efficient model architectures [22].

Recent data augmentation approaches focus on mitigating the problem of overfitting in discriminators by augmenting both real and synthetic samples using spatial and visual transformations [39,40,41,42,43]. Spatial transformations, such as zooming, cutout, and translation, manipulate the image itself, while visual transformations, such as brightness and coloring, change how the image appears [22]. It is hypothesized that spatial transformations can improve the performance of DE-GANs more than visual transformations [22]. The most common approaches are Adaptive Discriminator Augmentation (ADA), Adaptive Pseudo Augmentation (APA), and Differentiable Augmentation (DA) [42]. ADA [39] modulates the intensity of data augmentation, whereas APA [40] selects generated images to augment real data when the discriminator is overfitting during training. Adaptive Feature Interpolation (AFI), presented by Dai et al. [44], introduces an implicit data augmentation technique that enhances stable training and high-quality sample synthesis without label information. AFI employs the discriminator as a metric embedding of the real data manifold to execute unsupervised, feature-level sample interpolation. Jeong et al. proposed ContraD [45], a Contrastive Discriminator framework that integrates contrastive learning into the training process of GAN discriminators. Rather than directly optimizing the discriminator for the GAN loss, ContraD derives a contrastive representation from augmentations and (real or generated) samples. This representation is then used by a two-layer network to function as the actual discriminator. To make the discriminator less sensitive towards the invariance caused by data augmentation, Hou et al. developed an augmentation-aware discriminator (AugSelf-GAN), which predicts the parameters of an augmentation applied to the input image [46]. Huang et al.’s MaskedGAN [47] presents a robust image generation approach that excels with limited training data by randomly masking out portions of image information to facilitate effective GAN training. This method employs two masking strategies: shifted spatial masking, which applies random shifts in the spatial dimensions, and balanced spectral masking, which masks certain spectral bands with self-adaptive probabilities.

Regularization approaches aim to enhance generalization in DE-GANs by incorporating priors or additional supervision tasks [48]. For instance, Tseng et al. proposed an anchor-based regularization term to mitigate discriminator overfitting, forcing the discriminator to mix predictions of real and synthetic images rather than differentiating them [49]. Similarly, Fang et al. introduced the Discriminator Gradient Gap-Regularized GAN (DigGAN) [50]. DigGAN augments existing GANs by narrowing the gap between the norm of the gradient of a discriminator’s prediction for real images and generated samples, thus aiding in avoiding detrimental attractors within the GAN loss landscape. Expanding on the idea of regularized representation spaces, Yang, M. et al. proposed ProtoGAN [51], a method that incorporates a metric-learning-based prototype mechanism that aligns the prototypes and features of real and generated images in the embedding space to enhance the fidelity of the generated images. The authors also included a variance term in the objective function to promote synthesis diversity. To encourage robustness across varying styles, Kim et al. [52] suggested Feature Statistics Mixing Regularization (FSMR), where the features of an original image and a reference image are mixed in the feature space to produce images in new styles. Likewise, Pengwei [53] proposed Latent Feature Maximization (LFM) to improve diversity in the latent space by enforcing orthogonality between latent vectors, measured via dot–product constraints. In a different direction, Yang, C. et al. proposed InsGen, a contrastive-learning-based framework that extends the discriminator’s role beyond binary real/fake classification [54]. It introduces instance discrimination, compelling the discriminator to distinguish between individual image instances. This additional supervision promotes the learning of fine-grained, instance-aware features and implicitly encourages a more structured and diverse latent representation. Complementing this perspective, Kong et al. [55] addressed the imbalance between latent and data spaces by introducing a regularization strategy based on latent space interpolation. By sampling interpolation coefficients and generating anchor points between latent vectors, their method encourages smooth transitions in the generative mapping. While no explicit constraint is applied directly on the latent space, this strategy implicitly regularizes its structure by enforcing semantic consistency across interpolated samples.

The model architecture significantly influences the performance and stability of GANs [56]. Style-based [57,58] and progressively growing [59] architectures have gained much attention in research, but they are prone to overfitting in limited data settings due to their large number of parameters [23]. To address this limitation, lightweight DE-GAN architectures have been proposed. For instance, Liu et al. [60] introduced FastGAN, which employs a skip-layer channel-wise excitation module and a self-supervised discriminator trained as a feature encoder. This architecture demonstrates superior performance and faster convergence compared to StyleGAN2 under constrained data settings. In a complementary direction, Li et al. [61] proposed the Memory Concept Attention (MoCA) module, which enhances the generator with a memory-based attention mechanism guided by prototypes obtained through momentum online clustering. Beyond architectural simplification, some works explore alternative representations of the input data. Yang et al. [62] presented FreGAN, a frequency-aware framework that incorporates both a frequency-aware discriminator and a high-frequency alignment module. These components explicitly model and inject frequency domain representations into the generation process and help stabilize training. Also addressing discriminator overfitting, Cui et al. [63] introduced Generative Co-training (GenCo), a framework that employs multiple complementary discriminators to provide diverse supervision. This approach enables the generator to learn more robust and generalized representations. In contrast to Cui’s strategy of increasing discriminator diversity, Yang et al. [64] proposed DynamicD, which dynamically adjusts the discriminator’s capacity throughout training. By progressively increasing or decreasing model capacity depending on data availability, DynamicD improves generalization while maintaining training efficiency. Building on the goal of improving parameter efficiency under limited data, some works move beyond fixed architectures by discovering or leveraging sparse yet effective subnetworks. Chen et al. [65] introduced a two-stage framework based on the lottery ticket hypothesis, which identifies highly sparse and independently trainable subnetworks—referred to as GAN lottery tickets—from larger models. These subnetworks, once retrained on the same dataset, can match or even outperform their dense counterparts while reducing computational overhead. Similarly, Shi et al. [66] proposed AutoInfoGAN, which combines neural architecture search (NAS) with contrastive representation learning. By using reinforcement learning and unsupervised contrastive loss, AutoInfoGAN autonomously discovers compact and performant architecture configurations without manual tuning. In a similar vein, Saxena et al. [67] introduced Re-GAN, which dynamically prunes and regrows network connections during training. This continual architectural reconfiguration regularizes the GAN and avoids the costly train–prune–retrain cycles associated with conventional lottery ticket approaches.

3. Research Gap

Recent advances in DE-GANs have demonstrated promising results in generating high-quality images under limited data conditions. Despite this progress, their applicability to remanufacturing, particularly for the purpose of generating synthetic training data to support the classification of used industrial parts, remains largely unexplored.

Approaches for generating synthetic training data in remanufacturing often rely on 3D model rendering [68,69,70,71] or manual augmentation (e.g., pasting defects into images of non-defective parts) [72], both of which are resource-intensive and impractical for many real-world reverse supply chains, where 3D models of parts are rarely available. To date, the only generative model that has been explored within the broader remanufacturing context is the standard Deep Convolutional Generative Adversarial Network (DCGAN), as investigated by Gao et al. [12], who used solely 2D images as the input to generate data for classifying used gears with different defect types. However, DCGAN was not developed for low-data regimes, making it prone to overfitting and unstable training. To compensate for the limited training data, Gao et al. applied Random Oversampling (ROS), expanding the dataset from 50 to 200 images by repeatedly duplicating samples across categories. This strategy, while increasing sample count, introduces the risk of overfitting and compromises model generalizability [73]. Moreover, their evaluation relied largely on synthetically generated test data, which may not provide a reliable indication of real-world performance. Their study also focused on a single part type across five defect conditions, with images sharing similar properties such as color, size, and point-of-view [12].

In a broader evaluation context, Li et al. conducted a comprehensive review of DE-GAN approaches, which had been evaluated on datasets like FFHQ-5K, FFHQ-1K, and CIFAR-10 [22]. While their work provides valuable insights into the performance of DE-GANs under low-data regimes, it does not extend to domain-specific contexts such as remanufacturing. The unique challenges posed by industrial parts, including variability caused by wear, the presence of heterogeneous materials, and inconsistent visual conditions, are not addressed in prior work.

In contrast, our dataset includes 36 parts varying in material (plastic and some contain metal bearings), size, shape, and color. Image acquisition was intentionally performed under varied perspectives, lighting conditions, and distances between the object and the lens to simulate the heterogeneity encountered in real remanufacturing environments. Contrary to the approach of Gao et al., we did not use ROS and Traditional Data Enhancement [12]. To the best of our knowledge, no prior study has systematically investigated the application of DE-GANs for generating synthetic images of used parts with real-world defects and evaluating their usefulness for downstream classification tasks in remanufacturing. Our study addresses this gap and makes the following contributions:

Application of DE-GANs in remanufacturing: We investigate the effectiveness of optimized DE-GANs against the State-of-the-Art (SOTA) DCGAN in generating realistic images of defective and non-defective parts under limited data conditions. To our knowledge, this represents the first evaluation of DE-GANs in this specific domain.
Use of a diverse, real-world dataset: Our dataset includes parts with different material properties, geometries and wear patterns. The images were captured under varying conditions, including changes in lighting, object orientation, and distance, to reflect the complexity of real industrial settings.
Avoidance of oversampling and synthetic evaluation bias: Unlike Gao et al., we do not use ROS or rely on synthetic test sets. Instead, we assess model performance using a test set comprising solely real images, allowing for a more reliable evaluation of classification performance.
Analysis of synthetic data utility for downstream tasks: We assess the applicability of DE-GAN-generated images for training classifiers that predict the reusability of used parts and investigate the impact of different real-to-synthetic data ratios on classification accuracy.

Our work focuses on whether optimized DE-GANs are more effective in generating accurate synthetic images of defective and non-defective parts with limited training data compared to the SOTA DCGAN approach. In this context, we limited our study to the evaluation of optimized DE-GAN frameworks, whereas knowledge sharing and data selection approaches are outside the scope of this paper.

4. Materials and Methods

To assess whether synthetic images improve reusability classification, we followed the three-step approach shown in Figure 1: First, we created a real image dataset, which was then split in a stratified manner into a real training dataset (“Train R”), a real validation dataset (“Valid”), and a test dataset (“Test”). Second, we trained one baseline GAN (SOTA DCGAN) and the three most promising DE-GANs to synthesize additional training data. The synthetic training data (“Train S_n”) were first visually inspected and then evaluated on quality metrics using the real test dataset (“Test”) as reference. Next, we formed mixed training datasets (“Train M_n”) at predefined ratios of real and synthetic data. For every generative model and ratio, we trained ResNet18 classifiers on the mixed training set (“Train M_n”), while monitoring convergence on the real validation dataset (“Valid”). After training, we evaluated all classifiers on the held-out real test dataset (“Test”) and used these test results to identify the best-performing combination of generative model and real-to-synthetic ratio. Then, we optimized the hyperparameters of the ResNet18 classifiers by using the mixed training dataset (“Train M_n”) of the best-performing combination (Real-to-synthetic ratio and GAN-model) for training and the real validation dataset (“Valid”) for monitoring. Finally, the optimized classifiers’ performance was assessed and reported using quality metrics on the real test dataset (“Test”).

4.1. Real Dataset Creation

This study utilizes a dataset of gears previously used in electric bicycle motors, sourced from the completed “AddRE-Mo” project (https://www.addre-mo.de/, accessed on 1 September 2025). As a part of our current research project, ‘Versatile-KI’, we are employing this dataset to continue testing methods. The gears in the dataset vary in size, color, wear patterns, and material properties, consisting of different types of plastics. Some gears include metal bearings, while others do not feature any bearings at all. To better simulate real-world conditions, we captured images from various perspectives, lighting conditions, and distances between the camera and the object. Although all images were taken against the same background, we adjusted the camera settings based on the gear colors, which range from bright yellow to almost black. These adjustments were necessary to prevent over-exposure and under-exposure, causing the background appearance to vary across images. The darker the gear, the brighter the background appears and vice versa. This methodological approach was chosen to evaluate the practical suitability of DE-GAN optimization techniques.

A remanufacturing expert conducted the data labeling process, based on the gears’ wear and tear, which varied depending on their usage in the electric bicycle motors and their inherent durability. Gears with partially or completely sheared gear teeth, deformed gear teeth, or broken gears were classified as recyclable, while gears without signs of wear and tear were classified as reusable, as shown in Figure 2. While some defects, such as completely sheared gear teeth, were easily identifiable, others, like deformed gear teeth, were not as easily detectable.

We utilized a Nikon D5300 camera (Nikon Corporation, Tokyo, Japan) equipped with an AF-S Nikkor 18–105 mm lens (Nikon Corporation, Tokyo, Japan), complemented by two Aputure Amaran Tri-8C lights with EZ-Box II (Aputure, Los Angeles, CA, USA), to capture the images (Figure 3). Choosing this consumer-grade hardware was strategic: It allows for core sorting activities to be conducted off-site, such as in repair shops, thus minimizing transportation costs. This setup offers substantial advantages as it is cost-effective, widely available, and user-friendly, even for those with limited technical expertise. For our imaging process, we captured images of 18 gears classified as reusable and another 18 classified as recyclable.

Each gear was rotated to capture six different images from varying angles. Afterward, each gear was flipped and another six images were taken, ensuring comprehensive coverage of the gear’s condition. Since the native resolution of the Nikon D5300 is 6000 × 4000 pixels, preprocessing was necessary to prepare the images for input into the GAN models. Each image was center-cropped to a square resolution of 4000 × 4000 pixels, resized to 128 × 128 pixels, transformed into a tensor, and normalized. As outlined in Table 1, the resulting dataset was equally divided into training, testing, and validation subsets for each gear condition, diverging from conventional data splits such as 80:10:10. This customized distribution was intentionally selected to ensure robustness and reliability of the results, given the limited dataset size.

4.2. Synthetic Image Evaluation

Synthetic images were generated using optimized DE-GAN approaches to enhance the real training datasets. To identify the most effective optimized DE-GAN approaches that had already been tested on small datasets, we conducted a literature review. Unfortunately, not all developed approaches have been tested on the same small datasets, complicating comparisons within an optimized DE-GAN category. Therefore, we decided to focus our review only on DE-GAN approaches evaluated on widely used small-scale image datasets, specifically Flickr-Faces-HQ (FFHQ; FFHQ-5K and FFHQ-1K) and the Canadian Institute for Advanced Research 10-class dataset (CIFAR-10). Any optimized DE-GAN approach that had not been tested on at least one of the reference datasets was excluded from our analysis.

For each DE-GAN optimization category (model architecture, data augmentation, regularization), we selected the best optimized DE-GAN implementation based on the Fréchet Inception Distance (FID) scores obtained from the literature (see Table 2).

The three top-performing optimized DE-GAN approaches were FastGAN (https://github.com/odegeasslbc/FastGAN-pytorch (accessed on 1 September 2025)), StyleGAN2 + ADA + APA (hereafter referred to as APA; https://github.com/EndlessSora/DeceiveD (accessed on 1 September 2025)), and StyleGAN2 + ADA + InsGen (hereafter referred to as InsGen; https://github.com/genforce/insgen (accessed on 1 September 2025)). We implemented and tested these optimized DE-GAN approaches alongside the SOTA approach by Gao et al. [12] to compare their benefits. Our implementation differed from the original DCGAN implementation of Radford et al. [74] by including an additional layer in both the discriminator and generator to achieve an output image size of 128 × 128 pixels.

We utilized the PyTorch (https://pytorch.org/ (accessed on 1 September 2025)) framework for training, evaluating, and inferencing the GAN-models on an Nvidia Tesla T4 GPU (NVIDIA Corporation, Santa Clara, CA, USA) with 16 GB of memory. The performance of the synthetic image generators was assessed using FID and Inception Score (IS). The FID metric measures the distance between two distributions in a common feature space by comparing their mean vectors (μ_y, μ_ŷ) and covariance matrices (Cov_y,y, Cov_ŷ,ŷ) [75]. To obtain these statistical properties, we generated embeddings for each image in the dataset using the feature-extraction layers of an ImageNet pre-trained Inception-v3 network. These embeddings are high-dimensional representations that encode the visual features of each image. By aggregating all image embeddings from a dataset, we form the dataset’s distribution.

To calculate the distance between distributions (i.e., between the real and each synthetic dataset’s distributions), we used the Fréchet Distance (see Equation (1)), also known as the Wasserstein-2 distance. This involves calculating the squared magnitude of the difference between the mean vectors and adding the trace of the covariance matrices [75], where the trace (Tr) is the sum of the diagonal elements of each matrix. A smaller FID value indicates a smaller distance between the distributions [76], suggesting that the synthetic images more closely resemble the real images.

F I D = {∥ μ_{y} - μ_{\hat{y}} ∥}^{2} + T r ({C o v}_{y, y} + {C o v}_{\hat{y}, \hat{y}} - 2 \sqrt{{C o v}_{y, y} \times {C o v}_{\hat{y}, \hat{y}}})

(1)

The IS is calculated using an ImageNet pre-trained Inception-v3 network and takes into account both the quality (measured by the cross-entropy H(y|x) of p(y|x)) and diversity (measured by the cross-entropy H(y) of p(y)) of the images, with higher IS values indicating better generator (G) models [77]. In Equation (2), x denotes the generated image, y refers to the predicted label assigned by the Inception model, p(y|x) is the probability distribution of the generated image belonging to different classes and p(y) is the marginal class distribution [12]:

I S (G) = e x p \{\frac{1}{N} \sum_{i = 1}^{N} p (y| x) \{l o g [p (y| x)] - l o g [p (y)]\}\} = e x p \{H (y) - H (y| x)\}

(2)

To relate these metrics to training progress in a model-agnostic manner, we monitored class-wise FID over training and normalized progress to thousands of images shown to the discriminator (kimg). For StyleGAN2-based models (APA, InsGen), the snapshot identifier equals the same kimg count and for DCGAN and FastGAN we converted epochs and iterations to the equivalent kimg. We followed a milestone-based schedule realized as independent runs. Each model was freshly initialized and trained to a prescribed kimg milestone and then evaluated. We advanced to the next higher milestone only while the class-wise FID improved. Once the class-wise FID failed to improve across two successive milestones, we considered training converged, and we selected the final model at the milestone where both classes reached their respective minima. In our experiments, the class-wise minima coincided for each model, so no trade-off between classes was required. No single run was interrupted by an online rule, and no snapshot ensembling was applied.

As a qualitative corroboration of convergence, we reviewed representative samples at each milestone, and a remanufacturing expert judged whether perceived realism increased or decreased across milestones. As a complementary metric, we computed the IS only at the FID-selected milestone for each model to summarize sample quality and diversity at the chosen checkpoint, and we did not use IS to guide training.

4.3. Reusability Classification

While the FID score and IS are established metrics for assessing the quality and diversity of synthetic images, they do not directly quantify the utility of such images for improving classifier performance in remanufacturing tasks. These metrics are not designed to evaluate the contribution of individual images or an entire synthetic dataset to classifier performance [78]. For instance, a synthetic dataset may achieve an exceptionally low (i.e., very good) FID value by closely matching the distribution of the real dataset, without necessarily improving classifier training. A practical example of this is duplicating a real dataset to form a synthetic one: While the FID between these two datasets would be extremely low due to identical distributions, using this duplicated data for training could potentially degrade classifier performance, similar to the effects of Random Oversampling.

To address this limitation, we enriched the existing real training dataset with images generated by optimized DE-GAN approaches (see Figure 1). We then trained ResNet18 [79] classifiers using synthetic data from three optimized DE-GANs and the SOTA DCGAN at various real-to-synthetic ratios (see Table 3). Building on the approach of Zhou et al. [80], we examined the effects of [2:1], [1:1], and [1:2] real-to-synthetic ratios. Additionally, we chose to compare each partially synthetic training dataset with a dataset consisting solely of real images. In total, we considered 13 experimental configurations, comprising each GAN variant at each real-to-synthetic ratio as well as the real data’s baseline. For each configuration, we trained five independent ResNet18 classifiers, resulting in 65 trained models overall. We reported classification results as mean ± standard deviation (SD) across these five runs and evaluated the statistical significance of observed accuracy differences using Welch’s two-sample, two-tailed t-test (assuming unequal variances), with a significance threshold of p < 0.05. All comparisons were made relative to the real-data-only baseline.

After evaluating all trained ResNet18 classifiers on the real test dataset, we identified the generative model and the real-to-synthetic ratio that yielded the highest mean test accuracy with statistically significant improvement. We then performed a full factorial grid search to optimize the ResNet18 hyperparameters using synthetic data from this best-performing GAN at its optimal real-to-synthetic ratio. The grid comprised learning rates {5 × 10⁻⁵, 1 × 10⁻⁴, 3 × 10⁻⁴}, batch sizes {8, 16, 32}, and epochs {5, 10}, yielding 18 configurations (cases) in total. For each configuration, we trained five independent classifiers, resulting in a total of 90 trained classifiers. The training process was monitored on the validation dataset, while for the final testing of all classifiers, the fixed test dataset containing only real images was used. The reported accuracy, F1-score, recall, and precision were stated as mean ± SD across the five independent classifiers for each hyperparameter combination (case). We chose grid search rather than random or Bayesian optimization because the search space was low-dimensional and bounded, computational cost was modest, and interpretability was essential [81]. The selected ranges follow common practice in small-data image classification [82].

5. Results

This section presents the findings of our research. The evaluation process begins with an analysis of synthetic images using established evaluation metrics (Section 5.1), followed by assessing the impact of incorporating synthetic images on the classification performance (Section 5.2).

5.1. Evaluation of Synthetic Images on Evaluation Metrics

We utilized optimized DE-GANs and the SOTA DCGAN to generate additional synthetic images of gears, creating 144 synthetic training images per class with each GAN. The resolution of the synthesized images varied depending on the generative model (see Table 4).

While DCGAN, APA, and InsGen produced images at 128 × 128 pixels, FastGAN generated images at 512 × 512 pixels. To ensure consistency and comparability across the different models, we standardized all images to a resolution of 128 × 128 pixels post-generation. Although Gao et al. employed a resolution of 224 × 224 pixels in their study [12], we opted for the lower resolution of 128 × 128 pixels to mitigate the risk of instabilities known to occur during the training of DCGAN. These instabilities are exacerbated when additional layers are required to reach higher output resolutions, as the original DCGAN implementation was designed to output images of 64 × 64 pixels [74].

Despite this standardization, the quality of the generated images varied across different models. For example, synthetic images generated by the DCGAN approach in a recyclable condition (Table 4: D5–D8) appeared blurred compared to real images of the same class (Table 4: R5–R8). Despite this blurriness, signs of wear and tear were visible in three of the four synthetic images (Table 4: D5, D6, D8). In contrast, the synthetic images generated by the DCGAN in reusable condition (Table 4: D1–D4) were of such low quality that the gears were not easily discernible, even after 400 epochs of training. Furthermore, most of the generated images shared similar characteristics in terms of colors and shapes, which may indicate the occurrence of mode collapse. This issue will be explored in greater depth in the discussion presented in Section 6.1.

In contrast to DCGAN, the optimized DE-GAN approaches (Table 4: A1–A8, I1–I8, F1–F8) generated higher-quality gear images for both conditions. Both APA and FastGAN could even generate gears that had never existed in the original real training dataset. For instance, both models created white gears (Table 4: F3, A3) in reusable conditions, even though no white gear existed in the original real training dataset. FastGAN even created an image of a dark-grey gear in a reusable condition where the bearing is missing (Table 4: F1), although such a gear never existed in the original training dataset.

We could not clearly determine whether the optimized DE-GANs generated fine-grade wear patterns based on the optical characterization of the generated images. This is due to the low resolution of the generated images, which made it difficult to see minor signs of wear and tear. Only the FastGAN-generated image in Table 4, F8, showed clear signs of wear and tear. Additionally, we noticed that most of the generated images showed signs of distortion in varying degrees. While the images of FastGAN (Table 4, F6, F7) and APA (Table 4, A2, A6, A8) were only slightly affected by distortion, the images of InsGen were heavily distorted, with the generated bearing off-center (Table 4, I8) or the generated bearing and gear appearing squeezed (Table 4, I6).

To evaluate the quality and diversity of generated images further, we utilized FID and IS metrics (see Table 5). Ideally, a high IS reflects a generator’s ability to produce diverse and distinct images, while a low FID is desirable, as it signifies a closer statistical similarity between the distributions of generated images and real images.

At the FID-selected milestones, APA achieved the highest IS among all approaches, outperforming all optimized DE-GAN approaches as well as the DCGAN model. Notably, the IS values varied not only between models but also between classes. Unlike DCGAN, all DE-GANs yielded slightly higher IS values for the reusable class than for the recyclable class. IS was computed only at the FID-selected milestone (Section 4.2) and is reported as a complementary summary of sample quality and diversity.

The FID scores showed remarkable variation among different GAN models. DCGAN performed the poorest, with FID scores of 358.50 for the recyclable class and 435.08 for the reusable class. In contrast, the best-performing model, FastGAN, achieved FID scores of 63.04 for the recyclable class and 59.07 for the reusable class. Notably, the FID scores for the InsGen model exhibited a difference between the two classes, with scores of 158.53 for the recyclable class and 252.64 for the reusable class. In general, most models (DCGAN, APA, InsGen) achieved better FID scores for the recyclable class than for the reusable class, indicating that these models were more effective at generating high-quality images of gears with wear and tear (recyclable condition) compared to those without (reusable condition). However, only the FastGAN model attained a superior FID score for the reusable class (59.07) compared to the recyclable class (63.04). Despite slightly lower performance for the recyclable class, the overall results of the FastGAN model were still better than those of the second-best model, APA (recyclable: 126.20, reusable: 194.03).

For context, the full class-wise FID trajectories over kimg are provided in the Appendix A, Figure A1, and representative samples at the evaluated milestones are compiled per model in the Appendix A, Table A1, Table A2, Table A3 and Table A4.

5.2. Using Synthetic Images for Reusability Classification

The mean test accuracies of the ResNet18 classifiers, trained on real and partial synthetic training datasets, monitored on the validation dataset, and tested on the real test dataset, are presented in Table 6. To ensure the validity of the results, five runs were conducted for each real-to-synthetic ratio and model, using the same training data. The classifiers trained solely on real data attained a mean test accuracy of 72% (SD 4%).

Notably, incorporating synthetic data from DCGAN did not result in a statistically significant improvement for any of the examined real-to-synthetic ratios. Specifically, for real-to-synthetic ratios of [2:1] (p = 0.667), [1:1] (p = 0.178), and [1:2] (p = 0.074), no statistically significant differences were observed compared to the real-data-only baseline (see Table 6). In fact, the mean classification accuracies tended to decrease as the proportion of synthetic data increased.

In contrast, the use of APA-generated synthetic data led to consistently higher mean accuracies across all real-to-synthetic ratios tested. All observed improvements were statistically significant compared to the real-only baseline: 81% (SD 6%, p = 0.030) for a [2:1] ratio, 80% (SD 4%, p = 0.019) for a [1:1] ratio, and 84% (SD 6%, p = 0.009) for a [1:2] ratio of real-to-synthetic data. The best result for APA was observed at a [1:2] ratio. For InsGen, only the [1:2] real-to-synthetic ratio resulted in a statistically significant improvement compared to the baseline (81% ± 4%, p = 0.010). Although higher mean accuracies were observed for the [2:1] and [1:1] ratios (73% and 72%, respectively), these differences were not found to be statistically significant (p = 0.622 and p = 0.996). FastGAN yielded the greatest statistically validated improvements. At a real-to-synthetic ratio of [2:1], the mean accuracy was 85% (SD 5%, p = 0.003), and at a ratio of [1:1], the mean accuracy reached 87% (SD 3%, p < 0.001), both compared to 72% (SD 4%) for real-data-only training. For the [1:2] ratio, the accuracy dropped to 72% (SD 3%, p = 0.916), which was not statistically different from the baseline.

Across all generative models, the real-to-synthetic ratio at which mean accuracy peaked differed by approach. For FastGAN, the highest mean accuracy was observed at a [1:1] ratio of real-to-synthetic data, while performance declined at a [1:2] ratio. APA achieved its best result at [1:2] ratio and showed statistically significant improvements over the real-only baseline across all tested ratios. InsGen only achieved a statistically significant improvement at the [1:2] ratio, and DCGAN did not provide statistically significant improvements at any ratio.

These outcomes are consistent with generative quality metrics. In general, lower FID scores (see Table 5) align with higher mean classification accuracies and the presence of statistically significant improvements. Specifically, DCGAN produced the highest FID scores (358.5 for recyclable and 435.08 for reusable) and did not yield significant gains in classification accuracy (71% ± 5%, p = 0.667). InsGen exhibited the second-highest FID scores (158.53 and 252.64) and reached 81% ± 4% (p = 0.010) accuracy only at the [1:2] real-to-synthetic ratio. APA achieved the second-best FID scores (126.2 and 194.03) and a maximum mean classification accuracy of 84% ± 6% (p = 0.009). FastGAN had the best FID scores (63.04 and 59.07) and the highest mean classification accuracy (87% ± 3%, p < 0.001) at the [1:1] ratio. However, in our setting, the IS metric, as used by Gao et al. [12], did not deliver meaningful results in this study (Table 5).

To further improve the best-performing ResNet18 classifier, we conducted hyperparameter optimization by varying the number of training epochs, learning rate, and batch size (see Table 7).

The best accuracy values and F1-scores were achieved in cases 17 (94% ± 3% and 94% ± 2%) and 18 (94% ± 3% and 94% ± 4%), both with 10 epochs and a learning rate of 0.00005, but with batch sizes of 32 and 16, respectively. Specifically, a batch size of 16 (case 18) yielded a precision of 98% ± 2% and a recall of 91% ± 7%, whereas a batch size of 32 (case 17) resulted in a precision of 93% ± 7% and a recall of 95% ± 4%.

In our use case, ensuring that the classified gears are truly in reusable condition is critical. Hence, having a higher precision value is crucial for minimizing the chance of false positives, i.e., identifying a used part as reusable when it is not. A false positive in this scenario could lead to significant problems, such as the failure of a remanufactured product. Consequently, we selected the classifier from case 18, which was trained with a batch size of 16 and achieved a precision of 98% ± 2% on our real test dataset, as our best model.

6. Discussion

This section provides a structured interpretation of the experimental results. We first analyze how architectural differences across GAN models influence image quality (Section 6.1), followed by an evaluation of Inception Scores in the context of domain-specific challenges (Section 6.2). We then examine the impact of real-to-synthetic data ratios on classifier performance (Section 6.3) and discuss the phenomenon of mode invention (Section 6.4). Finally, we outline the practical implications for remanufacturing (Section 6.5) and address key limitations and future research directions (Section 6.6).

6.1. Architectural Influences on Image Quality

The DCGAN images displayed noticeable blurriness and low image quality, particularly in the reusable class, as detailed in Table 4. This degradation was largely due to mode collapse, a substantial challenge in Generative Adversarial Networks, where the generator produces a limited variety of outputs [83]. Specifically, in the SOTA DCGAN, the generator mapped diverse input values to a restricted set of outputs within the data space. Consequently, this led to a marked reduction in the diversity of generated images, manifesting as repetitive features such as colors, textures, and object types. This phenomenon likely contributed to the lower (i.e., better) FID values observed in the recyclable class compared to the reusable class. Furthermore, when compared to other DE-GAN models, DCGAN demonstrated inferior IS and FID values (see Table 5). We hypothesize that this discrepancy may be attributed to discriminator overfitting, likely induced by the additional layer and trainable parameters required to generate images of 128 × 128 pixels. This overfitting could lead to a lack of stability and convergence in the DCGAN approach, impacting its overall performance.

The similar FID values observed between APA and InsGen (see Table 5), both utilizing the StyleGAN2 + ADA baseline model, suggest that their shared baseline model substantially contributes to their comparable performance. StyleGAN2 introduced several architectural enhancements over its predecessor, StyleGAN1, to address observed limitations like phase artifacts [58]. For example, the generator was changed from a Progressive Growing GAN (ProGAN)-based architecture to a more advanced configuration inspired by ResNets’ residual connections and the Multi-Scale Gradients GAN (MSG-GAN). The MSG-GAN architecture facilitates skip connections between the generator and discriminator at matching resolutions, allowing gradients to flow at multiple scales [84]. This architectural refinement ensures substantial overlap between the real and synthetic distributions, enabling the discriminator to provide gradients with valuable information to the generator [84]. Correspondingly, experiments conducted by the authors of MSG-GAN [84] demonstrated a substantial improvement in FID score over the basic DCGAN architecture, aligning with our research findings and indicating the efficiency of the MSG-GAN approach.

Although APA and InsGen exhibit highly similar FID values, subtle differences were observed, especially between the recyclable and reusable classes (see Table 5). These differences could potentially be attributed to their distinct augmentation terms. Both DE-GAN models utilize the StyleGAN2 architecture with the additional augmentation term “ADA”, which enhances discriminator stability by increasing invariance to semantic-preserving distortions. We hypothesize that, by using non-leaking augmentations like cutout, contrast, brightness, and saturation, the discriminator could detect the class-determining features more easily. Thus, the discriminator can pass more valuable information to the generator, which it can use to generate realistic-looking wear and tear, especially in the recyclable class. In contrast to InsGen, APA incorporates an additional “Adaptive Pseudo Augmentation” term, which augments the real data distribution with generated images adaptively so that the discriminator will not get over-confident (i.e., overfits) during training [36]. Consequently, APA can converge further towards the real data distribution, resulting in a decrease in the FID between the real and synthetic distributions.

In addition to the improved FID values, the APA augmentation term also mitigates the effects of image distortions and misplaced features. In our research, images generated by InsGen exhibited substantial distortions compared to those generated by APA (see Table 4). Similar observations were reported by the authors of APA, who found that additional augmentation or regularization terms, such as ADA or APA, led to improved visual quality compared to a plain StyleGAN2 model [40]. Notably, models incorporating the APA term achieved comparable or even superior visual quality compared to those with LC-regularization or ADA terms. These findings align with our research and corroborate the existing literature, as displayed in Table 2.

FastGAN outperformed other models in terms of FID values, as indicated in Table 5. This superior performance can be attributed to several key architectural features of FastGAN. Firstly, FastGAN relies on skip layer excitation modules, akin to the skip connections found in residual blocks of ResNets, enhancing model stability, particularly when generating high-resolution images. Additionally, FastGAN employs a self-supervised discriminator, which provides robust regularization, further enhancing stability during training. Moreover, the architectural differences between FastGAN and StyleGAN2 contribute to its superior performance. FastGAN is smaller and has fewer trainable parameters compared to StyleGAN2, with approximately 44 million parameters for a resolution of 1024 × 1024 compared to approximately 109 million parameters for StyleGAN2 at the same resolution [60]. This efficiency translates to quicker convergence during training, enabling FastGAN to achieve superior results in data-scarce settings [60]. Interestingly, FastGAN exhibited better FID values in the reusable class compared to the recyclable class, unlike StyleGAN2-based variants such as APA and InsGen (see Table 5). Unlike these variants, FastGAN does not incorporate an augmentation term [60], whereas APA utilizes both APA and ADA [40], and InsGen employs ADA alone (refer to Table 2) [54]. This difference leads us to hypothesize that augmentation terms play a crucial role in synthesizing heavy wear and tear appearances characteristic of the recyclable class. The absence of such terms in FastGAN may explain its comparatively lower performance in generating images representative of this class.

6.2. Inception Score Metrics

In evaluating the IS metrics, the APA model exhibited the highest IS values. However, the differences in IS values among the other generative models were minimal, as depicted in Table 5. Notably, all GAN models in our experiments demonstrated relatively low and similar IS values, considering that the IS ranges from 1 to 1000. It is important to note that the IS values achieved in our study were comparable to those reported in the SOTA DCGAN approach by Gao et al. [12], which recorded an IS range of 1.951 to 1.982. The maximum IS value corresponds to the number of classes on which the underlying Inception-v3 network is trained (1000) [85]. Given that electric bicycle gears are not a class in the pre-trained version of the network, low IS values for our dataset were anticipated. Moreover, objects with wear and tear characteristics, such as those in the recyclable class, are likely underrepresented in the classes of the pre-trained Inception-v3 network. This imbalance could explain the higher IS values observed in the reusable class across all DE-GAN models compared to the recyclable class. In contrast, the DCGAN model exhibited better IS values for the recyclable class than for the reusable class. This discrepancy may be attributed to mode collapse in the reusable class, leading to unexpected IS values.

6.3. Real-to-Synthetic Ratio

The analysis of the real-to-synthetic ratios provided important insights into the performance of the classifiers trained with synthetic data generated by FastGAN, APA, InsGen, and SOTA DCGAN. As shown in Table 6, the highest mean classification accuracy was achieved when training the ResNet18 classifier on an equal share of FastGAN generated and real data, representing a statistically significant improvement over the real-data-only baseline (p < 0.001). However, increasing the proportion of synthetic data beyond this ratio did not lead to additional statistically significant gains; in some cases, it caused the mean accuracy to decline.

We hypothesize that this superior performance at a [1:1] ratio may be related to FastGAN’s ability to generate data with low FID values (see Table 5), indicating a close resemblance between the synthetic and real distributions. Sampling synthetic data within the existing data clusters increased the diversity of the training dataset, which led to improved classification accuracy [17]. However, the capacity of generative models to create diverse new data points diminishes beyond a certain threshold, leading to synthesized samples that closely resemble existing data points, which could potentially result in overfitting [86].

The correlation between the FID values of synthetic datasets (see Table 5) and the required amount of synthetic data to achieve specific classification accuracies (see Table 6) indicates that lower FID values are associated with a smaller proportion of synthetic data needed to reach comparable accuracy levels. This observation could explain why the FastGAN-trained classifier peaked at a [1:1] real-to-synthetic ratio, whereas the APA- and InsGen-trained classifiers peaked at a [1:2] ratio (see Table 6). In contrast, classifiers trained with data generated by DCGAN did not surpass the performance of classifiers trained solely on real images. This discrepancy is likely due to the substantially different distribution of generated data compared to real data, characterized by a high FID value indicative of noisy and blurred data. Additionally, mode collapse in the reusable condition resulted in a lack of diversity in the synthetic training dataset, further hindering the performance of DCGAN-trained classifiers.

Consistent with our findings, Zhou et al. conducted a similar investigation on real-to-synthetic ratios using various GAN models, including FastGAN and an auxiliary classifier GAN (ACGAN), to generate additional synthetic images for training tomato leaf classifiers [80]. In their research, regardless of the chosen classifier, optimal results were achieved with an equal amount of real and synthetic data at a [1:1] ratio. Beyond this ratio, the accuracy of trained classifiers decreased [80].

6.4. Mode Invention

The process of generating non-existent objects by APA and FastGAN unveiled an intriguing capability of these models to create novel entities that were not present in the original dataset. This phenomenon, known as “mode invention”, occurs when models not only extrapolate new variations within the data space but also over-represent outliers, creating images that depict rare or extreme cases not typically observed in the real training data [87]. Specifically, both APA and FastGAN have successfully generated gears that were absent from the initial dataset, exemplifying this unique capability. Mode invention can be particularly advantageous when models are encouraged to explore and expand the diversity of generated outputs beyond the confines of the training data. However, it may also result in the generation of atypical, outlier characteristics.

This remarkable ability can be attributed to the effectiveness of both models in disentangling style and content, achieved through similar techniques. While StyleGAN2 relies on a revised Ada-In architecture, where modifications are made solely to the standard deviation and not the mean of the feature maps [58], FastGAN utilizes skip-layer excitation (SLE) modules with channel-wise multiplication. According to the authors of the FastGAN paper [60], SLE modules closely resemble the Ada-In architecture. As such, we hypothesize that the striking similarity observed in the generation results of APA and FastGAN could be attributed to their highly similar style modulation architectures.

6.5. Practical Implications of the Findings

The findings of this study have substantial practical implications for the remanufacturing industry, particularly in sectors characterized by both high product volumes and broad product variation, such as the automotive sector. In the European car remanufacturing industry, it is estimated that more than 27 million cores are inspected every year [88]. This classification process, traditionally performed manually, is prone to errors due to product-specific inspection criteria and the subjective assessments of inspectors. The variability and errors associated with human judgment in determining whether a part is suitable for remanufacturing can lead to considerable costs, with error rates typically ranging between 5 and 30% [89].

As increasing quantities of cores are expected in the future and demographic change is leading to a shortage of skilled workers, there is an urgent need to automate the classification process. With our work, we have demonstrated a promising solution by using generative models such as FastGAN to utilize synthetic images to improve classification accuracy. Using FastGAN with a [1:1] ratio of real-to-synthetic data allows an increase in accuracy to 87% ± 3% compared to using only real images (p < 0.001), which only produced an accuracy of 72% ± 4%. By further optimizing the hyperparameters, it is even possible to achieve accuracies of up to 94% ± 3% (see Table 7). This improvement can translate into reduced error costs and the improved overall efficiency of the inspection and sorting processes.

From an implementation perspective, the approach requires only a modest extension of the current practices. A real image dataset can be created using existing vision systems, such as a shop floor camera station. Subsequently, a DE-GAN model (e.g., FastGAN) must be implemented and trained using either cloud infrastructure or in-house computational resources (e.g., GPU cluster). Once trained, the DE-GAN can automatically generate synthetic images, which can be integrated into the training pipeline of the classification model. The final classification model can then be validated and deployed on an existing camera station with minimal adjustments. While some initial setup and supervision are required, the ongoing generation and integration of synthetic data can be largely automated.

In our setting, the additional implementation effort is moderate, particularly when considered alongside the substantial gains in classification accuracy. Furthermore, automating the classification process optimizes resource utilization and minimizes waste, making it more efficient and scalable. This transition from manual to automated classification can improve consistency in decision making regarding the reusability of parts, regardless of the volume or complexity of used parts. The application of generative models and Deep Learning in remanufacturing thus holds the promise of transforming the industry by enhancing accuracy, reducing costs, and improving overall efficiency in part classification.

6.6. Limitations and Generalizability

Our study has effectively demonstrated the capabilities of DE-GANs in generating synthetic training data, specifically focusing on a dataset comprising gears. The results provide a robust foundation for future research. However, to broaden the applicability of this approach and deepen our understanding, it would be beneficial to expand our investigations to include objects with more complex geometries or reflective surfaces. The experimental design was specifically tailored to meet the operational challenges faced by remanufacturing companies, which require robust classification capabilities in dynamic environments. We introduced diverse camera angles to simulate real-world scenarios, resulting in slight image distortions in the initial dataset. These distortions, replicated to varying degrees by the DE-GAN models—with minimal impact in APA and FastGAN and more substantially in InsGen—offer crucial insights into the models’ adaptability. Future studies could explore a more controlled dataset, with images strictly from top, bottom, and side views to minimize camera angle variations and distortions, offering a different perspective on the models’ performance in less complex visual conditions.

A key methodological choice in this study was the use of a [1:1:1] data split for training, validation, and testing (Section 4.1 and Table 1). This decision was motivated by the need to ensure the robustness and reliability of our findings for scientific validation, particularly given the limited size of the available dataset. Nevertheless, alternative data splitting strategies are common in both academic research and industrial practice. In industrial deployment scenarios, maximizing the size of the training set, potentially at the expense of smaller validation and test sets, is often preferable in aiming to achieve the highest possible model performance. Thus, the [1:1:1] data split employed in this work should be regarded as a methodological choice aiming to improve scientific rigor under data scarcity, rather than a general recommendation for all applications. Future studies could systematically investigate the impact of alternative split ratios on model performance and generalizability.

Additionally, in our current experimental setup, images were used in their entirety without prior segmentation of the gears. As a result, the background was inherently included in both the generative and classification processes. Although the background is not expected to carry semantically relevant information for determining part reusability or recyclability, its presence may inadvertently introduce noise or confounding features, potentially influencing model performance. Future investigations could examine the impact of isolating the gears through image segmentation techniques prior to training. Comparing results obtained from segmented versus non-segmented image datasets may help to quantify the influence of background information on both synthetic image quality and downstream classification accuracy.

In terms of model exploration, the usage of the ResNet18 classifier provided valuable initial insights. Expanding the scope to include various depths (e.g., ResNet34, ResNet50, ResNet101, ResNet152) and innovative models (e.g., ResNeXt, EfficientNet, Vision Transformer) could further substantiate and potentially expand these findings. Additionally, examining other generative models, such as Diffusion Models, may uncover new possibilities for enhancing synthetic data generation.

Regarding hyperparameters, the choice of an image resolution of 128 × 128 pixels was dictated by stability issues observed with the SOTA DCGAN. Experimenting with higher resolutions could provide deeper insights into the impact of these factors on model performance and output quality.

Finally, our evaluation predominantly utilized FID and IS metrics. However, we did not perform statistical validation for these metrics. Statistical significance testing in this study was limited to differences in classification accuracy between models trained with synthetic data from different GAN variants and the real-data-only baseline. For other aspects, such as the results from hyperparameter optimization, we reported mean accuracies but did not formally test for statistical significance between different hyperparameter settings. Thus, while we report statistically significant improvements in accuracy for certain model comparisons, other evaluation results (including FID, IS, qualitative assessments, and hyperparameter optimization) are based solely on descriptive statistics. To augment the depth and breadth of our assessments concerning image fidelity, diversity, and generalizability, incorporating a broader range of metrics would be advantageous. Specifically, sample-level metrics like those described in ‘How Faithful is your Synthetic Data?’ by Alaa et al. [87] could remarkably enhance the comprehensiveness of future assessments.

7. Conclusions and Outlook

Our study aimed to investigate the applicability of generative models in classifying the reusability of parts for remanufacturing. We followed a three-step approach, starting with creating a real dataset consisting of used gears from bicycle motors in two conditions: reusable and recyclable. We then evaluated synthetic images generated by three optimized DE-GANs and the SOTA DCGAN approach using the FID and IS metrics. Finally, we enriched the real training dataset by generating images using DE-GANs and the DCGAN in three different real-to-synthetic ratios. Afterward, we evaluated the trained ResNet18 classifiers, identified the best model, and executed a hyperparameter optimization.

Our findings show that augmenting the real training dataset with synthetic images generated by DE-GANs can lead to substantial, statistically validated improvements in classification accuracy, depending on both the choice of generative model and the real-to-synthetic ratio. Among the DE-GANs, FastGAN was the top-performing model. The use of FastGAN-generated synthetic data for ResNet18 classifier training significantly increased the mean classification accuracy from 72% ± 4% (real-data-only baseline) to 85% ± 5% at a [2:1] real-to-synthetic ratio (p = 0.003) and 87% ± 3% (p < 0.001) at a [1:1] ratio. Hyperparameter optimization further improved the mean classification accuracy to 94% ± 3%, although we did not statistically assess differences between hyperparameter settings. APA-generated synthetic data also resulted in statistically significant accuracy improvements compared to the baseline, with accuracies of 81% ± 6% at a [2:1] real-to-synthetic ratio (p = 0.030), 80% ± 4% (p = 0.019) at a [1:1] ratio, and 84% ± 6% at a [1:2] ratio (p = 0.009). InsGen-generated data yielded a statistically significant improvement only at a [1:2] ratio, achieving 81% ± 4% accuracy (p = 0.010), while other ratios did not show significant effects. In contrast, including DCGAN-generated synthetic data did not yield statistically significant changes in accuracy, with values of 71% ± 5% at a [2:1] ratio (p = 0.667), 66% ± 8% at a [1:1] ratio (p = 0.178), and 61% ± 10% at a [1:2] ratio (p = 0.074) compared to the real-data-only baseline.

Interestingly, although the calculated IS of the different GAN models were similar, the FID metric showed meaningful differences. These differences were highlighted by evaluating the test accuracy of the different trained classifiers. In contrast to IS, which was used by Gao et al. [12], the FID score seems to generate more meaningful results to clarify whether current generative models can synthesize realistic images of used parts and their appearance, including wear and defects.

Overall, our study shows that using FastGAN-generated training data significantly improved the classification accuracy of ResNet18 classifiers for identifying reusable gears. Further research could explore the potential of other DE-GAN approaches, including knowledge sharing or data selection techniques, in addition to the optimized DE-GAN models used in this study. Moreover, alternative generative models like Diffusion Models, variational autoencoders, or mixed/stacked architectures could be investigated to determine their effectiveness in enhancing classification accuracy and generating realistic images to classify the reusability of parts for remanufacturing.

Author Contributions

Conceptualization, M.H. and E.A.; methodology, M.H. and E.A.; software, N.S.T.K., E.A. and M.H.; validation, M.H. and C.O.; formal analysis, E.A. and N.S.T.K.; investigation, M.H. and C.O.; resources, J.K.; data curation, N.S.T.K. and E.A.; writing—original draft preparation, M.H.; writing—review and editing, M.H., C.O., J.K. and F.D.; visualization, M.H. and N.S.T.K.; supervision, J.K. and F.D.; project administration, M.H.; funding acquisition, M.H. and J.K. All authors have read and agreed to the published version of the manuscript.

Funding

This paper evolved from the research project VersAtile-KI and was funded by the Federal Ministry of Education and Research (BMBF), Germany, Grant No. 01IS23020C.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data are not publicly available due to confidentiality.

Acknowledgments

During the preparation of this study, the author(s) used GPT-4o for the purposes of textual review and enhancement. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

Acc	accuracy
ACGAN	auxiliary classifier GAN
ADA	Adaptive Discriminator Augmentation
AFI	Adaptive Feature Interpolation
APA	Adaptive Pseudo Augmentation
CIFAR-10	Canadian Institute for Advanced Research 10-class dataset
CNN	Convolutional Neural Network
Cov_y,y, Cov_ŷ,ŷ	covariance matrices
DA	Differentiable Augmentation
DCGAN	Deep Convolutional Generative Adversarial Network
DE-GAN	Data-Efficient Generative Adversarial Network
DigGAN	Discriminator Gradient Gap-Regularized GAN
DL	Deep Learning
Exp	Exponential Function
FFHQ	Flickr-Faces-HQ
FID	Fréchet Inception Distance
FSMR	Feature Statistics Mixing Regularization
GAN	Generative Adversarial Network
GenCo	Generative Co-training
GPU	Graphics Processing Unit
H(y)	Unconditional Entropy
H(y\|x)	Conditional Entropy
IS	Inception Score
kimg	thousands of images (shown to the discriminator)
LFM	Latent Feature Maximization
log	Logarithmic Function
LR	learning rate
MoCA	Memory Concept Attention
MSG-GAN	Multi-Scale Gradients GAN
μ_y, μ_ŷ	mean vectors
N	number of samples
NAS	neural architecture search
p(y)	marginal class distribution
p(y\|x)	conditional class distribution
ProGAN	Progressive Growing GAN
ROS	Random Oversampling
SD	standard deviation
SLE	skip-layer excitation
SOTA	State-of-the-Art
Tr	trace
x	generated image
y	predicted label
#	number

Appendix A

The Appendix provides class-wise FID trajectories over training and representative generated samples at the evaluated milestones. Training progress is reported in kimg, which denotes thousands of images shown to the discriminator. The evaluation protocol and convergence criteria are described in Section 4.2.

Figure A1. Class-wise Fréchet Inception Distance (FID) evolution over training for all generative models. Panels are arranged from top-left to bottom-right as follows: DCGAN, APA, InsGen, and FastGAN. Each panel shows FID versus kimg for the recyclable and reusable classes, and stars mark the lowest FID per class. For DCGAN and FastGAN, epochs and iterations are mapped to kimg as described in Section 4.2.

Table A1. Representative DCGAN samples at selected kimg milestones with corresponding class-wise FID reported in Figure A1. For each kimg, two generated thumbnails are shown per class.

kimg	Recyclable			Reusable
kimg	Sample A	Sample B	FID	Sample A	Sample B	FID
0			758.41			842.73
3.6			689.72			790.35
7.2			703.66			775.18
10.8			642.31			742.99
14.4			660.08			748.73
18			607.22			713.54
21.6			590.47			695.47
25.2			523.18			646.81
28.8			358.46			435.08
32.4			391.27			478.92
36			410.63			547.34

Table A2. Representative APA samples at selected kimg milestones with corresponding class-wise FID reported in Figure A1. For each kimg, two generated thumbnails are shown per class.

kimg	Recyclable			Reusable
kimg	Sample A	Sample B	FID	Sample A	Sample B	FID
0			408.07			408.70
100			329.28			360.91
200			272.90			341.43
300			231.74			303.17
400			148.18			275.50
500			132.94			205.93
600			126.10			193.97
700			126.16			194.03
800			141.88			214.69

Table A3. Representative InsGen samples at selected kimg milestones with corresponding class-wise FID reported in Figure A1. For each kimg, two generated thumbnails are shown per class.

kimg	Recyclable			Reusable
kimg	Sample A	Sample B	FID	Sample A	Sample B	FID
0			406.68			400.67
100			352.98			377.22
200			270.48			327.42
300			236.15			296.82
400			161.91			253.67
500			158.53			252.64
600			166.54			259.71
700			176.99			274.13

Table A4. Representative FastGAN samples at selected kimg milestones with corresponding class-wise FID reported in Figure A1. For each kimg, two generated thumbnails are shown per class.

kimg	Recyclable			Reusable
kimg	Sample A	Sample B	FID	Sample A	Sample B	FID
0			496.91			537.00
20			220.42			321.67
40			98.59			181.25
60			84.71			121.59
80			70.90			82.31
100			69.84			73.90
120			68.67			67.33
140			65.49			61.22
160			63.04			59.07
180			66.93			62.78
200			71.46			68.93

References

E.M. Foundation. Towards a Circular Economy: Business Rationale for an Accelerated Transition 2015; Ellen MacArthur Foundation: Isle of Wight, UK, 2015. [Google Scholar]
Köhler, D.C.F. Regenerative Supply Chains: Regenerative Wertschöpfungsketten; Shaker-Verlag: Aachen, Germany, 2011; ISBN 9783844006162. [Google Scholar]
Steinhilper, R. Produktrecycling: Vielfachnutzen Durch Mehrfachnutzung; Fraunhofer-IRB-Verlag: Stuttgart, Germany, 1999; ISBN 3816753078. [Google Scholar]
Nwankpa, C.; Eze, S.; Ijomah, W.; Gachagan, A.; Marshall, S. Achieving remanufacturing inspection using deep learning. J. Remanuf. 2021, 11, 89–105. [Google Scholar] [CrossRef]
Subramoniam, R.; Huisingh, D.; Chinnam, R.B. Remanufacturing for the automotive aftermarket-strategic factors: Literature review and future research needs. J. Clean. Prod. 2009, 17, 1163–1174. [Google Scholar] [CrossRef]
Zlamparet, G.I.; Ijomah, W.; Miao, Y.; Awasthi, A.K.; Zeng, X.; Li, J. Remanufacturing strategies: A solution for WEEE problem. J. Clean. Prod. 2017, 149, 126–136. [Google Scholar] [CrossRef]
Saiz, F.A.; Alfaro, G.; Barandiaran, I. An Inspection and Classification System for Automotive Component Remanufacturing Industry Based on Ensemble Learning. Information 2021, 12, 489. [Google Scholar] [CrossRef]
Nwankpa, C.; Eze, S.; Ijomah, W.; Gachagan, A.; Marshal, S. Deep Learning Based Vision Inspection System for Remanufacturing Application. In Proceedings of the International Conference on Remanufacturing, Amsterdam, The Netherlands, 23–25 June 2019. [Google Scholar]
Sundin, E.; Dunbäck, O. Reverse logistics challenges in remanufacturing of automotive mechatronic devices. J. Remanuf. 2013, 3, 2. [Google Scholar] [CrossRef]
Casper, R. Automotive Remanufacturing in a Changing Market: Challenges and Opportunities in a Market with a Growing Share of Electric Cars, 1st ed.; Linköping University Electronic Press: Linköping, Sweden, 2021. [Google Scholar]
Lundmark, P.; Sundin, E.; Björkman, M. Industrial Challenges Within the Remanufacturing System. 2009. Available online: https://www.researchgate.net/publication/229049090_Industrial_challenges_within_the_remanufacturing_system (accessed on 1 September 2025).
Gao, H.; Zhang, Y.; Lv, W.; Yin, J.; Qasim, T.; Wang, D. A Deep Convolutional Generative Adversarial Networks-Based Method for Defect Detection in Small Sample Industrial Parts Images. Appl. Sci. 2022, 12, 6569. [Google Scholar] [CrossRef]
Ljubić, H.; Martinović, G.; Volarić, T. Augmenting data with generative adversarial networks: An overview. IDA 2022, 26, 361–378. [Google Scholar] [CrossRef]
Tanaka, F.H.K.d.S.; Aranha, C. Data Augmentation Using GANs. 2019. Available online: http://arxiv.org/pdf/1904.09135v1 (accessed on 1 September 2025).
Antoniou, A.; Storkey, A.; Edwards, H. Data Augmentation Generative Adversarial Networks. 2017. Available online: http://arxiv.org/pdf/1711.04340v3 (accessed on 1 September 2025).
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. 2014. Available online: http://arxiv.org/pdf/1406.2661v1 (accessed on 1 September 2025).
Bowles, C.; Chen, L.; Guerrero, R.; Bentley, P.; Gunn, R.; Hammers, A.; Dickie, D.A.; Hernández, M.V.; Wardlaw, J.; Rueckert, D. GAN Augmentation: Augmenting Training Data Using Generative Adversarial Networks. 2018. Available online: http://arxiv.org/pdf/1810.10863v1 (accessed on 1 September 2025).
Maack, L.; Holstein, L.; Schlaefer, A. GANs for generation of synthetic ultrasound images from small datasets. Curr. Dir. Biomed. Eng. 2022, 8, 17–20. [Google Scholar] [CrossRef]
Saiz, F.A.; Alfaro, G.; Barandiaran, I.; Graña, M. Generative Adversarial Networks to Improve the Robustness of Visual Defect Segmentation by Semantic Networks in Manufacturing Components. Appl. Sci. 2021, 11, 6368. [Google Scholar] [CrossRef]
Olaniyi, E.; Chen, D.; Lu, Y.; Huang, Y. Generative Adversarial Networks for Image Augmentation in Agriculture: A Systematic Review. 2022. Available online: http://arxiv.org/pdf/2204.04707v2 (accessed on 1 September 2025).
Zhou, J. Research on Generative Adversarial Networks and Their Applications in Image Generation. In Proceedings of the 2022 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA), Dalian, China, 20–21 August 2022; IEEE: New York, NY, USA, 2022; pp. 1144–1147, ISBN 978-1-6654-8090-1. [Google Scholar]
Li, Z.; Xia, B.; Zhang, J.; Wang, C.; Li, B. A Comprehensive Survey on Data-Efficient GANs in Image Generation. 2022. Available online: http://arxiv.org/pdf/2204.08329v2 (accessed on 1 September 2025).
Yang, M.; Wang, Z. Image Synthesis Under Limited Data: A Survey and Taxonomy. 2023. Available online: http://arxiv.org/pdf/2307.16879 (accessed on 1 September 2025).
Katharopoulos, A.; Fleuret, F. Not All Samples Are Created Equal: Deep Learning with Importance Sampling. 2018. Available online: http://arxiv.org/pdf/1803.00942v3 (accessed on 1 September 2025).
Zhang, Z.; Chen, X.; Chen, T.; Wang, Z. Efficient Lottery Ticket Finding: Less Data is More. 2021. Available online: http://arxiv.org/pdf/2106.03225v1 (accessed on 1 September 2025).
DeVries, T.; Drozdzal, M.; Taylor, G.W. Instance Selection for GANs. 2020. Available online: http://arxiv.org/pdf/2007.15255v2 (accessed on 1 September 2025).
Achicanoy, H.; Chaves, D.; Trujillo, M. StyleGANs and Transfer Learning for Generating Synthetic Images in Industrial Applications. Symmetry 2021, 13, 1497. [Google Scholar] [CrossRef]
Grigoryev, T.; Voynov, A.; Babenko, A. When, Why, and Which Pretrained GANs Are Useful? 2022. Available online: http://arxiv.org/pdf/2202.08937v2 (accessed on 1 September 2025).
Kumari, N.; Zhang, R.; Shechtman, E.; Zhu, J.-Y. Ensembling Off-the-Shelf Models for GAN Training. 2021. Available online: http://arxiv.org/pdf/2112.09130v3 (accessed on 1 September 2025).
Mangla, P.; Kumari, N.; Singh, M.; Krishnamurthy, B.; Balasubramanian, V.N. Data InStance Prior (DISP) in Generative Adversarial Networks. 2020. Available online: http://arxiv.org/pdf/2012.04256v2 (accessed on 1 September 2025).
Noguchi, A.; Harada, T. Image Generation From Small Datasets via Batch Statistics Adaptation. 2019. Available online: http://arxiv.org/pdf/1904.01774v4 (accessed on 1 September 2025).
Ojha, U.; Li, Y.; Lu, J.; Efros, A.A.; Lee, Y.J.; Shechtman, E.; Zhang, R. Few-shot Image Generation via Cross-Domain Correspondence. 2021. Available online: http://arxiv.org/pdf/2104.06820v1 (accessed on 1 September 2025).
Robb, E.; Chu, W.-S.; Kumar, A.; Huang, J.-B. Few-Shot Adaptation of Generative Adversarial Networks. 2020. Available online: http://arxiv.org/pdf/2010.11943v1 (accessed on 1 September 2025).
Toneva, M.; Sordoni, A.; Des Combes, R.T.; Trischler, A.; Bengio, Y.; Gordon, G.J. An Empirical Study of Example Forgetting During Deep Neural Network Learning. 2018. Available online: http://arxiv.org/pdf/1812.05159v3. (accessed on 1 September 2025).
Wang, Y.; Gonzalez-Garcia, A.; Berga, D.; Herranz, L.; Khan, F.S.; van de Weijer, J. MineGAN: Effective Knowledge Transfer from GANs to Target Domains with Few Images. 2019. Available online: http://arxiv.org/pdf/1912.05270v3 (accessed on 1 September 2025).
Wang, Y.; Gonzalez-Garcia, A.; Wu, C.; Herranz, L.; Khan, F.S.; Jui, S.; van de Weijer, J. MineGAN++: Mining Generative Models for Efficient Knowledge Transfer to Limited Data Domains. 2021. Available online: http://arxiv.org/pdf/2104.13742v1 (accessed on 1 September 2025).
Yang, C.; Shen, Y.; Zhang, Z.; Xu, Y.; Zhu, J.; Wu, Z.; Zhou, B. One-Shot Generative Domain Adaptation. 2021. Available online: http://arxiv.org/pdf/2111.09876v1 (accessed on 1 September 2025).
Zhao, M.; Cong, Y.; Carin, L. On Leveraging Pretrained GANs for Generation with Limited Data. 2020. Available online: http://arxiv.org/pdf/2002.11810v3 (accessed on 1 September 2025).
Karras, T.; Aittala, M.; Hellsten, J.; Laine, S.; Lehtinen, J.; Aila, T. Training Generative Adversarial Networks with Limited Data. 2020. Available online: http://arxiv.org/pdf/2006.06676v2 (accessed on 1 September 2025).
Jiang, L.; Dai, B.; Wu, W.; Loy, C.C. Deceive D Adaptive Pseudo Augmentation for GAN Training with Limited Data. 2019. Available online: https://arxiv.org/pdf/2111.06849 (accessed on 1 September 2025).
Tran, N.-T.; Tran, V.-H.; Nguyen, N.-B.; Nguyen, T.-K.; Cheung, N.-M. On Data Augmentation for GAN Training. IEEE Trans. Image Process. 2021, 30, 1882–1897. [Google Scholar] [CrossRef]
Zhao, S.; Liu, Z.; Lin, J.; Zhu, J.-Y.; Han, S. Differentiable Augmentation for Data-Efficient GAN Training. 2020. Available online: http://arxiv.org/pdf/2006.10738v4 (accessed on 1 September 2025).
Zhao, Z.; Zhang, Z.; Chen, T.; Singh, S.; Zhang, H. Image Augmentations for GAN Training. 2020. Available online: http://arxiv.org/pdf/2006.02595v1 (accessed on 1 September 2025).
Dai, M.; Hang, H.; Guo, X. Adaptive Feature Interpolation for Low-Shot Image Generation. 2021. Available online: http://arxiv.org/pdf/2112.02450v3 (accessed on 1 September 2025).
Jeong, J.; Shin, J. Training GANs with Stronger Augmentations via Contrastive Discriminator. 2021. Available online: http://arxiv.org/pdf/2103.09742 (accessed on 1 September 2025).
Hou, L.; Cao, Q.; Shen, H.; Pan, S.; Li, X.; Cheng, X. Augmentation-Aware Self-Supervision for Data-Efficient GAN Training. 2022. Available online: http://arxiv.org/pdf/2205.15677v1 (accessed on 1 September 2025).
Huang, J.; Cui, K.; Guan, D.; Xiao, A.; Zhan, F.; Lu, S.; Liao, S.; Xing, E. Masked Generative Adversarial Networks are Data-Efficient Generation Learners. Adv. Neural Inf. Process. Syst. 2022, 35, 2154–2167. [Google Scholar]
Li, Z.; Usman, M.; Tao, R.; Xia, P.; Wang, C.; Chen, H.; Li, B. A Systematic Survey of Regularization and Normalization in GANs. ACM Comput. Surv. 2022, 55, 1–37. [Google Scholar] [CrossRef]
Tseng, H.-Y.; Jiang, L.; Liu, C.; Yang, M.-H.; Yang, W. Regularizing Generative Adversarial Networks Under Limited Data. 2021. Available online: http://arxiv.org/pdf/2104.03310v1 (accessed on 1 September 2025).
Fang, T.; Sun, R.; Schwing, A. DigGAN: Discriminator gradIent Gap Regularization for GAN Training with Limited Data. 2022. Available online: http://arxiv.org/pdf/2211.14694 (accessed on 1 September 2025).
Yang, M.; Wang, Z.; Chi, Z.; Du, W. ProtoGAN: Towards high diversity and fidelity image synthesis under limited data. Inf. Sci. 2023, 632, 698–714. [Google Scholar] [CrossRef]
Kim, J.; Choi, Y.; Uh, Y. Feature Statistics Mixing Regularization for Generative Adversarial Networks. 2021. Available online: http://arxiv.org/pdf/2112.04120v2 (accessed on 1 September 2025).
Pengwei, W. Latent Space is Feature Space: Regularization Term for GANs Training on Limited Dataset. 2022. Available online: https://arxiv.org/pdf/2210.16251 (accessed on 1 September 2025).
Yang, C.; Shen, Y.; Xu, Y.; Zhou, B. Data-Efficient Instance Generation from Instance Discrimination. 2021. Available online: https://arxiv.org/pdf/2106.04566 (accessed on 1 September 2025).
Kong, C.; Kim, J.; Han, D.; Kwak, N. Few-Shot Image Generation with Mixup-Based Distance Learning. 2021. Available online: http://arxiv.org/pdf/2111.11672v2 (accessed on 1 September 2025).
Chu, C.; Minami, K.; Fukumizu, K. Smoothness and Stability in GANs. arXiv 2020, arXiv:2002.04185. [Google Scholar] [CrossRef]
Karras, T.; Laine, S.; Aila, T. A Style-Based Generator Architecture for Generative Adversarial Networks. 2018. Available online: http://arxiv.org/pdf/1812.04948v3 (accessed on 1 September 2025).
Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and Improving the Image Quality of StyleGAN. 2019. Available online: http://arxiv.org/pdf/1912.04958v2 (accessed on 1 September 2025).
Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive Growing of GANs for Improved Quality, Stability, and Variation. 2017. Available online: http://arxiv.org/pdf/1710.10196v3 (accessed on 1 September 2025).
Liu, B.; Zhu, Y.; Song, K.; Elgammal, A. Towards Faster and Stabilized GAN Training for High-Fidelity Few-Shot Image Synthesis. 2021. Available online: http://arxiv.org/pdf/2101.04775v1 (accessed on 1 September 2025).
Li, T.; Li, Z.; Luo, A.; Rockwell, H.; Farimani, A.; Lee, T.S. Prototype Memory and Attention Mechanisms for Few Shot Image Generation. In Proceedings of the International Conference on Learning Representations 2022, Online, 25–29 April 2022. [Google Scholar]
Yang, M.; Wang, Z.; Chi, Z.; Zhang, Y. FreGAN: Exploiting Frequency Components for Training GANs Under Limited Data. 2022. Available online: https://arxiv.org/pdf/2210.05461 (accessed on 1 September 2025).
Cui, K.; Huang, J.; Luo, Z.; Zhang, G.; Zhan, F.; Lu, S. GenCo: Generative Co-Training for Generative Adversarial Networks with Limited Data. 2021. Available online: http://arxiv.org/pdf/2110.01254v2 (accessed on 1 September 2025).
Yang, C.; Shen, Y.; Xu, Y.; Zhao, D.; Dai, B.; Zhou, B. Improving GANs with A Dynamic Discriminator. 2022. Available online: https://arxiv.org/pdf/2209.09897 (accessed on 1 September 2025).
Chen, T.; Cheng, Y.; Gan, Z.; Liu, J.; Wang, Z. Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective. 2021. Available online: https://arxiv.org/pdf/2103.00397 (accessed on 1 September 2025).
Shi, J.; Liu, W.; Zhou, G.; Zhou, Y. AutoInfo GAN: Toward a better image synthesis GAN framework for high-fidelity few-shot datasets via NAS and contrastive learning. Knowl.-Based Syst. 2023, 276, 110757. [Google Scholar] [CrossRef]
Saxena, D.; Cao, J.; Xu, J.; Kulshrestha, T. Re-GAN: Data-Efficient GANs Training via Architectural Reconfiguration. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR) 2023, Vancouver, BC, Canada, 17–24 June 2023; pp. 16230–16240. [Google Scholar] [CrossRef]
Koch, P.; Schlüter, M.; Krüger, J. With synthetic data towards part recognition generalized beyond the training instances. AIP Conf. Proc. 2024, 2989, 020007. [Google Scholar] [CrossRef]
Kaiser, J.-P.; Enslin, C.-L.; Barczak, E.T.; Stamer, F.; Heizmann, M.; Lanza, G. A Framework for Anomaly Classification and Segmentation in Remanufacturing using Autoencoders and Simulated Data. Procedia CIRP 2024, 126, 1005–1010. [Google Scholar] [CrossRef]
Wu, C.; Zhou, K.; Kaiser, J.-P.; Mitschke, N.; Klein, J.-F.; Pfrommer, J.; Beyerer, J.; Lanza, G.; Heizmann, M.; Furmans, K. MotorFactory: A Blender Add-on for Large Dataset Generation of Small Electric Motors. 2023. Available online: https://arxiv.org/pdf/2301.05028 (accessed on 1 September 2025).
Börold, A.; Teucke, M.; Rust, J.; Freitag, M. Recognition of car parts in automotive supply chains by combining synthetically generated training data with classical and deep learning based image processing. Procedia CIRP 2020, 93, 377–382. [Google Scholar] [CrossRef]
Mohammadzadeh, M.; Kremer, G.E.O.; Olafsson, S.; Kremer, P.A. AI-Driven Crack Detection for Remanufacturing Cylinder Heads Using Deep Learning and Engineering-Informed Data Augmentation. Automation 2024, 5, 578–596. [Google Scholar] [CrossRef]
Lee, D.; Kim, K. An efficient method to determine sample size in oversampling based on classification complexity for imbalanced data. Expert Syst. Appl. 2021, 184, 115442. [Google Scholar] [CrossRef]
Radford, A.; Metz, L.; Chintala, S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. 2015. Available online: https://arxiv.org/pdf/1511.06434 (accessed on 1 September 2025).
Dowson, D.; Landau, B.V. The Fréchet distance between multivariate normal distributions. J. Multivar. Anal. 1982, 12, 450–455. [Google Scholar] [CrossRef]
Shirazi, M.; Schmitz, M.; Janssen, S.; Thies, A.; Safronov, G.; Rizk, A.; Mayr, P.; Engelhardt, P. Verifying the Applicability of Synthetic Image Generation for Object Detection in Industrial Quality Inspection. In Proceedings of the IEEE International Conference on Machine Learning and Applications (ICMLA) 2021, Pasadena, CA, USA, 13–16 December 2021; pp. 1365–1372. [Google Scholar] [CrossRef]
Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved Techniques for Training GANs. 2016. Available online: https://arxiv.org/pdf/1606.03498 (accessed on 1 September 2025).
Maturana, C.N.V.; Orozco, A.L.S.; Villalba, L.J.G. Exploration of Metrics and Datasets to Assess the Fidelity of Images Generated by Generative Adversarial Networks. Appl. Sci. 2023, 13, 10637. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Zhou, H.; Fang, Z.; Wang, Y.; Tong, M. Image Generation of Tomato Leaf Disease Identification Based on Small-ACGAN. Comput. Mater. Contin. 2023, 76, 175–194. [Google Scholar] [CrossRef]
Feurer, M.; Hutter, F. Hyperparameter Optimization. In Automated Machine Learning: Methods, Systems, Challenges; Hutter, F., Kotthoff, L., Vanschoren, J., Eds.; Springer: Cham, Switzerland, 2019; pp. 3–33. ISBN 978-3-030-05317-8. [Google Scholar]
Masters, D.; Luschi, C. Revisiting Small Batch Training for Deep Neural Networks. 2018. Available online: http://arxiv.org/pdf/1804.07612 (accessed on 1 September 2025).
Kossale, Y.; Airaj, M.; Darouichi, A. Mode Collapse in Generative Adversarial Networks: An Overview. In Proceedings of the International Conference on Optimization and Applications (ICOA), Genoa, Italy, 6–7 October 2022; pp. 1–6. [Google Scholar] [CrossRef]
Karnewar, A.; Wang, O. MSG-GAN: Multi-Scale Gradients for Generative Adversarial Networks. 2019. Available online: http://arxiv.org/pdf/1903.06048v4 (accessed on 1 September 2025).
Barratt, S.; Sharma, R. A Note on the Inception Score. 2018. Available online: https://arxiv.org/pdf/1801.01973 (accessed on 1 September 2025).
Jordon, J.; Szpruch, L.; Houssiau, F.; Bottarelli, M.; Cherubin, G.; Maple, C.; Cohen, S.N.; Weller, A. Synthetic Data—What, Why and How? 2022. Available online: http://arxiv.org/pdf/2205.03257v1 (accessed on 1 September 2025).
Alaa, A.M.; van Breugel, B.; Saveliev, E.; van der Schaar, M. How Faithful is Your Synthetic Data? Sample-Level Metrics for Evaluating and Auditing Generative Models. 2021. Available online: http://arxiv.org/pdf/2102.08921 (accessed on 1 September 2025).
Parker, D.; Riley, K.; Robinson, S.; Symington, H.; Teweson, J.; Jannson, K.; Ramkumar, S.; Peck, D. Remanufacturing Market Study. Available online: https://www.remanufacturing.eu/assets/pdfs/remanufacturing-market-study.pdf (accessed on 28 May 2025).
Dutschke, W. Fertigungsmesstechnik, 4th ed.; Vieweg+Teubner Verlag: Wiesbaden, Germany, 2002; ISBN 9783322968104. [Google Scholar]

Figure 1. Flowchart of the experimental workflow from dataset preparation to classifier evaluation Abbreviations: Train = Training, Valid = Validation, Test = Testing, R = Real, S = Synthetic, M = Mixed.

Figure 2. Sample images of gears from electric bicycle motors after a life cycle.

Figure 3. Flowchart of the real dataset creation process: Each gear was photographed under rotation six times from the front and six times from the back.

Table 1. Overview of the initial real dataset.

Condition	Reusable Condition			Recyclable Condition
Usage	Training	Testing	Validation	Training	Testing	Validation
Images in Total	72	72	72	72	72	72
Images per Part	4	4	4	4	4	4

Table 2. Fréchet Inception Distance (FID) score list of optimized DE-GAN approaches from literature that have been tested on FFHQ-5k, FFHQ-1K, and CIFAR-10 datasets. The best results are highlighted in bold.

Optimized DE-GAN Approaches			FID (Lower is Better)
Optimized DE-GAN Approaches			FFHQ-5K (256 × 256)	FFHQ-1K	CIFAR-10 (10%)
Model Architecture	[58]	StyleGAN2	49.68	100.16	36.02
	[63]	GenCo	27.96	65.31	28.08
	[67]	Re-GAN	19.13	36.30	-
	[60]	FastGAN	-	24.45	-
+ Data Augmentation	[42]	StyleGAN2 + DA	10.45	25.66	14.50
	[45]	StyleGAN2 + ContraD	-	-	9.80
	[39]	StyleGAN2 + ADA	10.96	21.29	6.72
	[46]	StyleGAN2 + AugSelf	9.15	20.39	-
	[40]	StyleGAN2 + ADA + APA	8.38	18.89	-
+ Regularization	[50]	StyleGAN2 + DigGAN	-	-	29.75
	[49]	StyleGAN2+ADA+R_LC	-	21.70	6.56
	[54]	StyleGAN2+ADA+InsGen	-	19.58	-

Table 3. Data split for experiments with different real-to-synthetic ratios.

Training Dataset			Testing Dataset
Ratio	# of Real Images	# of Synthetic Images	# of Real Images
[2:1]	144	72	144
[1:1]	144	144	144
[1:2]	144	288	144
[1:0]	144	-	144

Table 4. Comparison of a sample of real and synthetically generated images of used gears. All images were resized to 128 × 128 pixels after the generation process to ensure comparability. The possible mode collapse of the SOTA DCGAN in reusable condition will be discussed in Section 6.1.

	Reusable Condition				Recyclable Condition
Real	(R1)	(R2)	(R3)	(R4)	(R5)	(R6)	(R7)	(R8)
DCGAN	(D1)	(D2)	(D3)	(D4)	(D5)	(D6)	(D7)	(D8)
APA	(A1)	(A2)	(A3)	(A4)	(A5)	(A6)	(A7)	(A8)
InsGen	(I1)	(I2)	(I3)	(I4)	(I5)	(I6)	(I7)	(I8)
FastGAN	(F1)	(F2)	(F3)	(F4)	(F5)	(F6)	(F7)	(F8)

Table 5. Evaluation of FID and Inception Score (IS) metrics for the SOTA GAN (DCGAN) and DE-GANs (APA, InsGen, FastGAN). The best results are highlighted in bold.

	DCGAN		APA		InsGen		FastGAN
	Recyclable	Reusable	Recyclable	Reusable	Recyclable	Reusable	Recyclable	Reusable
FID ↓	358.50	435.08	126.20	194.03	158.53	252.64	63.04	59.07
IS ↑	1.55	1.03	2.16	2.24	1.85	2.22	1.71	1.78

Table 6. Accuracy results (Acc) and standard deviation (SD) of ResNet18 classifiers in % trained with different real-to-synthetic ratios. Test results are based on a fixed test dataset consisting solely of real data and are stated as the means of five separately trained classifiers. p-values were calculated using Welch’s two-sample, two-tailed t-test versus the real-only baseline (72% ± 4%); p < 0.05 was considered statistically significant. The best result is highlighted in bold.

Only Real Images	Real-to-Synthetic Ratios	Synthetic Images Generated by
Only Real Images	Real-to-Synthetic Ratios	DCGAN		APA		InsGen		FastGAN
Acc ± SD	[R:S]	Acc ± SD	p	Acc ± SD	p	Acc ± SD	p	Acc ± SD	p
72 ± 4	[2:1]	71 ± 5	0.667	81 ± 6	0.030	73 ± 4	0.622	85 ± 5	0.003
	[1:1]	66 ± 8	0.178	80 ± 4	0.019	72 ± 5	0.996	87 ± 3	<0.001
	[1:2]	61 ± 10	0.074	84 ± 6	0.009	81 ± 4	0.010	72 ± 3	0.916

Table 7. Hyperparameter optimization of ResNet18 classifier. The classifiers are trained on an equal share of FastGAN-generated synthetic images and real images (real-to-synthetic ratio of [1:1]). Test results and SD in % are based on a fixed test set consisting solely of real data and are stated as means of five separately trained classifiers. The best model (case 18) is highlighted in bold. The cases are ordered by accuracy in ascending order. Abbreviations: LR, learning rate.

Cases	Hyperparameters			Metrics
Cases	Epochs	LR	Batch Size	Acc ± SD	F1-Score ± SD	Recall ± SD	Precision ± SD
1	5	0.0003	32	63 ± 12	68 ± 12	86 ± 29	66 ± 20
2	5	0.0003	8	66 ± 13	68 ± 20	83 ± 35	70 ± 20
3	10	0.0003	8	75 ± 11	72 ± 24	83 ± 36	77 ± 15
4	5	0.0003	16	77 ± 14	78 ± 11	84 ± 20	80 ± 19
5	5	0.0001	16	82 ± 11	76 ± 19	69 ± 30	95 ± 7
6	10	0.0003	32	82 ± 16	77 ± 29	81 ± 37	87 ± 14
7	10	0.0001	16	84 ± 19	87 ± 12	89 ± 9	89 ± 21
8	5	0.0001	8	86 ± 12	82 ± 17	74 ± 25	98 ± 2
9	10	0.0003	16	87 ± 4	88 ± 3	94 ± 8	84 ± 10
10	10	0.0001	32	90 ± 12	87 ± 18	83 ± 25	97 ± 2
11	10	0.00005	8	90 ± 5	90 ± 4	90 ± 9	91 ± 11
12	5	0.00005	16	91 ± 5	90 ± 7	88 ± 14	93 ± 6
13	5	0.0001	32	91 ± 3	91 ± 3	91 ± 9	93 ± 9
14	10	0.0001	8	92 ± 2	92 ± 3	90 ± 10	94 ± 6
15	5	0.00005	8	93 ± 1	92 ± 2	89 ± 6	96 ± 4
16	5	0.00005	32	93 ± 3	93 ± 3	91 ± 8	95 ± 5
17	10	0.00005	32	94 ± 3	94 ± 2	95 ± 4	93 ± 7
18	10	0.00005	16	94 ± 3	94 ± 4	91 ± 7	98 ± 2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Herold, M.; Ahmeti, E.; Kolakaleti, N.S.T.; Odabasi, C.; Koller, J.; Döpper, F. Alleviate Data Scarcity in Remanufacturing: Classifying the Reusability of Parts with Data-Efficient Generative Adversarial Networks (DE-GANs). Appl. Sci. 2025, 15, 9833. https://doi.org/10.3390/app15179833

AMA Style

Herold M, Ahmeti E, Kolakaleti NST, Odabasi C, Koller J, Döpper F. Alleviate Data Scarcity in Remanufacturing: Classifying the Reusability of Parts with Data-Efficient Generative Adversarial Networks (DE-GANs). Applied Sciences. 2025; 15(17):9833. https://doi.org/10.3390/app15179833

Chicago/Turabian Style

Herold, Maximilian, Engjëll Ahmeti, Naga Sai Teja Kolakaleti, Cagatay Odabasi, Jan Koller, and Frank Döpper. 2025. "Alleviate Data Scarcity in Remanufacturing: Classifying the Reusability of Parts with Data-Efficient Generative Adversarial Networks (DE-GANs)" Applied Sciences 15, no. 17: 9833. https://doi.org/10.3390/app15179833

APA Style

Herold, M., Ahmeti, E., Kolakaleti, N. S. T., Odabasi, C., Koller, J., & Döpper, F. (2025). Alleviate Data Scarcity in Remanufacturing: Classifying the Reusability of Parts with Data-Efficient Generative Adversarial Networks (DE-GANs). Applied Sciences, 15(17), 9833. https://doi.org/10.3390/app15179833

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Alleviate Data Scarcity in Remanufacturing: Classifying the Reusability of Parts with Data-Efficient Generative Adversarial Networks (DE-GANs)

Abstract

1. Introduction

2. Related Works

3. Research Gap

4. Materials and Methods

4.1. Real Dataset Creation

4.2. Synthetic Image Evaluation

4.3. Reusability Classification

5. Results

5.1. Evaluation of Synthetic Images on Evaluation Metrics

5.2. Using Synthetic Images for Reusability Classification

6. Discussion

6.1. Architectural Influences on Image Quality

6.2. Inception Score Metrics

6.3. Real-to-Synthetic Ratio

6.4. Mode Invention

6.5. Practical Implications of the Findings

6.6. Limitations and Generalizability

7. Conclusions and Outlook

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI