A Data-Driven Approach for Generating Synthetic Load Profiles with GANs

Kaneva, Tsvetelina; Valova, Irena; Gabrovska-Evstatieva, Katerina; Evstatiev, Boris

doi:10.3390/app15147835

Open AccessArticle

A Data-Driven Approach for Generating Synthetic Load Profiles with GANs

by

Tsvetelina Kaneva

^1,*

,

Irena Valova

¹

,

Katerina Gabrovska-Evstatieva

²

and

Boris Evstatiev

^3,*

¹

Department of Computer Systems and Technologies, University of Ruse “Angel Kanchev”, 7004 Ruse, Bulgaria

²

Department of Computer Science, University of Ruse “Angel Kanchev”, 7004 Ruse, Bulgaria

³

Department of Automatics and Electronics, University of Ruse “Angel Kanchev”, 7004 Ruse, Bulgaria

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(14), 7835; https://doi.org/10.3390/app15147835

Submission received: 17 June 2025 / Revised: 9 July 2025 / Accepted: 11 July 2025 / Published: 13 July 2025

(This article belongs to the Special Issue Innovations in Artificial Neural Network Applications)

Download

Browse Figures

Versions Notes

Abstract

The generation of realistic electrical load profiles is essential for advancing smart grid analytics, demand forecasting, and privacy-preserving data sharing. Traditional approaches often rely on large, high-resolution datasets and complex recurrent neural architectures, which can be unstable or ineffective when training data are limited. This paper proposes a data-driven framework based on a lightweight 1D Convolutional Wasserstein GAN with Gradient Penalty (Conv1D-WGAN-GP) for generating high-fidelity synthetic 24 h load profiles. The model is specifically designed to operate on small- to medium-sized datasets, where recurrent models often fail due to overfitting or training instability. The approach leverages the ability of Conv1D layers to capture localized temporal patterns while remaining compact and stable during training. We benchmark the proposed model against vanilla GAN, WGAN-GP, and Conv1D-GAN across four datasets with varying consumption patterns and sizes, including industrial, agricultural, and residential domains. Quantitative evaluations using statistical divergence measures, Real-vs-Synthetic Distinguishability Score, and visual similarity confirm that Conv1D-WGAN-GP consistently outperforms baselines, particularly in low-data scenarios. This demonstrates its robustness, generalization capability, and suitability for privacy-sensitive energy modeling applications where access to large datasets is constrained.

Keywords:

synthetic load profile; generative adversarial networks (GANs); deep learning; energy consumption

1. Introduction

The variation in a building’s, industry’s, or sector’s electrical energy consumption over time is referred to as “electrical load profile” [1] or as “load shape”, which is the curve that represents the load as a function of time [2]. Regardless of the used term, it is known that various factors influence the load profile. For example, the consumption of domestic consumers depends on the time of day, whether it is a weekend or a workday, the meteorological conditions, and other factors. Similarly, for agricultural consumers, the load profile is dependent on factors such as the agro-technological process schedule and the meteorological conditions [3].

Collecting data on building electrical load profiles, as well as on different sectors and scenarios, could be an expensive and time-consuming task [1]. This is further complicated when factoring in the privacy concerns, mentioned by multiple authors [4,5]. One solution to address privacy issues is the generation of synthetic data that do not directly correspond to the real pattern; however, its statistical and/or probabilistic characteristics are maintained [6,7].

The main source of real load profile data is smart meters, which record the electrical energy consumption and send the information to a server for further processing and storage [8]. Nowadays, using such meters is becoming increasingly popular, as this provides easy access to consumption datasets and is the backbone of digital energy systems.

The two main studies on the load profile can be categorized as descriptive, in which the load profile is described by its characteristics, and generative, in which the aim is to generate the load profile as realistically as possible [1]. Two approaches exist for generating building load profiles, as follows: white-box approach and black-box approach. The white-box approach relies on information about the physics of the building, predicted occupants’ behavior, and demands [9]; appliance schedule [10]; etc. Based on these data, energy models are created for the simulation of the energy consumption. On the contrary, the logic of the decision-making process that lies behind a black-box approach remains hidden from the user and is not easily accessible and interpretable [11].

According to [12], the generated synthetic data should have the following three properties:

Fidelity—the synthetic data should have the same or similar data distribution as the original data;
Flexibility—the models, generating the data, should allow for the generation of data of a particular class;
Privacy—the generation of the data should be anonymized while keeping its integrity and distribution.

Furthermore, according to [3], the synthetic data should keep not only the statistical but also the probabilistic characteristics of the original dataset.

Different methods exist for generating black-box load profiles, with two of them being the most widely used. The first one is based on Markov chain theory and relies on a random generator applied to one or more transition matrices [13,14]. For example, in [15], a simple Markov-chain mixture-distribution model was proposed for very short-term (half an hour) forecasting, which can also be used for synthetic data generation when combined with a random number generator. A slightly more complex approach was presented in [16]. The proposed model includes preliminary hierarchical clustering of the data, followed by the generation of transition matrices and random walks using a random generator. In [17], a 24 × 24 transition matrix was used to model domestic load profiles. In this study, only the first record was chosen randomly, and the next samples were chosen based on the highest probability of occurrence. The authors stated that certain statistical properties of the synthetic data were satisfactory; however, the temporal properties did not compare well with the original dataset. In [18], first-order Markov chains were used for generating synthetic solar energy data with application in smart grids. In this study, the data were divided into time segments, and each segment was divided into time slots. In the follow-up case studies, the time of the day was divided into five segments in the summer and four segments in the winter.

A more advanced model based on Markov chain theory is presented in [3]. The training data are analyzed on monthly and hourly bases, and it is implemented using 24 h and 24 h change transition matrices. It showed quite a good performance at generating different types of load profiles, such as domestic, agricultural, and industrial, and was able to maintain both the statistical and probabilistic properties of the original datasets. The authors concluded that the number of states of the transition matrices is an important parameter for generating realistic data and recommended using 20+ states for optimal results. However, the results might be unsatisfactory when a limited amount of training data are available.

The second widely used method for generating load profiles is through neural networks (NNs). For example, to generate synthetic load profiles that mimic real-world electricity consumption patterns, [19] used publicly available weather data and load data from neighboring or similar regions, combined with artificial neural networks (ANNs). The challenge of limited training data was addressed by utilizing data from neighboring regions and Bayesian regularization to improve the generalization.

Among NNs, generative adversarial networks (GANs) [20,21] and variational auto-encoders (VAEs) [7,22] have proven themselves as optimal solutions. Presented for the first time by Goodfellow et al. (2014) [23], the GAN method is characterized by its ability to detect the underlying distributions of the data, thus creating a more realistic dataset. The method is also considered flexible, as it allows for the application of data generation across various domain fields and tasks. The GAN method is preferred among researchers for synthetic data generation when there is not enough data or more diversity among real samples is required [24].

In [1], a data-driven approach for electrical load profile data generation was proposed based on a GAN and data from 156 office buildings with a temporal granularity of 1 h, resulting in 56,957 daily load profiles. The basic GAN algorithm was used with the additional step of clustering. Nineteen clusters were used, each one accounting for 2–9% of the total data, and GAN models were trained for them. The obtained results showed that the GAN captured not only the general trend but also the random variations in the actual loads. Similarly, in [4], an adaptation of the original GAN algorithm was used, called DoppelGANGer, which was optimized for time-series data with high fidelity. A smaller dataset of 129,600 samples was used to gain a better understanding of the model’s characteristics and working mechanisms. They further divide the training data into batches of 90 samples per iteration. However, the quality of the generated data was not compared to the original one. A new variation of the GAN method was proposed in [25]. The ERGAN dataset was split into clusters with the K-Means method, and a separate GAN model was trained on every cluster. The authors claimed that this technique allows the framework to capture more precisely the unique characteristics and variabilities of the data. The study is focused on generating synthetic data for residential buildings, and the results showed that the proposed methodology outperformed the selected benchmark models (WGAN, ACGAN, and C-RNN-GAN) in terms of compliance with the original dataset.

A significantly different approach was used in [24], where the GAN’s ability to produce high-resolution images is investigated. The authors propose an advanced GAN-ACGAN model, which relies on transforming the load profile into load matrices, resembling images. Every row consists of the daily load profiles for one week. Every element in the matrix is related to the adjacent elements like a pixelated image. Convolutional layers are used on those pixel-like matrices in the GAN model.

In [26], a model named PATE-GAN was proposed, which adds formal privacy guarantees by integrating the PATE framework (private aggregation of teacher ensembles) to the GAN method. Differential privacy (DP) ensures that the output of an algorithm does not reveal much about any individual sample in the training data, i.e., the authors use PATE to ensure the discriminator part of the GAN respects DP. The real dataset is partitioned into multiple disjoint subsets, and the discriminator is trained on each subset. After that, the predictions are used to create noisy labels, which are used to train a discriminator, thus creating a differentiable model. The authors tested their model with various datasets across multiple domains, such as fraud detection and medical tasks. The datasets used in their study include low and high dimensions, balanced and imbalanced classes, etc.

A novel GAN-based model (CTGAN) specifically designed to generate realistic synthetic tabular data, which is challenging due to mixed data types, non-Gaussian and multimodal continuous features, and imbalanced categorical variables, was introduced in [27]. It addresses these challenges using a mode-specific normalization technique for continuous variables and a conditional generator with training-by-sampling to ensure balanced learning across categories. It outperforms existing GAN-based methods and even Bayesian network models on a comprehensive benchmark of real and simulated datasets. The authors also introduced SDGym, an open-source benchmarking framework for evaluating tabular data synthesizers, demonstrating that CTGAN achieves superior machine learning efficacy and data fidelity in most cases. A combination of seven simulated and eight real-world tabular datasets was used to evaluate synthetic data generation. Simulated datasets are based on Gaussian mixture models and Bayesian networks to allow for controlled testing of model fidelity.

In [28], the WCGAN-GP model was used to generate tabular datasets with mixed data types. The WCGAN-GP relies on the Wasserstein distance and the gradient penalty (WGAN-GP). It reduces the occurrence of failure modes that are encountered when using GANs. The WCGAN-GP is very similar to the WGAN-GP, with the only difference being that the discriminator (here called critic) and generator are both conditioned on extra information of class labels. The critic predicts values that are large for real and small for fake samples instead of classifying them as real or fake [28,29].

While prior studies have successfully applied GAN architectures for synthetic electrical load profile generation, the majority focus on big datasets or apply recurrent models (e.g., RNNs and LSTMs) to capture temporal dependencies. These approaches, although effective for modeling sequential patterns, often require extensive data, are computationally expensive and exhibit instability when trained on small or noisy datasets. Furthermore, RNN-based GANs are inherently complex to train, suffer from vanishing gradients, and offer limited parallelization capabilities, which restrict their practical deployment in real-time or resource-constrained environments.

At the same time, the potential of convolutional architectures, particularly 1D convolutional GANs (Conv1D-GANs), remains unresearched in the energy domain. While convolutional layers have proven highly effective in capturing local structure and neighborhood relations in spatial data, their application to load profile generation has not been fully explored. Especially lacking are studies that demonstrate whether Conv1D-GANs can achieve high fidelity, statistical realism, and generalizability when trained on small-scale datasets, such as single-household profiles or industrial profiles with limited samples. In this study, we define small-scale and medium-scale datasets as those containing fewer than 500 and between 500 and 10,000 records, respectively. These ranges reflect realistic constraints in domains where data privacy, limited smart meter deployment, or short observation periods restrict data availability.

Moreover, despite the presence of privacy-preserving GAN variants (e.g., PATE-GAN and CTGAN), few works have emphasized architectures that are both compact enough to work well with small data and capable of maintaining key statistical and behavioral properties like daily cycles, peaks, and variance. Therefore, there is a need for developing a lightweight, convolutional GAN model that operates reliably across various dataset domains and balances training stability, temporal fidelity, and data privacy.

This study aims to address the abovementioned gap by proposing and benchmarking a Conv1D-WGAN-GP architecture specifically tailored for 24 h load profile generation. It combines the stability of Wasserstein loss with gradient penalty and the pattern-detection capabilities of convolutional layers, offering a novel solution for robust synthetic data generation in smart grid applications where training data are limited, sensitive, or heterogeneous. Moreover, the proposed model is optimized for operating with small-scale datasets, which is a common limitation in many practical situations.

2. Materials and Methods

2.1. GAN Explanation and Definition

The structure of a GAN model is based on the idea of two neural networks—a generator that creates the data and a discriminator that has to decide if the data are real or fake. The role of the generator is to generate such data that the discriminator is fooled. Since the first introduction of the GAN methodology, various implementations and variations have been developed. A notable characteristic of the GAN architecture is that the generator and the discriminator are trained separately, and more importantly, alternately.

Figure 1 shows the simplified structure of a GAN model, in which the interaction between the generator and the discriminator is visible. The generator generates data with random samples from the latent space, which is forwarded to the discriminator that classifies it as real (1) or fake (0). Since the generator’s objective is to produce synthetic electrical load profiles that are virtually indistinguishable from real consumption patterns, and the discriminator’s task is to differentiate between authentic and generated daily profiles, the loss function becomes a central mechanism in guiding the training process.

For time-series data like electrical load profiles, where subtle variations in hourly consumption, peak timing, and temporal coherence matter, the loss function must be designed to encourage both networks to improve in a coordinated manner. It should drive the discriminator to become increasingly sensitive to fine-grained temporal differences between real and synthetic profiles, such as ramp-up rates, load spikes, or evening baseloads, which often characterize residential or industrial behavior. At the same time, it should guide the generator to produce profiles that not only match the statistical distribution of the real data (e.g., daily peak loads and standard deviations) but also preserve realistic load dynamics over time. Through this adversarial feedback loop, the loss function ensures that after each epoch the discriminator becomes better at flagging unrealistic patterns, whose loss function is denoted as d__loss, while the generator, denoted as g__loss, learns to correct and refine its output, gradually converging toward high-fidelity, temporally consistent load profiles that reflect real-world energy usage.

For the training of the generator, a latent space,

z \in R^{d_{z}},

is defined, and for the training of the discriminator, half of the real samples and half of the generated samples are taken.

2.2. Requirements Toward the GAN-Based Synthetic Data for Electrical Load Profiles

The effectiveness of GAN-generated synthetic data in the domain of electrical load profiling relies on how well it can emulate the statistical, temporal, and behavioral nuances of real-world energy consumption. Unlike image generation, a domain in which GANs excel, electrical load data are deeply tied to daily operational rhythms, seasonal changes, and infrastructure characteristics. Therefore, ensuring that synthetic data mirror these realities is a requirement. To achieve this, the following requirements should be met:

(1): Synthetic load profiles must preserve the overall shape and behavioral dynamics of real electricity consumption. This includes accurately capturing daily cycles, such as morning and evening peaks that occur in most residential buildings, as well as base load behavior during off-peak hours. For industrial facilities, the profiles may follow production-related patterns. If such patterns are not captured, the synthetic data will lack ecological validity and fail to serve its intended purpose.
(2): The generated data must respect the statistical properties of the original dataset. This may include preserving the mean, variance, skewness, kurtosis, and other domain-specific features, such as load factor, energy-to-peak ratio, or hour-to-hour correlation coefficients. These properties guarantee how well synthetic data can be used to train or test load forecasting models or simulate grid scenarios. If statistical alignment is not maintained, predictive models trained on such data may produce biased or unstable results, especially in applications like smart grid control or demand response planning.
(3): It is critical to maintain the temporal structure of the data. Electrical load profiles are a type of time series, and the value at a given hour (e.g., 3 PM) is not independent of the preceding hours (e.g., 2 PM or 1 PM). Thus, synthetic data must preserve short-term dependencies (e.g., peaks) and longer-term patterns (e.g., weekday/weekend).
(4): GANs should not generate a narrow subset of repetitive or overly similar load profiles. In reality, energy consumption varies widely across users due to differences in building type, occupancy patterns, weather sensitivity, appliance usage, and behavior. A household with young children will have a very different load profile than a single-person apartment or an industrial facility. Behavioral variety must be represented in the synthetic data to ensure its usefulness across multiple analysis contexts. Moreover, the model should also be capable of capturing and reproducing rare but important patterns, such as those resulting from heatwaves, holiday periods, or abnormal grid events. Ignoring such anomalies would make the synthetic data unrealistic.
(5): Electricity consumption profiles can be highly sensitive and revealing. They can expose personal routines, such as when occupants wake up, leave for work, return home, or go on vacation. A well-trained GAN must generate synthetic profiles that are statistically similar but not identical to any profile in the training set. This helps mitigate the risk of data leakage or re-identification, especially in residential datasets. Techniques such as differential privacy [30], regularization, and architectural constraints can help ensure that the generator learns patterns without memorizing individual examples.

2.3. Problem Definition

As was already mentioned, this study aims to develop a GAN model capable of learning the statistical and temporal patterns of hourly electrical load profiles from various datasets comprising both residential and industrial buildings. Each sample in the dataset represents a 24-dimensional vector, corresponding to a single day’s worth of hourly active power (kW) measurements at a one-hour temporal granularity. The trained GAN should generate synthetic load profiles that have the same dimensionality and distributional characteristics as the original data. The output must preserve realistic daily variation, distinguish building types implicitly, and maintain temporal coherence within each day.

The synthetic dataset must adhere to the following requirements:

Statistically similar to the real dataset in terms of distribution and variance;
Temporally consistent, reflecting plausible daily behavior (e.g., peak hours);
Unidentifiable, i.e., not replicating individual real profiles;
Equal in shape, with the same number of samples and dimensions as the input dataset, to facilitate downstream modeling without reconfiguration.

Furthermore, the proposed methodology should be applicable in the case where a limited amount of data are available.

Let the original training dataset with a temporal resolution of 1 h (i.e.,

x_{i}^{t}

is the load at hour

t \in \{1,2, \dots, 24\}

) be the following:

χ = \{x_{1}, x_{2}, \dots, x_{N}\}, x_{i} \in R^{24},

(1)

where

χ

is the dataset of real electrical load profiles,

N

is the number of daily profiles (i.e., the number of rows), and

x_{i}

is a 24-dimensional vector representing hourly power load for day i.

The GAN is trained by solving the following min–max optimization problem:

\min_{G} \max_{D} = E_{x} [\log D (x) + E_{z} [\log (1 - D (G (z)))]],

(2)

where

D (x)

is the output of the discriminator, D, when given a real sample x,

E_{x}

is the expected operator applied to all real instances and represents the average value of the discriminator’s output given a real sample, x, as an input;

G (z)

is the output of the generator, G, when given a random input z (usually noise);

D (G (z))

is the output of the discriminator when given the output of the generator (the discriminator’s classification whether the generated sample is real or fake); and

E_{z}

is the expected value operator that is applied to all random inputs to the generator (i.e., the average value of the discriminator’s output when given the generated sample z as input).

The trained generator, G, is then used to create a synthetic dataset, as follows:

\hat{χ} = \{G (z_{1}), G (z_{2}), \dots, G (z_{N})\}, z_{i} ~ p_{z} (z),

(3)

where several important notes should be made, as follows:

\hat{χ} \in R^{N \times 24}

matches the shape of χ, the statistical distance

D_{d i v} (x_{i}, {\hat{χ}}_{i})

is minimized, and the temporal and behavioral properties of the real data are preserved in the generated profiles.

2.4. Evaluation Criteria

Unlike image GANs, where human judgment can spot errors, distortions, or visual flaws, time-series evaluation is more nuanced and subject to different evaluation criteria. Therefore, several categories of evaluation metrics will be used.

2.4.1. Statistical Methods

The first line of evaluation typically involves statistical comparisons between the real and synthetic datasets. These methods aim to assess whether the synthetic data are drawn from a similar probability distribution to the original data. One commonly used metric is the Kullback–Leibler (KL) divergence [31], which measures the divergence of one probability distribution from another. However, since KL is asymmetric and sensitive to zero probabilities [32], many researchers prefer the Jensen–Shannon divergence (JSD) [12,33,34], a symmetric and more stable variant that quantifies similarity between real and generated distributions, as follows:

J S D (P | | Q) = (\frac{1}{2} D_{K L} (P | | R) + D_{K L} (Q | | R)),

(4)

where

R = \frac{P + Q}{2}

and JSD are non-negative or 0 when both distributions, the real and the synthetic, are identical.

Another popular metric is the Wasserstein distance, particularly relevant in the context of Wasserstein GANs, as it evaluates the “cost” of transforming the synthetic distribution into the real one, as follows:

W a s s e r s t e i n D i s t a n c e (u, v) = \inf_{π \in Γ (u, v)} \int_{R \times R} |x - y| d π (x, y),

(5)

where Γ(u, v) is the set of (probability) distributions on R × R, whose marginals are u and v on the first and second factors, respectively. For a given value of x, u(x) results in the probability of u at position x and the same for v(x).

Additionally, first- and second-order statistics, such as mean and standard deviation (STD), are used to compare load magnitudes and variability among datasets. A simple distance-based metric like root mean square error (RMSE) can also provide insight into the point-wise deviation between real and synthetic time series when aligned across corresponding time steps. Together, these metrics result in a baseline indication of whether the GAN has learned to mimic the distributional structure of the original data.

R M S E = \sqrt{\frac{\sum_{i = 1}^{N} {(x_{i} - \hat{x_{i}})}^{2}}{N} .}

(6)

2.4.2. Downstream Tasks

Statistical similarity is necessary but not always sufficient. A more functional evaluation approach involves testing how well synthetic data perform in various machine learning tasks. This can include training a classifier or regressor on synthetic data and testing its performance on real data. If models trained on synthetic data generalize well to real data, it indicates that the synthetic samples carry informative patterns, not just noise or memorized examples. In this study, a logistic regression classifier is trained to distinguish real from synthetic samples. Consistent accuracy across real and synthetic sets implies that the GAN has preserved task-relevant features, making the data valuable for simulation or decision-support applications.

In this study, classifier accuracy refers to the performance of a logistic regression model trained to distinguish real from synthetic samples. Unlike traditional classification contexts where higher accuracy is desirable, here a lower accuracy is preferred, as it indicates that the synthetic data are highly realistic and difficult to distinguish from real samples. Therefore, classifier accuracy serves as a proxy for realism: values close to 0.5 (random guessing) reflect high-fidelity synthetic data, while values above 0.65 suggest clear differences between the two distributions. To reduce possible confusion, this metric may alternatively be referred to as the Real-vs-Synthetic Distinguishability Score, which should be minimized.

2.4.3. Visual Analysis

Despite the need for numerical evaluation means, visual analysis remains an important supplementary tool for inspecting GAN outputs, especially for identifying obvious differences. Researchers often plot real and synthetic time series side-by-side to detect anomalies like flat lines, unnatural spikes, or lack of variability, which are common issues when the GAN fails to learn sufficient diversity or overfits specific training samples. Visualization also helps assess whether the GAN captures typical behavioral patterns, such as daily cycles or seasonal shifts.

Additionally, dimensionality reduction techniques like t-distributed stochastic neighbor embedding (t-SNE) are used to project high-dimensional time-series data into two dimensions. By plotting the t-SNE embeddings of both real and synthetic data, one can visually inspect whether the clusters overlap, indicating similarity in distribution and structure. Another useful visual diagnostic is the loss curve over training epochs, for both the generator and discriminator. Stable convergence, absence of mode collapse, and oscillation patterns in the losses provide qualitative insights into training dynamics and model stability.

The combined usage of those evaluation methods forms the evaluation framework of the proposed methodology. Based on the requirements of the model, as well as the specifics of the problem, more precisely, the generation of synthetic data for load profiles, and the literature review, the following acceptable values for each evaluation criterion are defined and summarized in Table 1.

2.5. Benchmark Models

For evaluating the performance of the proposed model, several benchmark models are selected. Vanilla GANs represent the original formulation introduced by Goodfellow et al. in 2014 [23]. They serve as a foundational benchmark for evaluating generative models in numerous studies. Despite their simplicity, vanilla GANs can effectively capture the global distribution of real data and are a valuable baseline for comparison. However, they are notoriously unstable during training, especially when applied to complex or high-variance datasets such as electrical load profiles. These types of profiles exhibit strong temporal dependencies and varied behavioral patterns, which vanilla GANs often struggle to replicate accurately. Common issues include mode collapse, where the generator produces repetitive or limited variations in outputs, and non-informative loss curves, where discriminator and generator losses do not reflect true learning progress.

Still, vanilla GANs remain important for benchmarking. They allow for assessing the baseline performance of adversarial training and provide a reference point against which improvements from more advanced models, such as WGAN-GP or recurrent GANs, can be measured. In the context of electrical load profile generation, they can be used to test whether the network can approximate daily usage patterns under perfect-like conditions and whether minimal architectures can capture essential characteristics like peak load times and average consumption levels.

Table 2 shows the architecture of the vanilla GAN used as a benchmark model. The networks’ structures adhere to the recommendations outlined in [21].

The Wasserstein GAN with gradient penalty (WGAN-GP) [28,37,38] improves upon the limitations of vanilla GANs by addressing core training instabilities and offering a more meaningful loss function—the Wasserstein loss function [29], believed to help with mode collapse [39], as follows:

L_{D} = E_{\tilde{x} \sim P_{g}} [D (\tilde{x})] - E_{x \sim P_{r}} [D (x)] + λ \cdot E_{\hat{x} \sim P_{\hat{x}}} {(| \nabla_{\hat{x}} D (\hat{x}) |_{2} - 1)}^{2},

(7)

where

D (x)

is the critic output on the real sample,

D (\tilde{x})

is the critic output on a fake sample generated by the generator,

P_{r}

is the distribution of real data,

P_{g}

is the distribution of generated data,

P_{\hat{x}}

is the distribution of interpolated samples between real and fake, and

λ

is the gradient penalty coefficient.

Instead of relying on Jensen–Shannon divergence (as vanilla GANs do), WGAN-GP minimizes the Wasserstein distance between the real and generated data distributions [29]. This distance metric provides smoother gradients and better reflects the similarity between two distributions, which helps guide the generator more effectively during training.

Another key innovation in the WGAN-GP is the introduction of a gradient penalty (called the “critic” in most cases), as follows:

G P = E_{\hat{x} \sim P_{\hat{x}}} {(| \nabla_{\hat{x}} D (\hat{x}) |_{2} - 1)}^{2},

(8)

where

\hat{x} = a x + (1 - α) \tilde{x}

is a random interpolation between the real and fake samples, and

\nabla_{\hat{x}} D (\hat{x})

is the gradient of the critic’s output.

This penalty replaces the less effective method of weight clipping used in the original WGAN [40] and leads to much more stable training, especially for complex data types. The critic no longer needs to classify samples as real or fake with hard probabilities but, instead, learns to assign real-valued scores indicating how close a sample is to the true data distribution. For electrical load profiles, this architecture is especially useful, as it can better model subtle distributional differences in energy consumption.

The advantages of WGAN-GP are especially evident in the domain of synthetic data generation. Compared to vanilla GANs, the WGAN-GP provides a more stable training process, reduces the likelihood of mode collapse, and facilitates the generation of diverse and high-fidelity samples. These characteristics make WGAN-GP suitable for generating synthetic data. Its robustness across different architectures, including convolutional and recurrent networks, and lower sensitivity to hyperparameters further contribute to its appeal for practical applications, and therefore will be used in this study. As a benchmark, the WGAN-GP offers a robust and reliable alternative to vanilla GANs. It sets a high-performance baseline for fidelity and training stability, and is widely used in research as a go-to model when evaluating new GAN variations. The benchmark architecture is described in Table 3.

While vanilla GANs and WGAN-GP offer strong baselines for generative modeling, they were originally designed for static data, such as images or tabular datasets. Electrical load profiles, however, are inherently sequential and time-dependent, meaning that each value in a 24 h profile is closely related to the hours before and after it. To better capture these local temporal dependencies, it is valuable to include convolutional GANs (Conv1D-GANs) as part of the benchmark models for synthetic time-series generation.

Conv1D-GANs incorporate one-dimensional convolutional layers into the generator and discriminator. This architecture enables the model to learn local patterns across short time windows, such as hourly consumption spikes or dips. By applying temporal convolution, the generator can effectively model translation-invariant patterns, such as the morning ramp-up or evening decline in energy usage. Unlike feed-forward GANs, which treat the 24-dimensional vector as an unordered feature set, Conv1D-GANs preserve the sequential order while capturing local correlations through sliding filters, making them well-suited for time-series data such as electrical load profiles. The benchmark used in this study follows the architectural principles described in recent convolutional GAN literature and is summarized in Table 4.

2.6. Model Definition

The approach of the authors of [24], where load profiles are represented as pixel-like vectors, and the usage of convolutional layers in the GAN network is further developed in this study. Representing load profiles as pixel-like vectors is a promising way to bridge time-series analysis with image processing techniques, where GANs excel. A 24 h load profile can naturally be modeled as a vector with 24 continuous values, where each value represents the power consumption for a specific hour of the day. These values typically exhibit temporal correlations—the load at hour t is often related to some degree to the load at hours t − 1 and t + 1; e.g.,

C o r r (x_{t}, x_{t - 1})

may be greater or smaller than

C o r r (x_{t}, x_{t + 1})

, as shown in Figure 2. By interpreting this vector as a one-dimensional image with 24 “pixels”, each corresponding to an hourly measurement, the capabilities of CNNs can be used. While this is not the case for every load, just as it is not the case with every pixel in images, the convolutional layers in a neural network are still a preferable option for handling such problems.

This approach treats the temporal load vector as a spatial signal, allowing for convolutional filters to slide across the 1D vector and capture local temporal patterns, such as peaks during midday or valleys at night. In image processing, CNNs are used for edge detection, and, in this case, they could be utilized to learn consumption patterns. Moreover, using convolutional layers enforces translational invariance and weight sharing, reducing the number of trainable parameters—an advantage especially useful when training on smaller datasets, such as load profiles from single households or limited time frames.

The architecture employed in this study is based on a WGAN-GP, tailored for one-dimensional data and CNNs. To the best of our knowledge, there is no similar usage of those combined methodologies for synthetic load profiles generation, and their examination could prove to be useful in this particular problem.

The generator and discriminator (i.e., critic) models are implemented using convolutional and upsampling layers to enable effective learning from limited data and to capture temporal dependencies in the generated sequences. Both models are compact, making them suitable for training on small datasets without overfitting, while still being expressive enough to model complex temporal patterns.

In this architecture, both the generator and discriminator are designed to handle one-dimensional time-series data representing 24 h daily load profiles. Each sample is a fixed-length vector of 24 values, where each value corresponds to the load at a given hour. Although the data are temporal, the model treats it similarly to how an image is processed in convolutional neural networks—each “pixel” (hourly load value) is contextually related to its neighbors. This perspective justifies the use of Conv1D layers, which are particularly effective in capturing local dependencies and patterns along sequential data.

2.6.1. Generator

The generator maps a latent noise vector (z) to a realistic daily load profile (x). It begins with a dense layer to project the low-dimensional latent input into a higher-dimensional representation, enabling the model to learn complex, high-level patterns early in the generation process. This dense output is reshaped into a low-resolution temporal structure, forming a 1D array with a small number of time steps and multiple features.

The use of upsampling (UpSampling1D) followed by a convolutional layer allows the model to increase the temporal resolution while applying learned filters to refine the sequence at each step. Each convolutional layer, with its small kernel size, processes each time step together with its immediate neighbors, mimicking the relationship between adjacent hours in a load profile. The LeakyReLU activation promotes better gradient flow, especially important in GANs, by allowing for a small gradient when the unit is not active.

The final layer of the generator is again a convolutional one with one output channel and a sigmoid activation, producing a 24-element vector with values between 0 and 1, consistent with normalized power consumption values. The reshape step at the end flattens the sequence for direct comparison with real data samples. The generator’s layers visualization is shown in Figure 3.

2.6.2. Discriminator

The discriminator is structured as a feature extractor and regressor that scores input load profiles on their realism. It starts by reshaping the 24-dimensional input vector into a 1D sequence suitable for convolution. The use of convolutional layers with increasing filter counts ensures that the discriminator captures increasingly abstract and broader temporal patterns in the profile while reducing dimensionality. This structure is effective in identifying inconsistencies in the generated sequences, especially since real-world load profiles often contain distinct morning, daytime, and evening patterns.

The use of small convolution kernels again emphasizes local temporal relationships, important for capturing typical load profile characteristics, such as peak demand periods or low-activity hours. Dropout layers after each convolutional block add regularization, which is important when training with small datasets, preventing overfitting to specific daily shapes. After flattening, a single dense layer with no activation produces the Wasserstein score, which guides the generator via the WGAN-GP loss formulation. The structure of the discriminator is summarized in Figure 4.

This architectural approach, combining upsampling and convolution in the generator with progressive convolution and compression in the discriminator, is well-suited for capturing the nuanced structures of daily load profiles. It allows the model to synthesize realistic time-series data with locally coherent patterns and supports robust discrimination even with limited training data.

All models developed and evaluated in this study are implemented using TensorFlow, an open-source deep learning framework widely adopted for building and training neural networks. The modular architecture of TensorFlow allows for seamless construction of both the generator and discriminator networks using Conv1D, Dense, BatchNormalization, Dropout, and UpSampling1D layers. Custom training loops are employed to implement the WGAN-GP loss with gradient penalty, ensuring stable training. Additionally, TensorFlow’s automatic differentiation engine is used to calculate the gradients required for both generator updates and gradient penalty terms. The proposed methodology is summarized in Algorithm 1.

Algorithm 1. Conv1D-WGAN-GP

1: Input: A Dataset of load profiles
2: Output: A synthetic dataset of load profiles
3: Initiate Critic variables: number of critics
4: Build the Generator:

G

5: Build the Discriminator (Critic):

D

6: Define the Gradient Penalty function:

G P

7: Train

G

model
8: Train

D

model
9: Train Conv1D-WGAN-GP:
10: for each epoch do:
11: for each number of critics do:
12: Sample random noise vector

z ~ N (0, 1)

13: Sample a batch of real samples x from the training data
14: Generate fake samples

G (z)

15: Compute the critic loss

d_{l o s s} = D (f a k e) - D (r e a l) + λ * G P (r e a l, f a k e)

16: Update critic D using gradients of

d_{l o s s}

17: end for
18: Sample random noise vector

z ~ N (0, 1)

19: Generate fake samples

G (z)

20: Compute generator loss—

g_{l o s s} = - D (G (z))

21: Update generator

G

using gradients of

g_{l o s s}

22: end for
23: Generate Synthetic Data:
24: Sample noise vectors

z ~ N (0, 1)

25: Generate synthetic samples:

G (z)

2.6.3. Hyperparameters

When training a model, regardless of its purpose, the tuning of the hyperparameters is an important step. In this study, hyperparameters such as latent dimension, batch size, epochs, number of critics, and gradient penalty coefficient (λ) are optimized.

When the model is expected to work with small datasets, careful tuning of key hyperparameters is essential to achieve stable and meaningful synthetic data generation. While the data structure resembles a time series, in this context it is more appropriate to treat each 24 h vector as a structured spatial sequence, similar to a 1D image where adjacent hourly values are tightly correlated. This comes with specific requirements about the values of the hyperparameters. An initial grid search is conducted across the following ranges, informed by prior GAN literature and domain-specific knowledge for time-series data:

Latent Dimension: 8, 16, 32;
Batch Size: 32, 64, 128;
Epochs: Fixed at 1000 for comparability;
Number of Critics: 3, 5, 10;
Gradient Penalty Coefficient λ: 1, 2, 5, 10.

One of the first decisions is choosing the dimensionality of the latent space. Since the dataset is small and the data structure compact (i.e., a 24-value vector), a large latent space can introduce unnecessary complexity and overfit the generator. A latent vector size in the range of 10 to 30 dimensions is generally appropriate and applied. It provides enough capacity to encode meaningful variation in daily load patterns while avoiding overparameterization. Using a lower-dimensional latent space also helps the generator converge faster and learn more stable mappings, especially when paired with Conv1D layers that capture localized features.

The batch size should remain relatively small to prevent overfitting and to ensure gradient diversity during training. Values like 8, 16, or 32 are recommended, considering the expected datasets. Smaller batch sizes also make better use of the limited data by introducing stochasticity in learning, which is helpful for generalization. Additionally, smaller batches make the gradient penalty estimation more stable.

The number of critic updates per generator in WGAN-GP is another important parameter. With limited data, setting the number of critics (discriminators) to 3 or 5 often strikes the right balance. Nevertheless, experimental studies with the benchmark datasets showed that 10 critics yield better results.

Finally, the gradient penalty coefficient (λ) typically defaults to 10 in the WGAN-GP literature. A high λ can overly constrain the discriminator, especially when the data are sparse, leading to vanishing gradients and poor generator performance. The preliminary experiments with the datasets and the benchmark models showed that the optimal value of λ is 2.

The final selected values (Table 5) reflect the combination that yielded the most stable training and most realistic outputs across all datasets, balancing low statistical divergence and high realism.

3. Results

3.1. Experimental Datasets

To generate synthetic load profiles with GANs, the dataset used for training the models should represent realistic and diverse electricity consumption patterns over time. A good dataset includes both temporal and contextual features to help the GAN learn patterns accurately. Granularity of 15 min, 30 min, or 1 h is common for load profiles. Ideally, at least one year of data should be available to capture seasonal-related characteristic changes in the profiles.

Residential load profiles are rarely available in public domains due to numerous factors, including privacy concerns, data ownership restrictions, and challenges in data collection. Unlike industrial or aggregated grid-level data, which is often anonymized and released for planning purposes, residential consumption data can reveal highly sensitive personal information. Utilities and data providers are therefore reluctant to share such data. Additionally, residential datasets are typically collected through smart meters, which may not be uniformly deployed or accessible across regions, limiting the volume and variety of data available. Even when such data exist, it is often locked behind proprietary platforms or research agreements, making synthetic data generation a crucial alternative for researchers and model developers who need realistic, privacy-safe alternatives for training and evaluation.

3.1.1. Publicly Available Residential Dataset

The ERGAN dataset, introduced as part of the ensemble recurrent GAN (ERGAN) framework [25], consists of high-resolution synthetic residential electricity load profiles designed to closely replicate real-world consumption behavior [41]. The dataset is developed from the original Pecan Street Database and consists of electrical hourly measurements of 417 households from 2017 [42]. In the original work of the authors, the dataset is processed using an ensemble of Bi-LSTM-based GANs, each trained on a distinct cluster of load patterns, identified using K-means clustering and the Davies–Bouldin score for optimal segmentation. Each cluster captures a specific category of household consumption behavior, such as high daytime usage, evening peaks, or flat weekend patterns, enabling more granular control over the diversity and realism of the generated data. For benchmarking purposes, one of the available clusters is used. This allows for focused evaluation on a well-defined and behaviorally consistent subset of residential profiles, making it suitable for assessing the performance of generative models under specific consumption conditions. The data include hourly power measurements and maintains realistic temporal dependencies, statistical properties, and intra-day variability, making it a strong proxy for real residential load data in both qualitative and quantitative evaluations. The dataset used in this study has approximately 110,000 samples, which classifies it as a case study on a large dataset.

The heatmap of the ERGAN dataset is shown in Figure 5. One distinct pattern in the load profiles can be observed, as follows: an increase in the electrical consumption every day around the same period.

3.1.2. Industrial Dataset

A dataset from a meat processing plant consisting of detailed electrical load profiles captured at an hourly resolution, reflecting the operational energy consumption of an industrial facility, is used as the second benchmark dataset. This type of dataset is characterized by distinct load signatures, including high base loads during production hours, sharp demand fluctuations corresponding to equipment cycles (e.g., refrigeration, cutting, packaging), and relatively stable overnight or non-operational periods. The data provide valuable insights into industrial energy behavior, where load profiles are driven less by occupant activity and more by production schedules, machinery usage, and environmental controls. As a benchmark dataset, it offers a contrasting profile to residential data, enabling the evaluation of generative models on structured, high-load, and less behaviorally diverse time series. Its inclusion supports testing model robustness across different consumption domains, particularly in capturing steady-state loads, operational peaks, and equipment-induced load variability. The dataset includes 2 years of data and has approximately 700 samples, and can be classified as a medium-scale dataset. Figure 6 shows a heatmap of the Industrial dataset in the first week of March 2023. What is notable is the clear pattern of starting operations in the morning, high load on the weekdays, highest load in the first hours of the workday, etc.

3.1.3. Household

The dataset named Household in this has around 270 samples, representing the load profiles from a single household for one year (with some gaps). While its nature is similar to the ERGAN dataset, this one could give insight into the performance and effectiveness of the model in a real-life, small-scale scenario. Based on the number of samples in this dataset, it could be classified as a small dataset. Figure 7 shows the data from the Household dataset for the second week of November 2023. As expected, this dataset does not show any clear patterns on first glance. While the data shown in the figure are just for one week, and it is possible that there are visible patterns on a larger scale, this dataset is suitable for a case study in this methodology, as the samples are treated as 1D vectors with no relations to the other samples in the dataset.

3.1.4. Pig Farm

The dataset called Pig Farm was taken from an agricultural building and includes hourly energy consumption records for 1 whole year. In this case, the load profile depends on the agro-technological processes, such as the schedule of application of the different systems of the farm, such as the ventilation system, feeding system, heating system, lighting system, etc. Considering that the breeding of animals is always related to maintaining certain microclimate conditions, the energy usage also strongly depends on the meteorological conditions. This dataset contains 363 records, i.e., it can be classified as a small-scale dataset. The Pig Farm dataset’s heatmap, shown in Figure 8, does show higher load profiles in the first three days of the week, and not many other obvious patterns. The data shown in the heatmap are from the third week of January 2023.

Using diverse datasets for synthetic load profile generation presents several challenges. First, there are significant differences in scale and behavior between residential and industrial consumption patterns, making it difficult to train a unified model without introducing bias or mode collapse. The ERGAN residential dataset, although rich in diversity, may contain inconsistencies such as varying sampling intervals. The industrial and agricultural datasets, while temporally consistent, represent a very different load profile with more regular patterns and higher magnitude, which may dominate training if not carefully balanced. The smaller the dataset, the lower the generalization of the model and the higher the risk of overfitting. While the household dataset is the same in its foundation as the ERGAN dataset, it is not clustered as the ERGAN and is more than 300 times smaller, as it is from just one household. Table 6 summarizes the datasets’ characteristics.

3.1.5. Data Preprocessing

To ensure consistency across datasets and support model training, several preprocessing steps are applied, as follows:

Normalization—all datasets are min–max normalized independently to scale hourly consumption values into the [0, 1] range. This normalization is performed per dataset to avoid leakage of information across domains and to ensure that the generator outputs remain within valid ranges for each specific case;
Missing Value Handling—for the Household dataset, which contained intermittent missing values due to sensor outages or gaps in measurement, linear interpolation was applied within each 24 h profile to fill missing hourly entries (when gaps were ≤2 h). Days with more than two missing hours were excluded from the dataset;
Profile Validation—only days with complete 24 h load profiles (after preprocessing) are retained. This ensures uniform input dimensionality for the GAN training.

These preprocessing steps are performed for the comparability and integrity of the datasets, while also enabling fair benchmarking across GAN architectures. All preprocessing is implemented in Python (v 3.10) using the NumPy and Pandas libraries.

3.2. Evaluation and Comparison

Four different GAN architectures—Vanilla-GAN, WGAN-GP, Conv1D-GAN, and Conv1D-WGAN-GP—across four datasets of varying sizes have been trained and evaluated in this study. Each model is trained for 1000 epochs with a batch size of 128 and a latent dimension of 16. Quantitative assessments were performed using a comprehensive number of metrics, as follows: average Wasserstein distance, mean and standard deviation differences, classifier accuracy (from a logistic regression model trained to distinguish real from synthetic samples), Jensen–Shannon divergence, regression metrics (RMSE and MAE), histograms, and t-SNE plots. The training results for every measurable metric are presented in Table 7.

3.2.1. Statistical Methods for Evaluation of the Synthetic Data

The Wasserstein distance (W-Distance in Table 7) results across all datasets demonstrate that the Conv1D-WGAN-GP consistently achieves the lowest scores, indicating superior alignment between real and synthetic data distributions, as shown in Figure 9. For example, it records values as low as 0.0198 (ERGAN), 0.0239 (Industrial), 0.0329 (Household), and 0.0209 (Pig Farm)—all within the excellent threshold of <0.15 (as defined in Table 1). These results highlight the effectiveness of combining convolutional structures with gradient penalty regularization in capturing complex temporal dependencies and improving the stability of GAN training. In contrast, standard Conv1D-GAN and WGAN-GP models exhibit higher W-Distance values (often >0.05), suggesting less accurate distributional matching. And while those results are still within the 0.15 threshold, the results obtained by the Conv1D-WGAN-GP are preferable. Low Wasserstein distance values are particularly important, as they reflect how well the synthetic profiles mimic the hourly variations seen in the real data.

It is worth noting that the Conv1D-GAN model records a significantly higher Wasserstein distance (0.1162) on the Pig Farm dataset compared to all other models. This deviation can be attributed to several factors. First, the Pig Farm dataset is relatively small (only 363 samples), making it highly prone to overfitting or mode collapse in unconstrained GANs like the Conv1D-GAN, which lack gradient penalty regularization. Second, Conv1D-GAN relies solely on adversarial loss with no explicit regularization or divergence control, which may cause the generator to fail in capturing the subtle and irregular temporal patterns present in the Pig Farm data. These limitations reinforce the need for regularization techniques, such as the gradient penalty used in Conv1D-WGAN-GP, to ensure stable training and improved performance, particularly on small and irregular datasets.

The mean and STD difference metrics offer crucial insight into how well the synthetic data preserve the central tendency and variability of the original dataset (Figure 10).

Across all datasets, the Conv1D-WGAN-GP model performs strongly, often achieving values well within the excellent thresholds (<0.05 for the mean and <0.1 for the std, as per Table 1). For instance, in the ERGAN dataset, it achieves a mean diff. of 0.0048 and a STD diff. of 0.0153, indicating near-perfect reproduction of average and variability. Similarly, low values are observed for Industrial (0.0124, 0.0084) and Pig Farm (0.0096, 0.0081), reinforcing the model’s ability to preserve both the load level and its temporal spread, regardless of the dataset size.

In contrast, models like Conv1D-GAN show elevated mean and standard deviation differences, with mean values reaching 0.1142 (Pig Farm) and STD differences as high as 0.0510, indicating a tendency to inject both bias and excessive variability, which may reduce realism. Meanwhile, Vanilla-GAN shows inconsistent behavior, occasionally maintaining a low mean but with suboptimal variability (e.g., Household STD of 0.0343—just at the tolerable edge). While the Conv1D-WGAN-GP model generally achieves the lowest Mean and STD differences across datasets, it is worth noting that in the Pig Farm dataset, the Vanilla-GAN model achieves a slightly lower Mean diff. (0.0092 vs. 0.0096 for the Conv1D-WGAN-GP) and a reasonably close STD diff. (0.0161 vs. 0.0081). This indicates that Vanilla-GAN is capable of capturing central tendencies effectively, at least in simpler statistical terms. However, when viewed alongside other metrics, such as the higher Real-vs-Synthetic Distinguishability Score (0.5517 for Vanilla-GAN vs. 0.5034 for Conv1D-WGAN-GP) and less overlap in t-SNE visualizations, the Conv1D-WGAN-GP still demonstrates superior realism and generalization. Nonetheless, this highlights that no single metric should be used in isolation, and even simpler GANs like Vanilla-GAN can perform well in specific scenarios. Overall, the Conv1D-WGAN-GP stands out by consistently preserving both the average load and its natural fluctuation across diverse settings, which is essential for reliable downstream tasks such as load forecasting and anomaly detection.

The JSD metric (Figure 11) offers insight into the overall similarity between the real and synthetic data distributions, with lower values indicating higher fidelity. Across the datasets, Conv1D-WGAN-GP consistently achieves the lowest JSD scores, such as 0.0062 (Industrial) and 0.0078 (Pig Farm), placing it firmly within the excellent range (<0.05). This suggests that the model effectively preserves the global statistical structure of the original data. In contrast, models like Conv1D-GAN often exhibit much higher JSD values, for example, 0.0565 (Pig Farm), indicating greater divergence and less realistic generation.

3.2.2. Downstream Tasks for Evaluation of the Synthetic Data

The Real-vs-Synthetic Distinguishability Score is a critical metric in evaluating the realism of synthetic data, as follows: if a simple model like logistic regression struggles to distinguish between real and synthetic samples (accuracy ≤ 0.6, as per the reference table), it suggests high-quality, indistinguishable outputs. In this context, Conv1D-WGAN-GP consistently achieves scores at or below the ideal threshold of 0.6 across all datasets (Figure 12)—0.5354 (ERGAN), 0.5567 (Industrial), 0.5315 (Household), and 0.5034 (Pig Farm)—indicating that the generated profiles are highly realistic and difficult to separate from real data.

In contrast, other models such as WGAN-GP and Conv1D-GAN often exceed the tolerable upper bound of 0.65, with scores as high as 0.869 (Pig Farm) and 0.8559 (Household). These high values indicate that these models produce synthetic data with identifiable differences from the real data, undermining their usefulness in applications requiring high fidelity.

Figure 13 illustrates that the basic Conv1D-GAN exhibits the greatest variability in the scores across the four datasets, suggesting that its ability to generate realistic data is highly sensitive to dataset-specific factors such as size and underlying patterns. For instance, while it reaches a high accuracy of ~0.87 on the Pig Farm dataset, indicating poor realism, it drops significantly to around 0.61 on Household, reflecting inconsistent performance.

3.2.3. Visual Methods for Evaluation of the Synthetic Data

From the t-SNE plots of the ERGAN dataset (Figure 14), which visualize the distribution of real and synthetic data in a 2D space, the Conv1D-GAN shows a noticeable gap between real and synthetic clusters, as follows: the real data spread broadly and form a complex, scattered cloud, while the synthetic data remain concentrated in a dense central blob. This suggests that the Conv1D-GAN struggles to reproduce the diversity of the original data distribution. Similarly, the Vanilla-GAN exhibits a distinct clustering of synthetic data in the center of the plot, with real data again forming a more expansive cloud around it. This indicates mode collapse or a lack of coverage of the full real data distribution, despite partial overlap.

In contrast, both the Conv1D-WGAN-GP and WGAN-GP models achieve a better overlap between synthetic and real distributions. The Conv1D-WGAN-GP t-SNE plot reveals tighter integration of orange points within the blue cloud, implying improved fidelity and better coverage of the real data space. The WGAN-GP model demonstrates the most even spread, with the synthetic points largely embedded within the real distribution, suggesting that it has learned not only the central structure but also the outer variations in the real dataset.

The t-SNE plots for the Industrial dataset are shown in Figure 15. The Conv1D-GAN model displays partial overlap between real and synthetic clusters, but several synthetic regions remain separated, suggesting the model fails to fully capture the distribution’s complexity. Similarly, the Vanilla-GAN shows even more distinct segregation, with synthetic points clustering apart from real ones in multiple regions. This reinforces that the Vanilla-GAN struggles to generalize and exhibits mode collapse, generating synthetic samples from only a subset of the real data space.

The Conv1D-WGAN-GP and WGAN-GP models again show significantly improved results. In the Conv1D-WGAN-GP plot, the synthetic points are more uniformly scattered among the real clusters, with less evidence of isolated synthetic groupings. This indicates the model has better learned the local structure of the data. The WGAN-GP model performs the best: synthetic data points are well integrated with real ones across all clusters, and there is no major separation among domains. This level of overlap points to a high-fidelity generative process that successfully mimics both the global and local distributional features of the original data.

The t-SNE plots for the Household dataset (Figure 16) demonstrate clear structural patterns and allow a visually informative comparison of model performance. The Conv1D-GAN plot shows moderate mixing between real and synthetic data points, although some regions still display separation, particularly with clusters where synthetic points form localized groups that do not fully overlap with the real distribution. This suggests that the model learns the general structure but misses fine-grained variability. The Vanilla-GAN, by contrast, exhibits noticeable misalignment: clusters of synthetic data tend to shift away from real clusters, indicating limited generalization and possible mode collapse, especially where synthetic points dominate certain regions without a corresponding real data presence.

The Conv1D-WGAN-GP shows a strong improvement, with synthetic and real samples overlapping extensively across the t-SNE space. While some minor cluster shifts still exist, the visual density and pattern alignment are well-preserved, signaling the model has learned the key distributional characteristics more faithfully. In the WGAN-GP model, the real and synthetic points are thoroughly intermingled throughout the space. There is no evident structural separation, and the clusters contain a balanced mix of both domains. This indicates that the WGAN-GP architecture effectively captures both global and local structure in the household dataset, making it the most robust among the four models for generating realistic synthetic data.

While Conv1D-WGAN-GP consistently demonstrates strong visual overlap between real and synthetic samples in t-SNE projections, it is important to note that WGAN-GP performs equally well or better on some datasets, such as Industrial and Household. In these cases, the synthetic data generated by WGAN-GP is thoroughly interspersed with real data, indicating high distributional fidelity, even without the convolutional structure of the proposed model.

The Pig Farm dataset illustrates the varying capability of GAN models in capturing complex, overlapping data patterns, as evident in Figure 17.

The Conv1D-GAN model shows partial mixing of synthetic and real points, but distinct clusters of real and synthetic data remain, especially on opposite ends of the space. This separation indicates that the model has learned some general structure but fails to cover the full range of real data diversity. In the Vanilla-GAN plot, the separation is more pronounced, with large synthetic clusters diverging from real ones. The lack of integration among distributions confirms that Vanilla-GAN underperforms in modeling this dataset.

The Conv1D-WGAN-GP plot demonstrates much better overlap, with real and synthetic data points interspersed more uniformly across the space. Though minor shifts still appear in some regions, the general structure of the real distribution is reasonably well replicated. Finally, in the WGAN-GP model, the synthetic and real data distributions are thoroughly mixed across all visible clusters, suggesting the model captures both local and global properties of the original data.

Analyzing histograms per hour is critically important because it allows for a visual and quantitative comparison of how well the synthetic data replicate the real data’s distribution at each hour. Histogram plots make it easy to spot specific hours where the synthetic data fail to capture the correct distributional shape, such as missing peaks, incorrect skewness, or differences in spread and variability.

Figure 18 shows a comparative visualization of the real and synthetic data from the ERGAN dataset for the 20th hour. The hour for the visualization (20th) is chosen based on its interesting distribution of the real data, as there is a noteworthy spike near the maximum value of 1.0. The Conv1D-GAN model captures the general shape of the real distribution reasonably well. It reflects the broad peak of real values around the mid-range (0.3 to 0.6) but fails to replicate the sharp spike in real data near the maximum value (1.0). This underrepresentation of high-load values suggests that while the model learns the central trend, it struggles to generate extreme load cases.

In contrast, the Vanilla-GAN produces a smoother, bell-shaped synthetic distribution centered around 0.5. Although it aligns reasonably well with the real data in the central range, it noticeably fails to capture the asymmetry of the real distribution and completely misses the accumulation of values near 1.0. As a result, the synthetic data appear too regular and idealized compared to the more varied and skewed real distribution. This indicates that the Vanilla-GAN lacks the capacity to model the complex distribution tails effectively.

The Conv1D-WGAN-GP model delivers a moderately improved approximation with a broader spread and less central bias compared to Vanilla-GAN and Conv1D-GAN. However, it still fails to reproduce the sharp spike near the upper end of the real, similar to the other models. While this reflects improved balance across mid-range values, the advantage is less visually pronounced than in other cases.

Lastly, the WGAN-GP model struggles the most. It significantly overestimates the frequency of low-load values and fails to model both the mid-range and high-end values with any fidelity. The synthetic distribution is skewed heavily toward the lower range, creating a pronounced mismatch with the real data. This indicates a weakness in the model’s ability to learn from the full range of real data, particularly when it comes to representing more extreme values.

Based on the histogram visualization of the Industrial dataset (Figure 19), an interesting case that should be further examined is the 6th hour.

Therefore, the 6th hour is chosen for the histograms in Figure 6. The Conv1D-GAN model shows a reasonable match in the central range of the distribution (roughly between 0.15 and 0.6). However, the synthetic data overrepresents the low-load region (around 0.1 to 0.3), while it underrepresents values in the higher range (above 0.6). The real data display a noticeable peak near 0.65 and scattered values beyond that, which the synthetic data fail to replicate accurately. Overall, the model captures the distribution’s general shape but misses details, particularly in the higher-load tail.

The Vanilla-GAN model continues to produce a smooth, bell-like synthetic distribution centered in the 0.2–0.5 range. While the match is acceptable in the middle of the range, the model again fails to capture the peaks in the higher-load portion of the real distribution. It also appears to generate an excessive number of mid-range values while not representing the lower and higher extremes well. This results in a distribution that is less realistic and biased toward average load levels.

The Conv1D-WGAN-GP model performs slightly better in representing the overall variability of the real data. It captures values across the full range and appears more aligned with the real distribution in both shape and spread. However, there are still mismatches, particularly around 0.6, where the real data have stronger peaks. Despite some misalignment in frequency, this model seems to offer a more balanced approximation with a diverse synthetic output.

The WGAN-GP model, similar to its Conv1D-enhanced counterpart, succeeds in spreading synthetic values across the full domain. It replicates the general bimodal nature of the real data reasonably well, although the heights of the peaks differ. It still underrepresents some of the more prominent clusters in the real data, especially beyond 0.6. Nevertheless, among all models, it appears to generate the most evenly distributed synthetic data, though not necessarily the most accurate in terms of matching specific distribution features.

In general, when talking about households and their consumption, the time around the 18th hour is considered important, as there are expected peaks in the loads. Therefore, this is the hour represented in Figure 20 as its real distribution presents a challenge due to its steep skew and dominant low values.

The Conv1D-GAN shows a partial match with the real data. It captures the overall decreasing trend of the distribution, with a prominent spike near zero. However, the real data display a much sharper and higher peak at the lowest load values, which the synthetic distribution underestimates. Additionally, the synthetic values are more evenly spread across the 0.1–0.5 range, resulting in a smoother curve that does not fully reflect the abrupt drop-off seen in the real distribution. This indicates moderate success in capturing the general structure but a lack of precision in replicating the intensity of the low-load spike.

The Vanilla-GAN performs similarly, with slightly better alignment near the zero-load region. It reproduces the initial peak more closely but still falls short of matching its magnitude. The synthetic data again displays a smoother and more gradual decline, missing the sharp dips and sparse outliers in the real data. While it succeeds in following the overall skewed pattern, the generated distribution appears overly regular and lacks the noise and asymmetry observed in the real data.

The Conv1D-WGAN-GP model performs better than the previous two in capturing the core structure of the distribution. It matches the sharp rise at the beginning and more accurately represents the frequencies of mid-range values. However, some fluctuations remain mismatched, and the model generates slightly more synthetic values in the range above 0.5 than the real distribution suggests. Still, among the first three, this model achieves the most balanced synthesis, showing a credible representation of the sharp left-skew and long tail.

The WGAN-GP model delivers the closest overall approximation. It replicates the sharp spike at near-zero load with the highest fidelity among the models and produces a realistic decline into the mid-range values. Its synthetic distribution follows the real distribution’s fluctuations more faithfully, without over-smoothing or underestimating low-load dominance. The tail also appears better matched, with a proper drop in counts beyond 0.6.

The Pig Farm dataset is of high interest, as this is a typical dataset that is considered in this methodology—a small dataset with a load profile from one complex for one year. This is also an industrial dataset, but as evident from Figure 21, there are no clear patterns in the load profiles. Therefore, the chosen hour (9th hour) for the histograms of the models is based on asymmetric distributions of the real data. The Conv1D-GAN captures the central part of the real distribution fairly well, particularly in the range between 0.2 and 0.5. The synthetic values align moderately with the real ones, but the model underrepresents higher load values above 0.6 and also overrepresents some lower values between 0.1 and 0.2. While the model reproduces the general shape of the distribution, it misses the variability and irregularity seen in the real data, especially toward the tail ends.

The Vanilla-GAN performs less convincingly. It generates a concentration of synthetic values around 0.2 to 0.3 and again near 1.0, overestimating low-to-mid values and producing an unrealistic tail spike. The real data show a more gradual drop-off after the peak, but the synthetic distribution here appears biased and noisy, introducing artificial structure that is not present in the true distribution. The mismatch is especially visible at both extremes, suggesting that the Vanilla-GAN overfits local patterns without generalizing well.

The Conv1D-WGAN-GP model offers a significantly better match to the real distribution. The synthetic values track both the main peak near 0.3 and the longer tail toward 0.6 quite well. The histogram shapes are aligned, with some divergence in the finer local fluctuations. This model can replicate the skew, the spread, and even parts of the multimodal structure with better fidelity than the previous two. However, minor mismatches remain, particularly in the underrepresented regions between 0.6 and 0.8.

The WGAN-GP model also performs quite well with a realistic synthetic distribution that closely mirrors the shape and spread of the real data. It captures the dominant modes, spreads values appropriately across the 0.2 to 0.6 range, and avoids producing outliers at the extremes. Compared to the other models, WGAN-GP exhibits a good balance between smoothness and fidelity to real variability.

3.2.4. Overall Model Performance of Conv1D-WGAN-GP

The results across all datasets consistently highlight the superior performance of the Conv1D-WGAN-GP model, confirming its robustness and generalizability in synthetic time-series generation, particularly for 24 h electrical load profiles. Its design, merging the Wasserstein loss with gradient penalty (WGAN-GP) and convolutional sequence modeling (Conv1D), enables effective capturing of the local temporal dependencies while maintaining training stability—a challenge commonly faced in traditional GAN training.

Across all four datasets, Conv1D-WGAN-GP consistently achieved the lowest average Wasserstein distance and classifier accuracy. These results are particularly noteworthy because classifier accuracy serves as a proxy for how distinguishable the generated data are from the real samples. The fact that Conv1D-WGAN-GP maintained this indistinguishability even as the dataset size scaled from a few hundred to over 100,000 samples demonstrates its strong capacity to model the underlying data distribution without overfitting or mode collapse.

Moreover, the model achieved the smallest mean and standard deviation differences in nearly every setting. These low-level statistics suggest that the generated distributions closely match the empirical characteristics of the real data. This was further corroborated by consistently low JS divergence values, indicating tight alignment between real and synthetic distributions.

This scalability suggests that the model architecture is not overly sensitive to the volume of training data, making it suitable for real-world industrial applications, such as privacy-preserving smart meter data simulation or renewable energy forecasting augmentation.

The stability of Conv1D-WGAN-GP training is also worth noting. The use of gradient penalty rather than weight clipping (as in classical WGANs) avoids convergence issues and contributes to more stable discriminator updates.

Despite the shown strengths, the Conv1D-WGAN-GP model still relies on relatively simple upsampling techniques and linear activation layers within the generator. Future work could explore integrating attention mechanisms, dilated convolutions, or temporal fusion transformers to further enhance its ability to model multi-scale temporal patterns. Additionally, the model currently assumes a fixed-length (24 h) output; adapting the architecture for variable-length or multivariate time-series generation is a promising direction.

4. Discussion

4.1. Dataset Size Sensitivity

An important strength of the proposed method is its ability to perform well on small datasets. Since the size of the training dataset is a common limitation due to various reasons, e.g., privacy concerns, data collection costs, regulations, time-related problems, such a method where the generated data resemble the real data, however small and limited it is, is important and valuable base for further studies and usages. While traditional GANs tend to overfit or collapse under limited data, the Conv1D-WGAN-GP model maintained low divergence scores and classifier accuracy even with fewer than 400 samples (often, representing data for just one year). Its compact architecture and reduced parameter complexity make it particularly well-suited for cases where real load data are scarce or noisy.

4.2. Role of GANs in Load Profile Generation

GANs offer a powerful framework for modeling the complex and varied nature of energy consumption patterns. Compared to rule-based or statistical models, GANs can learn non-linear, multimodal distributions directly from historical data, preserving key behaviors. However, conventional GAN architectures often suffer from instability, mode collapse, or unrealistic outputs. As this work demonstrates, GANs are not only viable but also effective for generating synthetic energy data across multiple domains, enabling privacy-preserving data sharing, benchmarking of forecasting models, and scenario testing in grid simulations.

4.3. Proposed Conv1D-WGAN-GP Model

Across all datasets, Conv1D-WGAN-GP demonstrated the best balance between data fidelity and indistinguishability. Its ability to maintain low divergence scores and classifier accuracy across different-sized datasets illustrates strong generalization and robustness. While a traditional architecture like Vanilla-GAN sometimes performs reasonably on RMSE, its Real-vs-Synthetic Distinguishability Score is often high, and t-SNE plots show poor overlap, indicating it generates less realistic and distinguishable data. WGAN-GP tends to overfit or overshoot on some datasets, especially evident in its higher classifier accuracy and visual artifacts in distribution coverage. Conv1D-GAN, despite the benefits of convolution, frequently exhibits the highest errors and visual mismatches, particularly with the small Pig Farm dataset. These results validate the proposed architecture’s suitability for practical applications in smart grid analytics, especially when real data are limited or privacy-constrained. Additionally, these findings highlight the efficacy of combining Wasserstein loss with gradient penalty and convolutional architectures, particularly for both small-scale and large-scale load profile generation.

4.4. Limitations

Despite its strong performance, the proposed approach has several limitations. First, it assumes a fixed 24 h profile length, limiting its applicability to datasets with variable time horizons or irregular sampling. Second, the model currently treats each profile independently and does not explicitly learn longer-term dependencies across days or weeks (e.g., seasonal patterns and weekly routines). Furthermore, although convolutional layers effectively capture short-term structures, their receptive field is limited compared to transformer-based or attention-enhanced models. Lastly, the training process remains sensitive to hyperparameters such as batch size, critic iterations, and gradient penalty, which require empirical tuning to achieve optimal results.

5. Conclusions

This study addresses the critical problem of generating realistic synthetic electrical load profiles in scenarios where access to large, high-resolution datasets is constrained due to privacy concerns or data availability. Traditional methods often struggle with limited data, exhibiting training instability, overfitting, and computational inefficiency. These limitations hinder the adoption of data-driven modeling techniques in domains such as smart grid analytics, load forecasting, and energy simulation, particularly when dealing with small-scale or privacy-sensitive datasets.

To overcome these challenges, a lightweight convolutional GAN architecture—Conv1D-WGAN-GP, which combines the temporal modeling power of 1D convolutional layers with the training stability of the Wasserstein loss and gradient penalty—is proposed. The model is tailored to generate 24 h synthetic load profiles that maintain the statistical and temporal integrity of real-world energy consumption data.

Through a comprehensive evaluation on four diverse datasets, Conv1D-WGAN-GP consistently outperformed baseline models (Vanilla-GAN, WGAN-GP, and Conv1D-GAN) across multiple quantitative and qualitative metrics. It achieved the lowest Wasserstein distances, closest alignment in mean and standard deviation, and the Real-vs-Synthetic Distinguishability Score, indicating high realism and indistinguishability from actual data. Visual inspections using t-SNE plots and per-hour histograms further confirmed that the model effectively captured both typical patterns and subtle behavioral variations in load profiles.

In conclusion, Conv1D-WGAN-GP represents a promising and practical solution to the problem of synthetic load profile generation under data scarcity and privacy constraints. It enables the creation of high-fidelity, diverse, and temporally coherent synthetic data.

Future work will explore the integration of conditional inputs such as day type, weather conditions, or seasonality to allow for more controllable and context-aware data generation. Additionally, differential privacy mechanisms to further enhance data anonymity and security will be incorporated. Another promising direction is adapting the framework for multivariate time series and finer temporal resolutions (e.g., 15 min intervals).

Author Contributions

Conceptualization, T.K., I.V. and B.E.; methodology, T.K.; software, T.K.; validation, T.K., I.V., K.G.-E. and B.E.; formal analysis, T.K. and I.V.; investigation, T.K.; resources, B.E. and K.G.-E.; data curation, T.K. and I.V.; writing—original draft preparation, T.K., B.E. and I.V.; writing—review and editing, B.E. and T.K.; visualization, T.K.; supervision, B.E.; project administration, B.E.; funding acquisition, B.E. All authors have read and agreed to the published version of the manuscript.

Funding

This study is financed by the European Union—NextGenerationEU through the National Recovery and Resilience Plan of the Republic of Bulgaria, project no. BG-RRP-2.013-0001.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used in this study are published under the CC BY 4.0 license and can be found at https://doi.org/10.6084/m9.figshare.29332817 (accessed on 10 July 2025) (the Household and Industrial Datasets), https://doi.org/10.6084/m9.figshare.28785422 (accessed on 13 April 2025) (the Pig Farm Dataset), and https://github.com/AdamLiang42/ERGAN-Dataset (accessed on 13 April 2025) (the ERGAN Dataset).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, Z.; Hong, T. Generating realistic building electrical load profiles through the Generative Adversarial Network (GAN). Energy Build. 2020, 224, 110299. [Google Scholar] [CrossRef]
Price, P. Methods for Analyzing Electric Load Shape and Its Variability; Lawrence Berkeley National Laboratory: Berkeley, CA, USA, 2010. Available online: https://www.osti.gov/biblio/985909 (accessed on 15 May 2025).
Valova, I.; Gabrovska-Evstatieva, K.G.; Kaneva, T.; Evstatiev, B.I. Generation of Realistic Synthetic Load Profile Based on the Markov Chains Theory: Methodology and Case Studies. Algorithms 2025, 18, 287. [Google Scholar] [CrossRef]
Conselvan, F.; Mascherbauer, P.; Harringer, D. Neural network to generate synthetic building electrical load profiles. In Proceedings of the 13 Internationale Energiewirtschaftstagung an der TU Wien (IEWT), Vien, Austria, 15–17 February 2023. [Google Scholar]
Hu, J.; Vasilakos, A.V. Energy big data analytics and security: Challenges and opportunities. IEEE Trans. Smart Grid 2016, 7, 2423–2436. [Google Scholar] [CrossRef]
Triastcyn, A.; Faltings, B. Generating Higher-Fidelity Synthetic Datasets with Privacy Guarantees. Algorithms 2022, 15, 232. [Google Scholar] [CrossRef]
Wang, C.; Tindemans, S.H.; Palensky, P. Generating contextual load profiles using a conditional variational autoencoder. In Proceedings of the 2022 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT-Europe), Novi Sad, Serbia, 10–12 October 2022; IEEE: New York, NY, USA, 2022; pp. 1–6. [Google Scholar] [CrossRef]
Molina-Markham, A.; Shenoy, P.; Fu, K.; Cecchet, E.; Irwin, D. Private memoirs of a smart meter. In Proceedings of the 2nd ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Building, Zurich, Switzerland, 2 November 2010; pp. 61–66. [Google Scholar] [CrossRef]
Wang, W.; Hong, T.; Li, N.; Wang, R.Q.; Chen, J. Linking energy-cyber-physical systems with occupancy prediction and interpretation through WiFi probe-based ensemble classification. Appl. Energy 2019, 236, 55–69. [Google Scholar] [CrossRef]
Wang, Z.; Hong, T.; Piette, M.A. Data fusion in predicting internal heat gains for office buildings through a deep learning approach. Appl. Energy 2019, 240, 386–398. [Google Scholar] [CrossRef]
Hassija, V.; Chamola, V.; Mahapatra, A.; Singal, A.; Goel, D.; Huang, K.; Scardapane, S.; Spinelli, I.; Mahmud, M.; Hussain, A. Interpreting black-box models: A review on explainable artificial intelligence. Cogn. Comput. 2024, 16, 45–74. [Google Scholar] [CrossRef]
Lin, Z.; Jain, A.; Wang, C.; Fanti, G.; Sekar, V. Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions. In Proceedings of the IMC ‘20: Proceedings of the ACM Internet Measurement Conference, Virtual Event, 27 October 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 464–483. [Google Scholar] [CrossRef]
Labeeuw, W.; Deconinck, G. Residential Electrical Load Model Based on Mixture Model Clustering and Markov Models. IEEE Trans. Ind. Inform. 2013, 9, 1561–1569. [Google Scholar] [CrossRef]
Blanco, L.; Zabala, A.; Schiricke, B.; Hoffschmidt, B. Generation of heat and electricity load profiles with high temporal resolution for Urban Energy Units using open geodata. Sustain. Cities Soc. 2024, 117, 105967. [Google Scholar] [CrossRef]
Munkhammar, J.; van der Meer, D.; Widén, J. Very short term load forecasting of residential electricity consumption using the Markov-chain mixture distribution (MCM) model. Appl. Energy 2021, 282, 116180. [Google Scholar] [CrossRef]
Dalla Maria, E.; Secchi, M.; Macii, D. A Flexible Top-Down Data-Driven Stochastic Model for Synthetic Load Profiles Generation. Energies 2022, 15, 269. [Google Scholar] [CrossRef]
McLoughlin, F.; Duffy, A.; Conlon, M. The generation of domestic electricity load profiles through Markov chain modelling. In Proceedings of the 3rd International Scientific Conference on Energy and Climate Change Conference, Athens, Greece, 7–8 October 2010; pp. 18–27. Available online: https://arrow.tudublin.ie/dubencon2/9/ (accessed on 12 April 2025).
Tushar, W.; Huang, S.; Yuen, C.; Zhang, J.A.; Smith, D.B. Synthetic generation of solar states for smart grid: A multiple segment Markov chain approach. In Proceedings of the IEEE PES Innovative Smart Grid Technologies, Europe, Istanbul, Turkey, 12–15 October 2014; IEEE: New York, NY, USA, 2014; pp. 1–6. [Google Scholar] [CrossRef]
Pillai, G.G.; Putrus, G.A.; Pearsall, N.M. Generation of synthetic benchmark electrical load profiles using publicly available load and weather data. Int. J. Electr. Power Energy Syst. 2014, 61, 1–10. [Google Scholar] [CrossRef]
Tamayo-Urgilés, D.; Sanchez-Gordon, S.; Valdivieso Caraguay, Á.L.; Hernández-Álvarez, M. GAN-Based Generation of Synthetic Data for Vehicle Driving Events. Appl. Sci. 2024, 14, 9269. [Google Scholar] [CrossRef]
Bhattarai, B.; Baek, S.; Bodur, R.; Kim, T.K. Sampling strategies for gan synthetic data. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; IEEE: New York, NY, USA, 2020. [Google Scholar] [CrossRef]
Pan, Z.; Wang, J.; Liao, W.; Chen, H.; Yuan, D.; Zhu, W.; Fang, X.; Zhu, Z. Data-Driven EV Load Profiles Generation Using a Variational Auto-Encoder. Energies 2019, 12, 849. [Google Scholar] [CrossRef]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Gu, Y.; Chen, Q.; Liu, K.; Xie, L.; Kang, C. GAN-based model for residential load generation considering typical consumption patterns. In Proceedings of the 2019 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), Washington, DC, USA, 18–21 February 2019; pp. 1–5. [Google Scholar] [CrossRef]
Liang, X.; Wang, Z.; Wang, H. Synthetic Data Generation for Residential Load Patterns via Recurrent GAN and Ensemble Method. IEEE Trans. Instrum. Meas. 2024, 73, 2535412. [Google Scholar] [CrossRef]
Jordon, J.; Yoon, J.; Van Der Schaar, M. PATE-GAN: Generating synthetic data with differential privacy guarantees. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 21 December 2018. [Google Scholar]
Xu, L.; Skoularidou, M.; Cuesta-Infante, A.; Veeramachaneni, K. Modeling tabular data using conditional GAN. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
Walia, M.S.; Tierney, B.; McKeever, S. Synthesising tabular datasets using Wasserstein Conditional GANS with Gradient Penalty (WCGAN-GP). In Proceedings of the AICS 2020: 28th Irish Conference on Artificial Intelligence and Cognitive Science, Dublin, Ireland, 7–8 December 2020. [Google Scholar]
Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A.C. Improved training of Wasserstein GANs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, Long Beach, CA, USA, 31 March 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 5769–5779. [Google Scholar]
Ha, T.; Dang, T.K.; Dang, T.T.; Truong, T.A.; Nguyen, M.T. Differential privacy in deep learning: An overview. In Proceedings of the 2019 International Conference on Advanced Computing and Applications (ACOMP), Nha Trang, Vietnam, 26–28 November 2019; pp. 97–102. [Google Scholar] [CrossRef]
Impraimakis, M. A Kullback–Leibler divergence method for input–system–state identification. J. Sound Vib. 2024, 569, 117965. [Google Scholar] [CrossRef]
Johnson, D.H.; Sinanovic, S. Symmetrizing the Kullback-Leibler distance. IEEE Trans. Inf. Theory 2001, 1, 1–10. [Google Scholar]
Menéndez, M.L.; Pardo, J.A.; Pardo, L.; Pardo, M.C. The Jensen-Shannon divergence. J. Frankl. Inst. 1997, 334, 307–318. [Google Scholar] [CrossRef]
Sinn, M.; Rawat, A. Non-parametric estimation of Jensen-Shannon divergence in generative adversarial network training. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Playa Blanca, Lanzarote, Canary Islands, Spain, 9–11 April 2018; pp. 642–651. [Google Scholar]
Yoon, J.; Jarrett, D.; der Schaar, M. Time-series generative adversarial networks. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 494, pp. 5508–5518. [Google Scholar]
Aloni, O.; Perelman, G.; Fishbain, B. Synthetic random environmental time series generation with similarity control, preserving original signal’s statistical characteristics. Environ. Model. Softw. 2025, 185, 106283. [Google Scholar] [CrossRef]
Milne, T.; Nachman, A.I. Wasserstein GANs with gradient penalty compute congested transport. Conf. Learn. Theory 2022, 178, 103–129. [Google Scholar]
Lee, G.-C.; Li, J.-H.; Li, Z.-Y. A Wasserstein generative adversarial network-gradient penalty-based model with imbalanced data enhancement for network intrusion detection. Appl. Sci. 2023, 13, 8132. [Google Scholar] [CrossRef]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
Weng, L. From GAN to WGAN. arXiv 2019, arXiv:1904.08994. [Google Scholar]
ERGAN-Dataset. Available online: https://github.com/AdamLiang42/ERGAN-Dataset (accessed on 1 May 2025).
Pecan Street Database. Available online: https://www.pecanstreet.org/work/energy/ (accessed on 1 May 2025).

Figure 1. Simplified structure of a GAN model.

Figure 2. Correlations among the neighboring loads.

Figure 3. Summary of the generator’s structure.

Figure 4. Summary of the discriminator structure.

Figure 5. Heatmap of the ERGAN dataset. The columns correspond to the hour of the day (0 to 23) and the rows to the energy consumption before any data preprocessing.

Figure 6. Heatmap of the Industrial dataset. The columns correspond to the hour of the day (0 to 23) and the rows to the energy consumption before any data preprocessing.

Figure 7. Heatmap of the Household dataset. The columns correspond to the hour of the day (0 to 23) and the rows to the energy consumption before any data preprocessing.

Figure 8. Heatmap of the Pig Farm dataset. The columns correspond to the hour of the day (0 to 23) and the rows to the energy consumption before any data preprocessing.

Figure 9. Comparison between the Wasserstein distance metrics of the investigated algorithms using the ERGAN (a), Industrial (b), Household (c), and Pig Farm (d) datasets.

Figure 10. Comparison between the mean and STD metrics of the investigated algorithms using the ERGAN (a), Industrial (b), Household (c), and Pig Farm (d) datasets.

Figure 11. Comparison between the JSD metrics of the investigated algorithms and datasets.

Figure 12. Comparison between the Real-vs-Synthetic Distinguishability Score from the logistic regression and the investigated algorithms using the ERGAN (a), Industrial (b), Household (c), and Pig Farm (d) datasets.

Figure 13. Comparison between the Real-vs-Synthetic Distinguishability Scores of the investigated algorithms and datasets.

Figure 14. t-SNE analysis of the ERGAN dataset using Vanilla-GAN (a), WGAN-GP (b), Conv1D-GAN (c), and Conv1D-WGAN-GP (d).

Figure 15. t-SNE analysis of the Industrial dataset using Vanilla-GAN (a), WGAN-GP (b), Conv1D-GAN (c), and Conv1D-WGAN-GP (d).

Figure 16. t-SNE analysis of the Household dataset using Vanilla-GAN (a), WGAN-GP (b), Conv1D-GAN (c), and Conv1D-WGAN-GP (d).

Figure 17. t-SNE analysis of the Pig Farm dataset using Vanilla-GAN (a), WGAN-GP (b), Conv1D-GAN (c), and Conv1D-WGAN-GP (d).

Figure 18. Histograms of the 20th hour for the ERGAN dataset using Vanilla-GAN (a), WGAN-GP (b), Conv1D-GAN (c), and Conv1D-WGAN-GP (d).

Figure 19. Histograms of the 6th hour for the Industrial dataset using Vanilla-GAN (a), WGAN-GP (b), Conv1D-GAN (c), and Conv1D-WGAN-GP (d).

Figure 20. Histograms of the 18th hour for the Household dataset using Vanilla-GAN (a), WGAN-GP (b), Conv1D-GAN (c), and Conv1D-WGAN-GP (d).

Figure 21. Histograms of the 9th hour for the Pig Farm dataset using Vanilla-GAN (a), WGAN-GP (b), Conv1D-GAN (c), and Conv1D-WGAN-GP (d).

Table 1. A summary of the evaluation metrics adopted in this study.

Metric	Threshold Values *	The Metric’s Importance in This Methodology
Average Wasserstein Distance [35]	<0.15 (Excellent), <0.25 (Good)	Measures distribution alignment per hour. Lower is better; high values indicate poor matching of load shapes.
Mean Difference [36]	<0.05 (Excellent), <0.10 (Acceptable)	Ensures the synthetic data have the same overall average load.
STD Difference [36]	<0.1 (Excellent), <0.2 (Acceptable)	Captures variability; essential for representing diversity in users’ behavior.
Classifier Accuracy (logistic regression)	≤0.6 (Ideal), ≤0.65 (Tolerable)	If a simple model cannot easily distinguish real from synthetic, it indicates high realism.
JS Divergence (overall)	<0.05 (Excellent), <0.1 (Good)	A symmetric version of KL. Easier to interpret; 0 = identical, 1 = total difference.
RMSE (per profile and overall)	<0.1 (Excellent), <0.2 (Acceptable)	Penalizes large point-wise errors. It should be small if the curves are aligned.

* The threshold values in Table 1 are established based on a combination of existing literature and empirical validation. Specifically, the thresholds for JS divergence, Wasserstein distance, and classifier accuracy draw on precedent from prior studies evaluating GAN-generated time-series data [12,33,35]. However, due to the absence of universal standards for synthetic electrical load profile generation, these values were further refined using the authors’ benchmark experiments. Acceptable and excellent ranges were defined by evaluating the performance of multiple GAN architectures across datasets of varying sizes and comparing synthetic versus real profile fidelity through statistical and downstream task metrics. As such, the thresholds serve as practically motivated heuristics, rather than rigid universal standards, and provide a consistent evaluation framework across experiments in this study.

Table 2. Architecture of the vanilla GAN used in the benchmark process.

Model	Characteristic	Value
Generator	1st layer:	Densely connected: 128 units; activation—LeakyReLU
	2nd layer:	Batch normalization
	3rd layer:	Densely connected: 256 units; activation—LeakyReLU
	4th layer:	Batch normalization
	Output:	Densely connected: 24 units; activation—sigmoid
	Loss function:	Binary cross-entropy
	Optimizer	Adam (Adaptive moment estimation): learning rate = 0.0002;
Discriminator	1st layer:	Densely connected: 256 units; activation—LeakyReLU
	2nd layer:	Dropout: dropout rate = 0.3
	3rd layer:	Densely connected: 128 units; activation—LeakyReLU
	4th layer:	Dropout: dropout rate = 0.3
	Output:	Densely connected: 1 unit; activation—sigmoid
	Loss function:	Binary cross-entropy
	Optimizer	Adam (adaptive moment estimation): learning rate = 0.0002

Table 3. Architecture of the WGAN-GP model used for benchmarking.

Model	Characteristic	Value
Generator	1st layer:	Densely connected: 128 units; activation—LeakyReLU
	2nd layer:	batch normalization
	3rd layer:	Densely connected: 256 units; activation—LeakyReLU
	4th layer:	batch normalization
	5th layer:	Densely connected: 128 units; activation—LeakyReLU
	Output:	Densely connected: 24 units; activation—sigmoid
	Loss function:	Wasserstein loss
	Optimizer	Adam (adaptive moment estimation): learning rate = 0.0002;
Discriminator	1st layer:	Densely connected: 128 units; activation—LeakyReLU
	2nd layer:	dropout; dropout rate = 0.3
	3rd layer:	Densely connected: 64 units; activation—LeakyReLU
	4th layer:	dropout; dropout rate = 0.3
	Output:	Densely connected: 1 unit; activation—none
	Loss function:	Wasserstein loss
	Optimizer	Adam (adaptive moment estimation): learning rate = 0.0002;

Table 4. Architecture of the Conv1D-GAN model used for benchmarking.

Model	Characteristic	Values
Generator	1st layer:	Densely connected: 384 units; activation—LeakyReLU
	2nd layer:	Reshape: (6, 64)
	3rd layer:	UpSampling1D: size = 2 → (12, 64)
	4th layer:	Convolutional: 64 filters, kernel size = 3, padding = ‘same’; activation—LeakyReLU
	5th layer:	UpSampling1D: size = 2 → (24, 64)
	6th layer:	Convolutional: 32 filters, kernel size = 3, padding = ‘same’; activation—LeakyReLU
	Output:	Convolutional: 1 filter, kernel size = 3, activation = sigmoid, padding = “same”; followed by reshape to (24,)
	Loss function:	Binary cross-entropy
	Optimizer	Adam (adaptive moment estimation): learning rate = 0.0002
Discriminator	1st layer:	Input: shape (24,)
	2nd layer:	Reshape: (24, 1)
	3rd layer:	Convolutional: 64 filters, kernel size = 3, strides = 2, padding = “same”; activation—LeakyReLU
	4th layer:	Dropout: rate = 0.3
	5th layer:	Convolutional: 128 filters, kernel size = 3, strides = 2, padding = “same”; activation—LeakyReLU
	6th layer:	Dropout: rate = 0.3
	7th layer:	Flatten
	Output:	Densely connected: 1 unit; activation—sigmoid
	Loss function:	Binary cross-entropy
	Optimizer	Adam (adaptive moment estimation): learning rate = 0.0002

Table 5. Hyperparameters of the proposed Conv1D-WGAN-GP.

Hyperparameter	Value
Latent Dimension	16
Batch Size	128
Epochs	1000
Number of Critics	10
Lambda	2

Table 6. Summary of the used datasets.

Dataset	Number of Samples	Duration
ERGAN-dataset	110,274	N/A ¹
Industrial	728	2 years
Household	278	1 year
Pig Farm	363	1 year

¹ Not Applicable.

Table 7. Results from training.

Dataset	Model	W-Distance	Mean Diff.	STD Diff. ¹	Classifier Acc. ²	JS Div. ³	RMSE ⁴
ERGAN	Vanilla-GAN	0.0263	0.0129	0.026	0.5972	0.0023	0.2669
	WGAN-GP	0.0369	0.0257	0.0356	0.8372	0.0091	0.2709
	Conv1D-GAN	0.0605	0.0484	0.0603	0.8419	0.0193	0.2633
	Conv1D-WGAN-GP	0.0198	0.0048	0.0153	0.5354	0.0176	0.2647
Industrial	Vanilla-GAN	0.0599	0.0148	0.0263	0.5498	0.0329	0.2864
	WGAN-GP	0.0475	0.0365	0.0167	0.8557	0.0087	0.2893
	Conv1D-GAN	0.065	0.0568	0.0294	0.7973	0.017	0.2763
	Conv1D-WGAN-GP	0.0239	0.0124	0.0084	0.5567	0.0062	0.2716
Household	Vanilla-GAN	0.0458	0.0332	0.0343	0.5586	0.0309	0.3109
	WGAN-GP	0.0507	0.0288	0.0293	0.8559	0.0348	0.3317
	Conv1D-GAN	0.0515	0.0276	0.0501	0.6126	0.0494	0.2871
	Conv1D-WGAN-GP	0.0329	0.0139	0.0078	0.5315	0.0341	0.3219
Pig Farm	Vanilla-GAN	0.0326	0.0092	0.0161	0.5517	0.0162	0.2833
	WGAN-GP	0.0551	0.0415	0.0403	0.8483	0.0138	0.2549
	Conv1D-GAN	0.1162	0.1142	0.051	0.869	0.0565	0.277
	Conv1D-WGAN-GP	0.0209	0.0096	0.0081	0.5034	0.0078	0.2801

¹ Standard deviation difference; ² Classifier accuracy; ³ JS Divergence; ⁴ Root mean square error.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kaneva, T.; Valova, I.; Gabrovska-Evstatieva, K.; Evstatiev, B. A Data-Driven Approach for Generating Synthetic Load Profiles with GANs. Appl. Sci. 2025, 15, 7835. https://doi.org/10.3390/app15147835

AMA Style

Kaneva T, Valova I, Gabrovska-Evstatieva K, Evstatiev B. A Data-Driven Approach for Generating Synthetic Load Profiles with GANs. Applied Sciences. 2025; 15(14):7835. https://doi.org/10.3390/app15147835

Chicago/Turabian Style

Kaneva, Tsvetelina, Irena Valova, Katerina Gabrovska-Evstatieva, and Boris Evstatiev. 2025. "A Data-Driven Approach for Generating Synthetic Load Profiles with GANs" Applied Sciences 15, no. 14: 7835. https://doi.org/10.3390/app15147835

APA Style

Kaneva, T., Valova, I., Gabrovska-Evstatieva, K., & Evstatiev, B. (2025). A Data-Driven Approach for Generating Synthetic Load Profiles with GANs. Applied Sciences, 15(14), 7835. https://doi.org/10.3390/app15147835

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Data-Driven Approach for Generating Synthetic Load Profiles with GANs

Abstract

1. Introduction

2. Materials and Methods

2.1. GAN Explanation and Definition

2.2. Requirements Toward the GAN-Based Synthetic Data for Electrical Load Profiles

2.3. Problem Definition

2.4. Evaluation Criteria

2.4.1. Statistical Methods

2.4.2. Downstream Tasks

2.4.3. Visual Analysis

2.5. Benchmark Models

2.6. Model Definition

2.6.1. Generator

2.6.2. Discriminator

2.6.3. Hyperparameters

3. Results

3.1. Experimental Datasets

3.1.1. Publicly Available Residential Dataset

3.1.2. Industrial Dataset

3.1.3. Household

3.1.4. Pig Farm

3.1.5. Data Preprocessing

3.2. Evaluation and Comparison

3.2.1. Statistical Methods for Evaluation of the Synthetic Data

3.2.2. Downstream Tasks for Evaluation of the Synthetic Data

3.2.3. Visual Methods for Evaluation of the Synthetic Data

3.2.4. Overall Model Performance of Conv1D-WGAN-GP

4. Discussion

4.1. Dataset Size Sensitivity

4.2. Role of GANs in Load Profile Generation

4.3. Proposed Conv1D-WGAN-GP Model

4.4. Limitations

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI