1. Introduction
With the increasing installed capacity of renewable energy sources such as wind power and photovoltaic generation, power grids continue to expand in scale and become increasingly complex [
1,
2]. Consequently, the secure and stable operation of power systems is facing significant challenges [
3,
4,
5]. Due to the high penetration of renewable energy and multiple HVDC infeed configurations, a portion of synchronous generators has been replaced, leading to a substantial reduction in reactive power support capability and disturbance tolerance of the power grid [
6,
7,
8]. When severe disturbances occur in AC/DC hybrid power systems, transient voltage instability or even voltage collapse may be triggered [
9,
10], posing serious threats to the secure and stable operation of power systems.
Traditional transient voltage stability assessment methods mainly include time-domain simulation approaches [
11,
12] and direct methods [
13]. Time-domain simulation approaches consider detailed and complex system models and can provide accurate stability analysis results. However, they suffer from high computational burden and long computation time [
14]. Direct methods perform quantitative transient stability analysis by constructing transient energy functions, but in complex power systems, it remains challenging to develop energy functions that can accurately characterize transient system dynamics [
15,
16]. With the rapid deployment of power system measurement devices, data-driven methods integrated with artificial intelligence technologies provide a new pathway for fast and accurate stability assessment by establishing mapping relationships between electrical variables and system stability states, without requiring extensive analysis of complex operating mechanisms [
17,
18].
Among traditional shallow machine learning methods, models such as support vector machines [
19] and decision trees [
20] have been applied to voltage stability level prediction due to their strong nonlinear mapping capabilities, revealing the relationships between key response electrical quantities and voltage stability states. However, shallow machine learning methods typically rely on manual feature extraction, and their prediction accuracy may be significantly affected when the data exhibit strong nonlinear dynamic coupling characteristics [
21]. Considering the unique advantages of deep learning in automatic extraction of complex dynamic features, representative deep learning models such as convolutional neural networks (CNNs) [
22,
23] and long short-term memory networks (LSTMs) [
24,
25] have been applied to transient voltage stability assessment. By mining correlations among response data, these methods improve prediction performance. Nevertheless, with the continuous increase in the types and quantities of power system equipment, the dimensionality of features characterizing system dynamic processes has also grown. As a result, classical deep learning models face challenges regarding feature extraction effectiveness, making it difficult to guarantee prediction accuracy.
In addition, in practical applications, transient voltage instability samples obtained through time-domain simulations are scarce and difficult to acquire, resulting in significant sample imbalance in the datasets constructed for training data-driven models. This imbalance severely affects the accuracy of transient voltage stability assessment. Dataset augmentation using generative adversarial networks (GANs) can effectively alleviate the sample imbalance problem, particularly by enriching samples near the critical hyperplane between transient voltage stability and instability, thereby improving the accuracy of transient voltage stability discrimination. Current applications of GANs in power system data quality enhancement mainly focus on data augmentation and data generation [
25], and several studies have applied GANs to transient voltage analysis. However, due to the uncontrollable nature of sample generation in unsupervised GANs, conditional GANs (CGAN) have been proposed to achieve feature distribution learning and targeted sample generation for transient voltage data [
26,
27,
28,
29,
30]. However, sample augmentation results generated by existing CGAN-based methods cannot reliably ensure compliance with practical operational constraints of power systems. The comparison with existing GAN-based methods in power-system stability analysis is presented in
Table 1.
In this article, an intelligent enhanced method for modern power system transient voltage stability assessment considering the improved CGAN-based sample balancing is proposed. The main contributions are as follows.
- (1)
An improved CGAN based on enhanced feature-distance metric is proposed. Based on the standard adversarial loss with gradient penalty, an auxiliary feature-alignment term is introduced to preserve physically meaningful stability-related features while matching the data distribution.
- (2)
An intelligent sample enhancement method for transient voltage stability is proposed based on the improved CGAN. The initial dataset is effectively complemented to ensure the predictive performance of intelligent models under extreme operating conditions.
- (3)
A transient voltage stability assessment framework integrating a CNN and transformers is proposed, enabling effective extraction of low-dimensional features and achieving accurate evaluation of transient voltage stability states.
The structure of this paper is outlined as follows.
Section 2 presents the proposed sample balancing strategy based on the improved GAN model.
Section 3 develops an intelligent transient voltage stability assessment model that combines a CNN with transformer architecture.
Section 4 evaluates the effectiveness of the proposed approach through simulations on a representative power grid system. Finally,
Section 5 concludes the paper and highlights potential avenues for future investigation.
2. Sample Balancing Method Based on an Improved CGAN
In power system voltage stability assessment, sample datasets are typically generated through offline simulations and used to train intelligent assessment models. However, during the simulation process, the occurrence probability of voltage instability states is much lower than that of stable states, resulting in a severe class imbalance in the dataset. If such an imbalanced dataset is directly used for model training, the assessment model tends to classify samples as stable, which significantly weakens its capability to identify instability scenarios.
To alleviate the sample imbalance problem, this paper introduces an improved CGAN to synthesize physically interpretable instability samples in the sample space. By increasing the quantity and diversity of instability samples, the proposed method enhances the generalization ability and discrimination performance of the assessment model.
2.1. Basic Principle of the Traditional GAN
GANs constitute a class of unsupervised learning models motivated by zero-sum game theory. GANs are composed of two competing components, namely a generator and a discriminator, which are trained simultaneously in an adversarial manner to enhance the realism of synthesized data. In this learning paradigm, the generator is optimized to produce samples that closely resemble the true data distribution, whereas the discriminator is trained to accurately differentiate between real observations and generated counterparts.
Through continuous adversarial interaction, the generator gradually enhances its sample generation capability and learns the underlying distribution characteristics of the original data. The loss function of the adversarial training process can be defined as follows:
where
denotes the expectation operator;
represents samples drawn from the real data space;
is a random noise vector sampled from a predefined prior distribution; The corresponding probability distributions of real data and latent noise are denoted by
and
;
transforms the latent variable
into synthetic samples
. The discriminator
outputs the probability that a given input originates from the real data distribution. Accordingly,
indicates the likelihood that a real sample is correctly identified, whereas
reflects the probability that a generated sample is misclassified as real. During adversarial training, the discriminator is optimized to enhance its ability to distinguish real data from generated samples by maximizing correct classification accuracy, while the generator is trained in opposition to increase the likelihood that synthesized samples deceive the discriminator.
2.2. Improved CGAN
In this paper, an improved CGAN is employed to generate specified samples by conditioning on transient voltage sample types, thereby addressing the limitation of conventional GAN models in generating samples of designated categories. By introducing additional conditional information into the inputs of both the generator and the discriminator, CGAN enables controlled sample generation. Compared with the traditional GAN model, the CGAN generator takes the condition c together with the noise vector z as its input, while the discriminator distinguishes between real samples with their corresponding conditions and generated samples with their associated conditions. The basic architecture of the CGAN is illustrated in
Figure 1.
The objective function of the CGAN is defined as follows:
where
y denotes the transient voltage sample type.
Assume that the power system measurement system has collected a total of
groups of transient voltage samples, denoted as
, where
represents the feature vector of the
transient voltage sample. In this study, the transient voltage sample type labels and the corresponding transient voltage time-series samples are respectively used as the conditional input and the real sample input of the CGAN for model training. When the theoretical Nash equilibrium is reached, the CGAN generator is able to learn the mapping relationship between the transient voltage type labels and the transient voltage time-series data, thereby enabling the generation of specified types of transient voltage samples. Moreover, the conventional CGAN adopts the Jensen-Shannon(JS) divergence as the optimization objective, which often leads to issues such as gradient vanishing and mode collapse during adversarial training. To address these problems, the Wasserstein distance is introduced in this paper to replace the JS divergence, and a gradient penalty term is incorporated to further enhance training stability. The definition of the Wasserstein distance is given in (3).
where
denotes the mapping function of the discriminator,
represents the probability distribution of the generated samples,
denotes the Lipschitz norm of
, and
indicates that the function
satisfies the 1-Lipschitz continuity constraint.
Furthermore, the feature encoder
F is introduced to impose an additional constraint in the feature space. Specifically, the feature-space distance regularization term is incorporated into the objective function, which is defined as follows:
where
F denotes the stability-related feature extractor, is implemented as a multilayer perceptron. The parameters of
F are optimized jointly with the generator and discriminator during training.
Accordingly, the final optimization objective of the generator can be formulated as follows:
where
α denotes the weighting coefficient associated with the feature-distance regularization term.
3. Intelligent Enhanced Analysis Model for Voltage Stability
3.1. Initial Sample Set Construction
3.1.1. Input Feature Selection
System voltage stability is strongly influenced by operating conditions such as overall load level, the share of induction motor loads, and the penetration of renewable energy resources. Under severe disturbances that trigger transient voltage instability in localized areas, the post-fault dynamic responses of critical system components provide valuable insight into the system’s voltage stability margin. To identify representative response features that effectively capture voltage stability characteristics, both dominant influencing factors and the practicality of real-time data acquisition are taken into account.
Accordingly, a set of key electrical variables, including voltage magnitude, active power, and reactive power, measured immediately after fault clearing are extracted from various system components, such as buses, conventional and renewable generators, loads, and AC transmission lines. These quantities collectively describe the system’s instantaneous post-disturbance state and serve as the input features for model construction. To ensure that operating conditions and disturbance information are adequately reflected, only the sampled values at the first post-fault time instant are employed in forming the dataset. This design facilitates reliable and timely voltage stability assessment while remaining compatible with online monitoring requirements.
3.1.2. Output Label Determination
In accordance with the voltage stability assessment requirements outlined in the relevant grid operation guidelines, transient voltage stability is evaluated based on post-disturbance voltage recovery performance. Specifically, following a severe contingency, the voltages at load buses are required to restore to no less than 0.8 p.u. within 10 s. In systems with high renewable penetration, renewable generation units are additionally expected to maintain grid connection and avoid repeated activation of low-voltage ride-through operation during the recovery process.
Based on the above criterion, the voltage stability of each sample is evaluated. Samples that satisfy the criterion are labeled as 1 (stable), and those that violate the criterion are labeled as 0 (unstable). Together with the selected key electrical quantities, these labels are used to construct the voltage stability sample dataset, which provides the training and testing sets for the proposed voltage stability assessment model.
3.2. Intelligent Enhanced Analysis Model
By integrating the local receptive-field advantage of CNN with the global attention mechanism of Transformers, a CNN–transformer hybrid network is developed for intelligent voltage stability assessment, as illustrated in
Figure 2. The proposed model effectively extracts low-dimensional features that characterize power-system voltage stability and enables accurate evaluation of the system voltage stability state.
3.2.1. Input Layer
The purpose of the input layer is to receive multidimensional electrical features from the power system. The electrical quantities listed in
Table 1 are selected as the input features of the proposed model.
3.2.2. CNN Layer
The CNN is mainly employed to extract local features from the input electrical feature vectors and capture local patterns related to voltage stability. In the proposed model, the CNN encoder consists of one-dimensional convolution layers (1D Convolution), ReLU activation functions, and max pooling layers. The CNN layer includes the following components:
Convolution operation: Convolution kernels slide over the input feature vectors to learn local features, such as local voltage fluctuations and power variations. This operation helps capture short-term dynamic patterns in the power system.
ReLU activation function: Nonlinear activation is introduced to enable the model to learn more complex and abstract feature representations, thereby enhancing the expressive capability of the network.
Max pooling: Pooling is applied to the convolutional outputs to reduce feature dimensionality and compress information while retaining the most representative features, which improves computational efficiency.
Through these operations, the CNN layer extracts local spatial features from the electrical inputs, such as voltage and power fluctuation characteristics, and provides informative feature sequences for the subsequent Transformer encoder.
3.2.3. Transformer Layer
Although the CNN effectively captures local structural features, its ability to perceive relationships among distant nodes is limited. The transformer explicitly models such global dependencies through the self-attention mechanism. The basic structure of the transformer is illustrated in
Figure 3.
Positional encoding: To embed ordering information into the input sequence, positional encodings are injected at the embedding stage. This design compensates for the permutation-invariant nature of self-attention and allows the encoder to exploit relative positional patterns among input features.
Multi-head self-attention: Multi-head self-attention constitutes the key operation of the transformer encoder. By executing h attention heads in parallel, the model learns multiple parameterized representation subspaces, thereby enabling concurrent emphasis on diverse feature dependencies. The attention output is obtained as a weighted aggregation over all feature tokens. Unlike convolutional or recurrent structures, self-attention directly models interactions between any pair of positions through dot-product similarity, independent of spatial separation, which improves global feature coupling and representation capacity.
Feed-forward network: The position-wise feed-forward network comprises two fully connected transformations with a nonlinear activation in the hidden layer. This module refines the attended representations and enhances modeling expressiveness through nonlinear feature mapping.
Layer normalization and residual connections: Residual connections and layer normalization are applied after both the multi-head attention and feed-forward blocks. Layer normalization standardizes activations within each layer to improve optimization stability and facilitate gradient-based learning, while residual pathways promote information preservation and efficient gradient propagation, alleviating degradation as network depth increases [
32,
33].
3.2.4. Global Representation Layer
The transformer encoder outputs a sequence of features that represent local bus or line states while incorporating global contextual information. To perform the final classification, the sequential representations must be transformed into a single vector that characterizes the overall operating state of the power system. In this paper, global average pooling is employed to obtain a stable and compact global representation.
3.2.5. Output Layer
The output layer consists of a set of fully connected layers, which map the high-dimensional global representation to a probabilistic prediction of voltage stability. A typical structure includes one or more nonlinear layers, dropout regularization to prevent overfitting, and a Sigmoid activation function to produce the final stability probability.
3.3. Evaluation Metrics of the Prediction Model
In transient voltage stability assessment of power systems, misclassifying an unstable scenario as a stable one may lead to more severe cascading effects and large-scale system accidents than the opposite case. Therefore, in addition to evaluating the overall prediction accuracy, this paper places greater emphasis on the model’s capability to identify unstable scenarios. Accordingly, a confusion matrix for voltage stability assessment is defined, as shown in
Table 2.
Based on the above confusion matrix, four evaluation metrics are defined, including accuracy
A, misclassification rate
P, missed detection rate
R, and the
F1-score.
In the above metrics, A denotes the overall classification accuracy of the proposed model. P quantifies the proportion of stable operating conditions that are incorrectly identified as unstable, while R measures the likelihood of unstable cases being erroneously classified as stable. The F1-score serves as an integrated performance indicator, reflecting the model’s effectiveness in recognizing unstable operating states by jointly considering classification precision and recall.
3.4. Overall Evaluation Procedure
The overall transient voltage stability assessment procedure of the power system considering data augmentation is illustrated in
Figure 4, which consists of three main stages: scenario generation, model training, and online application.
During the scenario generation stage, basic scenarios are obtained based on historical operating conditions or time-domain simulation data, taking into account typical operating modes and fault types. Subsequently, input features and the corresponding voltage stability states of each scenario are determined through data processing, forming an initial voltage stability sample set. Since unstable voltage scenarios usually account for a relatively small proportion of the initial dataset, the proposed improved CGAN is employed to directionally generate voltage-unstable scenarios, thereby augmenting the initial sample set and improving the sample distribution characteristics.
In the model training stage, the model structure and parameters are determined by minimizing the loss function, and performance metrics are used to evaluate the overall prediction effectiveness of the model. During the online application stage, when the actual power system is subjected to disturbances, key electrical quantities after fault clearance are acquired and input into the well-trained offline prediction model to output the transient voltage stability assessment results of the power grid.
4. Case Study
In this section, the voltage collapse engineering test system released by the China Electric Power Research Institute (CEPRI-VC) is adopted as the simulation system for analysis. The system structure is shown in
Figure 5. The test system represents a DC receiving-end power system integrating wind farms and photovoltaic power stations. When a three-phase permanent N-2 fault occurs on the AC transmission corridor, the system may experience voltage instability.
The test system is based on a 500 kV network structure and consists of 100 buses with a base capacity of 100 MW. The installed capacities of renewable energy units and conventional generation units are 2400 MW and 6300 MW, respectively. In addition, the system includes one HVDC transmission line with a rated receiving power of 800 MW.
4.1. Sample Set Construction
To fully capture the variations in system response characteristics under different operating scenarios, factors such as operating modes and fault conditions are comprehensively considered to generate a transient voltage stability sample dataset through time-domain simulations. The input feature dimension of each sample is 251.
Under different load levels, multiple basic operating conditions are generated by adjusting the output proportions of conventional and renewable energy units, the transmitted power of the DC system, and the proportion of induction motor loads. The variation ranges of the relevant variables are listed in
Table 3. Based on these operating conditions, various fault scenarios are simulated. Specifically, three-phase permanent N-2 faults are imposed on four 500 kV transmission lines with double circuits, with fault locations set at 2%, 50%, and 98% of the AC line length. The fault clearing times are set to 0.1, 0.15, 0.2, 0.25, and 0.3 s after fault occurrence. In total, 6480 samples are generated, which are divided into training and testing datasets at a ratio of 8:2.
Each sample consists of 20 electrical features, and the output label is binary (0/1), where 0 denotes an unstable operating condition and 1 denotes a stable condition. Among all samples, only 648 belong to the unstable class, while the remaining 5832 samples are stable, leading to a highly imbalanced dataset where stable samples dominate.
4.2. Generation of Additional Training Samples
The transient voltage stability samples described in
Section 4.1 exhibit class imbalance, in which stable samples significantly outnumber unstable samples. Through the adversarial training process between the generator and discriminator in the CGAN model, the generator is able to learn the mapping relationship between the original time-series sample data and their corresponding sample labels after reaching a Nash equilibrium. Consequently, transient voltage samples of specified classes can be generated to balance the transient voltage stability dataset, thereby improving the prediction accuracy of the transient voltage early warning model. The network architectures of the proposed improved CGAN model are presented in
Table 4 and
Table 5, respectively.
To compare the sample generation performance of the proposed improved CGAN and the conventional GAN, the following experiments are conducted.
- (1)
Conventional GAN: Only the 648 real unstable samples are used to train an unconditional GAN. After training, 2000 unstable samples are generated by sampling from the noise space.
- (2)
Proposed CGAN: All 6480 samples are used for training, where the stable/unstable labels are encoded as conditional vectors and fed into both the generator and discriminator. After training, the condition is fixed as “unstable,” and 2000 unstable samples are generated under the same noise dimension and generation scale as the conventional GAN.
To visually illustrate the differences in unstable scenario generation between the conventional GAN and the proposed CGAN, the following four types of samples are combined and projected onto a two-dimensional space using the t-distributed stochastic neighbor embedding (t-SNE) algorithm:
- (1)
A subset of real stable samples (600 samples randomly selected from 5832);
- (2)
All 648 real unstable samples;
- (3)
2000 unstable samples generated by the conventional GAN (600 randomly selected for visualization);
- (4)
2000 unstable samples generated by the proposed CGAN (600 randomly selected for visualization).
The visualization results of the conventional GAN and the proposed CGAN are shown in
Figure 6 and
Figure 7, respectively. It can be observed that the unstable samples generated by the conventional GAN are relatively scattered in the embedding space. Some samples deviate significantly from the real unstable cluster and even spread toward the stable sample cluster, indicating that the conventional GAN fails to accurately capture the distribution of unstable scenarios. In contrast, the unstable samples generated by the proposed CGAN are mainly distributed around the real unstable sample cluster, with overall shape and density more consistent with the real unstable region. The overlap with the stable sample cluster is significantly reduced, demonstrating that the conditional constraints effectively guide the generator to learn a more accurate distribution of unstable scenarios. Therefore, compared with the conventional GAN, the proposed CGAN-based unstable sample augmentation method better preserves the statistical characteristics and discriminative boundaries of unstable operating conditions in the feature space, making it more suitable for training subsequent transient voltage stability assessment models.
To more convincingly demonstrate that the generated unstable samples are physically meaningful, the distribution similarity achieved by a traditional GAN and by the proposed enhanced generation method using two complementary statistical metrics Maximum Mean Discrepancy (MMD) [
34] and Wasserstein distance [
27]. For both metrics, smaller values indicate higher similarity. The results in
Table 6 show that the proposed method significantly reduces both MMD and Wasserstein distance compared with the traditional GAN, indicating substantially improved alignment between generated and real unstable data distributions. The consistent reduction across both global and geometric metrics demonstrates that the proposed approach not only improves statistical fidelity in high-dimensional space but also enhances structural consistency in feature geometry. These quantitative findings complement the t-SNE visualizations and provide stronger statistical evidence that the generated unstable samples are distribution-consistent and physically meaningful rather than artifacts of adversarial training.
Furthermore, the contribution of the proposed CGAN-based augmentation is demonstrated by comparing the CNN–transformer trained on the original imbalanced dataset with the augmented dataset. The model performance is presented in
Table 7. The comparison clearly demonstrates that the performance improvement primarily originates from the proposed CGAN-based augmentation rather than the CNN–transformer architecture itself. When trained on the original imbalanced dataset, the model exhibits noticeably higher misclassification and missed detection rates, particularly for unstable operating conditions. After incorporating CGAN-generated unstable samples, overall accuracy increases significantly, while both the misclassification rate (P) and missed detection rate (R) decrease sharply, indicating a substantial enhancement in minority-class recognition capability. The considerable improvement in F1-score further confirms that the classifier achieves a more balanced and reliable decision boundary. These results verify that the CGAN-based augmentation effectively alleviates class imbalance and plays a decisive role in improving instability identification performance.
4.3. Prediction Performance of the Intelligent Data Augmentation Model
The CNN–transformer model is trained using the augmented samples. The variations in prediction accuracy and loss function with respect to the number of training iterations are illustrated in
Figure 8. It can be observed that, as the number of iterations increases, the loss function gradually decreases, while the prediction accuracy continuously improves. Eventually, a well-trained CNN–transformer model is obtained. Specifically, the intelligent model consists of 3 convolutional layers with kernel sizes of 3 × 3 and increasing filter counts (32, 64, 128), followed by 6 transformer layers with 8 attention heads and an attention dimension of 512. The model uses the Adam optimizer with a learning rate of 0.0001, batch size of 64, and is trained for 50 epochs with early stopping based on validation loss.
Moreover, t-SNE is adopted as an effective nonlinear dimensionality reduction technique for the visualization of high-dimensional feature representations. The method maps pairwise similarities among samples from the original feature space to a low-dimensional manifold by constructing corresponding probability distributions and optimizing their divergence via gradient-based learning. This procedure preserves local neighborhood relationships of the original data to the greatest extent during dimensionality reduction. Accordingly, t-SNE is applied to project the feature outputs of different layers in the proposed CNN–transformer model into a two-dimensional space during the training process. The resulting visualization outcomes are illustrated in
Figure 9.
It can be observed that, as the depth of the prediction network increases, samples from different classes gradually exhibit clearer clustering characteristics in the feature space. This phenomenon indicates that the CNN–transformer module is capable of extracting more discriminative and representative features. Moreover, the learned feature embeddings contribute to improving the generalization capability of the model, enabling effective prediction under complex and diverse operating conditions, and thereby ensuring accurate assessment of power grid voltage stability.
Furthermore, the classification prediction results of the CNN–transformer model are compared with those of decision tree (DT), CNN, and LSTM neural network models, and the classification prediction performance of the CNN–transformer model is compared with that of DT. The mean and standard deviation of the evaluation metrics are presented in
Table 8.
Based on the evaluation results of the voltage stability assessment, it can be observed that the proposed model achieves an overall prediction accuracy of 99.34%, which is higher than that of the other models, and its prediction results are closer to the ground truth. In addition, the proposed model exhibits a strong capability in identifying unstable samples, with both the misclassification rate and the missed detection rate being lower than those of the other networks. In particular, the missed detection rate of unstable samples is significantly reduced.
In summary, the CNN–transformer model demonstrates superior overall performance and is able to accurately determine the system stability level under complex operating conditions, thereby providing reliable support and guidance for subsequent secure and stable control of the power system.
4.4. Anti-Interference Performance Analysis of the Intelligent Data Augmentation Model
Noise is inevitable in the system measurement process. To evaluate the performance of the proposed method under noisy conditions, Gaussian white noise with signal-to-noise ratio (SNR) of 20 dB, 25 dB, and 30 dB is added to the original noise-free dataset. Multiple training and testing experiments are conducted on each data-driven model, and the average results of 20 Monte Carlo simulations are adopted for comparison, as shown in
Table 9.
As seen from
Table 7, under noise interference environments with SNRs of 20 dB, 25 dB, and 30 dB, the proposed model consistently outperforms the other models in terms of classification performance, demonstrating its strong robustness and excellent noise immunity.