1. Introduction
With the large-scale integration of renewable energy, the traditional power grid is gradually evolving into a new power system characterized by a more complex structure, larger scale, and more stochastic operation. The stochastic and fluctuating nature of renewable energy output poses heightened challenges to the stability, security, economic efficiency, and low-carbon operation of power systems. Meanwhile, the interactive randomness of generation and load, along with the integration of diverse energy storage technologies, has led to strong system coupling, multi-timescale dynamics, and high operational variability. Under these conditions, traditional power system models are increasingly inadequate to meet the analytical and computational needs of new power systems. To address this, this paper proposes a cloud-edge-terminal collaborative modeling framework based on AI large models.
After the integration of renewable energy, power systems exhibit much stronger uncertainty and nonlinearity, and the scale of grid nodes, computational burden, and overall complexity increase sharply. Modeling approaches that rely on priors, forecasts, or rule-based formulations provide only limited accuracy [
1,
2], and traditional methods based on physical models and linear assumptions are becoming increasingly inadequate. References [
3,
4] develop modeling and analysis methods for power system planning and simulation under the random characteristics of renewable generation and loads, using multi-source heterogeneous data after the integration of renewables. The authors of ref. [
5] investigate a multi-agent collaborative optimization planning model for power systems with a high share of renewable energy, while ref. [
6] considers modeling of multi-scale fault-diagnosis features under a unified time series framework. Refs. [
3,
7] analyze typical modeling issues in new power system operation and control, such as the stochastic behavior of distributed renewables and the modeling of power-electronic interfaces.
In modeling typical problems of new power systems, AI techniques have been widely applied to power flow calculation, fault diagnosis, and voltage/frequency stability analysis. Owing to their advantages in handling complex topologies, graph neural networks, combined with deep neural networks (DNNs), have been used for dynamic equivalent modeling of microgrids with high penetration of renewable energy for frequency stability studies, achieving significant speed-ups while maintaining high accuracy [
8]. Reference [
9] studies RMS modeling and control of a grid-forming E-STATCOM in isolated power systems, enabling prediction and assisted assessment of system stable operation. A symmetry-preserving dynamic equivalent modeling method for large power systems based on transfer learning is proposed in [
10], allowing fast identification of dynamic operating conditions. Refs. [
11,
12] investigate XGBoost ensemble learning models and modular modeling schemes, improving both accuracy and convenience when analyzing complex systems in a multi-model framework.
From a system-architecture perspective, AI is driving a shift in power system modeling from traditional approaches toward a hybrid “data-driven + physics-guided” paradigm. Considering the randomness of renewable and load data in power system modeling [
13,
14], study data-driven modeling methods that use time-series segmentation. In line with the computational characteristics of power system models, the authors of [
15] develop a carbon-emission AI modeling framework for new power systems with large-scale renewable integration, highlighting the joint features of high-performance computing and AI in power system modeling. Ref. [
16] investigates knowledge distillation in neural networks, where compression techniques are used to distill the knowledge of an ensemble into a single model, and a new type of ensemble is introduced to distinguish fine-grained classes that are difficult for full models to separate. Ref. [
17] proposes a sample-efficient OPF learning method based on annealing knowledge distillation, which integrates decoupled tasks and improves accuracy in small-data regimes. Ref. [
18] studies the selection of eigenvalues for parallel analysis of large-scale power system models, and ref. [
19] develops a GRU–attention-based ultra-short-term load forecasting model for large power systems, emphasizing feature extraction from load data. By extracting attention-based features that capture the variation in renewable generation, ref. [
20] proposes a Seq2Seq Transformer–based optimization method for regulating resource capacity allocation in power grids with high penetration of renewable energy.
However, conventional approaches typically construct independent models for typical problems such as power flow calculation, transient stability, and short-circuit analysis, lacking global consistency and the ability for coordinated evolution. In essence, the power system is a high-dimensional, nonlinear, strongly coupled system, in which a disturbance at any node may trigger dynamic changes in the global state. This global cascading effect is reflected not only in the coupling relationships between major nodes such as generation and load, but also in the interactions among multidimensional variables such as voltage, frequency, and power flow. Therefore, given this global nature and strong coupling, it is necessary to establish a system model with a global perspective and dynamic response capability; however, such a model is extremely large-scale and cannot meet the fast-response requirements of power system operation.
To address how to construct a model structure that both captures the globally coupled characteristics of the power system and enables rapid response, this paper proposes a large-model system based on a cloud-edge-terminal collaborative architecture. The cloud focuses on global object modeling and learning of evolutionary trends, the edge focuses on modeling typical problems, and the terminals focus on lightweight inference models for fast local response. Altogether, this forms an intelligent modeling framework that provides global coverage while responding to local conditions, thereby meeting the multi-level requirements of new power system operation.
2. Organizational Structure of Large Models
In new power systems, factors such as a high penetration of renewable energy, demand-side interaction, and deep coupling among generation, load, and storage jointly drive system analysis toward higher complexity and autonomous intelligence. The power system is a highly coupled whole, and comprehensive, accurate computation requires taking all objects into account and constructing an integrated model with global perception capability. However, such a global model is extremely large in scale, stochastic and dynamic, and highly complex. Typical power system problems such as power flow and stability need to be modeled on the basis of global awareness and are also closely related to variations at local terminals. Accurate analysis of the local grid likewise needs to be built upon the global model, but this makes the computational workload too heavy to satisfy the fast-response requirements of local services. As shown in
Figure 1, a cloud-edge-terminal collaborative architecture is an effective way to tackle this complexity challenge. Its essence lies in functionally layering the processes of modeling, training, inference, and optimization of large power system models and deploying them separately on the cloud, edge, and terminal, which then collaborate to complete the tasks of power system modeling, analysis, control, and optimization [
20].
At the cloud level in
Figure 1, unified object models are constructed for generation, network, load, and storage. Based on large-scale historical operation data of the grid, a family of large power system models is trained for global application and shared across different regions and tasks. The cloud not only provides model parameters, structures, and global feature information, but also performs model calibration and synchronized updates, thereby offering the edge and terminal layers the necessary modeling basis and data support.
The edge model in
Figure 1 focuses on modeling and analysis of typical power system problems, such as optimal power flow, stability assessment, and reactive power optimization. By extracting relevant object models and topology information from the cloud, the edge layer couples system states, control objectives, and constraint conditions to form problem-oriented online multi-scale computational models, enabling real-time modeling of typical problems under globally aware online monitoring.
The terminal model in
Figure 1 is deployed in local distribution networks, microgrids, and park-level grids to enable high-frequency, low-latency intelligent control responses. Due to resource constraints at the terminal level, models are distilled and pruned by the edge layer to generate lightweight inference models. These retain essential feature parameters and serve tasks such as local grid operation optimization and autonomous edge control.
Collaboration among the three layers is achieved through standardized model and data interfaces and conversion protocols, ultimately building a large power system model framework that integrates global perception, problem-oriented analysis, and local autonomy, thereby comprehensively enhancing the intelligent analysis and regulation capabilities of the power system.
3. Cloud Model
As the central hub of the large-model framework, the cloud model not only serves as the “central brain” for global cognition of the power system, but also acts as the “knowledge source” that links multi-region and cross-layer tasks of the grid. The behaviors of various entities—generation, network, load, and storage—are highly nonlinear, strongly spatiotemporally correlated, and subject to numerous dynamic constraints. Through cross-period learning and regionally integrated modeling, the cloud model extracts key operating features from a global perspective and abstracts a unified modeling framework for different types of devices (such as wind power, photovoltaics, conventional generating units, the grid, energy storage, and flexible loads).
At the same time, based on grid topology and dynamic coupling relationships, graph neural network (GNN) structures are constructed among different types of entities to achieve the fusion of physical correlations and information flows. The cloud layer thus aggregates a comprehensive system that includes standardized object modeling, graph-based representation of grid structures, unified model input–output interfaces, task-driven collaborative training, local model distillation, and model update mechanisms.
3.1. Object Models
Typical power system objects include various power sources, grid components, loads, and energy storage systems. Taking photovoltaic (PV) systems as an example, the cloud model integrates physical mechanisms with AI-based modeling. Parameters such as conversion efficiency, temperature derating coefficient, model weights, and output bias need to be periodically trained and dynamically updated based on operational data. Through techniques such as sliding-window training, residual compensation, federated learning, and multi-model management, the system builds an intelligent, sustainable, and adaptive PV modeling framework.
The conventional empirical formula for
PV output can be expressed as:
: Actual output power (W)
: Module efficiency (affected by aging, cleanliness)
A: Total module area (m2)
Gt: Solar irradiance (W/m2)
Tc: Real-time module temperature (°C)
Tref: Reference module temperature (°C)
γ: Temperature derating coefficient (typically around 0.003–0.005/°C)
This model has a simple structure and strong physical interpretability, but it responds slowly to short-term weather changes and environmental impacts, and lacks predictive capability. The module efficiency needs to be updated quarterly, as it is affected by factors such as PV panel aging and dust accumulation. The temperature coefficient γ requires annual recalibration.
To address these limitations, multivariate regression or time series forecasting models can be built using operational data:
where the inputs include recent irradiance
G, temperature
T, humidity
H, timestamps, location, etc. The output is the PV power output in the next
h steps (rolling prediction). Model architectures such as LSTM, GRU, TFT, Conv1D+LSTM, and XGBoost can be used. These models capture temporal patterns, adapt to different regional characteristics, and offer high prediction accuracy—though their input features must be periodically updated.
(1) Sliding Window Incremental Training
Use data from the most recent few days or one week for mini-batch training to achieve short-term adaptive correction and update model weights:
where:
: Learning rate, controlling the step size of each update
: Gradient of the loss function with respect to parameters , evaluated on the current sample
(2) Residual-Based Adaptive Correction Model
An auxiliary model is constructed to fit the prediction error
ε(
t), which is then used to compensate for the output of the base model:
: Final corrected PV output
: PV output predicted by the base physical or empirical model
: Prediction error between the forecast and actual output
: Residual model that fits the error term
Equation (4) is particularly suitable for handling sudden weather changes or model underperformance scenarios.
(3) Federated Learning and Edge-Side Fine-Tuning
Due to the sensitive and private nature of operational data at individual PV power plants, federated learning can be employed. In this approach, model parameters are trained locally at each edge node without uploading raw data. The cloud then aggregates model weights to update the global model:
where:
: Global model parameters (weight vector) aggregated in the cloud
: Number of training samples at the i-th edge node
n: Total number of training samples across all edge nodes
: Model parameters trained locally at the i-th edge node
On the basis of conventional physical models, the PV model further incorporates factors such as component aging, conversion efficiency, and external environmental influences during operation, while also taking into account daily, monthly, and seasonal variations in solar irradiance. By training the corresponding parameters using global data from the power grid, the generalization capability of the PV model under various operating conditions can be enhanced, enabling it to satisfy computational requirements arising from dynamic changes across different regions and time periods of the grid, and allowing direct invocation during modeling at the edge and terminal layers.
Wind Power Models share similar characteristics with PV models, and can follow the same data-driven modeling and update mechanisms. Thermal Power Units involve parameters such as frequency regulation characteristics, upper and lower output limits, ramp rates, and heat rate curves, which vary over time and require periodic identification and adjustment using historical operational data. Hydropower Units are influenced by water head, penstock dynamics, and reservoir scheduling plans. Parameters related to their governor systems must be adaptively tuned online to maintain accuracy. Energy Storage Systems (ESS) experience dynamic changes in parameters such as charge/discharge efficiency, capacity degradation coefficients, internal resistance, and voltage–power response characteristics due to time, temperature, and cycling. These parameters must be periodically identified and updated online. Load Models must support high-accuracy short-term forecasting, long-term adaptability, and multi-scenario transferability. AI-based load models are constructed by integrating meteorological data, time-of-day features, and user behavior patterns. The power network model’s accuracy depends on real-time data such as network topology, line parameters (impedance, admittance), transformer status, and operating conditions. By integrating PMU and SCADA data and applying data-driven algorithms, the model parameters can be dynamically updated to reflect actual network behavior.
3.2. Model Organization
To support the multi-region, multi-task, and multi-scale computational requirements of power systems, the cloud model must be equipped with capabilities for task decoupling and unified interfacing. This enables the model to serve upward for system-level planning, assessment, and dispatch decision-making, while also supporting downward deployment of lightweight models to edge and terminal layers through distillation or pruning, ensuring adaptability to regional grids, microgrids, or local control systems.
The various physical entities in the power system (generators, photovoltaic units, loads, energy storage devices, grid components, etc.) are modeled in a standardized manner. Each object is abstracted as a “Model Meta-Object,” which consists of the following structure:
To enable cross-scenario model transfer and composite training, all physical entities in the power system—such as generators, PV units, loads, storage systems, and grid components—are abstracted using a standardized modeling approach. Each entity is defined as a Model Meta-Object, which includes the following components:
(1) Static Parameters: Rated capacity, response coefficients, controller parameters, device constraints, etc.
(2) Dynamic States: Frequency, voltage, power output, state of charge (SOC for storage), etc.
(3) Control Interfaces: Frequency regulation, voltage regulation, load response, storage control strategies, etc.
(4) Label Information: Region affiliation, equipment type, control hierarchy, etc.
All model meta-objects in the power system are encapsulated in a modular structure, facilitating the composition of edge-layer models and the invocation of models at the terminal layer.
3.3. Graph-Structured Modeling
The cloud model serves as the central hub of the large-model architecture for new power systems. Building on the standardized representations of the various objects, it performs graph-structured modeling with unified inputs and outputs, supports collaborative training of different tasks, continual learning, model distillation, and model evolution. The cloud model establishes the global structural representation of the power system and, for online applications in local grids, provides model meta-objects and topological connectivity information. This enables the construction of efficient and accurate lightweight local models, supporting intelligent operation requirements across multiple regions, tasks, and scenarios.
The spatiotemporal coupling and connectivity among different objects can be organized for the entire network using graph-based methods, so as to meet the analysis needs of different grid areas and different types of problems.
The power grid is modeled as a graph G = (V,E), where:
Node set V: Physical system components (e.g., generators, storage units, loads, substations)
Edge set E: Electrical connections (transmission lines), control paths, or data communication channels
Node features Xv: Encapsulate object parameters and real-time states
Edge weights Aij: Represent physical relationships such as electrical admittance, line length, etc.
This graph-based modeling approach enables a unified representation of the global power grid structure and can be used as input to a graph neural network (GNN) for node state prediction and regional-level grid operation assessment. It also supports rapid extraction and training of local subgraphs (for example, for regional grid dispatch optimization problems).
3.4. Unified Model Training
In new power systems, generation, network, load, and storage exhibit strong coupling, dynamic variability, and pronounced regional heterogeneity. To achieve collaborative optimization and dynamic updating of each model meta-object, it is essential to rely on a unified cloud-based data training framework. This framework aggregates multi-source heterogeneous data—such as wind and PV output, power grid flows, load time series, electricity prices, and energy storage SOC—under a unified standard. Through standardized processing and feature extraction, it provides consistent, high-frequency, and traceable inputs for all types of model meta-objects.
During training, wind and solar forecasting models depend on meteorological and output data; load models integrate factors such as temperature, time series characteristics, and user behavior; grid models require the combination of power flow and stability status; and storage models focus on SOC and charge–discharge response. The cloud data platform can simultaneously meet these multidimensional input requirements for all such models.
In addition, by leveraging training feedback and operational error analysis, the system can dynamically adjust parameters such as efficiency, capacity limits, and response delays, thereby enabling continuous evolution of model meta-objects under a unified data-driven paradigm. This unified data training mechanism not only enhances modeling consistency and collaborative performance but also lays a solid data foundation for integrated scheduling of generation, grid, load, and storage.
4. Edge Model
In the cloud-edge-terminal architecture, the cloud layer centrally builds and maintains the various “generation–grid–load–storage” object models (such as renewable energy models, load models, grid models, and energy storage models), enforcing unified standards, continuous training, and dynamic updating. The edge-layer models, by contrast, target typical engineering problems in power systems (such as power flow calculation, reactive power optimization, and stability analysis). Centered on these problems, they combine, reconstruct, and parameterize the model meta-objects to form edge computing models that are deployable and capable of fast response.
As the “intermediate logical hub” in the cloud-edge-terminal collaborative framework, the edge-layer model is mainly oriented toward specific typical power system problems and is responsible for decomposing system-level tasks, conducting regional modeling, and performing rapid problem solving. Its core role is to construct problem-oriented analytical models tailored to different tasks, based on the unified object models and structural parameters provided by the cloud.
In the edge-layer modeling process, the required model parameters and state information for generators, loads, energy storage, and grid topology in the target region are first retrieved from the cloud. These are then combined with problem objectives and control constraints to construct the corresponding mathematical models. This paper mainly analyzes the edge-layer models for three typical problems: optimal power flow, reactive power optimization, and stability.
4.1. Edge Model Architecture Analysis
(1) Optimal Power Flow (OPF)
For the Optimal Power Flow (OPF) problem, an optimization model is constructed over the entire regional network, considering nodal and line constraints as well as economic objectives:
where:
: Forecasted load at node i, obtained from the cloud-based load AI model, updated every 5–15 min
: Generator output, obtained from the edge-side OPF solution, real-time optimization variable
: State of charge and efficiency of energy storage systems, obtained from the cloud-based storage model, calibrated daily or in real time
, : Charge/discharge power
: Line parameters of the grid, obtained from the cloud-based grid model, considered quasi-static
Pspilled: Renewable energy curtailment, derived in real time from the cloud-based wind/PV models
ci(P): Cost function of thermal generators, sourced from the cloud-based thermal unit model, updated periodically
: Penalty coefficient for curtailment, reflecting economic loss or carbon cost due to unutilized renewable energy
: Operational cost of energy storage, including charge/discharge loss, degradation, and price arbitrage
: Weighting factor for storage cost, reflecting its importance in the overall optimization objective
This model integrates cloud-provided parameters and predictions with real-time variables optimized at the edge, allowing for region-specific, adaptive OPF solutions.
(2) Volt-VAR Optimization (VVO)
For the Volt-VAR optimization problem, information from local inverters and reactive compensation devices is extracted to construct a local voltage-constrained optimization model. The objective is to minimize voltage deviations or reactive power losses, thereby ensuring voltage compliance and achieving optimal reactive power distribution:
where:
: Voltage magnitude at node i
: Target voltage at node i (set by cloud platform or dispatch center)
: Reactive power flow on branch ij
: Reactive power transmission loss coefficient
: Weighting factor for voltage importance at node i
This localized optimization model ensures voltage stability and efficient VAR support within the regional distribution or microgrid environment.
(3) Transient Stability
For transient stability issues in power systems (such as rotor angle, frequency stability), a transient stability assessment model is constructed based on cloud-side dynamic models of generator control, excitation systems, renewable energy sources, loads, and energy storage. The structure of the edge-layer stability model is as follows:
where:
: Generator input power: determined by the cloud-side governor model and adjusted according to the control strategy;
: Electrical power
D: Overall system damping coefficient, pushed from the cloud-side model;
H: System equivalent inertia (including virtual inertia), obtained by aggregating the cloud-side generator, renewable energy, and energy storage models and then pushed to the edge;
: Deviations of stability-related variables.
4.2. Edge Model Generation and Training
The edge-layer models are oriented toward the analysis needs of typical power system problems. By extracting relevant object models and global state information from the cloud-based global model, they can rapidly generate various problem-specific models (such as optimal power flow and stability analysis), improving response speed while ensuring computational accuracy, and thus offering good real-time performance and scalability.
In terms of modeling mechanisms, the edge-layer models are built on the graph structure of the cloud model and introduce a “task graph generation” mechanism. They select task-related device nodes (such as generation, load, and energy storage), extract the corresponding topology, and combine dispersed object models according to task requirements, thereby forming model structures tailored to typical power system problems.
The edge-layer models also need to be custom-trained and rapidly fine-tuned in conjunction with actual operating scenarios. When training data at the edge is insufficient, a few-shot learning strategy can be adopted: designing feature transfer modules based on the existing model structures for typical problems and coupling them with cloud pre-trained models to achieve rapid convergence of both structure and parameters, thereby enhancing the model’s generalization and adaptability in new scenarios.
To further accelerate analysis while maintaining result accuracy, the edge layer can deploy a “task template generation mechanism + fast optimization/decision engine,” such as heuristic search, approximate linear programming, and AI-assisted solvers. Edge-layer modeling should also support model pruning, compression, and distillation, providing training data and teacher-model outputs for the terminal side, and distributing “lightweight models” to enable low-cost deployment at terminals.
Within the cloud-edge-terminal architecture, the edge-layer models play a key bridging role in mapping “from global to local.” Through standardized composition of model components and graph-based modeling, the edge layer can rapidly construct models online that are consistent with actual operation, thereby supporting efficient, real-time analysis and practical deployment of typical power system applications.
5. Terminal Model
5.1. Terminal Model Distillation
The terminal model is deployed at the edge of the power system, such as in distribution automation systems, microgrid master controllers, or campus energy management systems (EMS). It is responsible for high-frequency, localized decision-making and control, and must meet strict requirements for low computational load, real-time responsiveness, and strong adaptability.
Due to limited computing resources, terminal models cannot execute full-scale models. Therefore, a knowledge distillation mechanism—from edge model (teacher) to terminal model (student)—is adopted to train lightweight models.
The terminal model distillation process includes the following steps:
(1) Training Dataset Generation at the Edge Layer
This includes input features (such as load, voltage, and disturbance information) and teacher model outputs (such as optimal dispatch outputs, reactive power responses, and stability classification results).
(2) Student Model Architecture Design
Lightweight neural network structures are selected based on task requirements, such as MLP (Multi-Layer Perceptron), CNN (Convolutional Neural Networks), Attention-based models
(3) Distillation Loss Function Definition
The loss function may include a combination of supervised error and distillation KL divergence, along with regularization terms.
(4) Local Training and Fine-Tuning
Model parameters are fine-tuned using on-site operational data at the terminal to enhance adaptability.
(5) Online Deployment and Update
The terminal model supports periodic or event-triggered synchronization and updates to maintain performance in dynamic environments.
The terminal model can achieve “edge-level approximation with rapid response,” providing the local power grid with fast and efficient autonomous operation optimization strategies.
5.2. Terminal Model Architecture Analysis
(1) Optimal Power Flow (OPF)
To enable fast analysis of local optimal power flow (OPF) scheduling problems at the terminal, a lightweight multilayer perceptron (MLP) network is trained using a distillation approach, with the edge-layer OPF model outputs serving as the teacher model. The student model takes as inputs the local node loads, voltage states, and topology encodings, and outputs the predicted optimal generation setpoints for each node. A power-balance regularization term is introduced into the loss function to reinforce physical constraints within the model, ensuring that the results are more consistent with actual operating conditions.
The first term is the mean squared error (MSE) of predicted generation outputs
The second term ensures power flow balance
This model is trained using supervised learning on typical regional grid datasets and supports millisecond-level online inference.
(2) Volt-VAR Optimization
For voltage/reactive power optimization, the teacher model adopts the full edge-layer V–Q optimization algorithm to solve for the optimal reactive power distribution. The student model is built using a shallow convolutional neural network combined with an attention mechanism, taking local voltage states and equipment configurations as inputs and outputting the reactive power control commands for the corresponding nodes. During training, apparent power limits and voltage bounds are introduced as regularization terms to ensure that the voltage control results are executable and compliant.
where:
: Reactive power outputs from the edge-side Volt-VAR optimization model (teacher);
: Apparent power limit violation penalty;
: Voltage limit violation penalty.
This model can be deployed at substations or distribution automation nodes, enabling rapid reactive control in response to voltage disturbances. The CNN + Attention architecture enables fast and efficient decision-making for reactive power allocation (e.g., among inverters, SVCs, etc.).
(3) Transient Stability
To enhance the terminal system’s online assessment capability for disturbances in frequency, voltage, and rotor angle, this paper develops a lightweight stability classification neural network that retains the key node objects and state variables of the terminal grid. The student model takes disturbance feature sequences and the system’s initial state as inputs, extracts temporal features via 1D convolution, and outputs a stability score through a shallow fully connected network. During training, in addition to label-supervised learning (cross-entropy loss), a temperature-scaled distillation KL-divergence term is incorporated to fully capture the boundary information of the more complex edge-layer stability model.
where:
: Standard classification loss between student prediction and true label
: Distillation balance factor, adjusting the weight between the two losses (range: 0–1)
T: Parameter for softening logits in distillation
KL(⋅): Kullback–Leibler divergence measuring distributional distance
: Softmax probability distribution under temperature T
: Logits output from the teacher model, i.e., the unnormalized prediction scores
This final model provides fast, local assessment of grid disturbance stability, enabling rapid decision-making for emergency control at the terminal level. It allows the system to quickly judge whether the current disturbance will result in stable or unstable behavior in terms of frequency, voltage, and rotor angle dynamics.
5.3. Distillation Parameter T
In model distillation, the temperature parameter T plays a crucial role as a hyperparameter that controls the smoothness of the soft labels. It adjusts the softmax probability distribution output by the teacher model:
When T = 1: The softmax output approaches hard labels (close to 0 or 1), with information highly concentrated.
When T > 1: The output becomes smoother, revealing more inter-class structure. This allows the stu dent model to learn the reasoning path of the teacher, not just its final decision.
When T → ∞: The output approaches a uniform distribution.
When T < 1: The output becomes sharply peaked, overly biased toward the maximum class with reduced training signal.
In smart modeling for new power systems, the T is not only a training parameter but a core mechanism for balancing accuracy and generalization. For highly volatile wind and solar resources, smooth soft labels help the model better perceive boundary states and fuzzy classifications. For systems with multi-source coupling (e.g., combinations of generators, storage, and loads), a higher T reveals relative relationships among suboptimal outputs, helping the student model understand complex couplings. In distributed power structures (e.g., microgrids or industrial parks), tuning T allows the model to adapt to heterogeneous local topologies, enhancing transferability and generalization.
(1) Softmax with Temperature
The teacher model produces soft labels using temperature T, and the student model follows:
(2) Distillation Loss (KL Divergence)
KL Divergence measures the difference between the teacher’s and student’s softened output distributions:
(3) Cross-Entropy Loss
Standard classification loss between student predictions and true labels:
where
is the predicted probability for class i from the student model.
(4) Distillation Temperature Regularization
To prevent T from drifting too far from its optimal range during training:
(5) Total Loss Function
Combining all components into the overall loss:
Training Steps per Batch:
(1) Forward
Teacher model:
Student model:
Compute softened outputs:
(2) Compute Loss using Equation (17)
(3) Backpropagation and update model parameters:
(4) Update
T Parameter:
where
,
are learning rates for model weights and T, respectively.
A proper choice of the distillation temperature T can significantly enhance the terminal model’s adaptability to dynamic disturbances, boundary operating conditions, and cross-regional applications, and is therefore crucial for enabling the practical deployment of AI large models in new-type power systems. The parameter T depends on many factors. In practice, an appropriate value of T can be determined through a certain amount of pre-training, and then further adjusted via additional training using task-specific data, before being deployed in the lightweight terminal model.
In principle, to cover all possible dynamic variations in the terminal grid, T should be obtained by training on the complete set of dynamic data and computing it according to Equations (13)–(20). However, it is difficult to acquire all such data in real applications. In particular, Equation (20) determines T by computing the various loss terms between the teacher-model and student-model outputs over the entire training set, and then selecting T based on the magnitude of these losses. Therefore, in practice, T may be determined using limited routine operating data via pre-training to obtain a locally suitable value, and then adjusted as needed. For example, a commonly used value is T = 5. In this paper, we further evaluate T over the range 1–10 and select different T values according to the corresponding results so as to match different operating conditions. A lightweight model trained with a well-chosen T can balance speed and accuracy, making it suitable for deployment in edge/local grids that require fast response.
6. Case Study
6.1. Simulation Case Scenarios and Data Generation
In this paper, a regional power grid is used as the foundation, with the various object models and topological structure of the grid taken as references. The IEEE 33-bus distribution system, the MG-14 microgrid system, and the local grid of an industrial park are selected as three independent terminals, which are connected into a unified grid to form test cases, on the basis of which the cloud-edge-terminal model framework is constructed.
On the cloud side, unified construction and training of the various object models in the grid are carried out. Based on the cloud models, the edge side organizes the modeling of typical problems such as optimal power flow, reactive power/voltage optimization, and stability analysis. Finally, these typical problem models are distilled into lightweight models and deployed in the three local terminal grids. The three simulation scenarios are summarized in
Table 1. Scenario S1 corresponds to the IEEE 33-bus distribution system, where a 3-layer MLP model is used to implement terminal applications for power flow prediction and stability analysis. Scenario S2 corresponds to the MG-14 microgrid system, where an MLP combined with a constraint module is used to implement terminal applications for reactive power optimization and stability. Scenario S3 corresponds to the industrial park system, where a CNN+LSTM model supports terminal applications for optimal power flow and stability analysis.
In the simulation studies, large-scale multi-scenario samples are first generated using Latin hypercube sampling (LHS) based on the probability distributions of wind/PV output and load demand, yielding sufficiently rich joint time-series data for renewables and loads [
21]. This is then fused with historical grid operation data (such as power flows, transmission corridor constraints, and equipment status) to form input datasets for power flow, reactive power optimization, and stability analysis. For the energy storage component, SOC-based sliding scheduling logic and disturbance discharge commands are defined to simulate multi-strategy response processes, thereby characterizing the charging and discharging behavior of storage under different scenarios. Grid topology disturbances are constructed via corridor reconfiguration, islanding switching, generator tripping, load transfer, and other operations to emulate changes in operating modes and the propagation of contingency chains. Stability events are created by artificially injecting short-circuit faults, power steps, and load steps, producing frequency and voltage dynamic response data. In this way, a joint dataset combining “multi-scenario steady-state + dynamic events” is ultimately formed, which is used to train the cloud-edge-terminal models and to carry out case validation.
6.2. Development of the Cloud-Edge-Terminal Model Architecture in the Simulation System
As shown in
Figure 2, the cloud layer acts as the global modeling hub of the power system, integrating real-time operational data, historical logs, and equipment status information from across the grid. Using deep modeling approaches such as GNNs and Transformers, it builds generalized model structures and parameter templates for a wide range of objects, including generators, energy storage units, PV systems, loads, and induction motors. Its core outputs include unified object models, trained parameter sets, and abstract graph representations of the grid structure, which serve as the foundation for the edge layer to construct typical problem models such as power flow and stability analysis.
The edge layer constructs models for typical problems not directly from raw data, but on the basis of the device models and parameters provided by the cloud. It builds problem-oriented model structures and dynamically generates models for specific tasks such as optimal power flow (OPF) and dynamic stability. In addition to performing global optimization, dispatch, and stability assessment for these typical problems, the edge layer also produces soft-label data and intermediate feature representations to guide the distillation training of lightweight terminal models.
Terminal models are deployed in local grids and focus on fast-response application scenarios such as real-time power flow prediction, reactive power optimization, and stability analysis. Based on the typical problem models at the edge and combined with local real-time operating data, the terminal performs parameter fine-tuning or rapid distillation to obtain lightweight neural networks (e.g., MLPs, CNNs). These terminal models preserve local regional characteristics while partially inheriting the structural knowledge of the cloud and edge models, thereby achieving efficient, high-accuracy, and locally aware control and decision-making capabilities. Terminal models are the final executors in the cloud-edge-terminal intelligent collaborative framework.
6.3. Computational Results and Analysis
Based on the cloud-edge-terminal framework, the final terminal models generated from the cloud and edge are used to carry out simulation calculations for three types of tasks—power flow prediction, reactive power optimization, and transient stability classification—in the three terminal grids, respectively. The results are then compared with those of traditional centralized models, as shown in
Table 2.
In the power flow prediction task, the global generation–grid–load–storage model trained in the cloud achieves high accuracy, but its size is too large to be directly deployed at the terminal. Based on the edge-layer power flow model, a lightweight terminal model is obtained through data training and subsequent distillation, which yields a low error in terms of the MSE metric. As shown in
Table 2, taking IEEE 33 as an example, the MSE of the traditional model for power flow prediction is 0.0221. As illustrated in
Figure 3, for different values of
T, the MSE varies across terminal models; for the distilled terminal model, when
T = 5, the MSE is 0.0245, an increase of only about 10.9%. At the same time, the model size is reduced by 83% and the inference time is shortened by about 132 ms, significantly improving the real-time power flow response performance of the local grid.
In the reactive power optimization task, the traditional model requires a relatively long time per iteration and cannot satisfy the frequent control demands of various industrial processes in an industrial park. As shown in
Figure 4, after distillation at the terminal, the edge-layer GNN-based reactive power optimization model can quickly output node-level reactive power adjustment strategies, with an average control error of less than 3% and a response delay kept within 30 ms. Therefore, when the load in the industrial park changes frequently, the lightweight terminal model provides rapid control and better meets the optimized power-use requirements of industrial production.
If only minor changes occur in the nodes or branches of a terminal grid, the existing lightweight terminal model can be adapted through parameter fine-tuning. For example, in the IEEE 33-bus system, when a PV generation unit with a capacity equal to 40% of the bus capacity is added at one node while the overall grid topology remains unchanged, the input power at that node becomes stochastically varying, which affects power-flow prediction to some extent. This change can be accommodated by fine-tuning the model parameters. As shown in
Table 2, the response time increases by 17 ms, while the model size remains unchanged and the accuracy is almost unaffected.
For the transient stability classification task, various model objects that have been uniformly trained in the cloud are combined with local microgrid operating characteristics to construct the edge-layer model, and a compact and effective terminal stability classifier is obtained via distillation. In the MG-14-bus microgrid, this model achieves a rapid stability classification accuracy of 94.7% for three types of disturbances (voltage sag, load disturbance, and source tripping), which is about 7.2% higher than that of traditional methods.
The advantages of the cloud-edge-terminal architecture are reflected not only in the flexibility of model deployment and the improvement in inference efficiency, but more importantly in its ability to rapidly adapt to the operating conditions of terminal grids. Global modeling on the cloud ensures consistency of model structures and forward scalability; the edge layer can quickly construct models for typical power system problems; and on the terminal side, lightweight predictors or classifiers are formed through model distillation, meeting diverse analytical requirements for power grids with different structures and target tasks.
6.4. Analysis of the Distillation Parameter T
As the final executor in the cloud-edge-terminal architecture, the terminal model’s distillation parameter T directly determines the ultimate tradeoff between result accuracy and response speed. In practical settings, when the output varies smoothly, T can be chosen from [1, 2]; for small variations, from [2, 4]; and for larger variations, from [4, 10]. In our case studies, power changes in power-flow results can sometimes be large; in reactive power optimization, voltage variations are relatively small but reactive power can vary substantially; and stability outcomes are even more irregular. Therefore, we analyze a relatively wide range of T ∈ [1, 10]. In what follows, different values of T are selected for three terminal grid application scenarios to perform calculations and comparisons. Compared with
Figure 3, more samples are used here, with a more uniform value distribution and wider coverage, resulting in higher MSE accuracy. We now analyze and compute the impact of different T values for the terminal models in the three local grids.
(1) IEEE 33-Bus System
After distilling lightweight terminal models from the cloud and edge, power flow prediction and stability analysis are carried out. The scenarios and task settings are shown in
Table 3.
When the distillation temperature parameter T = 1, 2, 5, 10, each task is trained with 100,000 disturbance samples (covering both power flow and stability tasks). The results are as follows:
As shown in
Figure 5, the voltage prediction MSE is minimized at T = 10, but the MSE values at the six points within the range T = 5–10 differ only slightly. The current prediction MSE reaches its minimum at T = 8, yet the MSE values across T = 5–10 are also very close, and the three points at T = 8–10 are almost identical. Therefore, in this case, for convenience, T = 5 can be uniformly adopted in practical computations.
As shown in
Table 4 and
Table 5, when the distillation temperature T = 5, the terminal model achieves the best performance on both the power flow prediction and stability classification tasks. Not only is the power flow prediction error significantly reduced, but the stability classification accuracy is also greatly improved. A moderate T value strikes a good balance between imitating the structural complexity and decision behavior of the cloud–edge teacher models and maintaining efficiency, thereby enhancing the model’s capability to identify disturbances and atypical operating states.
(2) Analysis of Terminal Models in the MG-14 Node Microgrid
The microgrid includes a high penetration of renewable energy (PV + wind), energy storage systems, and hybrid loads. The control objective is to predict the optimal SVG and SVC output strategies to maintain voltage compliance and minimize reactive power losses.
The teacher model is a combined optimization result from cloud-edge collaborative OPF, while the student model is a lightweight DNN regression model trained via knowledge distillation.
The teacher model for power flow is a jointly optimized cloud + edge OPF model, while the student model is a lightweight DNN regression model with distillation. For the stability classification task, the goal is to determine whether the system can maintain dynamic voltage stability after a disturbance. The teacher model consists of dynamic stability simulation combined with a rule-based boundary model, and the student model is a lightweight MLP-based stability classifier trained with distillation.
For distillation parameters T = 1, 2, 5, 10, each task is trained with 50,000 disturbance samples and tested with 10,000 samples.
According to
Figure 6, the voltage-deviation MSE reaches its minimum at T = 6, but the MSE values at the six points within the range T = 5–10 differ only slightly. The reactive control error MSE is minimized at T = 7, yet the MSE values at the three points within the range T = 5–7 are also very close. Therefore, in this case, T = 5 can be uniformly adopted in practice for computation.
As shown in
Table 6 and
Table 7, when T = 5, the terminal model also achieves optimal performance in terms of reactive power control error and stability classification accuracy. It significantly reduces voltage deviation and reactive power control error, better realizing nonlinear control strategies and enhancing stability pre-assessment capability. An appropriately chosen distillation temperature helps to keep the terminal model lightweight while strengthening its ability to predict voltage and stability states under ambiguous conditions, thus meeting the fast response requirements for microgrid terminal deployment.
(3) Industrial Park System
In industrial parks, there are many impact loads with strong nonlinear fluctuations. The power supply typically adopts a hybrid configuration of PV + energy storage + utility grid, supporting both islanded operation and grid-connected modes. The control objectives are high-precision power flow prediction and fast stability identification. To avoid affecting actual industrial production, voltage, power, and fault control are required to respond in real time with very low latency.
For the terminal OPF prediction model, the inputs include current load, power source status, and power boundaries, while the output is the optimal power flow distribution. The teacher model is an OPF solver based on global constraints, and the student model is a lightweight DNN-based regression model.
For stability classification, the inputs are the system state before a disturbance and the disturbance type, while the output is a binary classification of system stability. The teacher model combines cloud-edge stability criteria, and the student model is a lightweight MLP classifier trained via distillation.
Distillation parameters were set as T = 1, 2, 5, 10. Each task used 100,000 disturbance samples for training and 20,000 samples for testing.
According to the results in
Figure 7, the voltage-error MSE is minimized at T = 5, whereas the power-error MSE is minimized at T = 10. However, the power-error MSE values at the six points within the range T = 5–10 do not differ significantly. Therefore, in this case, T = 5 can be uniformly adopted in practice for computation.
As shown in
Table 8 and
Table 9, when the distillation T = 5, the terminal model achieves a significant reduction in voltage and reactive power errors in the power flow prediction task. In the stability classification task, the accuracy improves to 95.9%, with an F1-score of 0.94, clearly outperforming locally trained standalone models. These results show that a moderate distillation parameter helps simplify the terminal model’s decision strategy under complex boundary conditions, enhancing its ability to handle multiple types of disturbances and atypical states in the industrial park power grid. At the same time, the model inference time remains around 18 ms, meeting the requirements for online rapid deployment in industrial power control scenarios within the park.
The distillation parameter is not just a training hyperparameter, but a key tuning factor for enhancing the practicality and deploy ability of terminal models in new power systems. Based on the results obtained for different T values, T = 5 generally yields the best performance, or performance close to the best. For convenience, the case studies in this paper adopt a unified setting of T = 5. In practice, for a terminal grid, it is often only feasible to pre-train on limited operational data to obtain a value of T that is suitable for the current operating conditions. To accommodate other operating scenarios, T needs to be adjusted accordingly. For instance, a commonly used setting is T = 5. In this paper, we evaluate T over the range 1–10 and determine appropriate T values based on the corresponding results so as to match different operating conditions. With a well-chosen T, the lightweight model can achieve a good balance between speed and accuracy, making it suitable for deployment in edge/local grids that require fast response.
7. Conclusions
Traditional power system models are typically developed separately for each canonical problem (such as power flow calculation and stability analysis), and thus lack global awareness of the overall coupling characteristics of the system. As a result, they struggle to meet the demand for fast analysis under the conditions of new power systems, which are characterized by multi-source, stochastic, heterogeneous data and highly complex, variable operating modes. In view of these challenges, this paper carries out an in-depth study and presents the following contributions:
(1) A cloud-edge-terminal large-model architecture for power systems is proposed to achieve unified global perception. On the cloud side, unified modeling and training are performed for various objects in generation, grid, load, and storage. On this basis, the edge side can uniformly deploy typical problem models, while at the terminal side, lightweight models are generated online via distillation to meet the requirements of fast and accurate computation in local grids. Taking typical problems such as power flow analysis, stability assessment, and reactive power optimization as examples, the paper investigates cloud-edge-terminal collaborative modeling methods for power systems.
(2) To address the need for fast and accurate modeling of local grids under the globally large-scale, high-dimensional, and dynamically stochastic structure of real-world power systems, a training method for setting the distillation parameter T is proposed. This method can effectively improve the accuracy and generalization capability of terminal grid models and exhibits strong adaptability across different application scenarios, making it a key means for enabling the practical deployment of intelligent terminal modeling in new power systems.
The cloud-edge-terminal model architecture for new power systems not only provides global perception, but also organizes edge-layer models and distills local models online, thereby meeting the requirements for fast and accurate local computation. It represents an important direction for the development of intelligent modeling in power systems and serves as a key enabling technology for the analysis of future power grids with a high penetration of renewable energy.