1. Introduction
The transformation of the traditional power grid into a smart grid, along with the continuous development of distributed energy resources, has gradually evolved the traditional distribution network into an active distribution network with bi-directional power flow [
1]. This has led to an increase in the complexity of the operation state and control mode. With the advent of Industry 4.0, the introduction of intelligent devices and systems has made the intelligent transformation of distribution networks possible and provided a pathway for various emerging digital technologies. Digital twins (DTs), as a part of Industry 4.0, play an important role. By utilizing advanced IoT, big data analytics, and AI technologies, DT enhances production efficiency and product quality while providing strong support for predictive maintenance and optimized decision-making [
2]. The application of this technology is not limited to manufacturing; the use of DT in power systems marks a significant step towards smarter, more efficient, and more resilient energy infrastructure. By leveraging real-time data, advanced analytics, and predictive capabilities, DT technology transforms the management and operation of power systems, paving the way for a more sustainable energy future.
A DT is a virtual mapping of physical entities. It constructs models of physical objects using a data-model hybrid-driven approach based on measured data to achieve functions such as real-time state perception, future state projection, closed-loop control, and other functions. DTs have been gradually applied in practice in many fields such as manufacturing, aeronautics and space and urban management [
3]. In the field of power systems, DT technology has been the subject of initial research and subsequent application. Reference [
4] develops a second response grid DT online analysis system, which achieves millisecond data processing in the grid. Reference [
5] gives predominant focus to a DT-based longitudinal protection method for DC grids. This method compares real measurements with virtual measurements through a dynamic state estimation method. Reference [
6] offers a DT-driven multi-agent coordinated optimization control strategy of the smart microgrid. This enhances the ability of microgrids to be aware, predict and adapt. Reference [
7] establishes a digital twin model of the steam turbine system in a thermal power plant, optimizing the system through the indicators provided by the model and effectively improving efficiency. Reference [
8] achieves the precise localization of real-time cyber attacks by establishing a digital twin reference model for the distribution network. Reference [
9] proposes a digital twin-based distributed energy coordination control method, minimizing the need for real-time communication and achieving the overall coordination of distributed energy resources.
The DT relies on the real-time input of distribution network data for modeling and analysis, so the accuracy of the measurement data is the main factor affecting the modeling of the DT distribution network. In practice, the measurement data are not completely real, and a small amount of dirty data enters the measurement data set due to sensor damage, data attacks, etc., which seriously affects the accuracy of distribution network modeling. Large-scale access to renewable energy makes distribution network measurement more complex and diverse, with randomness and volatility, which also poses challenges to the identification of bad data. Traditional bad data identification methods such as the residual search identification method [
10], non-quadratic criterion identification method [
11], and estimation identification method [
12] are used to identify bad data after state estimation operations, which are inefficient and prone to identify fluctuating or sudden changes in the normal measurement data as bad data, contributing to inaccuracies in the DT distribution network and influencing the subsequent dispatching and coordinated control of the distribution network.
State estimation (SE) is a crucial part of state perception and secure and stable operation of distribution network and it is also the foundation for the DT distribution network modeling. Commonly used distribution network SE algorithms are a power branch method based on weighted least square (WLS) or weighted least absolute value (WLAV) [
13], as well as SE algorithms that consider robustness [
14,
15,
16]. These algorithms are not applicable to the complex situation of multiple sources of real measurement data and the computation process is complicated and time-consuming making it fail to meet the needs for real-time awareness of DT distribution networks. In [
17], the authors combine WLS and WLAV to make the SE model more robust, but the computation is slow. In [
18], the branch current state estimation (BCSE) is used, which sets the measurement and state variables as current data, to make the SE linear and speed up the iteration, but it still needs to go through flow calculation to solve the voltage state quantity. Reference [
19] applied linear Bayesian theory to SE, analyzing the impact of measurement errors on the algorithm and improving the performance of state estimation. With the development of artificial intelligence technology, many scholars have started to adopt machine learning (ML) methods to estimate the state of distribution networks, such as References [
20,
21,
22], etc. However, due to the complexity of the distribution network model, it is difficult to establish an accurate mapping by the learning method, and it requires a large amount of historical data support.
This paper adopts SE as the method for the DT modeling of distribution networks with a high proportion of renewable energy, capturing the real-time state of the physical distribution network and identifies and corrects the distribution network measurement data prior to SE. The contributions of this paper are as follows:
- A novel bad data identification and correction method is proposed, which identifies bad data based on temporal correlations and completes the data using a neural network training approach. 
- A linear SE model incorporating photovoltaic (PV) integration is established. The PV output is predicted using a neural network, and complete data for the distribution network are obtained through a linear SE algorithm based on multi-source data fusion. 
- A DT-based SE model and database for the distribution network were established on the server side and were run synchronously with the simulated physical model of the real-time digital simulator (RTDS), verifying the accuracy and real-time performance of the DT SE model. 
  2. DT for Distribution Network
The power system digital twin (PSDT) is an emerging technology driven by the increasing complexity of power system models, the dramatic growth of data, and the gradual improvement of DT technology, as illustrated in 
Figure 1. In contrast to traditional model-based simulation software and cyber physical systems, PSDT focuses more on real-time situational awareness and super-real-time virtual testing through data-driven or hybrid data-model-driven approaches to support power system operation and regulatory decision making. Especially in the smart grid domain, PSDT can provide more precise and real-time system state monitoring and analysis, enabling the smart grid to achieve self-optimization and intelligent decision-making, thereby improving grid reliability.
The DT architecture for distribution networks can be divided into four layers: the physical distribution network layer, the data processing layer, the twin distribution network layer and the twin application layer [
3]. The schematic diagram of the distribution network DT hierarchical architecture is shown in 
Figure 2. The physical distribution layer is the information source of the DT distribution network and is responsible for metering based on multi-source metering devices and transmission of distribution network data; Based on the data received from the physical distribution network layer, the data processing layer performs multi-source data fusion processing; the twin distribution layer implements real-time sensing of the physical distribution network, as well as renewable energy on the network, and continuously updates the twin model to reflect operational status in a timely manner; the twin application layer provides diversified solutions for various application scenarios in the distribution network based on the DT distribution network model. The accurate DT model is the groundwork for the DT distribution network; thus, the data-processing layer and the twin distribution network layer are crucial. The effectiveness of data processing and model establishment directly impacts the accuracy and real-time nature of the DT model, which is also the focus of this study.
  4. DT Modeling Using State Estimation with PV Forecasting
In the actual distribution network, there are three main types of measurement devices: phasor measurement units (PMUs), supervisory control and data acquisition (SCADA) systems, and new energy pseudo measurement devices. The type and accuracy of measurement data from different devices can vary significantly. In order to apply multisource measurement data in a reasonable manner with the objective of increasing SE redundancy and ensuring network observability, this paper employs a neural network to predict PV power in order to establish a pseudo measurement model of the PV system. Thereafter, the different measurement data are computed after linear transformation aiming to establish an efficient and precise state model of the distribution network.
  4.1. PV Power Generation Prediction Method Based on BILSTM
Due to external environmental factors and other influences, the output power of PV power plants is subject to significant randomness and uncertainty, which can lead to a non-negligible error in the SE. Conventional new energy prediction methodologies [
23,
24,
25] are constrained in their ability to provide accurate real-time output prediction data in the presence of more complex fluctuations. In contrast, deep learning-based PV prediction methodologies are capable of achieving satisfactory prediction outcomes in a diverse range of scenarios. However, the traditional artificial neural network (ANN), back propagation neural network (BPNN), is not optimal for handling time-series problems and may not be able to effectively capture the long-term trends and periodicity in PV power generation. Consequently, this paper employs the BILSTM neural network, as detailed in 
Section 2, to predict the real-time PV output. 
Figure 6 illustrates this process.
The method utilizes meteorological statistics of PV solar irradiance, temperature, wind speed and wind direction as inputs, with corresponding historical output data serving as outputs and sets the input time series step at 10 in order to train the BILSTM neural network. Given that the meteorological data units and sizes of the input layers differ, a normalization process is required. This is achieved through the application of the following equation:
        where 
 represents the normalized number. 
 represents the original measurement data, including all PV solar irradiance, temperature, wind speed, and wind direction data. 
 and 
 denote the maximum and minimum values of the data, respectively. All normalized data are in the range of 0 to 1. Using normalized data as input can accelerate the training process of neural networks, improve prediction accuracy, and prevent gradient vanishing or exploding problems.
The Input of real-time meteorological data into the trained model will result in the generation of the real-time PV system output pseudo-measurement model, which is required for the SE.
  4.2. Linear Transformation of Measured Data
This paper uses a measurement transformation method for converting the initial quantity measurements of the distribution network uniformly into voltage and current measurements that are separated into real and imaginary parts in a rectangular cartesian coordinate system. In this instance, the PMU is responsible for measuring the node voltage phasors and branch current phasors, and subsequently performing a linear transformation of these quantity measurements using the following equation.
        
        where 
 and 
 are the voltage phasors measured by the PMU; 
 and 
 are the current phasors measured by the PMU.
In a similar manner, the SCADA measurements are converted. The voltage and current amplitudes measured by SCADA are calculated based on Equations (5) and (6). It is possible to convert the more frequent branch power measurements in SCADA measurements to equivalent branch current measurements as follows.
        
        where 
 and 
 are the branch active and reactive power measured by SCADA, respectively. 
 and 
 are the real part and imaginary part of the node voltage obtained at each iteration, respectively.
In addition, the node injection power measurements in SCADA measurements and new energy pseudo measurements can be converted into equivalent node injection current measurements as:
        where the values of node injection power, both actual and pseudo-measured, are included in both 
 and 
.
  4.3. Linear State Estimation Model for Distribution Network
After the above series of measurement transformations, the SE measurement equation is constructed as:
        where z is the measurement vector. The measurement vector function is represented by 
. And 
 is the error vector for z and 
.
The state variable 
 is set to the real and imaginary parts of the voltages of the n nodes of the distribution network, where:
The z is shown below:
        where 
 and 
 are the converted real part and imaginary part voltage measurement. 
 and 
 are the converted node injection current real part and imaginary part measurement. 
 and 
 are the converted branch current real part and imaginary part measurement.
Once the distribution network is connected to the PV system, the corresponding node injects active and reactive power. However, the inverter connected to the PV power supply generates reactive power at a high cost, which means that the PV system usually only outputs active power. Consequently, the reactive power 
 emitted by the PV system in the distribution network is zero. The SE model with equation constraints can be constructed by using the weighted least squares method and considering the reactive power injection constraints at PV nodes. And this model can be described by the following equation.
        
        where 
 represents the zero injection node constraint function, and 
 denotes the least squares estimation objective function.
The Lagrange multiplier method is used for the solution of the above mathematical model. The Lagrangian extremal function is written here as
        
        where 
 is the objective function.
The partial derivatives for the state variable 
 and the constrained phase 
 are as follows
        
        where 
 is the zero injection measurement constrained Jacobi matrix, and 
 is the converted fused covariance matrix.
Following the implementation of a linear transformation on a measurement, the Jacobi matrix 
H can be expressed as
        
        where 
 is the branch conductance. 
 is the branch susceptance.
It is known that the Jacobi matrix H is a constant coefficient matrix that remains constant during the iteration process, which speeds up the computation of SE. Newton’s method is employed to resolve Equation (16) in order to ascertain the state variable  for the k + 1st iteration.
If the state variables satisfy , the value of  is output as the final estimation and the iteration ends.
  4.4. DT Modeling Process Based on Distribution Network State Estimation
The specific procedure of DT modeling based on the distribution network SE can be summarized as follows:
- 1.
- Collect measurement data from the physical distribution network and transfer the data to the DT database in the server via communication methods; 
- 2.
- Store the historical data in the database and identify the bad data from the real-time in-coming measurements and correct the bad data employing the predicted measurements obtained from the training of the historical data; 
- 3.
- Predict the output of PV power plants based on real-time-measured meteorological data and historical meteorological data; 
- 4.
- Use the linear SE method to iteratively solve the distribution network states with consideration of the effect of PV node power constraints; 
- 5.
- Obtain a DT model that reflects the real-time state of the distribution network. 
According to the description above, the specific flowchart of DT modeling based on distribution network SE is depicted in 
Figure 7.
  5. Case Study
To simulate the real operating conditions of the distribution network and verify the accuracy of the DT model established in this paper, a hardware-in-the-loop (HIL) simulation method is used. The distribution network model is built in RTDS using RSCAD software to simulate the operation of the actual physical distribution network. The DT mathematical model is established on the server side. Real-time simulation in RTDS communicates with the server via an Ethernet switch, exchanging information using the user datagram protocol (UDP). Compared to the transmission control protocol (TCP), UDP has lower communication latency, enabling faster data transmission from RTDS and ensuring the timeliness of the DT model. Various measurement modules in RTDS measure the distribution network data, which is output from RTDS through communication boards, passed through the Ethernet switch, and then input into the DT database established on the server based on MySQL 8.0.20 (Oracle Corporation, Austin, TX, USA). The DT mathematical model extracts data from the database for calculations, establishes an accurate DT model, and stores the calculation results in the historical database. The DT test platform is shown in 
Figure 8.
The IEEE33 node 12.66KV distribution network model is built in RTDS, and the topology is shown in 
Figure 9. In order to simulate system fluctuations during real distribution network operation, the simulation accesses real-time fluctuating solar photovoltaic power generation equipment at Node 8, 16, 22 and 33, with an installed capacity of 1 MW, and adjusts the load data of other nodes (except generator Node 1) according to real load fluctuations. PMUs are installed at nodes 1, 3, 6, 11, 15, 21 and 29, and PMU measurements are highly accurate, with voltage amplitude and phase angle measurement errors of ±0.05% and ±0.005 rad, respectively. All branch circuits are equipped with SCADA meters to measure branch circuit power data, with a measurement error of ±1%. Additionally, the SCADA system measures power injection at nodes 4, 7, 10, 20, 21, 25, 27, and 31 to increase redundancy, with a measurement error of ±0.5%.
The construction of the server-side DT model of the distribution network is based on the C++ code in the Visual Studio 2022 software (Microsoft, Redmond, WA, USA) platform, where part of the C++ code is automatically generated or modified based on the code written in the Matlab 2021b software. The RTDS simulation of the distribution network model is built on RSCAD version 5.014.1, with the RTDS equipment version being NovaCor 2.0. Testing was performed in follow environment:
The computer CPU was Core i5-9300H, the master frequency was 2.40 GHz, the RAM was 16 GB, and the GPU was NVIDIA GTX 1660Ti.
  5.1. Data Evaluation Index
In this paper, we utilize three statistical measures, namely the mean absolute percentage error (MAPE), the root mean square error (RMSE) and the coefficient of determination R
2, in order to assess the deviation between the predicted value, the estimated value and the true value. MAPE, RMSE, and R
2 can be calculated as follows:
        where 
 is the predicted or estimated value of node 
 at moment 
. 
 is the true value of node 
 at moment 
. 
 is the average value of 
 node over time. 
 represents the number of consecutive time section. MAPE represents the relative error between the predicted and actual values. RMSE expresses the absolute difference between predicted and actual values. R
2 stands for the goodness of fit of the model.
  5.2. Identification of Bad Data of DT Database
Measurement data from 2000 time sections in the DT database were selected as the test set, and 5%, 10% and 15% of bad data were subsequently incorporated into the aforementioned test set for the purpose of validation. The bad data of voltage-phase quantity and current-phase quantity exhibited a 20% to 30% increase or decrease in relation to the original value. Similarly, the bad data of active and reactive power exhibited a 40% to 50% increase or decrease in relation to the original value. The measurement time series length 
T was set to 10, and the 
T−1 time series measurement data prior to the current measurement moment was designated as the corrected measurement data. In order to facilitate a comparison between the method proposed in this paper and alternative approaches, the residual search method and DBSCAN clustering method were selected for analysis. The results of this comparison are presented in 
Table 1.
As can be observed in 
Table 1, the traditional residual search method and clustering method fail to account for the temporal correlation of fluctuating data. Consequently, as the proportion of bad data increases, the missing detection rate and false detection rate of these two methods escalate significantly, whereas the method proposed in this paper maintains a high degree of discrimination in all cases, with a missing detection rate and false detection rate that are consistently lower than those of the residual search method and clustering method. This enables the accurate identification of bad data.
  5.3. BILSTM Measurement Data Prediction Accuracy
SCADA measurements account for the largest proportion of measurement data, and the proportion of bad data is also higher. Consequently, this section mainly focuses on the verification of the prediction accuracy of branch power based on SCADA measurements. In this section, the BP neural network [
26], LSTM [
27], and BILSTM methods are employed for comparison, respectively. Measurement data from 2500 consecutive time sections in the DT database were selected for experimentation, with 80% of these utilized as the training set and 20% as the test set. The predicted values of active and reactive power for branches 3–4 in the 500 time sections of the test set were selected for comparison. 
Figure 10 and 
Figure 11 demonstrate that the predicted values of BILSTM have the best fit to the true values.
In 
Table 2, the predictive accuracy of each method is evaluated by calculating the mean of the MAPE, RMSE and 
R2 values for all predicted branch power measurements. As can be noted from 
Table 2, BP neural networks have lower accuracy for time-series problems. In contrast, LSTM networks demonstrate greater efficacy than BP networks for time-series or natural language data. BILSTM networks combine forward and reverse LSTM structures to more comprehensively capture and understand the contexts and dependencies in the time series data, and reveal a higher prediction accuracy for bad data correction.
  5.4. DT Model Validation Based on State Estimation
  5.4.1. Accuracy of PV Output Prediction
Using the method proposed in this paper to predict the output of the PV power station, the predicted values of the 22-node PV power station at 300 consecutive time intervals were selected for comparison. Compare the ANN, BP, and BILSTM methods proposed in this paper for predicting photovoltaic output. Real-time meteorological data and historical meteorological data were input into trained ANN, BP, and BILSTM models, and the photovoltaic output prediction results are shown in the following figure, 
Figure 12.
From the figure, it can be seen that the proposed BILSTM method fits the photovoltaic actual output better than the ANN and BP methods. The calculated MAPE for the BILSTM predicted data is 16.95%, and the RMSE is 4.1%. It can be concluded that the BILSTM-based photovoltaic output prediction proposed in this paper exhibits good forecasting performance, which can serve as pseudo-measurements to enhance measurement redundancy in state estimation.
  5.4.2. Analysis of DT Model Accuracy
This section introduces a comparative analysis of the WLS SE method and the WLAV SE method commonly used in distribution networks for constructing DT models, with the DT model proposed in this paper. A measurement dataset comprising 500 consecutive measurement time sections was constructed, with 10% of the bad data added. This process was conducted to simulate the presence of bad data in a real-world measurement scenario. Data for PV nodes were predicted using the method proposed in this paper. The actual value of the voltage phase quantity of a randomly selected time section was compared with the estimated value, and the results are presented in 
Figure 13. Node 10 was selected at random, and the state estimation results for the time sections 0–50 were obtained, as illustrated in 
Figure 14. 
Figure 13 and 
Figure 14 demonstrate that the DT model presented in this paper exhibits enhanced accuracy and alignment with the real distribution network model. It is capable of tracking the changes in the physical distribution network model in a synchronized manner.
The total error in SE was calculated for all nodes over a period of 500 time sections, as demonstrated in 
Table 3 Following the comparison in 
Table 3, it can be observed that the traditional WLS SE model is reliant on data redundancy to enhance the accuracy of SE. However, it lacks the capacity to identify and rectify bad data, resulting in a considerable error. In contrast, the WLAV SE model is capable of automatically reducing the weight of bad measurement data, exhibiting robust performance. Nevertheless, it is unable to address the issue of bad data, leading to a lack of redundancy in the measurement. The method proposed in this paper demonstrates enhanced robustness against bad data and the capacity to rectify such data, thereby satisfying the SE redundancy requirement, which further improves the accuracy of the SE.
  5.4.3. Analysis of DT Model Efficiency
In order to fulfil the real-time requirements of the DT model of the distribution network, it is necessary to ensure that the SE process is capable of providing the real-time distribution network state in an efficient manner. This section presents an estimation of the distribution networks and a comparison of the SE speeds of different SE models for a specific time period within each distribution network calculation. 
Table 4 displays the operational lengths of the various SE models.
As found in 
Table 4, the WLS method necessitates an increased number of iterations when confronted with complex measurement scenarios. Each iteration necessitates the recalculation of the Jacobi matrix, which is slower to converge and less computationally efficient. In contrast, the WLAV employs an absolute value loss function, which necessitates the computation of the absolute value of each residual in order to minimize the error and the optimization problem is more complex. This presents a challenge in meeting real-time demands, despite the method’s robust performance. This paper adopts an efficient method for identifying and correcting bad data, which is then used to preprocess measurement data. This process makes the SE more robust to differences. Furthermore, the linearization method reduces the Jacobi matrix to a constant matrix, which speeds up the iteration time while reducing the computational memory and greatly accelerating the computation speed. This approach is suitable for real-time demands of the DT model.
  6. Conclusions
This paper addresses the challenge of developing an accurate DT model for distribution networks. It begins by adopting a method that considers the temporal correlation of measurement data to identify and exclude bad data. This is followed by the use of a BILSTM neural network training method to address the issue of missing measurement data, thereby ensuring the observability of the distribution network. Secondly, meteorological data and the BILSTM method are used to predict the real-time output of PV power plants. Subsequently, a linear SE algorithm is employed to model the DT of the distribution network in a rapid and efficacious manner. Finally, the constructed DT model is validated through the real-time synchronous operation of the RTDS and server models. Following verification of the simulation, the method for identifying bad data presented in this paper has a low missing and false detection rate. Moreover, the BILSTM neural network exhibits a higher degree of prediction accuracy than other neural networks in the prediction of measurement data. The linear SE method with PV integration proposed in this paper ensures the redundancy of SE measurements, improves the accuracy and iteration speed of the SE, and provides a guarantee for the accuracy and real-time performance of the DT model. The DT modeling method proposed in this paper provides new approaches and insights for the construction of smart grids. Based on efficient and precise real-time monitoring and state awareness, it facilitates the efficient operation and intelligent management of smart grids, thereby driving further optimization and enhancement of grid systems. However, the digital twin model established in this paper cannot track changes in the distribution network topology in real time, and the state estimation results do not vary with changes in the topology. In the future, we will further investigate digital twin modeling that can track changes in the distribution network topology. Based on the established DT model, we aim to achieve functions such as state prediction, fault location, and coordinated control, and explore its applicability and performance optimization in different grid scenarios.