Intelligent Identification of the Line-Transformer Relationship in Distribution Networks Based on GAN Processing Unbalanced Data

Wang, Yan; Zhang, Xinyu; Liu, Haofeng; Li, Boqiang; Yu, Jinyun; Liu, Kaipei; Qin, Liang

doi:10.3390/su14148611

Open AccessArticle

Intelligent Identification of the Line-Transformer Relationship in Distribution Networks Based on GAN Processing Unbalanced Data

by

Yan Wang

¹,

Xinyu Zhang

^2,*,

Haofeng Liu

²,

Boqiang Li

²,

Jinyun Yu

²,

Kaipei Liu

² and

Liang Qin

²

¹

State Grid Corporation, Beijing 100053, China

²

School of Electrical Engineering, Wuhan University, Wuhan 430072, China

^*

Author to whom correspondence should be addressed.

Sustainability 2022, 14(14), 8611; https://doi.org/10.3390/su14148611

Submission received: 18 May 2022 / Revised: 5 July 2022 / Accepted: 9 July 2022 / Published: 14 July 2022

(This article belongs to the Special Issue Advances in Sustainable Electrical Engineering)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The wrong line-transformer relationship is one of the main reasons that leads to the failure of the line loss assessment of the distribution network with voltage levels of 10 kV and below. The traditional manual method to verify the line-transformer relationship is time-consuming, labor-intensive and inefficient. At the same time, due to the small sample size of the data with abnormal line-transformer relationship, the unbalanced sample data reduces the accuracy of the artificial intelligence algorithm. To this end, this paper proposes an intelligent identification method for distribution network line-transformer relationship based on Generative Adversarial Networks (GAN) processing unbalanced data. Firstly, perform data preprocessing and feature extraction based on the input power of the distribution line and the power consumption of each distribution transformer; then, build a GAN-based model for expanding the data of only a small number of abnormal line-transformer relationship samples, so as to solve the problem of unbalanced sample data distribution; and finally, establish a support vector machine (SVM) to realize the classification of the line-transformer relationship. The results of the example simulation show that, compared with the traditional Synthetic Minority Oversampling Technique (SMOTE) for processing unbalanced data, the classification effect of the proposed GAN-based data augmentation method has been significantly improved. In addition, the recall rate of the three types of the line-transformer relationship (line hanging error, magnification error and normal) under the line-transformer relationship identification method proposed in this paper is more than 92%, which proves the effectiveness and feasibility of the method.

Keywords:

line-transformer relationship; unbalanced data; feature extraction; generative adversarial networks

1. Introduction

Line loss rate is an important technical and economic indicator of power enterprises [1]. On the one hand, it is a comprehensive response to the management level of the enterprise, and on the other hand, it is also directly related to the economic interests of the enterprise. In order to reduce the power loss of the power grid and promote the energy conservation and emission reduction of the power grid and the development of green power, the State Grid Corporation of China has built a “Line Loss Management Platform for the Same Period” to further improve the level of lean and information management of line losses [2].

In the process of treatment of line losses in the same period, the line loss treatment level of lines with voltage levels of 35 kV and above is often higher, and the line loss assessment indicators also reach the standard. This is because the number of lines at this voltage level is small, the topology of the line is also very simple, there are few changes to them, and the historical account information of the line can be updated in time. However, for the distribution network with voltage level of 10 kV and below, due to the frequent action of the knife switch in the line, the untimely update of the line historical account information, and the untimely maintenance of the meters and equipment, the connection relationship between the distribution line and the distribution transformer (line-transformer relationship) is incorrect. This means that the compliance rate of line loss assessment indicators at this voltage level is very low. Additionally, it is also difficult to manage, which brings great challenges to line loss control in the same period. Therefore, sorting out the line-transformer relationship of 10 kV lines has become the primary task to improve the compliance rate of line losses in the same period of 10 kV branch lines [3]. The traditional carpet type on-site verification requires a lot of manpower and material resources, and the efficiency is still very low. Consequently, it is necessary to study a low-cost 10 kV line loss control method, so that it can quickly and effectively reveal the line-transformer relationship of 10 kV lines, accurately locate the line change relationship error, and improve the efficiency of line loss control during the same period [4].

During the operation of the distribution network, a large amount of operating data will be generated, such as the output voltage of the distribution transformer, the daily input power of the distribution line, etc. These operating data can not only directly reflect the operating status of the distribution network, but also indirectly reflect the line-transformer relationship of the distribution network. At present, the identification method of the line-transformer relationship is mainly based on the correlation analysis of the voltage curve [5]. Authors [6,7] determine the similarity of the user voltage curve according to the Pearson correlation coefficient, check the line to which the transformer belongs, and identify the transformer that does not belong to the line. Author [8] uses the correlation coefficient between user voltage sequence data as the correlation measure in the outlier algorithm to calculate the degree of outliers in the user ring domain, so as to verify the users with the wrong line-transformer relationship. Although these methods based on the correlation analysis of voltage curves can solve the problem of inaccurate line-transformer relationship to a certain extent, there are still some limitations. Firstly, the uncertainty of distributed generation leads to the change of the voltage of each node of the distribution network, which in turn affects the calculation result of the similarity of the voltage curve [9]; secondly, for three-phase power supply users, when the load cannot be completely balanced, the voltages of each phase at the outlet of the distribution transformer will be asymmetrical, and it is impossible to effectively discriminate the line-transformer relationship through the correlation of the single-phase voltage fluctuation curve. The third limitation is that it is impossible to identify the incorrect equipment parameters in the line-transformer relationship error.

Researchers have made some new attempts, and [10] proposes a topology recognition method of distribution network based on branch active power, but the premise of this method is that the power of each branch of the distribution network is completely observable. Author [11] proposes an automatic identification method of low-voltage distribution network topology based on HPLC, but this method is only limited to users who support carrier communication.

In summary, this paper is based on the electricity data in the distribution network, and uses the support vector machine model to mine the potential correlation between the electricity information and different line-transformer relationship, so as to realize the intelligent identification of line-transformer relationship.

At the same time, in the intelligent identification of line-transformer relationship, the number of distribution transformers with abnormal line-transformer relationship often accounts for only a very small number. Therefore, the collected original data set belongs to the imbalanced data set that is, the number of samples in each category of the data set varies greatly [12,13,14]. If the original data set is directly input into the classifier model training, the classification results will tend to be in the majority class. That is to say, the classification accuracy of the normal category of the line-transformer relationship is high, but the classification accuracy of the abnormal category of the line-transformer relationship with a small proportion is very low [15]. In engineering practice, this kind of misjudgment of abnormal line-to-line relationship will cause serious consequences. Therefore, when classifying the linear relationship, it is necessary to consider the limitations of traditional classifier algorithms on unbalanced data sets and adopt appropriate unbalanced data processing methods to improve the classification effect of traditional classification algorithms [16,17,18]. This article deals with imbalanced data from the perspective of data augmentation. By augmenting the data of the minority class samples, the number of each class in the sample set can be balanced. The literature [19] proposes Synthetic Minority Oversampling Technique (SMOTE) to deal with imbalanced data, but this method starts from the local part of the sample point and does not consider the overall distribution of the data set. It is no longer suitable for sample sets with the intersection or overlap of the class domain. Compared with the traditional generative model, the Generative Adversarial Networks (GAN) can generate synthetic data that is close to the real data without being based on that real data; this can expand the data diversity and avoid over-fitting. It is widely used in computer vision, medicine, natural language processing and other fields. Therefore, this paper proposes an abnormal data expansion method based on Generative Adversarial Networks, which expands the data of the abnormal line-transformer relationship samples that only account for a very small number of the whole feature set, so as to solve the classification problem under the imbalanced data set.

Combined with the above analysis, this paper proposes a new method for identifying the line-transformer relationship, which overcomes the limitation of the correlation analysis method based on the voltage curve and provides a new idea in the processing of the unbalanced data of the line-transformer relationship.

The remainder of the article is organized as follows: Section 2 details the methodology adopted to carry out the study; the experimental results obtained and their analysis are presented in Section 3; and the conclusions end the paper in Section 4.

2. Methodology

2.1. Technical Route

The intelligent identification method of the line-transformer relationship in distribution networks based on GAN processing unbalanced data is based on the power supply of distribution lines and the power consumption of distribution transformers. It extracts the feature information of different linear relationships from the big data level for classification processing. At the same time, based on GAN, a data generation model of abnormal line-transformer relationship samples is proposed, and is used to expand the data of abnormal line-transformer relationship samples that only account for a very small number of the whole feature sets. In this way, the balance of the number of samples among different line-transformer relationship categories is achieved, which improves the recognition accuracy of line-transformer relationship recognition and provides a new idea for the study of unbalanced data processing methods.

The specific implementation process is as follows: firstly, collect through the user acquisition system the daily input power of 10 kV distribution line and the daily power consumption of each distribution transformer connected to the distribution line; secondly, preprocess the data on them and the daily power loss of each 10 kV distribution line; thirdly, establish an original data set consisting of daily power loss of distribution lines, daily input power of distribution lines and daily power consumption of each transformer; fourthly, extract features from the electricity data set from four parameters (Pearson coefficient, relative variation coefficient, fluctuation coefficient and slope coefficient ratio); fifthly, build a GAN-based model for generating samples of abnormal line-transformer relationship, so as to achieve the balance of the number of samples between different line-transformer relationship categories; finally, establish a support vector machine classifier model to identify the classification of the line-transformer relationship for the distribution network.

2.2. Intelligent Identification of the Line-Transformer Relationship in Distribution Networks Based on GAN Processing Unbalanced Data

2.2.1. Feature Extraction

Feature extraction can improve the recognition effect of line-transformer relationship recognition. This paper extracts 12 characteristic quantities from the daily power loss of the distribution line, the daily input power of the distribution line, and the daily power consumption of each transformer. These feature quantities are made from the four parameters of the Pearson coefficient, the relative variation coefficient, the fluctuation coefficient and the slope coefficient ratio. The specific implementation process is as follows:

1. Pearson coefficient, which defines the degree of correlation between two types of power data [6] calculated as:

r = \frac{\sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2} \sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}}

(1)

where

r

is the Pearson correlation coefficient, and its value is between −1 and 1. The closer the value of

| r |

is to 1, the higher the degree of linear correlation is;

y_{i}

represents the daily input power or line loss power of the 10 kV line, and

\bar{y}

represents the average value of the input power or line loss power of the 10 kV line;

x_{i}

represents the daily power consumption of each transformer connected to the line,

\bar{x}

represents the average power consumption of each transformer connected to the line;

n

represents the number of days. For the convenience of discussion, we define the Pearson coefficient between the input power of the 10 kV line and the power consumption of each transformer as

r_{s}

; The Pearson coefficient between the power loss of the 10 kV line and the power consumption of each transformer is defined as

r_{l}

.

2. Relative variation coefficient, which it is the ratio of two discrete coefficients, and its calculation formula is:

c_{s} = \frac{σ_{s}}{μ_{s}}, c_{l} = \frac{σ_{l}}{μ_{l}}, c_{c} = \frac{σ_{c}}{μ_{c}}

(2)

C_{c s} = \frac{c_{c}}{c_{s}}, C_{c l} = \frac{c_{c}}{c_{l}}

(3)

where

c_{s}

represents the dispersion coefficient of the time series of the input power of the 10 kV line;

c_{l}

represents the dispersion coefficient of the time series of the power loss of the 10 kV line;

c_{c}

represents the dispersion coefficient of the time series of the power consumption of each transformer under the line;

σ

represents the standard deviation;

μ

represents the average value;

C_{c s}

represents the relative dispersion coefficient between the power consumption of each transformer and the input power of the 10 kV line;

C_{c l}

represents the relative dispersion coefficient between the power consumption of each transformer and the power loss of the 10 kV line.

3. Fluctuation coefficient, which is used to measure the relative fluctuation between two kinds of power data, and its calculation formula is:

d = \frac{\sum_{i = 2}^{n} \frac{p_{i} - p_{i - 1}}{q_{i} - q_{i - 1}}}{n - 1}, i = 2, 3, \dots, n

(4)

where

p

represents the daily power consumption of each transformer,

q

represents the daily input power or daily power loss of the 10 kV line;

n

represents the number of days. For the convenience of discussion, we define

d_{s}

as the time series of the input power of the 10 kV line and the fluctuation coefficient of the time series of the power consumption of each transformer connected to the line; we define

d_{l}

as the fluctuation coefficient between the time series of power loss of a 10 kV line and the time series of power consumption of each transformer connected to the line.

4. Slope coefficient ratio for each power data, to determine the start, end, peak and valley points (excluding start and end); next calculate the slope between two adjacent points where

x

represents the slope between the start point and the end point,

y

represents the slope between the end point and the peak point and

z

represents the slope between the peak point and the valley point. The slope coefficient ratio of the power consumption of each transformer to the line input power is calculated according to Formula (5), and the ratio of the slope coefficient of the power consumption of each transformer to the power loss of the line is calculated according to Formula (6):

X_{s} = \frac{x_{c}}{x_{s}}, Y_{s} = \frac{y_{c}}{y_{s}}, Z_{s} = \frac{z_{c}}{z_{s}}

(5)

X_{l} = \frac{x_{c}}{x_{l}}, Y_{l} = \frac{y_{c}}{y_{l}}, Z_{l} = \frac{z_{c}}{z_{l}}

(6)

where

X_{s}

,

Y_{s}

and

Z_{s}

represent the slope coefficient ratio of the power consumption of each transformer to the line input power, respectively;

X_{l}

,

Y_{l}

and

Z_{l}

represent the slope coefficient ratio of the power consumption of each transformer to the power loss of the line. Finally, the 12 characteristic quantities of each transformer are calculated by Formulas (1)–(6), which are

r_{s}

,

r_{l}

,

C_{c s}

,

C_{c l}

,

d_{s}

,

d_{l}

,

X_{s}

,

Y_{s}

,

Z_{s}

,

X_{l}

,

Y_{l}

and

Z_{l}

.

2.2.2. The GAN-Based Model for Generating Samples of Abnormal Line-Transformer Relationship

In the processing method of unbalanced data, the traditional Synthetic Minority Oversampling Technique (SMOTE) starts from the local part of the sample point and does not consider the overall distribution of the data set, so it is no longer applicable to the sample set with more overlapping or overlapping class domains [20]. GAN can generate data similar to the overall distribution of real samples and realize data expansion of complex samples.

In this paper, we propose a GAN-based method for generating samples of abnormal line-transformer relationship. The model structure of this method is shown in Figure 1, and is mainly composed of a generator model and a discriminator model. The feature data of each category of samples in the line-transformer relationship data set is one-dimensional tensor data; therefore, both the generator model and the discriminator model are designed as fully connected neural network structures. The generator uses machines to generate data to fool the discriminator. The discriminator is used to judge whether the generated data is real or machine-generated, and the purpose is to find out the “fake data” made by the generator. In the process of training GAN, the discriminator model needs to make the real samples judged as “true” as much as possible; that is, to maximize the objective function. While the generator model needs to reduce the probability that the generated samples are identified by the discriminator, that is to minimize the objective function. Therefore, the generator and the discriminator form a competitive relationship [21,22]. When training the generator and the discriminator, one is often fixed first to update the weights of the other network, and so on alternately until the generator and the discriminator reach a dynamic balance, that is, Nash equilibrium [23].

The objective function of this model is as follows:

\min_{G} \max_{D} V (D, G) = E_{x ~ P_{d a t a} (x)} [\log (D (x))] + E_{z ~ P_{z} (z)} [\log (1 - D (G (z)))]

(7)

where

D (x)

represents the probability that the discriminator considers the real sample, and

1 - D (G (z))

is the probability that the discriminator considers the generated sample to be false.

x ~ P_{d a t a} (x)

refers to being sampled from the true sample distribution

P_{d a t a} (x)

, and

z ~ P_{z} (z)

refers to

z

sampling from the noise distribution

P_{z} (z)

.

This is a max–min optimization problem [24]. The discriminator is trained so that the labels of the training samples are assigned with maximum probability; that is, to maximize

\log D (x)

and

\log (1 - D (G (z)))

. The generator is trained to minimize

\log (1 - D (G (z)))

. In the training process, one side is fixed, the parameters of the other network are updated, and the other network is iterated alternately. Therefore, the objective function can be decomposed into the following two optimization problems, as shown in Formulas (8) and (9). When optimizing the discriminator, the discriminant network is a two-class classifier model. Its goal is to determine as much as possible whether the input sample belongs to the real sample or the generated sample; that is, it is expected that the output of the real sample tends to be 1 and the output of the generated sample, that is

D (G (z))

, tends to be 0. When optimizing the generator, the training of the generator is to hope that

D (G (z))

tends to be 1, so that the loss of the generator will be minimized, thus reflecting the idea of confrontation.

\max_{D} V (D, G) = E_{x ~ P_{d a t a} (x)} [\log (D (x))] + E_{z ~ P_{z} (z)} [\log (1 - D (G (z)))]

(8)

\min_{G} V (D, G) = E_{z ~ P_{z} (z)} [\log (1 - D (G (z)))]

(9)

During the training process, the generator model and the discriminator model are alternately and iteratively trained separately. First fix the generator model and train the discriminator model, then fix the discriminator model and train the generator model. When training the discriminator model, the sample data generated by the previous generator model and the real sample data are spliced together as the input of the discriminator model, where the label of the fake sample is set to 0, and the label of the real sample is set to 1. Then through the discriminator model, a probability value is generated, which is a number between 0 and 1, and the loss function is formed by the difference between the probability value and the target value. The weights of the neural network in the discriminator are updated by stochastic gradient descent. When training the generator model, treat the generator model and the discriminator model as a whole. Input a set of random vectors, generate sample data in the generator model, and discriminate the generated sample data through the discriminator model to obtain the output result. The difference between this result and 1 is formed into a loss function, and then the stochastic gradient descent method is used to update the weights of the neural network in the generator. In this way, the generator and the discriminator iterate alternately, when

P_{g}

is equal to

P_{d a t a}

(where

P_{g}

represents the discrimination result of the generated samples, and

P_{d a t a}

represents the discrimination result of the real sample data with abnormal line-transformer relationship). That is, when the probability value of the output of the discriminator model for the generated samples and the real samples is 0.5, the objective function of the GAN model reaches the global optimal solution [25].

Use the trained generator model to perform data augmentation on the samples of abnormal line-transformer relationship that only account for a very small number of the whole feature sets. You will then obtain the generated samples that are similar to the overall distribution of the real line-transformer relationship sample data, so as to achieve a balanced distribution of the number of samples among different line-transformer relationship categories [26].

2.2.3. Support Vector Machine

Support Vector Machine (SVM) not only has a solid statistical theoretical foundation, but can also be applied well to high-dimensional data to avoid the problem of dimensional disaster. This has become a much-discussed machine learning classification technique. Its main idea is to find an optimal classification hyperplane in the sample space, so that it can correctly separate as many two types of data points as possible, and at the same time make the separated two types of data points farthest from the classification surface [27].

Given a training set sample

T = {(x_{1}, y_{1}), \dots, (x_{l}, y_{l})} \in {(R^{n} \times Y)}^{l}

, where

x_{i} \in R^{n}

in each sample point

(x_{i}, y_{i})

is a vector containing n-dimensional attributes, and

y_{i} \in Y = {+ 1, - 1}

is the corresponding class label. The support vector machine tries to find a real function

g (x) = (ω^{T} \cdot x + b)

on the

R^{n}

space that minimizes the classification boundary, in order to use the decision function

f (x) = sgn (g (x))

to infer the classification class

y

corresponding to any input

x

. For linear classification problems, solving the optimal classification hyperplane can be expressed as solving the following quadratic program:

\begin{array}{l} \min \frac{1}{2} {‖ ω ‖}^{2} + C \sum_{i = 1}^{l} ξ_{i} \\ s . t . y_{i} (ω^{T} \cdot x_{i} + b) \geq 1 - ξ_{i}, i = 1, 2, \dots, l \\ ξ_{i} \geq 0, i = 1, 2, \dots, l \end{array}

(10)

where

ω

is the normal vector of the hyperplane,

b

is the bias of the hyperplane,

ξ_{i} \geq 0

is the slack variable that allows data point

x_{i}

to deviate, and

C > 0

is the penalty factor.

To facilitate the solution, construct the Lagrangian function:

L (ω, b, ξ, α, β) = \frac{1}{2} {‖ ω ‖}^{2} + C \sum_{i = 1}^{l} ξ_{i} - \sum_{i = 1}^{l} α_{i} (y_{i} (ω^{T} \cdot x + b) - 1 + ξ_{i}) - \sum_{i = 1}^{l} β_{i} ξ_{i}

(11)

Minimize

L

with respect to

x

and bring the result back to (11), then the dual problem of the original problem (10) can be obtained:

\begin{array}{l} \max \sum_{i = 1}^{l} α_{i} - \frac{1}{2} \sum_{i = 1}^{l} \sum_{i = 1}^{l} α_{i} α_{j} y_{i} y_{j} x_{i} x_{j} \\ s . t . \sum_{i = 1}^{l} α_{i} y_{i} = 0, i = 1, 2, \dots, l \\ 0 \leq α_{i} \leq C, i = 1, 2, \dots, l \end{array}

(12)

Solve the dual problem (12) to get

α_{i}

, and then derive

ω

and

b

.

For nonlinear classification problems, it is necessary to map the nonlinear data set to a high-dimensional linear space through the kernel function transformation, so that the samples are linearly separable in the mapped feature space. Then the optimal classification surface can be obtained [28,29,30].

3. Experimental Results and Analysis

3.1. Data Description

Taking the line-transformer relationship management of eighteen 10 kV distribution lines in a certain area and their connected 449 distribution transformer users (including special transformers and public transformers) as an example, collect the daily input power data of the distribution line for one month and the daily power consumption data of the transformers connected to the corresponding distribution line from the user’s power collection system. Since these 18 distribution lines are demonstration lines for line loss control of power grid companies, the category of the line-transformer relationship of each transformer is obtained at the same time. Combined with the actual situation, the error of the line-transformer relationship is mainly manifested in two aspects: the user is connected to the wrong distribution line and the power meter magnification recorded by the system does not match the scene, which is referred to as line hanging error and magnification error. Table 1 shows the data distribution of each category.

In this paper, the data in this area will be used to verify the actual effect of the line-transformer relationship identification method proposed in this paper. The main steps of the method are shown in Figure 2.

3.2. Data Preprocessing

During the process of data collection, we found that there are missing or abnormal data in the collected power consumption of the transformers. In order to improve the accuracy of classification, it was necessary to preprocess the collected power data, and the specific processing method was as follows. If there is missing data on electricity consumption, when the amount of missing data on electricity consumption is greater than 20% of the collection amount, the daily electricity consumption sequence of the transformer user collected will be excluded from the sample. For users whose missing amount is less than 20% of the collection amount, the missing data of electricity consumption is replaced by the average value of the adjacent places. If there is abnormal electricity consumption data, when the abnormal value data of electricity consumption data is higher than the collection amount by 20%, the daily electricity consumption sequence of the transformer user collected will be excluded from the sample. For users whose abnormal amount is less than 10% of the collection amount, the abnormal value data of electricity consumption is replaced by the average value of the adjacent place. If the abnormal amount of electricity consumption data is higher than 10% of the collection amount but lower than 20% of the collection amount, the abnormal value data is corrected by the smoothing correction method.

Using the input power of the pre-processed distribution line and the power consumption of each distribution transformer, the daily power loss of the distribution line was calculated. The original database of line-to-line-to-change relationship identification was established based on the input power of distribution lines, the power consumption of each distribution transformer and the power loss of distribution lines. Taking one of the A lines as an example, the power data is shown in Table 2.

3.3. Feature Extraction

Feature extraction can improve the performance of the classifier. In this paper, we extracted 12 characteristic quantities from the daily power loss of the distribution line, the daily input power of the distribution line and the daily power consumption of each transformer according to the four parameters of Pearson coefficient, relative variation coefficient, fluctuation coefficient, and slope coefficient ratio, which are

r_{s}

,

r_{l}

,

C_{c s}

,

C_{c l}

,

d_{s}

,

d_{l}

,

X_{s}

,

Y_{s}

,

Z_{s}

,

X_{l}

,

Y_{l}

and

Z_{l}

. Finally, a feature set of 449 × 28 can be obtained, which is shown in Table 3.

3.4. Generating Samples of Abnormal Line-Transformer Relationship Based on GAN

It can be seen from Table 1 that in the original 449 transformer sample data, the numbers of the three categories of line hanging error, magnification error and normal are 12, 42, and 395, respectively. The distribution of the numbers of the three categories is extremely unbalanced. If it is directly invested in the training of the classifier model, the classification results will tend to be in the majority class; that is, the classification accuracy of the normal category is high, but the classification accuracy of the line hanging error and multiplication error is very low. The GAN-based model for generating samples is used to expand the data of the two minority categories of line hanging error and power meter multiplier error, so that the number of samples among each category is balanced. It will solve the problem of inaccurate classification of imbalanced datasets on traditional classifier models [31,32,33].

According to Figure 1, we build the GAN-based model for generating samples of abnormal line-transformer relationship in the software MATLAB R2020a. Because the characteristic data of each category of samples in the line-transformer relationship dataset is one-dimensional tensor data, the generator model and the discriminator model are both designed as fully connected neural networks. The input layer, hidden layer and output layer of the fully connected neural network in the generator model are all designed as one layer, and the number of nodes in each layer is designed to be 10, 50, and 12, respectively. The pureline function is selected as the activation function of the input layer, and the sigmoid function is selected as the activation function of the hidden layer and the output layer. The discriminator model is designed as a binary classifier composed of a 3-layer fully connected neural network, and the number of nodes in the input layer, hidden layer and output layer of the network are designed to be 12, 50, and 1, respectively. The Pureline function is selected as the activation function of the input layer, and the Sigmoid function is selected as the activation function of the hidden layer and the output layer. The learning rate is set to 0.01, and the total number of iterations is set to 5000.

Use the trained generator model to perform data augmentation on the sample data with abnormal line-transformer relationship, and you will get a batch of generated samples that are similar to the overall distribution of the real data. Finally, the problem of unbalanced sample data is solved [34].

Taking the data augmentation of 42 samples in the category of magnification error as an example, Figure 2 shows the variation of the mean square error (G-mse and D-mse) of the generator and discriminator.

Figure 3 shows the dynamic game process between the generator and the discriminator. As the iteration progresses, the mean squared errors of the generator and the discriminator continue to move closer together, and both sides of the game eventually reach the same “strength”, thus reaching the Nash equilibrium.

Figure 4 shows that the discriminant results of the final generated samples are all around 0.5. Like the final result derived from the GAN principle, it can be seen that the objective function of the GAN model reaches the global optimal solution.

Figure 5 shows the comparison between the real data and the generated data. It can be seen from the figure that the generated data of the generative adversarial network is closer to the overall distribution of the real data.

Based on the above analysis, we can use the generated 42 sample data as real samples for training. In the same way, we expand the two categories of line hanging error and magnification error from the initial 12 and 42 to 12 × 33 = 396 and 42 × 9 = 378, respectively. Thus, the problem of an unbalanced number of samples among various categories is solved. Finally, a 1169 × 28 sample feature set after data expansion will be obtained.

3.5. Build the Classifier Model

In traditional classification problems, we mainly evaluate the classification effect by accuracy and error rate. However, due to the characteristics of imbalanced datasets, it is not comprehensive to use accuracy and error rates to evaluate. For binary classification problems with imbalanced data, we often use evaluation metrics such as precision, recall, and G-mean to evaluate the performance of the overall model [35]. Among them, precision represents the proportion of correctly classified samples in the minority class and recall represents the proportion of correctly classified samples in all minority classes. The G-mean metric is proposed to comprehensively consider the majority class classification accuracy and minority class classification accuracy. This value will only increase if the classification of both classes is good, so G-mean is a measure of the classification effect of the overall dataset. The specific calculation is as follows:

p r e c i s i o n = \frac{TP}{TP + FP}

(13)

r e c a l l = \frac{T P}{T P + F N}

(14)

G - m e a n = \sqrt{\frac{T P}{T P + F N} \times \frac{T N}{T N + F P}}

(15)

where TP is the number of correctly classified minority classes, FP is the number of majority classes that are wrongly classified as minority class, FN is the number of minority classes that are wrongly classified as majority class, and TN is the number of correctly classified majority classes.

For multi-classification problems, multiclass G-mean (mGM) is proposed in the literature [36] to apply G-mean to multi-class evaluation, where it represents the geometric mean of recalls for all classes. This measure will become 0 if there is a class with zero recall. This metric can evaluate the classification performance of the classifier between different classes. The calculation formula is as follows, where M is the number of categories:

m G M = {(\prod_{i = 1}^{M} r e c a l l_{i})}^{\frac{1}{M}}

(16)

Use the LIBSVM toolbox that comes with the software Matlab R2020a to build a support vector machine model. Then, the 1169 sample sets after data expansion are put into the support vector machine classifier model to realize the classification of the line-transformer relationship. The kernel function of the support vector machine selects the polynomial kernel function and uses the grid optimization method to tune the parameters of the SVM. That is, the penalty parameter and kernel function parameter of SVM are discretely valued within a certain range, and the optimal parameter is the one with the highest classification accuracy of the test set. The training samples and test samples of the support vector machine are selected using a 10-fold cross-validation strategy. The total sample set is randomly divided into 10 equal parts; 9 of these parts are selected as the training set each time, and the remaining 1 part is selected as the test set, until the samples of each equal part are used as the test set. Record the corresponding correct rate obtained for each test, and then calculate the average value. The classification results are shown in Table 4.

The comparison of unbalanced data processing methods is shown in Figure 6, where it can be seen that the proposed GAN-based data augmentation method outperforms the traditional synthetic minority oversampling technique (SMOTE). Under the mGM index, the GAN-based data augmentation method is nearly 13.5% higher than the SMOTE, which provides a useful idea for the research on the processing method of unbalanced data. The intelligent identification method of the line-transformer relationship based on the GAN processing unbalanced data proposed in this paper has a recall rate of more than 92% for the three types of line-change relationship (line hanging error, multiplying error and normal), which proves the effectiveness and feasibility of this method.

4. Conclusions

Aiming at the limitation of the traditional identification method based on voltage distribution correlation analysis on the existence of distributed generation or three-phase users, this paper proposes a method for identification of line-transformer relationship based on electricity consumption data. The experimental verification results show that the recall rate of the proposed method for the three types of line-to-line relationship (line hanging error, magnification error and normal) is more than 92%. This method can greatly reduce the workload of on-site investigation by employees, provide a new idea for the optimization and management of the line-transformer relationship of the 10kV distribution network, and greatly save manpower and time costs.

At the same time, in the processing method of unbalanced data, this paper proposes a GAN-based data generation model for abnormal line-transformer relationship. It is used to expand the data of abnormal line-transformer relationships that only account for a very small number of the whole feature sets. In this way, the balance of the number of samples among different line-transformer relationship categories is achieved. The experimental results show that the processing method of unbalanced data based on GAN is better than the traditional SMOTE method, which provides a useful idea for the research on the processing method of unbalanced data.

In the future research work, on the one hand, because this paper proposes a new method for identifying the line-to-line variation relationship, it can be tried to extend it to the anti-theft management of the distribution network. On the other hand, in the research of imbalanced data processing method, this paper proposes an imbalanced data processing method based on generative adversarial network from the perspective of data expansion. Subsequent research can try to start from the algorithm level or the integrated learning level. For example, the classifier model algorithm used in this paper can be improved so that it can be used to deal with imbalanced data sets.

Author Contributions

Methodology, Y.W., X.Z., H.L., B.L. and J.Y.; writing—review & editing, K.L. and L.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the science and technology project of State Grid Jibei Electric Power Company Limited. (B70101220005).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

We declare that there is no conflict of interest.

References

Gao, C. Research on the Application of Monitoring Technology Based on the Influencing Factors of Line Loss in the Power Consumption Area in the Power Consumption Information Collection System. In Proceedings of the 2022 IEEE International Conference on Electrical Engineering, Big Data and Algorithms (EEBDA), Changchun, China, 25–27 February 2022; pp. 132–136. [Google Scholar] [CrossRef]
Chen, B.; Xiang, K.; Yang, L.; Su, Q.; Huang, D.; Huang, T. Theoretical Line Loss Calculation of Distribution Network Based on the Integrated Electricity and Line Loss Management System. In Proceedings of the 2018 China International Conference on Electricity Distribution (CICED), Tianjin, China, 17–19 September 2018; pp. 2531–2535. [Google Scholar] [CrossRef]
Li, S.; Gao, S.; Wu, J.; Xie, D.; Xi, G.; Zhao, Y.; Zuo, Z.; Huang, H.; Qi, L. Research on Topology Identification of Distribution Network Under the Background of Big Data. In Proceedings of the 2020 IEEE 4th Conference on Energy Internet and Energy System Integration (EI2), Wuhan, China, 30 October–1 November 2020; pp. 4294–4297. [Google Scholar] [CrossRef]
Lai, X.; Cao, M.; Liu, S.; Sun, C. Low-voltage distribution network topology identification method based on characteristic current. In Proceedings of the 2021 6th Asia Conference on Power and Electrical Engineering (ACPEE), Chongqing, China, 8–11 April 2021; pp. 1233–1238. [Google Scholar] [CrossRef]
Zhao, G.; Chu, J.; Deng, L.; Pan, K. Research on Line-transformer-user Topological Anomaly Recognition Model Based on Multi-source Data Mining. In Proceedings of the 2020 5th Asia Conference on Power and Electrical Engineering (ACPEE), Chengdu, China, 4–7 June 2020; pp. 192–196. [Google Scholar] [CrossRef]
Gao, Q.; Han, B.; Huang, X.; Zhang, P.; Liu, J.; Ge, L. Verification method of topological relationship of low voltage distribution equipment based on KNN and Pearson correlation coefficient. In Proceedings of the 2021 International Conference on Power System Technology (POWERCON), Haikou, China, 8–9 December 2021; pp. 127–132. [Google Scholar] [CrossRef]
Bing, L.; Lou, B.; Li, C.; Deng, J.; Zhu, L.; Yang, C.; Chen, W. Low-voltage distribution network topology verification method based on Revised Pearson correlation coefficient. J. Phys. Conf. Ser. 2020, 1633, 012084. [Google Scholar]
Li, J.; Wu, D.; Jin, W.; Chu, Z.; Liu, S.; Ma, J.; Lin, Z.; Yang, L. Identification of distribution network topology parameters based on multidimensional operation data. Energy Rep. 2021, 7 (Suppl. 1), 304–311. [Google Scholar] [CrossRef]
Ganguly, S.; Samajpati, D. Distributed Generation Allocation on Radial Distribution Networks Under Uncertainties of Load and Generation Using Genetic Algorithm. IEEE Trans. Sustain. Energy 2015, 6, 688–697. [Google Scholar] [CrossRef]
Liu, B.; Wang, D.; Li, Y.; Qiao, L.; Chen, S. Topology identification method of distribution network based on branch active power. J. Phys. Conf. Ser. 2021, 2108, 012062. [Google Scholar] [CrossRef]
Dong, Y.; Li, X.; Zhang, L.; Yang, J. Automatic Identification of Low Voltage Distribution Network Topology Based on HPLC. J. Phys. Conf. Ser. 2021, 1881, 022023. [Google Scholar] [CrossRef]
Wang, L.; Han, M.; Li, X.; Zhang, N.; Cheng, H. Review of Classification Methods on Unbalanced Data Sets. IEEE Access 2021, 9, 64606–64628. [Google Scholar] [CrossRef]
Yi, H.; Jiang, Q.; Yan, X.; Wang, B. Imbalanced Classification Based on Minority Clustering Synthetic Minority Oversampling Technique with Wind Turbine Fault Detection Application. IEEE Trans. Ind. Inform. 2021, 17, 5867–5875. [Google Scholar] [CrossRef]
Arumugam, G. Handling Class Imbalance in Multiclass Datasets by using a Neighborhood based Adaptive Heterogeneous Oversampling Ensemble Classifier. In Proceedings of the 2022 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS), Erode, India, 7–9 April 2022; pp. 1498–1501. [Google Scholar] [CrossRef]
Janet, B.; Joshua, A.K.R.; Didugu, P.S.G. Credit Card Fraud Detection with Unbalanced Real and Synthetic dataset using Machine Learning models. In Proceedings of the 2022 International Conference on Electronic Systems and Intelligent Computing (ICESIC), Chennai, India, 22–23 April 2022; pp. 73–78. [Google Scholar] [CrossRef]
Xiong, H. Unbalanced Data Set Classification Based on Convolutional Neural Network. In Proceedings of the 2021 International Conference on Computer Network, Electronic and Automation (ICCNEA), Xi’an, China, 24–26 September 2021; pp. 186–190. [Google Scholar] [CrossRef]
Rathore, S.S.; Chouhan, S.S.; Jain, D.K.; Vachhani, A.G. Generative Oversampling Methods for Handling Imbalanced Data in Software Fault Prediction. IEEE Trans. Reliab. 2022, 71, 747–762. [Google Scholar] [CrossRef]
Rosadi, D.; Arisanty, D.; Andriyani, W.; Peiris, S.; Agustina, D.; Dowe, D.; Fang, Z. Improving Machine Learning Prediction of Peatlands Fire Occurrence for Unbalanced Data Using SMOTE Approach. In Proceedings of the 2021 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA), Medan, Indonesia, 11–12 November 2021; pp. 160–163. [Google Scholar] [CrossRef]
Ileberi, E.; Sun, Y.; Wang, Z. Performance Evaluation of Machine Learning Methods for Credit Card Fraud Detection Using SMOTE and AdaBoost. IEEE Access 2021, 9, 165286–165294. [Google Scholar] [CrossRef]
Dharmasaputro, A.A.; Fauzan, N.M.; Kallista, M.; Wibawa, I.P.D.; Kusuma, P.D. Handling Missing and Imbalanced Data to Improve Generalization Performance of Machine Learning Classifier. In Proceedings of the 2021 International Seminar on Machine Learning, Optimization, and Data Science (ISMODE), Jakarta, Indonesia, 29–30 January 2022; pp. 140–145. [Google Scholar] [CrossRef]
Lu, Y.-W.; Liu, K.-L.; Hsu, C.-Y. Conditional Generative Adversarial Network for Defect Classification with Class Imbalance. In Proceedings of the 2019 IEEE International Conference on Smart Manufacturing, Industrial & Logistics Engineering (SMILE), Hangzhou, China, 20–21 April 2019; pp. 146–149. [Google Scholar] [CrossRef]
Alnujaim, I.; Oh, D.; Kim, Y. Generative Adversarial Networks to Augment Micro-Doppler Signatures for the Classification of Human Activity. In Proceedings of the IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 9459–9461. [Google Scholar] [CrossRef]
Liu, Z.; Tong, M.; Liu, X.; Du, Z.; Chen, W. Research on Extended Image Data Set Based on Deep Convolution Generative Adversarial Network. In Proceedings of the 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chongqing, China, 12–14 June 2020; pp. 47–50. [Google Scholar] [CrossRef]
Ayanoglu, E.; Davaslioglu, K.; Sagduyu, Y.E. Machine Learning in NextG Networks via Generative Adversarial Networks. IEEE Trans. Cogn. Commun. Netw. 2022, 8, 480–501. [Google Scholar] [CrossRef]
Jiang, T.; Xie, W.; Li, Y.; Du, Q. Discriminative Semi-Supervised Generative Adversarial Network for Hyperspectral Anomaly Detection. In Proceedings of the IGARSS 2020–2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020; pp. 2420–2423. [Google Scholar] [CrossRef]
Bhagwani, H.; Agarwal, S.; Kodipalli, A.; Martis, R.J. Targeting class imbalance problem using GAN. In Proceedings of the 2021 5th International Conference on Electrical, Electronics, Communication, Computer Technologies and Optimization Techniques (ICEECCOT), Mysuru, India, 10–11 December 2021; pp. 318–322. [Google Scholar] [CrossRef]
Kalita, D.J.; Singh, S. SVM Hyper-parameters optimization using quantized multi-PSO in dynamic environment. Soft Comput. 2020, 24, 1225–1241. [Google Scholar] [CrossRef]
Willsch, D.; Willsch, M.; De Raedt, H.; Michielsen, K. Support vector machines on the D-Wave quantum annealer. Comput. Phys. Commun. 2019, 248, 107006. [Google Scholar] [CrossRef]
Altayef, E.; Anayi, F.; Packianather, M.; Benmahamed, Y.; Kherif, O. Detection and Classification of Lamination Faults in a 15 kVA Three-Phase Transformer Core Using SVM, KNN and DT Algorithms. IEEE Access 2022, 10, 50925–50932. [Google Scholar] [CrossRef]
Ali, O.M.A.; Kareem, S.W.; Mohammed, A.S. Evaluation of Electrocardiogram Signals Classification Using CNN, SVM, and LSTM Algorithm: A review. In Proceedings of the 2022 8th International Engineering Conference on Sustainable Technology and Development (IEC), Erbil, Iraq, 23–24 February 2022; pp. 185–191. [Google Scholar] [CrossRef]
Lee, C.Y.; Yang, M.R.; Chang, L.Y.; Lee, Z.J. A hybrid algorithm applied to classify unbalanced data. In Proceedings of the 6th International Conference on Networked Computing and Advanced Information Management, Seoul, Korea, 16–18 August 2010; pp. 618–621. [Google Scholar]
Mingyue, F.; Zao, F.; Xiaodong, W.; Jun, M. A Pipeline Blockage Identification Model Learning from Unbalanced Datasets Based on Random Forest. In Proceedings of the 2021 33rd Chinese Control and Decision Conference (CCDC), Kunming, China, 22–24 May 2021; pp. 696–701. [Google Scholar] [CrossRef]
Pereira, J.; Saraiva, F. A Comparative Analysis of Unbalanced Data Handling Techniques for Machine Learning Algorithms to Electricity Theft Detection. In Proceedings of the 2020 IEEE Congress on Evolutionary Computation (CEC), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar] [CrossRef]
Lv, Y.; Lin, L.; Liu, J.; Guo, H.; Tong, C. Research on Imbalanced Data Classification Based on Classroom-Like Generative Adversarial Networks. Neural Comput. 2022, 34, 1045–1073. [Google Scholar] [CrossRef] [PubMed]
Branco, P.; Torgo, L.; Ribeiro, R.P. Relevance-Based Evaluation Metrics for Multi-Class Imbalanced Domains; Springer International Publishing: Cham, Switzerland, 2017; pp. 698–710. [Google Scholar]
Ziherl, P.; Kamien, R.D. Maximizing Entropy by Minimizing Area: Towards a New Principle of Self-Organization. J. Phys. Chem. B 2001, 105, 10147–10158. [Google Scholar] [CrossRef] [Green Version]

Figure 1. The GAN-based data generation model.

Figure 2. The main steps of the line-transformer relationship identification method.

Figure 3. The variation of the mse of the generator and discriminator.

Figure 4. The discriminant results of the generated samples.

Figure 5. The comparison between the real data and the generated data.

Figure 6. The comparison of imbalanced data processing methods.

Table 1. The data distribution of each category.

Category	Number	Number
Line hanging error	12	1
Magnification error	42	2
Normal	395	3
Total	449	/

Table 2. The power data and line-transformer relationship of line A.

Data Type		Daily Electricity Data. Unit: kwh					Category
Data Type		1	2	3	…	30	Category
Input power of line A		68,800	69,200	62,800	…	42,800	/
Power consumption of each transformer	transformer 1	273	263	282	…	279	Normal
	transformer 2	77	77	76	…	75	Line hanging error
	…	…	…	…	…	…	…
	transformer 22	320	316	400	…	524	Magnification error
The power loss of line A		65,460	65,485	59,148	…	38,542	/

Table 3. The feature set.

Feature	Transformer 1	Transformer 2	Transformer 3	…	Transformer 449
$r_{s}$	0.24414	−0.17513	0.27490	…	0.48841
$r_{l}$	0.24992	−0.18690	0.27370	…	0.48176
$C_{c s}$	0.12639	7.30734	0.69886	…	0.41743
$C_{c l}$	0.12216	6.86262	0.65633	…	0.39203
$d_{s}$	−0.00175	0.00026	−0.00016	…	0.00094
$d_{l}$	−0.02953	0.00087	0.00164	…	0.01527
$X_{s}$	−0.00023	−0.00002	0.00002	…	0.00028
$Y_{s}$	0.00019	0.00692	0.00058	…	0.00064
$Z_{s}$	0.00064	−0.00015	0.00088	…	0.00114
$X_{l}$	−0.00022	−0.00001	0.00002	…	0.00027
$Y_{l}$	0.00018	0.00676	0.00057	…	0.00062
$Z_{l}$	0.00064	−0.00015	0.00088	…	0.00115

Table 4. The classification results of the line-transformer relationship.

	Line Hanging Error	Magnification Error	Normal	mGM
SMOTE	86.05%	83.87%	77.46%	82.38%
GAN	97.32%	92.27%	98.10%	95.86%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Zhang, X.; Liu, H.; Li, B.; Yu, J.; Liu, K.; Qin, L. Intelligent Identification of the Line-Transformer Relationship in Distribution Networks Based on GAN Processing Unbalanced Data. Sustainability 2022, 14, 8611. https://doi.org/10.3390/su14148611

AMA Style

Wang Y, Zhang X, Liu H, Li B, Yu J, Liu K, Qin L. Intelligent Identification of the Line-Transformer Relationship in Distribution Networks Based on GAN Processing Unbalanced Data. Sustainability. 2022; 14(14):8611. https://doi.org/10.3390/su14148611

Chicago/Turabian Style

Wang, Yan, Xinyu Zhang, Haofeng Liu, Boqiang Li, Jinyun Yu, Kaipei Liu, and Liang Qin. 2022. "Intelligent Identification of the Line-Transformer Relationship in Distribution Networks Based on GAN Processing Unbalanced Data" Sustainability 14, no. 14: 8611. https://doi.org/10.3390/su14148611

APA Style

Wang, Y., Zhang, X., Liu, H., Li, B., Yu, J., Liu, K., & Qin, L. (2022). Intelligent Identification of the Line-Transformer Relationship in Distribution Networks Based on GAN Processing Unbalanced Data. Sustainability, 14(14), 8611. https://doi.org/10.3390/su14148611

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Intelligent Identification of the Line-Transformer Relationship in Distribution Networks Based on GAN Processing Unbalanced Data

Abstract

1. Introduction

2. Methodology

2.1. Technical Route

2.2. Intelligent Identification of the Line-Transformer Relationship in Distribution Networks Based on GAN Processing Unbalanced Data

2.2.1. Feature Extraction

2.2.2. The GAN-Based Model for Generating Samples of Abnormal Line-Transformer Relationship

2.2.3. Support Vector Machine

3. Experimental Results and Analysis

3.1. Data Description

3.2. Data Preprocessing

3.3. Feature Extraction

3.4. Generating Samples of Abnormal Line-Transformer Relationship Based on GAN

3.5. Build the Classifier Model

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI