Evaluation Model of Operation State Based on Deep Learning for Smart Meter

: The operation state detection of numerous smart meters is a signiﬁcant problem caused by manual on-site testing. This paper addresses the problem of improving the malfunction detection efﬁciency of smart meters using deep learning and proposes a novel evaluation model of operation state for smart meter. This evaluation model adopts recurrent neural networks (RNN) to predict power consumption. According to the prediction residual between predicted power consumption and the observed power consumption, the malfunctioning smart meter is detected. The training efﬁciency for the prediction model is improved by using transfer learning (TL). This evaluation uses an accumulator algorithm and threshold setting with ﬂexibility for abnormal detection. In the simulation experiment, the detection principle is demonstrated to improve efﬁcient replacement and extend the average using time of smart meters. The effectiveness of the evaluation model was veriﬁed on the actual station dataset. It has accurately detected the operation state of smart meters.


Introduction
Smart grids integrate the electric network infrastructure and cyber systems [1]; it can simply integrate the information of all units connected to them including generators, distributed energy system, and consumers. Obviously, smart grids ensure the power system operate in equilibrium state and with high levels of supply quality. The smart grid uses innovative communication, acquisition, and storage technologies to form a new type of power system network. With the development of advanced power technologies, the interconnectivity of power systems becomes more complex, and the cyber-physical part of the energy grid becomes increasingly important. The development of advanced metering infrastructure (AMI), which is a broad platform with abundant measurement, communication, and storage resources, enables energy companies to access a large amount of energy information [2]. It stimulates discussions about implementing in-operation state monitoring methods, a feature that is needed for any distributed instrument in the smart grid. Several methods and algorithms have been proposed to analyze meter data to discover internal information. Energy data analysis includes consumer profiling, load forecasting, electricity prices, identification of violations, metering, and real-time operations. Smart meters are important and popular electrical metering equipment in smart grid construction [3], and the metering accuracy of the grid is directly related to energy security and trade settlement. Unlike traditional meters, smart meters have bidirectional communication with multiple metering modules, data transmission, security modules, in addition to energy metering functions. Smart meters are widely distributed all over the world. Therefore, anomaly detection of smart meters would be meaningful work.
With the complexity of the low-voltage station area and increase in amount of measurement equipment, improving the anomaly detection capability of smart meters has become a focus of attention for grid companies [4]. When the operation error of smart meter exceeds the maximum allowable values, unfair trade occurs in the energy market. Smart meters with negative errors cause losses to energy companies due to insufficient energy provided, while meters with positive errors cause losses to consumers due to excessive energy consumed. To keep the smart meters in a good operation state, smart meters are replaced after 8 years of installation, which leads to additional waste of resources. If inaccurate smart meters could be removed on a targeted basis as needed, it would save a lot of money and human resources. Therefore, it is necessary to adopt some rapid, intelligent, and accurate methods to identify abnormal instruments.
Numerous analytic methods in a meter operation error were proposed for detecting malfunctioning smart meter. The operation state of the smart meter can be judged visually by the operation error and on-site or laboratory checking is mainstream operational error measurement method. Professionals with instruments and equipment were hired by energy companies to check the smart meter operation error on-site. Furthermore, some meters were removed from the installation spot and taken to the laboratory for testing. These two methods have high detection accuracy but low detection efficiency. Processing the readings of smart meter is a novel way to estimate meter operation error for the large-scale station. In [5], the error contributions of each factor and their combined error have been obtained by the Monte Carlo calculation. Z. Zhang [6] proposed an estimation method for the smart meter's operation error based on parameter degradation. The model had considered the load temperature and humidity. In [5,6], they studied the impact of load on the smart meter, but it is not the state of smart meter. In [7], it proposed an error estimation model of watt hour meter based on tree topology which combines K-means and Tikhonov regularization to solve the ill-posed equation. In [8], the authors proposed a recursive algorithm in order to estimate meter error. This method was influenced by the large energy losses. This method has less accuracy under large energy losses system. Kong et al. [9] proposed a recursive algorithm which based on limited memory, and the influence of light load is considered. This method can accurately calculate the recent operation error of the meter. In [7][8][9], their research requires a complete topology of the power network, so the generalization of model is easily limited. Kim et al. [10] proposed the intermediate monitor meter (IMM) and a non-technical loss (NTL) detection algorithm to detect (NTL) of meter malfunctioning. The algorithm can also detect energy losses. Not all data in any time period can meet the independence and orthogonal requirements, and the method lacks real-time performance. In [11], the adaptive varied weight method is introduced to establish the smart meter condition assessment model which can solve the problem of constant index weights and dynamically reflect the impact on the smart meter.
Data analytic method is another way to detect malfunctioning smart meter. Anomaly detection is active and in-depth study [12]. Power consumption is time series, and therefore contains the customer's power consumption behavior and changes according to the time pattern. A time series is formed by uniform time interval sequence of data points [13]. In recent years, machine learning-based methods have been widely used to analysis time series. Classification and prediction are two forms of data analysis that extract important features and make expectations. In [14], the authors proposed an approach for reducing training process. TL was joined into the establishment of neural network-based models, which takes advantage of already trained models. This method reduces the training time of the model in the face of a large number of target tasks. In [15], the authors proposed a hybrid model which combines several machine learning approaches. However, the abnormal dataset needs to be adjusted manually before detecting anomalies. In [16], it proposed a novel deep learning method based on long short-term memory (LSTM) and a modified convolutional neural network (CNN), which aims to extract spatial and temporal characteristics of the power consumption; it detects abnormal stations based on a prediction residual. Z. Zheng [17] introduced a novel electricity theft detection method based on Wide & Deep CNN model to detect electricity theft, the features of 1-D and 2-D power consumption data were extracted by the model. Tsatsral Amarbayasgalan [18] proposed a deep learning-based unsupervised anomaly detection approach for time-series data. The judgment of abnormal time-series data is defined by whether the reconstruction error exceeds the threshold. In [19], the authors proposed a prediction method, which based on Sequence-to-Sequence (S2S) RNNs with attention. This method improves the prediction accuracy of time-series with long time span. In [20], the authors proposed an improved residual network which adopted a convex k-parameter strategy which was improved by different processing objects. Berhane Araya [12] recommended an ensemble anomaly detection (EAD) framework, which gets anomaly detection results by several anomaly detection classifiers. Kim and Cho [21] proposed a method combined with LSTM, CNN, and DNN to extract complex features. These studies motivated us to apply deep learning to smart malfunctioning meter detection.
The prediction accuracy of deep learning models strongly depends on the hyperparameter optimization (HPO) process. In [16][17][18][19][20][21], the authors ignored the problem of hyperparameter optimization, so the accuracy of the model cannot be improved. In [22], the authors proposed a novel optimization method that combines generalized normal distribution optimization, Nelder-Mead (NM) simplex direct search method and differential mutation operator to address optimization problems. This method is computationally expensive and time-consuming. Wang et al. [23] presented a prediction model that adopted secondary variables data to improve prediction model accuracy, but the parameter of the prediction model was influenced by the noise of the primary and secondary variables. Duan et al. [24] recommended a novel prediction model which included data decomposition techniques, recurrent neural network prediction algorithms and error decomposition correction methods. The two-layer prediction structure ensured the accuracy of the prediction but the hyperparameter tuning problem was not addressed. Qu et al. [25] introduced an ensemble model which used the grid-search method to optimize the hyperparameter for decision tree and neural network. As it included three base models, the model training process was complex. This paper tried to address the anomaly detection by considering the assumptions stated below.
In this paper, the historical operation state of the smart meters is normal. In other words, the historical power consumption is all normal data. Spontaneous changes in customers' power consumption behaviors are assumed to be bounded.
Although there are many anomaly detection methods for time series, there is still a vacancy of efficient and fast anomaly detection for the operation state of smart meters. At present, the most widely used method of smart meter detection is manual on-site detection, which has a long detecting period and labor-intensive. We propose an evaluation model based on deep learning to address the anomaly detection for smart meter. Keeping this in mind, this paper makes the following main contributions.

1.
RNN is applied for time series prediction of power consumption. Note that RNN outperformed other time-series prediction models [19]. The RNN cell contains the internal state in which information can be stored, making it ideal for time series prediction tasks. The grid-search method is adopted to address the HPO.

2.
TL is applied for building prediction model of significant number of meters. TL is added to the prediction model construction, and the hyperparameters of the trained model are set as the training starting point of the next prediction model, which greatly reduce the workload of building prediction.

3.
Flexible threshold setting and abnormal accumulator are added in abnormal judgment, which prevent the false positives caused by man-made or natural factors. Figure 1 shows the workflow of the evaluation. The evaluation model is divided into three parts: data preprocessing, model training, and anomaly detection notifier. The data processing stage includes data collection, data cleaning, normalization, and load profile clustering. The model training stage combines TL with neural networks to build  Figure 1 shows the workflow of the evaluation. The evaluation model is divided into three parts: data preprocessing, model training, and anomaly detection notifier. The data processing stage includes data collection, data cleaning, normalization, and load profile clustering. The model training stage combines TL with neural networks to build the prediction model. The anomaly detection notifier completes the anomaly detection by means of threshold setting and accumulator rule. The rest of this paper is structured as follows. Section 2 describes the data processing procedure and electricity curve analysis. Section 3 introduces the transfer learning process. Section 4 presents the evaluation model of operation state. Section 5 presents the analysis of the experiments. Finally, a summary of this paper is presented in Section 6.

Data Cleaning
The operation data that the smart meter collected are not all valuable: some contain data that are incomplete, and thus worthless [26]. To mitigate the negative impact of noisy and incomplete data on the performance of the evaluation framework, this paper exploits the Local Outlier Factor [27] to remove erroneous values and interpolation method to recover the incomplete or worthless values: where xi stands for the readings of the smart meter.

Normalization
The fluctuation range of meter readings is different for various customers, which is closely related to the electricity demand of users. For example, the power consumption of more populated households is generally one level higher than that of less populated households, but their power consumption patterns may have similarities. The use of metering values will reduce the effectiveness of clustering and TL, so do not input the power The rest of this paper is structured as follows. Section 2 describes the data processing procedure and electricity curve analysis. Section 3 introduces the transfer learning process. Section 4 presents the evaluation model of operation state. Section 5 presents the analysis of the experiments. Finally, a summary of this paper is presented in Section 6.

Data Cleaning
The operation data that the smart meter collected are not all valuable: some contain data that are incomplete, and thus worthless [26]. To mitigate the negative impact of noisy and incomplete data on the performance of the evaluation framework, this paper exploits the Local Outlier Factor [27] to remove erroneous values and interpolation method to recover the incomplete or worthless values: where x i stands for the readings of the smart meter.

Normalization
The fluctuation range of meter readings is different for various customers, which is closely related to the electricity demand of users. For example, the power consumption of more populated households is generally one level higher than that of less populated households, but their power consumption patterns may have similarities. The use of metering values will reduce the effectiveness of clustering and TL, so do not input the power consumption directly into the prediction model. The activation function of the prediction model is more sensitive to the number near zero and the data has been normalized get the best cluster effect. In particular, the data were normalized by the MAX-MIN scaling method: where x i stands for the reading value, min(x) is the minimum value in x, and max(x) is the maximum value in x. After processing data through the standardized method, all data will fall within the range of [0, 1].

Electricity Curve Analysis
This paper analyzes the power consumption data collected by the smart meter. This dataset contains the power consumption of 128 customers within 420 days, which is provided by state grid Shanxi. Figure 2a shows the power consumption of the smart meter operate in normal state, which is arranged by day and week, respectively. With the curves arranged by day, the fluctuation of the power consumption can be observed but no regularity involved. The key features of power consumption data are hardly to capture by 1-D data which affect the efficient of neural networks. The regularity of the power consumption readings can be observed by arranging the data in week, in which the readings tend to peak on day 4 and day 7 every week and reach the trough on day 6. For the smart meter under normal operation, the regularity of its readings can be observed. The regularity for most meter readings was observed if aligning the readings of all the 12 months together [28]. maximum value in x. After processing data through the standardized method, all data will fall within the range of [0, 1].

Electricity Curve Analysis
This paper analyzes the power consumption data collected by the smart meter. This dataset contains the power consumption of 128 customers within 420 days, which is provided by state grid Shanxi. Figure 2a shows the power consumption of the smart meter operate in normal state, which is arranged by day and week, respectively. With the curves arranged by day, the fluctuation of the power consumption can be observed but no regularity involved. The key features of power consumption data are hardly to capture by 1-D data which affect the efficient of neural networks. The regularity of the power consumption readings can be observed by arranging the data in week, in which the readings tend to peak on day 4 and day 7 every week and reach the trough on day 6. For the smart meter under normal operation, the regularity of its readings can be observed. The regularity for most meter readings was observed if aligning the readings of all the 12 months together [28].
In contrast, Figure 2b shows a reading of a malfunctioning smart meter in a month. Similar to Figure 2a, this paper plots the power consumption by date and week. It can be seen that the power consumption of the first two weeks fluctuates periodically and the reading tends to peak on day 2 and day 6. Since day 6 of the third week, the power consumption has been significantly reduced, and the power consumption has been kept at a low level. In this case, it can be judged that the user's power consumption habits have changed or the operation state of the electric energy meter was malfunctioning. In contrast, Figure 2b shows a reading of a malfunctioning smart meter in a month. Similar to Figure 2a, this paper plots the power consumption by date and week. It can be seen that the power consumption of the first two weeks fluctuates periodically and the reading tends to peak on day 2 and day 6. Since day 6 of the third week, the power consumption has been significantly reduced, and the power consumption has been kept at a low level. In this case, it can be judged that the user's power consumption habits have changed or the operation state of the electric energy meter was malfunctioning.
However, with the support of abundant historical electricity data of users, accidental changes will not affect the overall trend. As long as the historical electricity data is sufficient, the user's power consumption can be predicted. The prediction residual will be within a certain range if the electric energy meter operates normally. On the contrary, the characteristics of time series hidden in the electricity data changed accordingly with the change of operation mode of smart meter. As the meter operation changed, the prediction residual of the electricity data will be uncontrollable, so that the malfunctioning meter can be located.

Cluster-Based Chain Transfer Learning
TL is a machine learning method which aims at transferring knowledge the neural networks had already learnt. It can be applied to feature representation transfer, pretrained neural networks transfer, and parameter transfer. This evaluation adopts parameter transfer, it aims to reduce the iterations of the prediction model for different learning domains which have similarity. The training time of the evaluation model is the product of the training time of a single prediction model and the number of meters. TL that merged into the model training process realizes the purpose that greatly reduce model overall training time and improve the operating efficiency of the framework. Cluster-based chain transfer learning (CBCTL) approach builds prediction models for abundant meters by applying clustering and TL. The efficiency of TL is improved by restricting the model parameters to be transferred in the same cluster. First, the prediction model focused on the internal feature rather than the size. It does not directly use power consumption values, but further transforms the readings through normalization. Each transformed data is treated as feature in the K-means clustering. Samples for all meters must start at same time. For each formed cluster, the model training process is applied with the meters within that cluster. Framework uses the meter data to determine the transfer path and the train set to train the neural network models. The TL path is indicated by the similarity of meters electrical data in each cluster. Select next meter which is most similar with the source meters. Setting the source meter as next model's starting point obviously reduces the number of training epoch for optimizing hyperparameters.

Load Profile Clustering
Faced with significant amounts of meters, it is an onerous task to establish the evaluation model. The weights learned during the initial model training were transferred from the source domain to the target domain. It is to expect that success will be higher if the two domains are more similar. Moreover, the accuracy of forecasting is higher if the transfer occurs between more similar meters. In addition to accuracy, higher similarity of the meters requires fewer training epochs which reduces the training time. K-means algorithm [29] is a classic distributed clustering algorithm. It randomly divides load profile into K clusters according to the profile and the Euclidean distance, which makes the daily power consumption of users in the cluster more similar with each cluster center. K-means clustering uses Euclidean distance as a distance measurement. The Euclidean distance between sample i and sample j is calculated as follows: The load profile is divided into K clusters, and the optimal number of clusters is determined by the elbow criterion [30]. The number of class is set from 1 to n, and n sum of squared errors (SSE) are calculated. According to the potential pattern of data, the SSE presents a rapid decline trend when the set number of clusters approaches the real number of class clusters. When the set number of class clusters exceeds the real number of class clusters, the SSE will continue to decline, and the decline will quickly slow down. By finding the inflection point during the descent, the K value can be better determined. The characteristics of the load profiles within the cluster are similar and the efficiency of TL is improved.

Transfer Learning
The determination of the TL direction of the model weights begins with the calculation of similarity between each pair of load curves within the cluster. The framework is interested in the characteristic of power consumption and not in the actual readings of smart meters [31]. The readings need to be scaled into the same range. The min-max normalization was performed in load profile clustering and similarity calculation continues using it scaled data.
R meters were clustered to a cluster, and the calculation of the similarity is a R*R upper oblique matrix. This similarity calculation matrix is expressed as S R*R . The values in the matrix represent similarity between two meters, and the lower the value in the matrix the similar are the two customer's power consumption characteristics. Three similarity calculation metrics are considered, including Euclidean, Cosign, and Manhattan distances.
Denote set T and set S, where set T represents set of no prediction model and set S represents set of existing prediction models. To start with, none of the meters have a prediction model, all meters belong to the target set T and the source set S is empty. Denote prediction model as m 1 , m 2 , . . . , m p , m q , . . . , m k ∈ M, where M is the set of all prediction models and k is the number of models. Meter k data are denoted as d k . Suppose that if these load curves have some similarities, so do the hyperparameters of the prediction model they have trained. Therefore, the training time of the modeling process can be reduced by setting the hyperparameters of similar meter prediction models as the training starting point. Figure 3 shows the framework of building the evaluation model. The TL process is as follows: 1.
Within each cluster, it is divided into source set S and target set T. TL starts with similarity calculation which gives similarities between each pair of meters within the cluster. Assume that meter p is the core of cluster and it was selected to build the initial prediction model m p . The meter that has the maximum similarity to the initial meter p is selected as the TL target.

2.
The existing model m p , which is trained by the power consumption data, is regarded as the starting point for training next prediction model m q . Initial model m p is trained by the target model dataset d q to build the next model m q . During the transferred process, the structure and hyperparameters of initial model remain the same. The weights for each model changed only in this situation that trained by its own dataset. 3.
The result of the training with the transferred model is the new source model m q . The new built source model m q , which is available for TL, was added to the source set S. For m p , m q , and m k of the model that has already been built, the electrical data is attributed to the set S. For the electrical history data that does not have a model in the process, the data is attributed to set T. 4.
Next, the direction of the chain transfer is determined by calculating the similarity between set S and set T. Figure 4 shows the direction of chain transfer learning. The chain Transfer process ends with an empty set T, where all users have a well-trained evaluation model.

Evaluation Model
In this section, combining TL, Anomaly Detection Model (ADM), and RNN constructs a TL-ADM-RNN for time series anomaly detection.

Prediction Model
RNN is selected as the core module of the prediction model. Gate recurrent unit (GRU) is a new generation of RNN, which is very similar to LSTM [32]. The structure diagram of GRU is provided in Figure 5. The integration method between new information and the previous memory was determined by Reset gate z, the amount of memory saved to the current time step was defined by update gate. The GRU differs from the

Evaluation Model
In this section, combining TL, Anomaly Detection Model (ADM), and RNN constructs a TL-ADM-RNN for time series anomaly detection.

Prediction Model
RNN is selected as the core module of the prediction model. Gate recurrent unit (GRU) is a new generation of RNN, which is very similar to LSTM [32]. The structure diagram of GRU is provided in Figure 5. The integration method between new information and the previous memory was determined by Reset gate z, the amount of memory saved to the current time step was defined by update gate. The GRU differs from the

Evaluation Model
In this section, combining TL, Anomaly Detection Model (ADM), and RNN constructs a TL-ADM-RNN for time series anomaly detection.

Prediction Model
RNN is selected as the core module of the prediction model. Gate recurrent unit (GRU) is a new generation of RNN, which is very similar to LSTM [32]. The structure diagram of GRU is provided in Figure 5. The integration method between new information and the previous memory was determined by Reset gate z, the amount of memory saved to the current time step was defined by update gate. The GRU differs from the LSTM cell in that it combines the cell storage and hidden states into a common hidden state h and also combines the input and forget gate into update gate z. GRU reduces the number of parameters and has long-term memory function. More importantly, it solves the problem of vanishing gradient. A reset gate r is introduced that mitigates the effect of the previous hidden state on the new hidden state, as shown in the update step in Formula (5). The formula in a single GRU unit is where h [t−1] is the hidden layer status at the previous moment, and h [t] is the hidden layer information at the current moment. All of the weight matrices W* are updated using the error backpropagation algorithm according to the difference between the output value and the actual value, and b* denotes the bias vectors. indicates the element-wise multiplication between two vectors and indicates the element-wise multiplication between two vectors and σ expresses the activation function of the gate.
where h[t-1] is the hidden layer status at the previous moment, and h[t] is the hidden layer information at the current moment. All of the weight matrices W* are updated using the error backpropagation algorithm according to the difference between the output value and the actual value, and b* denotes the bias vectors. indicates the element-wise multiplication between two vectors and σ expresses the activation function of the gate.  All models were built through the same neural networks approach, which were trained for a sufficient number of epochs ensuring that weights converge. To avoid getting stuck in a local minimum, a different set of the random initial values was chosen at the starting point of the training process. The input sequence x [1], ..., x[T]is divided into smaller segments, which enable GRU to discover the internal periodicity in the consumption data. The segments pass through the GRU [33] to extract timing characteristics of the input vector obtain the prediction sequence y [1], ..., y[N]. The schematic diagram of a sliding-window process is shown in Figure 6. As the train set has been passed through the model, the optimizer adjusts parameter. When the model reaches a fixed number of iterations or the MSE meets the requirements, the model is established.

x[T-2] x[T-n+1] y[T]'
x [3] x [4] .  Figure 6. Schematic diagram of a sliding-window process. All models were built through the same neural networks approach, which were trained for a sufficient number of epochs ensuring that weights converge. To avoid getting stuck in a local minimum, a different set of the random initial values was chosen at the starting point of the training process. The input sequence x [1], . . . , x[T]is divided into smaller segments, which enable GRU to discover the internal periodicity in the consumption data. The segments pass through the GRU [33] to extract timing characteristics of the input vector obtain the prediction sequence y [1], . . . , y[N]. The schematic diagram of a sliding-window process is shown in Figure 6. As the train set has been passed through the model, the optimizer adjusts parameter. When the model reaches a fixed number of iterations or the MSE meets the requirements, the model is established.
where h[t-1] is the hidden layer status at the previous moment, and h[t] is the hidden layer information at the current moment. All of the weight matrices W* are updated using the error backpropagation algorithm according to the difference between the output value and the actual value, and b* denotes the bias vectors.  indicates the element-wise multiplication between two vectors and σ expresses the activation function of the gate. σ σ tanh Figure 5. GRU structure diagram.
All models were built through the same neural networks approach, which were trained for a sufficient number of epochs ensuring that weights converge. To avoid getting stuck in a local minimum, a different set of the random initial values was chosen at the starting point of the training process. The input sequence x [1], ..., x[T]is divided into smaller segments, which enable GRU to discover the internal periodicity in the consumption data. The segments pass through the GRU [33] to extract timing characteristics of the input vector obtain the prediction sequence y [1], ..., y[N]. The schematic diagram of a sliding-window process is shown in Figure 6. As the train set has been passed through the model, the optimizer adjusts parameter. When the model reaches a fixed number of iterations or the MSE meets the requirements, the model is established. Figure 6. Schematic diagram of a sliding-window process. Figure 6. Schematic diagram of a sliding-window process. The evaluation model adopts a grid-search method to address the HPO process. The hyperparameter range for the number of hidden layers and neurons is set, then the model is trained by traversing different combinations of hyperparameters and the model prediction error is calculated. By comparing the errors under different combination of hyperparameters, the hyperparameter with the best prediction effect is selected, and then the parameter settings of the model are determined.

Anomaly Detection Model
For a residential area after artificially detecting the meter state of the collected data, the smart meter operation state is normal as its default. The operation state of the meter can be determined by monitoring whether the data exceeds the threshold. In this paper, K-sigma and Confidence Interval are used to set the threshold of prediction residual. After training the prediction model, the K value or the C value is determined by the test set. The formula to calculate the prediction residual is shown below: where E represents the daily residual error, W predict represents the daily electricity prediction, and W rom represents the daily reading of the meter. Consumption data are divided into 80% train set and 20% test set. The prediction residual threshold is determined based on the residual of the test set. The ability of the model to predict the trend of power consumption is more important than the ability to predict accurately. The predictive capability of trained model is determined by the train set. Without the influence of changing the smart meter operation, the prediction residual will be within a certain range, which is determined by the prediction residual of the model itself. The model threshold is set by the performance of the test set in this segment.

K-Sigma
The upper and lower thresholds of the prediction residual are determined by K-Sigma, where the K value needs to be determined through the test set. The determination rule is to calculate the minimum K value that can satisfy the prediction residual of the test set. The threshold setting is intended to compare changes in the time series characteristics of the context and the prediction residual of normal data should belong to a normal range. Upperthreshold = T mean + K × I range Lowerthreshold = T mean − K × I range (7) where T mean represents the trimmed mean of the data, and the I range represents the interquartile range of the data.

Confidence Interval
This paper sets the confidence interval of the prediction residual and obtains the confidence level by comparing the residual error in test set. The data for the known test set is collected in a normal state, and the confidence interval coefficients obtained from normal data are credible.

Abnormal Judgment
Abnormal judgment is a necessary part of anomaly detection. In [16], the submeters are defined as inaccurate if the predicted residual errors exceed threshold. In [17], it adopts three-sigma rule as the rule of abnormal judgment. The data are cited as abnormal when it exceeds outlier threshold. In [34], the state of the observed volume depended on whether the number of anomalies reported is sufficient. In this paper, the requirements of abnormal judgment are as follows:

1.
Abnormal detection requires reliability and effective identification when the operation state of meters changes.

2.
Abnormal detection needs to be robust, an abnormality caused by short-term power mutation or non-human causes during the operation of power system cannot be judged as abnormal.
The model defines outliers as prediction residual exceeding the upper and lower thresholds. Due to the volatility of power system, it is easy to produce false positives based on a single anomaly. This paper proposed the accumulator rule, whose rule was to require multiple outliers to occur in a short period of time before signaling anomaly. The use of accumulator improves decision performance. When the prediction residual exceeds the threshold range, the value of the accumulator grows. Conversely, the accumulator shrinks as the predicted residual within the threshold range [35]. The smart meter was judged to be abnormal when the prediction residual sends out a continuous abnormal signal for a certain period of time. The meter was determined as malfunctioning with the accumulator reaches the alarm value. In addition to the statistics of local anomalies, the monitoring rules are set for the whole data. When the relative error outliers exceed 20% of the total data, it is judged that the meter is malfunctioning.

Evaluation
The data were collected in the residential area of Shanxi, China and thoroughly inspected by humans; all of the meters are in good state. Empty rows and erroneous values were removed by data processing. It is assumed that all of the smart meters are in good operation state. To simulate a real-world residential area with malfunctioning meters, the malfunction was embedded manually. The offset for an inaccurate meter is a shift in the reading value, according to the technical specifications of smart meters. The relative error of the smart meter is more than 2%, which is judged to be abnormal. After the evaluation model was built and the human modified failure was injected. Selecting the consumption data to establish the evaluation model of operation state, the pre-and post-prediction residual as shown in Figure 7. when it exceeds outlier threshold. In [34], the state of the observed volume depended on whether the number of anomalies reported is sufficient. In this paper, the requirements of abnormal judgment are as follows: 1. Abnormal detection requires reliability and effective identification when the operation state of meters changes. 2. Abnormal detection needs to be robust, an abnormality caused by short-term power mutation or non-human causes during the operation of power system cannot be judged as abnormal.
The model defines outliers as prediction residual exceeding the upper and lower thresholds. Due to the volatility of power system, it is easy to produce false positives based on a single anomaly. This paper proposed the accumulator rule, whose rule was to require multiple outliers to occur in a short period of time before signaling anomaly. The use of accumulator improves decision performance. When the prediction residual exceeds the threshold range, the value of the accumulator grows. Conversely, the accumulator shrinks as the predicted residual within the threshold range [35]. The smart meter was judged to be abnormal when the prediction residual sends out a continuous abnormal signal for a certain period of time. The meter was determined as malfunctioning with the accumulator reaches the alarm value. In addition to the statistics of local anomalies, the monitoring rules are set for the whole data. When the relative error outliers exceed 20% of the total data, it is judged that the meter is malfunctioning.

Evaluation
The data were collected in the residential area of Shanxi, China and thoroughly inspected by humans; all of the meters are in good state. Empty rows and erroneous values were removed by data processing. It is assumed that all of the smart meters are in good operation state. To simulate a real-world residential area with malfunctioning meters, the malfunction was embedded manually. The offset for an inaccurate meter is a shift in the reading value, according to the technical specifications of smart meters. The relative error of the smart meter is more than 2%, which is judged to be abnormal. After the evaluation model was built and the human modified failure was injected. Selecting the consumption data to establish the evaluation model of operation state, the pre-and post-prediction residual as shown in Figure 7.

EX1
GRU and LSTM play an important role in RNN. Experiments show that sliding windows with different width have an important influence on the grasping and prediction of time series characteristics. The prediction accuracy of GRU and LSTM for power consumption was tested respectively in the case of 3, 7, 14, and 20 days, each experiment has  (1) The prediction accuracy of GRU is higher overall than that of LSTM, and the model training time of GRU is shorter. (2) Through the comparison of MSE, RMSE, MAE, and other indicators in the process of predicting the power consumption, the prediction effect of GRU and 14-day sliding window is the best, and the prediction effect is shown in Figure 8.

EX2
The abnormal judgment of the evaluation model is based on the prediction residual. Residual threshold setting uses two methods: K-Sigma and Confidence interval. Both K and C values need to be set by the residual of the test set. For models built by different user data, thresholds for different models are different. This improves the generalization

EX2
The abnormal judgment of the evaluation model is based on the prediction residual. Residual threshold setting uses two methods: K-Sigma and Confidence interval. Both K and C values need to be set by the residual of the test set. For models built by different user data, thresholds for different models are different. This improves the generalization capability of the model. The error of fault injection is 3%, 5%, 7% and 10%. The model was experimented with two threshold settings. The results of the experiments can be found in Figure 9 and Table 3:

EX2
The abnormal judgment of the evaluation model is based on the prediction residual. Residual threshold setting uses two methods: K-Sigma and Confidence interval. Both K and C values need to be set by the residual of the test set. For models built by different user data, thresholds for different models are different. This improves the generalization capability of the model. The error of fault injection is 3%, 5%, 7% and 10%. The model was experimented with two threshold settings. The results of the experiments can be found in Figure 9 and Table 3:  The results of the operation of the energy meter under different errors can be detected by model, and the K-Sigma method is more reliable.

EX3
The dataset is collected in a low-voltage network; 40 m are deployed in this system. This paper selects four meters and injects the fault offset artificially into the data. Data offsets as shown in Table 4:  The results of the operation of the energy meter under different errors can be detected by model, and the K-Sigma method is more reliable.

EX3
The dataset is collected in a low-voltage network; 40 m are deployed in this system. This paper selects four meters and injects the fault offset artificially into the data. Data offsets as shown in Table 4: The electric energy is regarded as a kind of generalized flow which satisfies the conservation law, so the electric energy meter, water meter, gas meter, and other conventional flow meters are abstracted as a kind of generalized flow meters. Under the flow conserva- tion agreement, the actual flow increment of the master meter is equal to the sum of the increments of each sub meter in the same period, thus constituting (8): where Y(i) is the reading of the master meter, Q j (i) is the reading of the sub-meter, e z is the operation error of the master meter, measured on-site or laboratory, e j is the operation error of the sub-meter, e y is the line loss rate of the area, and e0 is the fixed loss of the area.
where x j represents the actual consumption of the meter. Figure 10 shows the results of error estimator. It can be seen that the error of meter 28 is calculated at 3.19%, the error of meter 37 is calculated at −4.9%, and the other two abnormal meters cannot be found. In this paper, the above four abnormal smart meters were detected by the model this paper proposed. The results were shown in Table 5. For meter 9, the model prediction residual was shown in Figure 11. The conclusion of the comparison as following: (1) When the cluster-based error estimation matrix of smart meter is constructed, the measurement result is the average operation error of the whole measurement cycle.
The historical data will affect the recent state of smart meters. The error estimation of smart meter has no timeliness. (2) The evaluation model of smart meters evaluates meters in real-time. The abnormal state of smart meter can be identified quickly, sensitively, and reliably when the characteristics of power consumption data change combined with abnormal judgment. The electric energy is regarded as a kind of generalized flow which satisfies the conservation law, so the electric energy meter, water meter, gas meter, and other conventional flow meters are abstracted as a kind of generalized flow meters. Under the flow conservation agreement, the actual flow increment of the master meter is equal to the sum of the increments of each sub meter in the same period, thus constituting (8): where Y(i) is the reading of the master meter, j Q (i) is the reading of the sub-meter, z e is the operation error of the master meter, measured on-site or laboratory, j e is the operation error of the sub-meter, y e is the line loss rate of the area, and 0 e is the fixed loss of the area.
where j x represents the actual consumption of the meter. Figure 10 shows the results of error estimator. It can be seen that the error of meter 28 is calculated at 3.19%, the error of meter 37 is calculated at −4.9%, and the other two abnormal meters cannot be found. In this paper, the above four abnormal smart meters were detected by the model this paper proposed. The results were shown in Table 5. For meter 9, the model prediction residual was shown in Figure 11. The conclusion of the comparison as following: (1) When the cluster-based error estimation matrix of smart meter is constructed, the measurement result is the average operation error of the whole measurement cycle.
The historical data will affect the recent state of smart meters. The error estimation of smart meter has no timeliness. (2) The evaluation model of smart meters evaluates meters in real-time. The abnormal state of smart meter can be identified quickly, sensitively, and reliably when the characteristics of power consumption data change combined with abnormal judgment Figure 10. Results of cluster-based operation error method of electric energy meters. Figure 10. Results of cluster-based operation error method of electric energy meters. (a) (b) Figure 11. (a) The residual output results of the meter 9 in normal state. (b) The output result of the residual error of the meter 9 in the 3% error injection model. Table 5. Results of evaluation model. 9  abnormal  13  abnormal  22 abnormal 37 abnormal

Conclusions and Future Work
In this paper, the key technology of smart meter state evaluation is studied and an online evaluation method of smart meter operation state is proposed. The state evaluation model of the smart meters is established based on the historical consumption data. The power consumption of users was accurately predicted by the RNN. When the operation state of the smart meter changed, the model can accurately find the malfunctioning smart meter. TL greatly reduces the time to establish the model and improves the efficiency of model training. The flexibility of threshold setting makes it more suitable for the condition monitoring of smart meter. According to the actual data of a certain station in Shanxi Province, the evaluation model of the smart meter is established and evaluation of operation state for smart meter is carried out. The evaluation can provide effective technical support for daily operation, maintenance management, state evaluation of smart meters and accurate replacement strategy which greatly improve the intelligent management level and work efficiency of power operation management department.

Conclusions and Future Work
In this paper, the key technology of smart meter state evaluation is studied and an online evaluation method of smart meter operation state is proposed. The state evaluation model of the smart meters is established based on the historical consumption data. The power consumption of users was accurately predicted by the RNN. When the operation state of the smart meter changed, the model can accurately find the malfunctioning smart meter. TL greatly reduces the time to establish the model and improves the efficiency of model training. The flexibility of threshold setting makes it more suitable for the condition monitoring of smart meter. According to the actual data of a certain station in Shanxi Province, the evaluation model of the smart meter is established and evaluation of operation state for smart meter is carried out. The evaluation can provide effective technical support for daily operation, maintenance management, state evaluation of smart meters and accurate replacement strategy which greatly improve the intelligent management level and work efficiency of power operation management department.