Identification of Abnormal Electricity Consumption Behavior of Low-Voltage Users in New Power Systems Based on a Combined Method

Gou, Jiaolong; Niu, Xudong; Chen, Xi; Dong, Shuxin; Xin, Jing

doi:10.3390/en18102528

Open AccessArticle

Identification of Abnormal Electricity Consumption Behavior of Low-Voltage Users in New Power Systems Based on a Combined Method

by

Jiaolong Gou

^1,2,

Xudong Niu

³,

Xi Chen

⁴,

Shuxin Dong

¹ and

Jing Xin

^1,*

¹

School of Automation and Information Engineering, Xi’an University of Technology, Xi’an 710048, China

²

Xi’an High Way Research Institute Co., Ltd., Shaanxi Transportation Holding Group Co., Ltd., Xi’an 710065, China

³

Department of New Quality Productive Forces, Shaanxi Transportation Holding Group Co., Ltd., Xi’an 710065, China

⁴

Information and Communication Company, State Grid Shaanxi Electric Power Company Limited, Xi’an 710004, China

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(10), 2528; https://doi.org/10.3390/en18102528

Submission received: 1 April 2025 / Revised: 24 April 2025 / Accepted: 8 May 2025 / Published: 14 May 2025

(This article belongs to the Special Issue Hybrid Intelligent Modeling Technology and Optimization Strategy for Industrial Energy Consumption Processes: 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

With the rapid growth in low-voltage electricity demand, abnormal electricity consumption behavior is becoming more and more frequent, which not only threatens the safe and stable operation of power systems, but also causes huge economic losses. In order to effectively meet this challenge, it is of great practical significance to carry out monitoring and analysis of abnormal power consumption of low-voltage users. In this paper, a new detection model of abnormal power consumption behavior of low-voltage power users in power system based on the hybrid model, namely the K-GBDT model, is proposed. The model combines the GBDT (Gradient Boosting Decision Tree) algorithm with the KNN (K-Nearest Neighbor) algorithm, effectively leveraging the strengths of both approaches. The K-GBDT model employs a two-stage classification strategy. In the first stage, the GBDT algorithm leverages its robust feature learning and nonlinear classification capabilities to perform coarse-grained classification, extracting global patterns and categorical information. In the second stage, based on the coarse classification results from GBDT, the data are partitioned into multiple subsets, and the KNN algorithm is applied to fine classification within each subset. This hybrid approach enables the K-GBDT model to effectively integrate GBDT’s global modeling strength with KNN’s local classification advantages. Comparative experiments and practical applications of the K-GBDT model against standalone GBDT and KNN algorithms were conducted. To further validate the proposed method, a comparative analysis was conducted against the Long Short-Term Memory Autoencoder (LSTM-AE) model. The experimental results demonstrate that the proposed K-GBDT model outperforms single-algorithm models in both classification accuracy and model generalization capability, enabling more accurate identification of abnormal electricity consumption behaviors among low-voltage users. This provides reliable technical support for intelligent management in power systems.

Keywords:

abnormal electricity consumption behavior; GBDT; KNN; K-GBDT

1. Introduction

The global energy landscape is undergoing a profound transformation, driving the power system’s transition from conventional paradigms to a new power system characterized by sustainability and intelligence [1]. This transformation not only emphasizes clean and low-carbon energy, but also prioritizes the goals of carbon peak and carbon neutrality, which have become the core issue in the development of the power industry. The construction of the new power system requires greater attention to the interaction between supply and demand sides, aimed at promoting renewable energy integration, optimizing electricity resources allocation, and enhancing the flexibility and intelligence of power system management. Against this backdrop, the power industry faces unprecedented challenges and opportunities, especially in the management and control of electricity demand.

As residential electricity consumption surges, the scientific and efficient management of low-voltage users’ electricity consumption behavior, while mitigating safety risks and economic losses from abnormal electricity usage, has become an urgent priority for the power sector. Common abnormal electricity behaviors include electricity theft, meter malfunctions, and connection errors. Among these, electricity theft poses a significant challenge due to its covert nature and widespread occurrence, making it one of the major hazards in the operation of power systems. The covert nature of electricity theft manifests in multiple dimensions: circumventing the conventional detection system through the tampering technology of smart meters; Temporarily implement intermittent theft mode by using remote control ability and monitoring gap; By taking advantage of the structural weakness in the old residential area to protect the environment, in the old residential area, complex wiring configuration and centralized instrument installation are convenient to hide unauthorized devices. Electricity theft not only undermines the stability of the power grid but also leads to substantial waste of electricity resources, severely constraining the management efficiency and economic benefits of the power sector [2].

In recent years, with the continuous maturation and widespread adoption of smart meter technology [3]. The power department can collect massive electricity consumption data from the user side in real time through the Advanced Metering Infrastructure (AMI). An Intelligent Meter Information Management System (IMIMS) has constructed a three-layer architecture system including intelligent meters (SM), data concentrators (DC), and front-end systems (HES) from Ref. [4]. At the terminal layer, intelligent meters collect 18 types of electricity consumption characteristic parameters such as voltage, current, and power factor at a minute-level frequency. These real-time data are transmitted to the regional data concentrator via power line communication (PLC) or wireless mesh network (WMN). As shown in Figure 1 is a typical framework of a power data collection system.

The power cloud platform employs integrated time-series analytics and machine learning to establish consumption baselines and detect significant behavioral deviations. However, residential anomaly detection faces three key challenges: residential load volatility exceeding industrial patterns, characteristic similarities between normal/abnormal consumption, conventional threshold methods’ inefficacy against advanced tampering devices, and modeling complexities from distributed photovoltaic bidirectional flows.

With the development of new power systems, the electricity industry is entering a new era characterized by digitalization and intelligence. The deep integration of smart meters, IoT technologies, big data analytics, and AI is driving the transformation and upgrading of the power sector. By enhancing the intelligence level of power systems, more precise management of electricity consumption behavior can be achieved. This not only helps reduce the waste of power resources but also ensures the safe and stable operation of power systems, contributing to the realization of the “dual carbon” goals [5].

Currently, the primary methods for identifying abnormal electricity consumption in power systems include rule-based approaches [6], statistical-based methods [7], and machine learning-based approaches [8]. Ref. [9] proposed a deep variational autoencoder network (DVAE) for load curves and its anomaly analysis service. DVAE can help reconstruct the load curve and measure the difference between the original curve and the new curve. Its measurement indicators include reconstruction probability and Pearson similarity. Traditional rule-based and statistical methods detect anomalies by setting thresholds or assuming predefined models. Ref. [10] employs a rule-based approach to classify and identify electricity theft, utilizing predefined thresholds for electricity load and temporal characteristics to detect abnormal behaviors. Ref. [11] models data using statistical distributions such as normal distribution and Poisson distribution by calculating deviations between real-time and historical data to determine the presence of abnormal fluctuations. However, external variables such as weather, seasonal fluctuations, and equipment degradation can cause electricity usage to exhibit deviations from typical patterns, thereby compromising the reliability of rule-based methods. Similarly, equipment malfunctions and shifts in consumer behavior may alter underlying data distributions, impairing the ability of statistical models to accurately detect anomalous variations. Thus, these methods are susceptible to external interference and exhibit limited accuracy when dealing with complex and dynamic electricity consumption behaviors [12]. Ref. [13] leverages dimensionality reduction techniques such as Principal Component Analysis (PCA), combined with multi-dimensional statistical analysis, to further improve the accuracy and adaptability of anomaly detection. This integrated approach enables the system to extract meaningful features from complex and high-dimensional data, thereby improving the robustness and flexibility of the model. Ref. [14] employ Support Vector Machines (SVM) for anomaly detection in smart meter data, constructing a classification model to identify abnormal patterns in electricity usage data. By modeling electricity load and temporal features, this method can effectively distinguish between normal and anomalous electricity consumption behavior.

In recent years, with the development of smart grids and the widespread application of power big data, machine learning techniques, particularly deep learning [15,16] and ensemble methods [17]. Power distribution and management in smart grids rely on real-time and historical data. However, existing solutions do not meet the standard requirements for prediction, are difficult to deploy, and do not achieve the expected accuracy. Ref. [18] proposes an ensemble learning-based smart meter power consumption prediction model (EPC-PM). Ensemble learning calculates the weights of the base predictors, and the voting engine selects the appropriate predictor with high accuracy and generates the final prediction output. In the next prediction iteration, the performance of the base predictors will be considered. Ref. [19] proposed a hybrid approach integrating deep learning with conventional machine learning techniques to identify anomalous behaviors deviating from normal consumption patterns by modeling power data under diverse conditions. Studies demonstrate that combining convolutional neural networks (CNN) and Long Short-Term Memory (LSTM) networks can uncover latent anomalies in time-series power data, while traditional methods such as SVM and Decision Trees further enhance classification accuracy. By learning from extensive historical electricity consumption data, machine learning approaches can automatically extract features from the data to identify potential anomalous behaviors, exhibiting strong adaptability and precision [20]. Ref. [21] enhances the traditional AE for time-series data by replacing its fully connected layer with a two-layer LSTM to form LSTM-AE, used as base models in High-LowDAAE. LSTM solves long-term dependency. The design extracts effective features, capturing data correlations.

In existing research, anomaly detection methods based on single algorithms such as GBDT [22,23] and KNN [24,25] have achieved certain advancements. However, these methods often face challenges such as insufficient fusion of global and local features, as well as poor model generalization capabilities [26]. Additionally, although deep learning methods excel in feature extraction, they typically require large amounts of labeled data, involve complex training processes, and are susceptible to data quality and noise.

To address the aforementioned issues, this paper proposes a novel anomaly detection model for low-voltage electricity consumption behavior in power systems based on a hybrid approach—the K-GBDT model. In the data preprocessing stage, this paper employs the boxplot method to detect outliers and missing values, and then uses interpolation to fill in missing data, thereby enhancing data quality. This model integrates the global modeling strengths of the GBDT algorithm with the local classification advantages of the KNN algorithm. The GBDT algorithm extracts pattern information from data on a global scale through its robust feature learning ability, while KNN ensures precise classification within local regions of the data via neighborhood searches. This enhances the accuracy and robustness of anomaly detection. By adopting this hybrid methodology, the K-GBDT model can more effectively handle complex electricity consumption behavior data and significantly improve the precision of anomaly identification.

The contributions of this paper are primarily reflected in the following three aspects:

(1): Data Preprocessing and Optimization Strategies: To address the high dimensionality and complexity of low-voltage power consumption data, a variety of data preprocessing and optimization strategies were designed. These strategies further enhanced the robustness of the model against noisy data and outliers, thereby improving its generalization capabilities.
(2): Insensitivity to Anomalous Samples and SMOTE Integration: The proposed method incorporates the KNN algorithm’s insensitivity to anomalous samples while effectively enhancing its ability to handle class imbalance through the integration of the Synthetic Minority Over-sampling Technique (SMOTE).
(3): Innovative Heterogeneous Algorithmic Collaboration Mechanism: An innovative heterogeneous algorithmic collaboration mechanism that combines Gradient Boosting Decision Tree (GBDT) with K-Nearest Neighbors (KNN) was adopted to construct a K-GBDT hybrid model with dual-modal feature processing capabilities. This framework achieves multi-level feature interaction in both temporal and spatial dimensions via GBDT’s global feature abstraction and KNN’s local topological preservation characteristics, thereby enhancing the accuracy and robustness of abnormal electricity consumption behavior recognition.

The rest of this paper is organized as follows: Section 2 introduces the principle of identifying abnormal electricity consumption behaviors of low-voltage electricity consumers in the power system based on the combination method. Section 3 mainly focuses on the application scenarios and performance analysis of the K-GBDT model. Section 4 is discussion. Section 5 is the content summary part, and it discusses the remaining issues in future work.

2. Principles of the Proposed Method

2.1. Problem Definition

With the rapid development of the social economy and the sharp increase in the number of low-voltage electricity users, power systems are facing growing challenges. Particularly in the field of low-voltage electricity consumption, frequent abnormal electricity consumption behaviors (such as electricity theft, equipment failures, and illegal electricity age), not only undermine the revenue stability of power companies but also poses serious threats to the safe operation of the power grid. These abnormal electricity behaviors lead to the wastage of electricity resources, increase the burden on the power grid, and reduce the efficiency of power system dispatching and management. Therefore, timely and accurate identification of abnormal electricity consumption behaviors among low-voltage users has become a critical technical issue for ensuring the safe and economic operation of power system.

The detection of abnormal electricity consumption behavior is essentially a binary classification problem, which requires the design of appropriate detection methods to distinguish between normal and abnormal electricity consumption behaviors, and the calculation of classification accuracy based on evaluation metrics. This paper proposes a novel identification model for abnormal electricity consumption behavior among low-voltage electricity users, termed the K-GBDT model, which effectively enhances the identification accuracy and robustness of abnormal electricity consumption behavior. The proposed model and related data processing methods are described in detail in the following sections.

2.2. The Model and Related Data Processing

The proposed framework for anomaly detection in low-voltage electricity users, as illustrated in Figure 2. The framework comprises four main components: data acquisition, data cleaning and quality enhancement, feature engineering and data transformation, and the construction of an anomaly detection model based on K-GBDT. The specific functions of each module are described as follows.

2.2.1. Data Acquisition

The data in this paper were obtained from the 2016 CCF Big Data and Intelligence Competition. The primary data content include the identification number of each user and their annual electricity consumption. The specific data formats and types are as shown in Table 1.

2.2.2. Data Cleaning and Quality Enhancement

This section employs the boxplot method to detect outliers in the daily cumulative electricity consumption and daily electricity consumption data of low-voltage consumers over a year (365 days). The identified outliers and missing values are then supplemented using an interpolation approach.

2.2.3. Feature Engineering and Data Transformation

This study selects the raw daily cumulative electricity consumption and daily electricity consumption vectors as the primary features of the model. To mitigate the impact of different measurement scales on electricity consumption behavior analysis, both vectors are normalized using the Min–Max normalization method. Given the lengths of the daily cumulative electricity consumption vector and daily consumption vector are both 730, there exists a certain degree of redundancy among the feature vectors. To reduce the computational complexity of identifying abnormal electricity consumption behavior among low-voltage residential users, Principal Component Analysis (PCA) is utilized to decrease the number of features (dimensionality). This approach not only curtails computational complexity but also enhances algorithmic efficiency. Additionally, by projecting the data into a lower-dimensional space, PCA eliminates numerous irrelevant or redundant dimensions, thus alleviating the challenges associated with distance calculations in high-dimensional spaces.

2.2.4. Construction of the Identification Model

To address the challenge of balancing global pattern extraction with local pattern extraction capabilities in single classification algorithms, this paper proposes the K-GBDT algorithm. By integrating the global pattern extraction strength of GBDT with the local classification ability of KNN, K-GBDT effectively overcomes the limitations of traditional single algorithms in complex data classification, class imbalance, and model generalization. This integration significantly enhances the classification accuracy and recognition performance for minority classes. The proposed method has been applied to construct an abnormal electricity consumption behavior model, and empirical validation demonstrates that this method outperforms single-algorithm models in both classification accuracy and generalization performance.

The K-GBDT model proposed in this paper effectively addresses complex data classification and class imbalance problems by combining the strengths of both GBDT and KNN algorithms. Its workflow consists of four steps: GBDT model construction, GBDT rough classification, KNN fine-grained classification, and K-GBDT model output.

GBDT model construction

GBDT employs the gradient boosting method, utilizing K decision trees to iteratively optimize the loss function

L = \sum_{i = 1}^{m} l (y_{i}, {\hat{y}}_{i})

, extracting global features and classification patterns from the data, and ultimately outputting a rough classification result:

{\hat{y}}_{i} = \sum_{k = 1}^{K} f_{k} (x_{i})

(1)

where x_i represents the input feature vector of the i-th sample, y_i represents the true category or target value of the i-th sample,

{\hat{y}}_{i}

denotes the final prediction result of the GBDT model for the i-th sample x_i,

l (y_{i}, {\hat{y}}_{i})

represents the loss between the true label y_i of the i-th sample and the predicted value

{\hat{y}}_{i}

,

f_{k} (x_{i})

is the prediction result of the k-th tree, and L represents the loss function of the entire model, which is used to measure the difference between the predicted value and the true value.

2.: GBDT rough classification

According to the class probability

P (y = c | x_{i})

output by GBDT, the samples are partitioned into class-specific subsets:

D_{c} = \{x_{i} | {a r g m a x}_{c} P (y = c | x_{i} = c)\}

(2)

where c belongs to one of all possible category sets, and

D_{c}

represents the set of all samples x_i that are predicted to belong to category c.

3.: KNN fine-grained classification

In each subset, the KNN algorithm calculates the Euclidean distance between a sample and its neighboring samples according to Equation (3), and then performs majority voting classification based on the nearest neighbor set

Τ_{k} (x)

.

d (x_{i}, y_{i}) = \sqrt{\sum_{k = 1}^{n} {(x_{i, k} - x_{j, k})}^{2}}

(3)

where x_i,k represents the value of sample x_i at the k-th feature, x_j,k represents the value of sample x_j at the k-th feature, and

d (x_{i}, y_{i})

denotes the Euclidean distance.

4.: K-GBDT model output

Combining the results of GBDT rough classification and KNN fine-grained classification to complete the final classification. The above K-GBDT model is defined as:

f_{i}^{K - G B D T} (x_{i}) = y_{i}

(4)

where

f_{i}^{K - G B D T} (x_{i})

represents the final classification result of sample xi by the K-GBDT model.

2.2.5. Training of the Identification Model

A stepwise training approach is adopted. Initially, global features are extracted using GBDT to generate coarse classification results. Subsequently, subsets are divided according to these coarse classification outcomes, and local fine-grained classification is performed within each subset using KNN. This process enables the model to effectively integrate both global and local feature extraction and classification, significantly enhancing the recognition accuracy of abnormal electricity consumption behaviors and the model’s generalization capability.

The implementation process of the proposed identification of abnormal electricity consumption behavior of low-voltage users in new power systems based on the K-GBDT model is given as follows:

Data Preprocessing

Given a training dataset

D = {(x_{i}, y_{i})}_{i = 1}^{m}

, where

x_{i}

represents the input feature vector and

y_{i}

represents the target label, the goal is to construct a model

f (x)

that minimizes the loss function:

L = \sum_{i = 1}^{m} l (y_{i}, f (x_{i}))

(5)

where

l

represents the loss function, such as mean squared error or cross-entropy.

Feature Engineering and Data Transformation

Min–Max Normalization: For the cleaned feature vector

x_{i}

, min–max normalization is used to scale each feature value to the range [0,1]:

x_{i}^{n o r m} = (x_{i} - x_{m i n}) / (x_{m a x} - x_{m i n})

(6)

Principal Component Analysis (PCA): PCA is applied to the normalized feature matrix

X

for dimensionality reduction:

Z = X W

(7)

where

W

represents the projection matrix, and

Z

represents the low-dimensional feature matrix. The projection matrix

W

is composed of the eigenvectors of the covariance matrix:

W = {a r g m a x}_{W} t r a c e (W^{⊤} Σ W)

(8)

where

Σ

represents the covariance matrix of the feature matrix

X

.

K-GBDT Model Construction and Model Output

Combining the rough classification results from GBDT and the fine-grained classification results from KNN, the final classification is completed. The K-GBDT model is defined as:

f_{i}^{K - G B D T} (x_{i}) = y_{i}

(9)

where

f_{i}^{K - G B D T}

represents a comprehensive model that combines global feature extraction with local classification.

3. Performance Analysis of the K-GBDT Model

KNN or GBDT, when applied to complex data scenarios, often exhibit limitations such as insufficient global pattern extraction or suboptimal local classification accuracy. In contrast, K-GBDT effectively addresses these issues by integrating global and local feature extraction, thereby enhancing overall classification performance. To validate the effectiveness of the novel combined method for identifying abnormal electricity consumption behavior among low-voltage users in power systems proposed in this paper, we utilized electricity consumption data from low-voltage users in a certain region as experimental data and conducted two sets of experiments. The first experiment compared the performance of the proposed model with single models (GBDT or KNN) without sample balancing. The second experiment applied sample balancing to the single models while the proposed model operated without sample balancing, allowing for a comparative analysis of their classification performance. Finally, the proposed method was then applied to the scenario of identifying abnormal electricity consumption behavior among low-voltage residential users in State Grid Shaanxi Electric Power Company Limited, located in Shaanxi Province, China, and satisfactory practical benefits are obtained. Experimental settings and results are as follows.

3.1. The Evaluation Criterion and the Experimental Platform

The identification algorithm of abnormal electricity consumption behavior aims to accurately and quickly detect abnormal electricity users without affecting the electricity metering of normal users. Therefore, the evaluation indicators of abnormal power consumption are mainly used to evaluate the advantages and disadvantages of the algorithm, and the true positive rate (TPR) and false positive rate (FPR) are selected as the evaluation indexes of the algorithm:

T P R = T P / (T P + P N)

(10)

F P R = F P / (F P + T N)

(11)

where TP (true positive) represents the actual anomalies that are correctly identified, TN (true negative) represents the actual normal cases that are correctly identified, FN (false negative) represents the actual anomalies that are incorrectly identified as normal, and FP (False positive) represents the actual normal cases that are incorrectly identified as anomalies. That is, TN (true positive) indicates correct classifications, FN (false negative) and FP (false positive) indicate incorrect classifications. The higher the ratio of TP and TN, the better the model’s performance in identifying abnormal electricity usage; the closer TPR is to 1 and FPR is to 0, the better the model’s performance in identifying abnormal electricity usage.

All experiments are run on Python 3.12.4 under the 64-bit Windows 10 OS. The computational platform has a 1.80 GHz CPU and an 8 GB memory.

3.2. Experimental Result Analysis

In this paper, the desensitized public dataset of a new power system electricity consumption information collection system is used for analysis. The dataset includes cumulative electricity consumption data and daily electricity consumption information of users for 365 days, as well as the list of users who have been confirmed by the field staff to steal electricity, with a total of 9491 samples. The sample data are split into training set and testing set at a ratio of 7:3, and the labels are evenly distributed. The performance of the abnormal electricity consumption behavior identification model is verified, and good results are obtained.

3.2.1. A Non-Sample Balancing Case

The performance of the proposed K-GBDT model was compared with traditional GBDT and KNN models without applying the SMOTE (Synthetic Minority Over-sampling Technique) for sample balancing. The experimental results shown in Table 2 show that K-GBDT maintains a significant advantage in key evaluation indicators, including accuracy, F1-score, true positive rate (TPR), and false positive rate (FPR).

In terms of accuracy, the K-GBDT model reached 0.8791, which was 1.31% higher than GBDT (0.8673) and 20.33% higher than KNN (0.7228). This improvement emphasizes the superiority of K-GBDT in the overall classification capability, because it can predict the sample category more accurately. The F1-score of the K-GBDT model was 0.7558, which was significantly higher than GBDT (0.6175) and KNN (0.6281). These results show that K-GBDT achieves a good balance between precision and recall between positive and negative categories, which reflects its greater stability in dealing with complex classification tasks.

From the perspective of TPR, K-GBDT achieved a true positive rate of 0.6529, which is better than GBDT (0.5107) and KNN (0.5272). This indicates that K-GBDT has a stronger capability to identify positive samples, even in the case of unbalanced class distributions. In addition, in terms of FPR, the value reported by K-GBDT is 0.0275, which is significantly lower than GBDT (0.0357) and KNN (0.0378). This result highlights the very low FPR of K-GBDT, indicating it can effectively suppress the misclassification of negative samples while maintaining a high TPR. These results verify the excellent performance of the model as a whole.

The performance advantages of K-GBDT are further confirmed by the ROC curve and AUC value. Figure 3 show the ROC comparison curves of models without sample balancing. It can be seen from Figure 3 that K-GBDT’s AUC value reached 0.90, surpassing GBDT (0.86) and KNN (0.83). This indicates that K-GBDT is more precise in defining classification boundaries and provides stronger predictive power. K-GBDT’s ROC curve consistently outperforms other models, further demonstrating its superior classification performance.

These results indicate that the K-GBDT model still shows its unique advantages even when SMOTE technology is not used to balance the samples. The K-GBDT model effectively combines the local pattern learning ability of KNN and the global gradient optimization ability of GBDT. The experimental results on imbalanced datasets highlight the potential of K-GBDT as a practical solution to solve the problem of data imbalance, making it a valuable tool in the classification tasks of energy industry. In addition, the experimental results further emphasize the robustness and applicability of the K-GBDT model, indicating that it has great potential in real-world applications. Comparative results can be seen in Table 2 and Figure 3.

3.2.2. A Single Model Sample Balancing Case

The proposed K-GBDT algorithm, which integrates KNN and GBDT, has demonstrated significant performance improvements in dealing with imbalanced datasets, especially after employing the SMOTE for sample balancing. The experimental results shown in Table 3 indicate that the K-GBDT model outperforms traditional GBDT and KNN models in all key performance metrics, highlighting its exceptional capability in handling imbalanced data classification tasks.

In terms of accuracy, the K-GBDT model achieved 0.8851, representing a 1.34% improvement over GBDT (0.8737) and a 1.15% improvement over KNN (0.8754). Regarding the F1-score, the K-GBDT model attained 0.8333, significantly surpassing GBDT (0.7536) and KNN (0.6206). This enhancement in the F1-score emphasizes the superior ability of K-GBDT in effectively identifying minority class samples. Additionally, the K-GBDT model also showed significant improvements in TPR and FPR, achieving a TPR of 0.8323, over 10% higher than GBDT (0.7545). Meanwhile, the FPR of K-GBDT was reduced to 0.0871, better than GBDT (0.0912) and KNN (0.3213). These results demonstrate the balanced performance of the K-GBDT model, enabling it to more effectively identify positive samples while maintaining a low false positive rate.

The superiority of K-GBDT was further validated by analyzing its ROC curves and corresponding AUC values. The K-GBDT model achieved an AUC of 0.93, which outperforms that of GBDT (0.90) and KNN (0.89), confirming its robustness and accuracy in classification tasks. This performance is attributed to the complementary strengths of the two integrated algorithms: KNN effectively captures local patterns of the data through the neighborhood relationships, while GBDT provides powerful classification capabilities through its gradient boosting strategy. The introduction of the SMOTE further improves the distribution of minority class samples, ensuring that these samples are adequately represented and properly weighted in the model.

In summary, the K-GBDT model successfully combines the advantages of KNN and GBDT and utilizes the SMOTE to effectively solve the sample imbalance problem and significantly improve the classification performance. The model excels in identifying minority class samples, providing an effective and innovative solution for classifying unbalanced datasets in the energy system. By enhancing the accuracy and reliability of predictive systems, the K-GBDT model demonstrates great potential in real-world applications and is valuable in facilitating data-driven decision making in energy systems. Comparative results can be seen in Table 3 and Figure 4.

3.2.3. Comparison with Recent Literature Methods

To further validate the effectiveness of the method proposed in this paper, a comparative experiment was conducted between the proposed K-GBDT method and the Long Short-Term Memory Autoencoder (LSTM-AE) method in Ref. [21]. The basic LSTM-AE model consists of an encoder and a decoder, where both the encoder and the decoder adopt a two layer LSTM connected structure. In this structure, the first layer of the LSTM network extracts timing features after compressing the input timing data, and then the extracted features are fed into the second layer of the LSTM network to further extract the condensed timing features. The two-layer LSTM design can not only extract more effective timing features but also output timing features with different compression levels.

The test results are shown in Figure 5. The proposed K-GBDT model demonstrates superior performance over the LSTM-AE method across multiple evaluation metrics, both with and without the application of the SMOTE. Specifically, K-GBDT achieves higher accuracy (0.8791 without the SMOTE, 0.8851 with the SMOTE) compared to LSTM-AE (0.7287 without the SMOTE, 0.7168 with the SMOTE), highlighting its enhanced generalization capability. The F1-score, which balances precision and recall, is also significantly higher for K-GBDT (0.7558 without the SMOTE, 0.8333 with the SMOTE) than for LSTM-AE (0.2239 without the SMOTE, 0.2629 with the SMOTE), indicating its robustness in handling class imbalance. Additionally, the true positive rate (TPR) of K-GBDT improves markedly with the SMOTE (from 0.6529 to 0.8323), while LSTM-AE lags behind (0.196 to 0.2157), confirming K-GBDT’s superior detection capability. Moreover, K-GBDT maintains a consistently lower false positive rate (FPR), particularly when the SMOTE is used (0.0871 vs. LSTM-AE’s 0.13). Overall, these results affirm the effectiveness and reliability of the proposed K-GBDT model, making it a more suitable approach for imbalanced classification tasks.

3.2.4. Practical Application Results

We developed an User Electricity Anomaly Detection Tool (UEADT) based on the method proposed in this paper and applied it to the State Grid Shaanxi Electric Power Company, China. The login interface of the software and identification results are illustrated in Figure 6, Figure 7 and Figure 8. Specifically, Figure 6 displays the login page of the anti-electricity theft tool for the State Grid Shaanxi Electric Power Company, China, Figure 7 presents the developed Power Information Collection System for collecting the user electricity consumption data, and Figure 8 shows the anomaly identification results generated by the meth of proposed in this paper. The concrete benefits are as follows:

(1): In terms of economic benefits: Through the analysis and calculation of 260,000 electricity users in a prefecture-level city under State Grid Shaanxi, three rounds of on-site verification involving 47 households were conducted, successfully identifying 13 households with metering anomalies and 20 households engaged in electricity theft, recovering economic losses exceeding 500,000 RMB.
(2): In terms of social benefits: The application of big data and computational intelligence-based anti-electricity theft analysis has strengthened deterrence against electricity theft, maintained normal electricity usage order, and ensured user in electricity consumption.
(3): In terms of management benefits: After deploying the User Electricity Anomaly Detection Tool developed by the method proposed in this paper (Figure 6), on one hand, user electricity consumption data can be obtained through the Power Information Collection System (Figure 7), and on the other hand, a list of high-probability users can be obtained through electricity theft risk scenarios (Figure 8). In contrast, the State Grid Shaanxi Electric Power Company conducted a marketing survey on 18,710 low-voltage users in a city using traditional manual experience methods, deploying two inspectors per day for 75 days, and identified a total of 5 electricity theft users, with an electricity theft detection rate of only 0.07%. After using the User Electricity Anomaly Detection Tool based on the method proposed in this paper, a total of 190,000 low-voltage users were analyzed, identifying 7 electricity theft users and 2 metering anomaly users, with an abnormal electricity detection rate of 51.16%, which is 730.86 times the manual detection rate. Both the time and personnel required were significantly reduced compared to traditional household-by-household inspections, thereby substantially saving labor and material resources while enhancing operational efficiency.

As illustrated in Figure 9 of the inspection case, the proposed method was employed to identify abnormal electricity consumption behavior for a low-voltage residential user. The calculated suspicion level of this user to be 0.95, which is considered a high probability suspect. On-site investigation revealed the presence of an illegal dual-circuit control system on the user side (shown in Figure 9), confirming the suspicion of electricity theft. Two switches were found installed on-site, one connected to the electricity meter and the other to the public grid line. When the first switch was turned off, engaging the second switch enabled electricity theft, whereas operating the first switch restored normal metered consumption.

4. Discussion

The K-GBDT model proposed in this paper has demonstrated remarkable performance in the task of identifying abnormal electricity consumption behaviors of low-voltage users. In this section, we will conduct an in-depth discussion on the research results from the following aspects:

4.1. Analysis of the Mechanism for Model Performance Enhancement

The advantage of the GBDT model stems from the synergy between GBDT and KNN. GBDT extracts global features (such as electricity consumption trends and periodic patterns) through the gradient boosting strategy, while KNN captures fine-grained anomalies (such as short-term electricity theft behaviors) through local neighborhood search. For example, in the case of electricity theft, GBDT can identify long-term electricity consumption decline trends, and KNN can further detect sudden changes in daily electricity consumption. This combined strategy effectively alleviates the shortcomings of a single model in the trade-off between global and local features, thereby improving classification accuracy (such as an increase of more than 10% in F1-score) and generalization ability.

4.2. Comparative Analysis with Existing Methods

In this study, a comparative experiment was conducted between the proposed K-GBDT method and the LSTM-AE method from recent literature. The experiment aimed to evaluate the performance of both methods in identifying abnormal electricity consumption behaviors of low-voltage users. The results showed that K-GBDT outperformed LSTM-AE in terms of classification accuracy, especially in handling class imbalance problems. K-GBDT combined the advantages of GBDT and KNN to capture both global and local features, while LSTM-AE mainly focused on sequential data analysis. This comparison validates the superiority of the K-GBDT method in this specific application scenario.

4.3. Challenges and Improvement Directions in Practical Applications

Data Dependency: The performance of the model depends on the quality and representativeness of historical data. For instance, when the electricity meter data in old residential areas are missing or noisy, the error of interpolation methods may affect the classification results.

Computational Efficiency: The local classification of KNN requires real-time calculation of the distance between samples. In large-scale data scenarios, it may face computational bottlenecks. In the future, the computational speed can be optimized by introducing the Approximate Nearest Neighbor (ANN) algorithm.

Extremely Imbalanced Categories: Although K-GBDT performs well in unbalanced data, when the proportion of electricity theft samples is extremely low, the model still needs to combine active learning or cost-sensitive learning to further improve the recognition ability of minority classes.

4.4. Insights for Intelligent Management of Power Systems

The successful application of this model indicates that the combination method has significant value in power big data analysis. For example, in the actual case of Shaanxi Electric Power Company, K-GBDT increased the electricity theft detection rate to 51.16%, far exceeding the traditional manual verification (0.07%). This result verifies the potential of intelligent tools in reducing labor costs and improving inspection efficiency, providing technical support for the optimization of power systems under the “dual carbon” goals.

5. Conclusions

To address the challenge of identifying abnormal electricity usage behaviors among low-voltage consumers, we proposed a hybrid model and designed multiple data preprocessing and optimization strategies, the following is the detailed description:

To address the high dimensionality and complexity of low-voltage electricity consumption data, this paper introduces multiple data preprocessing and optimization strategies, further enhancing the model’s robustness against noisy data and outliers, as well as improving its generalization capability.
The proposed approach takes advantage of the KNN algorithm’s insensitivity to anomalous samples. Moreover, by integrating the Synthetic Minority Over-sampling Technique (SMOTE), it effectively enhances the model’s capacity to handle class imbalance problems.
A hybrid model based on K-GBDT is proposed, which combines the advantages of GBDT and KNN to simultaneously handle global and local features, thereby improving the accuracy and robustness of abnormal electricity consumption behavior identification.

In future work, we will leverage the SMOTE and other improvement methods to further optimize the K-GBDT model, enhancing its performance on highly imbalanced datasets. Additionally, we will conduct an in-depth investigation into the synergistic mechanisms between GBDT and KNN, examining the impact of various parameters (such as tree depth and the number of nearest neighbors) on model performance. This research aims to provide stronger theoretical support and experimental evidence for the practical application of the combined model.

Author Contributions

Conceptualization, J.G. and J.X.; methodology, J.G. and J.X.; software and validation, J.G., X.N. and S.D.; resources, J.G., J.X., S.D. and X.C.; writing—original draft preparation, J.G. and J.X.; writing—review and editing, J.G., J.X. and X.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Chinese National Natural Science Foundation under Grant No. 62473311.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Jiaolong Gou was employed by the company Xi’an High Way Research Institute Co., Ltd. Authors Jiaolong Gou and Xudong Niu were employed by the Shaanxi Transportation Holding Group Co., Ltd. Author Xi Chen was employed by the Information and Communication Company, State Grid Shaanxi Electric Power Company Limited. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Nguyen, L.H.; Nguyen, V.L.; Hwang, R.H.; Kuo, J.-J.; Chen, Y.-W.; Huang, C.-C.; Pan, P.-I. Towards Secured Smart Grid 2.0: Exploring Security Threats, Protection Models, and Challenges. IEEE Commun. Surv. Tutor. 2024, 26, 1–39. [Google Scholar] [CrossRef]
Zghaibeh, M.; Belgacem, I.B.; El Barhoumi, M.; Baloch, M.H.; Chauhdary, S.T.; Kumar, L.; Arıcı, M. Optimization of green hydrogen production in hydroelectric-photovoltaic grid connected power station. Int. J. Hydrogen Energy 2024, 52, 440–453. [Google Scholar] [CrossRef]
Xia, Y.; Sun, G.; Wang, Y.; Yang, Q.; Wang, Q.; Ba, S. A novel carbon emission estimation method based on electricity-carbon nexus and non-intrusive load monitoring. Appl. Energy 2024, 360, 122773. [Google Scholar] [CrossRef]
Farooq, A.; Shahid, K.; Olsen, R.L. Prioritization of smart meters based on data monitoring for enhanced grid resilience. Comput. Commun. 2025, 234, 108082. [Google Scholar] [CrossRef]
Khalid, M. Energy 4.0: AI-enabled digital transformation for sustainable power networks. Comput. Ind. Eng. 2024, 193, 110253. [Google Scholar] [CrossRef]
Abdulrahaman, O.O.; Mohd, W.M. A rule-based model for electricity theft prevention in advanced metering infrastructure. J. Electr. Syst. Inf. Technol. 2022, 9, 2. [Google Scholar] [CrossRef]
Zhang, L.J. A Comparative Study on the Monitoring of Abnormal Power Consumption Behavior by Different Models. Int. J. High Speed Electron. Syst. 2025, 34, 2540093. [Google Scholar] [CrossRef]
Wu, R.B. Behavioral analysis of electricity consumption characteristics for customer groups using the k-means algorithm. Syst. Soft Comput. 2024, 6, 200143. [Google Scholar] [CrossRef]
Lin, R.H. Electricity Behavior Modeling and Anomaly Detection Services Based on a Deep Variational Autoencoder Network. Energies 2024, 17, 3904. [Google Scholar] [CrossRef]
Jain, S.; Choksi, K. Rule-based classification of energy theft and anomalies in consumers’ load demand profile. IET Smart Grid 2019, 2, 612–624. [Google Scholar] [CrossRef]
Ankur, S.; Amarjeet, S. Mahanti. Detecting anomalous energy consumption using contextual analysis of smart meter data. Wirel. Netw. 2021, 27, 4275–4292. [Google Scholar] [CrossRef]
Hasan, M.N.; Toma, R.N. Electricity theft detection in smart grid systems: A CNN-LSTM based approach. Energies 2019, 12, 3310. [Google Scholar] [CrossRef]
Erhan, L.; Ndubuaku, M. Smart anomaly detection in sensor systems: A multi-perspective review. Inf. Fusion 2021, 67, 64–79. [Google Scholar] [CrossRef]
Zhang, W.; Dong, X.; Li, H.; Xu, J.; Wang, D. Unsupervised Detection of Abnormal Electricity Consumption Behavior Based on Feature Engineering. IEEE Access 2020, 8, 55483–55500. [Google Scholar] [CrossRef]
Pan, H.P.; Yin, Z.Q. High-Dimensional Energy Consumption Anomaly Detection: A Deep Learning-Based Method for Detecting Anomalies. Energies 2022, 15, 6139. [Google Scholar] [CrossRef]
Xia, X. ETD-Conv LSTM: A Deep Learning Approach for Electricity Theft Detection in Smart Grids. IEEE Trans. Inf. Forensics Secur. 2023, 18, 2553–2568. [Google Scholar] [CrossRef]
Qu, Z.J.; Liu, H.X. A combined genetic optimization with AdaBoost ensemble model for anomaly detection in buildings electricity consumption. Energy Build. 2021, 248, 111193. [Google Scholar] [CrossRef]
Kumar, J.; Gupta, R. Power consumption forecast model using ensemble learning for smart grid. J. Supercomput. 2023, 79, 11007–11028. [Google Scholar] [CrossRef]
Ilias, S.; Panagiotis, R.G. A unified deep learning anomaly detection and classification approach for smart grid environments. IEEE Trans. Netw. Serv. Manag. 2021, 18, 1137–1151. [Google Scholar] [CrossRef]
Ghosal, A.; Conti, M. Key management systems for smart grid advanced metering infrastructure: A survey. IEEE Commun. Surv. Tutor. 2019, 21, 2831–2848. [Google Scholar] [CrossRef]
Tang, C.; Qin, Y.; Liu, Y.; Pi, H.; Tang, Z. An Efficient Method for Detecting Abnormal Electricity Behavior. Energies 2024, 17, 2502. [Google Scholar] [CrossRef]
Xu, B.; Wang, Y. Efficient fraud detection using deep boosting decision trees. Decis. Support Syst. 2023, 175, 114037. [Google Scholar] [CrossRef]
Maryam, D.; Said, B. Anomaly detection model based on gradient boosting and decision tree for IoT environments security. J. Reliab. Intell. Environ. 2023, 9, 421–432. [Google Scholar] [CrossRef]
Xie, J.; Xiang, X.; Xia, S.; Jiang, L.; Wang, G.; Gao, X. MGNR: A multi-granularity neighbor relationship and its application in KNN classification and clustering methods. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 42, 7956–7972. [Google Scholar] [CrossRef] [PubMed]
Xu, L.; Lin, J.; Yang, Y.; Zhao, Z.; Shi, X.; Ge, G.; Qian, J.; Shi, C.; Li, G.; Wang, S.; et al. Ultrahigh thermal stability and piezoelectricity of lead-free KNN-based texture piezoceramics. Nat. Commun. 2024, 15, 9018. [Google Scholar] [CrossRef]
Huang, Z.; Ling, Z.; Gou, F.; Wu, J. Medical assisted-segmentation system based on global feature and stepwise feature integration for feature loss problem. Biomed. Signal Process. Control 2024, 89, 105814. [Google Scholar] [CrossRef]

Figure 1. Typical Framework of a Power Data Collection System.

Figure 2. Overall Framework of the Proposed Methods.

Figure 3. ROC comparison curves of models without sample balancing.

Figure 4. ROC comparison curves of models with sample balancing.

Figure 5. Comparison of the test results of K-GBDT and LSTM-AE (with and without the SMOTE).

Figure 6. State grid Shaanxi electric power company User Electricity Anomaly Detection Tool login page.

Figure 7. Usage data from the Power Information Collection System.

Figure 8. State grid Shaanxi electric power company anti-electricity theft tool abnormality identification results.

Figure 9. Electricity theft scene diagram.

Table 1. Description of data fields.

Field Code	Field Name	Data Type	Is Nullable
CONS_NO	User Code	NUMBER (16)	Yes
DATA_DATE	Date	DATE	No
KWH_READING	Current Day Energy Reading	NUMBER (11,4)	No
KWH_READING1	Previous Day Energy Reading	NUMBER (11,4)	No
KWH	Energy Consumption	NUMBER (11,4)	No

Table 2. Performance metrics comparison of models without sample balancing.

	Accuracy	F1-Score	TPR	FPR
GBDT	0.8673	0.6175	0.5107	0.0357
KNN	0.7228	0.6281	0.5272	0.0378
K-GBDT	0.8791	0.7558	0.6529	0.0275

Table 3. Performance metrics comparison of models with single model sample balancing.

	Accuracy	F1-Score	TPR	FPR
GBDT	0.8737	0.7536	0.7545	0.0912
KNN	0.8754	0.6206	0.8426	0.3213
K-GBDT	0.8851	0.8333	0.8323	0.0871

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gou, J.; Niu, X.; Chen, X.; Dong, S.; Xin, J. Identification of Abnormal Electricity Consumption Behavior of Low-Voltage Users in New Power Systems Based on a Combined Method. Energies 2025, 18, 2528. https://doi.org/10.3390/en18102528

AMA Style

Gou J, Niu X, Chen X, Dong S, Xin J. Identification of Abnormal Electricity Consumption Behavior of Low-Voltage Users in New Power Systems Based on a Combined Method. Energies. 2025; 18(10):2528. https://doi.org/10.3390/en18102528

Chicago/Turabian Style

Gou, Jiaolong, Xudong Niu, Xi Chen, Shuxin Dong, and Jing Xin. 2025. "Identification of Abnormal Electricity Consumption Behavior of Low-Voltage Users in New Power Systems Based on a Combined Method" Energies 18, no. 10: 2528. https://doi.org/10.3390/en18102528

APA Style

Gou, J., Niu, X., Chen, X., Dong, S., & Xin, J. (2025). Identification of Abnormal Electricity Consumption Behavior of Low-Voltage Users in New Power Systems Based on a Combined Method. Energies, 18(10), 2528. https://doi.org/10.3390/en18102528

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Identification of Abnormal Electricity Consumption Behavior of Low-Voltage Users in New Power Systems Based on a Combined Method

Abstract

1. Introduction

2. Principles of the Proposed Method

2.1. Problem Definition

2.2. The Model and Related Data Processing

2.2.1. Data Acquisition

2.2.2. Data Cleaning and Quality Enhancement

2.2.3. Feature Engineering and Data Transformation

2.2.4. Construction of the Identification Model

2.2.5. Training of the Identification Model

3. Performance Analysis of the K-GBDT Model

3.1. The Evaluation Criterion and the Experimental Platform

3.2. Experimental Result Analysis

3.2.1. A Non-Sample Balancing Case

3.2.2. A Single Model Sample Balancing Case

3.2.3. Comparison with Recent Literature Methods

3.2.4. Practical Application Results

4. Discussion

4.1. Analysis of the Mechanism for Model Performance Enhancement

4.2. Comparative Analysis with Existing Methods

4.3. Challenges and Improvement Directions in Practical Applications

4.4. Insights for Intelligent Management of Power Systems

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI