Electricity Theft Detection in Smart Grids Using a Hybrid BiGRU–BiLSTM Model with Feature Engineering-Based Preprocessing
Abstract
1. Introduction
2. List of Contributions
- To tackle the imbalance data issue, theft class data are synthesized using six theft variants. Later on, the synthesized data are oversampled using a K-means synthetic minority oversampling technique (SMOTE).
- A Tomek links technique is used to eliminate cross-pairs across the decision boundary.
- To overcome the data leakage problem, a simple stratified approach is opted for.
- Cumulative and distinct features are engineered using stochastic feature engineering, which enables the model to learn data characterization and uniqueness.
- An integrated hybrid model of Bi-Directional Gated Recurrent Units (Bi-GRU) and bi-directional long-term short-term memory (Bi-LSTM) is used to tackle misclassification and high FPR issues.
- Furthermore, to verify the robustness of the proposed model, an unseen variant of the theft data with temperate randomness is analyzed to acknowledge the stability and integrity.
3. Literature Review
3.1. Considering Sequential Data
3.2. Monitoring Morphological Patterning
3.3. Tampering with Smart Meter Readings
3.4. Investigating Neighborhood Area Networks
4. Proposed System Model
- Step (1) is a data preprocessing step, where missing values are filled using a mean-based strategy and outliers are removed. Filling and removing such values is a necessary step of the data preprocessing, as noisy and ambiguous data affect accuracy and degrade the misclassification scenario. A simple imputer is implemented to fill such values.
- In step (2), the preprocessed data are augmented where benign samples are modified and manipulated due to their rare existence. The problems of skewness and bias are observed if the model is trained on such imbalanced data. Therefore, it is a necessary step to balance the data before the training of the model.
- In step (3), benign class data are manipulated and theft class data are generated.
- In step (4), decision boundaries’ associated cross-pairs are identified and eliminated. As cross-pair is a combination of the opposite class samples. Henceforth, a Tomek links technique is used. The majority class samples are removed, and minority class samples are retained in order to preserve the data integrity.
- In step (5), the data is stratified in order to inhibit the defusion of the data while splitting.
- In step (6), abstract features are engineered based on stochastic feature engineering.
- In step (7), Time-Series Data are inputted to a developed Bi-GRU [38] and Bi-LSTM [39]. A binary sigmoid function classifies the samples [40]. Bi-LSTM [41] is featured with the handling of high dimensional data, while Bi-GRU is used to avoid the computational complexity due to its fast operating features.
| Algorithm 1: Bi-GRU- and Bi-LSTM-based Detection Scheme. | 
| 
 | 
4.1. Dataset
4.2. Data Leakage
4.3. Data Preprocessing
4.4. Data Augmentation and Balancing
- In data manipulation technique 1, as shown in Figure 2a, a random number is multiplied with benign class Time-Series Data in order to manipulate fair consumption.
- The data manipulating technique 2 is shown in Figure 2b. To capture the consumption’s discontinuity, a random number is multiplied to manipulate the honest consumption’s data. Random number multiplication is a series-based discontinuity in the consumption pattern.
- The data manipulating technique 3 is shown in Figure 3a. A random multiplication of 1 and 0 with Time-Series Data shows either the original consumption or a complete zero consumption. There is no ramping function in between 1 and 0. It is a straightforward switching ON, OFF operation with a complete connected load or the cut off. The multiplication is a mode to copy the historic consumption project, and it is not confined to a continuous Time-Series Data.
- In Theft Case 4, total consumption is aggregated into a mean which is multiplied by a random number in between (0.1, 1.0), as shown in Figure 3b.
- The data manipulating technique 5 is shown in Figure 4a. The aggregated mean is multiplied with a random number. It is a two-part manipulation. The average value is a centered value of continuous Time-Series Data, where maximum consumption is under-reported. In the second part, the same aggregated value is multiplied with a random number in between (0.1–0.9), where the average value is under-reported as well in an extra exploitation.
- The data manipulating technique 6 is shown in Figure 4b. A continuous swapping of the low consumption and peak consumption hours is practiced, where a couple slabs of consumed energy are shifted from ON-Peak hours to OFF-Peak hours and vice versa. In such manipulating techniques, the consumer pays the charges for the consumed energy, however, the vigilant swapping does not affect the UPs extensively.
4.5. Bi-Directional LSTM
4.6. Feature Engineering
5. Performance Evaluation
6. Simulation Results
7. Robustness Analysis
8. Computational Complexity
9. Performance Validation
10. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
- The following abbreviations are used in this manuscript:
| AMI | Advanced Metering Infrastructure | 
| APD-HT | Anomaly Pattern Detection Hypothesis Testing | 
| Bi-GRU | Bi-directional Gated Recurrent Unit | 
| AUC | Area Under the Curve | 
| Bi-LSTM | Bi-directional Long Short-Term Memory | 
| CatBoost | Categorical Boosting | 
| CNN | Convolutional Neural Network | 
| DTKSVM | Decision Tree Combined K-Nearest Neighbor and Support Vector Machine | 
| EBT | Ensemble Bagged Tree | 
| ETD | Electricity Theft Detection | 
| DT | Decision Tree | 
| DR | Detection Rate | 
| DG | Distributed Generation | 
| XGBoost | Extreme Gradient Boosting | 
| Fits | Feed-in Tariffs | 
| FN | False Negative | 
| FP | False Positive | 
| FPR | FP Rate | 
| GBCs | Gradient Boosting Classifiers | 
| LGBoost | Light Gradient Boosting | 
| MIC | Maximum Information Coefficient | 
| ML | Machine Learning | 
| NaN | Not a Number | 
| NAN | Neighborhood Area Network | 
| NTLs | Non-Technical Losses | 
| PV | Photo Voltaic | 
| PRC | Precision Recall Curve | 
| RUSBOOST | Random Under Sampling Boosting | 
| RF | Random Forest | 
| SSEA | Semi-Supervised Auto-Encoder | 
| SGCC | State Grid Corporation of China | 
| SMs | Smart Meters | 
| SSDAE | Stacked Sparse Denoising Auto-Encoder | 
| SCADA | Supervisory Control and Data Acquisition | 
| SVM | Support Vector Machine | 
| TLs | Technical Losses | 
| TN | True Negative | 
| TP | True Positive | 
| UP | Utility Provider | 
| WFI | Weighted Feature Importance | 
| C | Sample’s Unique Class | 
| O | Observations | 
| p | Population of the Samples | 
| S | Number of Samples | 
| Time-Series Data | |
| T | Theft Case | 
| Standard Deviation | |
| Mean | 
References
- Grigsby, L.L. Electric Power Generation, Transmission, and Distribution; CRC Press: Boca Raton, FL, USA, 2007. [Google Scholar]
- Yu, X.; Cecati, C.; Dillon, T.; Simoes, M.G. The new frontier of smart grids. IEEE Ind. Electron. Mag. 2011, 5, 49–63. [Google Scholar] [CrossRef]
- Depuru, S.S.S.R.; Wang, L.; Devabhaktuni, V. Electricity theft: Overview, issues, prevention and a smart meter based approach to control theft. Energy Policy 2011, 39, 1007–1015. [Google Scholar] [CrossRef]
- Buzau, M.M.; Tejedor-Aguilera, J.; Cruz-Romero, P.; Gó mez-Expó sito, A. Hybrid deep neural networks for detection of Non-Technical Losses in electricity Smart Meters. IEEE Trans. Power Syst. 2019, 35, 1254–1263. [Google Scholar] [CrossRef]
- World Bank. World Development Report 2004: Making Services Work for Poor People; The World Bank: Washington, DC, USA, 2003. [Google Scholar]
- Gaur, V.; Gupta, E. The determinants of electricity theft: An empirical analysis of Indian states. Energy Policy 2016, 93, 127–136. [Google Scholar] [CrossRef]
- Agüero, J.R. Improving the efficiency of power distribution systems through technical and Non-Technical Losses reduction. In Proceedings of the PES T&D 2012, Orlando, FL, USA, 7–10 May 2012; pp. 1–8. [Google Scholar]
- Viegas, J.L.; Esteves, P.R.; Melicio, R.; Mendes, V.M.F.; Vieira, S.M. Solutions for detection of Non-Technical Losses in the electricity grid: A review. Renew. Sustain. Energy Rev. 2017, 80, 1256–1268. [Google Scholar] [CrossRef]
- Munawar, S.; Asif, M.; Kabir, B.; Ullah, A.; Javaid, N. Electricity Theft Detection in Smart Meters Using a Hybrid Bi-directional GRU Bi-directional LSTM Model. In Conference on Complex, Intelligent, and Software Intensive Systems; Springer: Cham, Switzerland, 2021; pp. 297–308. [Google Scholar]
- Salinas, S.; Li, M.; Li, P. Privacy-preserving energy theft detection in smart grids. In Proceedings of the 2012 9th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks (SECON), Seoul, Korea, 18–21 June 2012; pp. 605–613. [Google Scholar]
- Saeed, M.S.; Mustafa, M.W.; Sheikh, U.U.; Jumani, T.A.; Mirjat, N.H. Ensemble bagged tree based classification for reducing Non-Technical Losses in multan electric power company of Pakistan. Electronics 2019, 8, 860. [Google Scholar] [CrossRef]
- Punmiya, R.; Choe, S. Energy theft detection using gradient boosting theft detector with feature engineering-based preprocessing. IEEE Trans. Smart Grid 2019, 10, 2326–2329. [Google Scholar] [CrossRef]
- Avila, N.F.; Figueroa, G.; Chu, C.C. NTL detection in electric distribution systems using the maximal overlap discrete wavelet-packet transform and random undersampling boosting. IEEE Trans. Power Syst. 2018, 33, 7171–7180. [Google Scholar] [CrossRef]
- Adil, M.; Javaid, N.; Qasim, U.; Ullah, I.; Shafiq, M.; Choi, J.G. LSTM and bat-based RUSBoost approach for Electricity Theft Detection. Appl. Sci. 2020, 10, 4378. [Google Scholar] [CrossRef]
- Li, S.; Han, Y.; Yao, X.; Yingchen, S.; Wang, J.; Zhao, Q. Electricity Theft Detection in power grids with deep learning and Random Forests. J. Electr. Comput. Eng. 2019, 2019, 4136874. [Google Scholar] [CrossRef]
- Liu, Y.; Liu, T.; Sun, H.; Zhang, K.; Liu, P. Hidden electricity theft by exploiting multiple-pricing scheme in smart grids. IEEE Trans. Inf. Forensics Secur. 2020, 15, 2453–2468. [Google Scholar] [CrossRef]
- Kong, X.; Zhao, X.; Liu, C.; Li, Q.; Dong, D.; Li, Y. Electricity Theft Detection in low-voltage stations based on similarity measure and DT-KSVM. Int. J. Electr. Power Energy Syst. 2021, 125, 106544. [Google Scholar] [CrossRef]
- Yan, Z.; Wen, H. Electricity Theft Detection base on extreme gradient boosting in AMI. IEEE Trans. Instrum. Meas. 2021, 70, 1–9. [Google Scholar] [CrossRef]
- Gunturi, S.K.; Sarkar, D. Ensemble Machine Learning models for the detection of energy theft. Electr. Power Syst. Res. 2021, 192, 106904. [Google Scholar] [CrossRef]
- Lepolesa, L.J.; Achari, S.; Cheng, L. Electricity Theft Detection in Smart Grids Based on Deep Neural Network. IEEE Access 2022, 10, 39638–39655. [Google Scholar] [CrossRef]
- Yao, R.; Wang, N.; Ke, W.; Chen, P.; Sheng, X. Electricity Theft Detection in unbalanced sample distribution: A novel approach including a mechanism of sample augmentation. Appl. Intell. 2022, 1–20. [Google Scholar] [CrossRef]
- Liao, W.; Yang, Z.; Liu, K.; Zhang, B.; Chen, X.; Song, R. Electricity Theft Detection Using Euclidean and Graph Convolutional Neural Networks. IEEE Trans. Power Syst. 2022, 1–13. [Google Scholar] [CrossRef]
- Gu, D.; Gao, Y.; Chen, K.; Junhao, S.; Li, Y.; Cao, Y. Electricity Theft Detection in AMI with Low False Positive Rate Based on Deep Learning and Evolutionary Algorithm. IEEE Trans. Power Syst. 2022, 37, 4568–4578. [Google Scholar] [CrossRef]
- Zheng, K.; Chen, Q.; Wang, Y.; Kang, C.; Xia, Q. A novel combined data-driven approach for Electricity Theft Detection. IEEE Trans. Ind. Inform. 2018, 15, 1809–1819. [Google Scholar] [CrossRef]
- Aslam, Z.; Javaid, N.; Ahmad, A.; Ahmed, A.; Gulfam, S.M. A Combined Deep Learning and Ensemble Learning Methodology to Avoid Electricity Theft in Smart Grids. Energies 2020, 13, 5599. [Google Scholar] [CrossRef]
- Huang, Y.; Xu, Q. Electricity Theft Detection based on stacked sparse denoising autoencoder. Int. J. Electr. Power Energy Syst. 2021, 125, 106448. [Google Scholar] [CrossRef]
- Fenza, G.; Gallo, M.; Loia, V. Drift-aware methodology for anomaly detection in smart grid. IEEE Access 2019, 7, 9645–9657. [Google Scholar] [CrossRef]
- Yip, S.C.; Wong, K.; Hew, W.P.; Gan, M.T.; Phan, R.C.W.; Tan, S.W. Detection of energy theft and defective Smart Meters in smart grids using linear regression. Int. J. Electr. Power Energy Syst. 2017, 91, 230–240. [Google Scholar] [CrossRef]
- Park, C.H.; Kim, T. Energy Theft Detection in Advanced Metering Infrastructure Based on Anomaly Pattern Detection. Energies 2020, 13, 3832. [Google Scholar] [CrossRef]
- Hu, J.; Li, S.; Hu, J.; Yang, G. A Hierarchical Feature Extraction Model for Multi-Label Mechanical Patent Classification. Sustainability 2018, 10, 219. [Google Scholar] [CrossRef]
- Hasan, M.; Toma, R.N.; Nahid, A.A.; Islam, M.M.; Kim, J.M. Electricity Theft Detection in smart grid systems: A CNN-LSTM based approach. Energies 2019, 12, 3310. [Google Scholar] [CrossRef]
- Khalid, R.; Javaid, N.; Al-Zahrani, F.A.; Aurangzeb, K.; Qazi, E.U.H.; Ashfaq, T. Electricity load and price forecasting using Jaya-Long Short Term Memory (JLSTM) in smart grids. Entropy 2020, 22, 10. [Google Scholar] [CrossRef]
- Rostampour, V.; Keviczky, T. Probabilistic energy management for building climate comfort in smart thermal grids with seasonal storage systems. IEEE Trans. Smart Grid 2018, 10, 3687–3697. [Google Scholar] [CrossRef]
- Jokar, P.; Arianpoo, N.; Leung, V.C. Electricity Theft Detection in AMI using customers’ consumption patterns. IEEE Trans. Smart Grid 2015, 7, 216–226. [Google Scholar] [CrossRef]
- Buzau, M.M.; Tejedor-Aguilera, J.; Cruz-Romero, P.; Gómez-Expósito, A. Detection of Non-Technical Losses using smart meter data and supervised learning. IEEE Trans. Smart Grid 2018, 10, 2661–2670. [Google Scholar] [CrossRef]
- Biswas, P.P.; Cai, H.; Zhou, B.; Chen, B.; Mashima, D.; Zheng, V.W. Electricity theft pinpointing through correlation analysis of master and individual meter readings. IEEE Trans. Smart Grid 2019, 11, 3031–3042. [Google Scholar] [CrossRef]
- Ismail, M.; Shaaban, M.F.; Naidu, M.; Serpedin, E. Deep learning detection of electricity theft cyber-attacks in renewable Distributed Generation. IEEE Trans. Smart Grid 2020, 11, 3428–3437. [Google Scholar] [CrossRef]
- Zhu, Q.; Zhang, F.; Liu, S.; Wu, Y.; Wang, L. A hybrid VMD–BiGRU model for rubber futures time series forecasting. Appl. Soft Comput. 2019, 84, 105739. [Google Scholar] [CrossRef]
- Bhagat, R.C.; Patil, S.S. Enhanced SMOTE algorithm for classification of imbalanced big-data using Random Forest. In Proceedings of the 2015 IEEE International Advance Computing Conference (IACC), Bangalore, India, 12–13 June 2015; pp. 403–408. [Google Scholar]
- Sun, J.; Shi, W.; Yang, Z.; Yang, J.; Gui, G. Behavioral modeling and linearization of wideband RF power amplifiers using BiLSTM networks for 5G wireless systems. IEEE Trans. Veh. Technol. 2019, 68, 10348–10356. [Google Scholar] [CrossRef]
- Hussain, S.; Mustafa, M.W.; Jumani, T.A.; Baloch, S.K.; Alotaibi, H.; Khan, I.; Khan, A. A novel feature engineered-CatBoost-based supervised Machine Learning framework for Electricity Theft Detection. Energy Rep. 2021, 7, 4425–4436. [Google Scholar] [CrossRef]
- Ullah, A.; Munawar, S.; Asif, M.; Kabir, B.; Javaid, N. Synthetic theft attacks implementation for data balancing and a gated recurrent unit based Electricity Theft Detection in smart grids. In Conference on Complex, Intelligent, and Software Intensive Systems; Springer: Cham, Switzerland, 2021; pp. 395–405. [Google Scholar]
- Asif, M.; Kabir, B.; Ullah, A.; Munawar, S.; Javaid, N. Towards Energy Efficient Smart Grids: Data Augmentation Through BiWGAN, Feature Extraction and Classification Using Hybrid 2DCNN and BiLSTM. In International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing; Springer: Cham, Switzerland, 2021; pp. 108–119. [Google Scholar]
- Asif, M.; Ullah, A.; Munawar, S.; Kabir, B.; Khan, A.; Javaid, N. Alexnet-AdaBoost-ABC based hybrid neural network for Electricity Theft Detection in smart grids. In Conference on Complex, Intelligent, and Software Intensive Systems; Springer: Cham, Switzerland, 2021; pp. 249–258. [Google Scholar]
- Kabir, B.; Ullah, A.; Munawar, S.; Asif, M.; Javaid, N. Detection of Non-Technical Losses Using MLP-GRU Based Neural Network to Secure Smart Grids. In Conference on Complex, Intelligent, and Software Intensive Systems; Springer: Cham, Switzerland, 2021; pp. 383–394. [Google Scholar]
- Dash, S.K.; Roccotelli, M.; Khansama, R.R.; Fanti, M.P.; Mangini, A.M. Long Term Household Electricity Demand Forecasting Based on RNN-GBRT Model and a Novel Energy Theft Detection Method. Appl. Sci. 2021, 11, 8612. [Google Scholar] [CrossRef]
- Khan, Z.A.; Adil, M.; Javaid, N.; Saqib, M.N.; Shafiq, M.; Choi, J.G. Electricity Theft Detection using supervised learning techniques on smart meter data. Sustainability 2020, 12, 8023. [Google Scholar] [CrossRef]
- Javaid, N.; Javaid, S.; Asif, M.; Javed, M.U.; Yahaya, A.S.; Aslam, S. Synthetic Theft Attacks and Long Short Term Memory-Based Preprocessing for Electricity Theft Detection Using Gated Recurrent Unit. Energies 2022, 15, 2778. [Google Scholar]
- Gul, H.; Javaid, N.; Ullah, I.; Qamar, A.M.; Afzal, M.K.; Joshi, G.P. Detection of Non-Technical Losses using SOSTLink and bidirectional gated recurrent unit to secure Smart Meters. Appl. Sci. 2020, 10, 3151. [Google Scholar] [CrossRef]







| Limitation Number | Limitation Identified | Solution Number | Solution Proposed | Validations | 
|---|---|---|---|---|
| L1 | Data imbalance issue | S1 | A K-means SMOTE technique is used to solve the data imbalance issue | V1: Performance comparison of the models | 
| L2 | Misclassification due to cross-pairs | S2 | A Tomek links technique is used to identify the cross-pairs and remove them accordingly | V2: Table 3 Removal of cross-pairs | 
| L3 | Data leakage during training | S3 | A simple stratified methodology is used to divide the data based on key attributes into subgroups for training of the model | V3: Equations (1)–(7) | 
| L4 | High FPR | S4 | A hybrid model of Bi-GRU and Bi-LSTM is used to classify samples precisely and reduce high FPR | V4: Figure 6a,b AUC and PRC curve | 
| L5 | Lack of abstract features | S5 | A stochastic feature engineering approach is opted to generate abstract features | V5: Table 5 | 
| Description | Value | 
|---|---|
| Administering years of the dataset | 2014–2016 | 
| Total number of benign consumers | 38,756 | 
| Total number of fraudulent consumers | 3616 | 
| Total Samples (Before) | Removal of Cross-Pairs | Remaining Samples | 
|---|---|---|
| 10,500 | 105 | 10,395 | 
| Models | F1 Score | Precision | Recall | Accuracy | 
|---|---|---|---|---|
| Proposed | 80.7 | 80.6% | 80.9% | 88.7% | 
| Existing [33] | 76.3 | 84.3% | 74.7% | 83.1% | 
| SVM | 75.0 | 62.5% | 84.3% | 72.5% | 
| DT | 75.7 | 62.3% | 79.5% | 76.3% | 
| RF | 78.2 | 64.2% | 77.6 % | 73.6% | 
| Models | Without Feature Engineering | With Stochastic Features | 
|---|---|---|
| Proposed Model | 88.7% | 95% | 
| Models | Accuracy | AUC Score | F1 Score | 
|---|---|---|---|
| Proposed Model | 88.3% | 57.6 | 54.9 | 
| Existing Model | 86.9% | 54.9 | 53.6.7 | 
| Input Batch Size | Execution Time Proposed Model (s) | Execution Time Existing Model (s) | 
|---|---|---|
| 50 | 218 | 62 | 
| 100 | 165 | 88 | 
| 150 | 159 | 48 | 
| 200 | 159 | 87 | 
| 250 | 166 | 87 | 
| 300 | 152 | 88 | 
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. | 
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Munawar, S.; Javaid, N.; Khan, Z.A.; Chaudhary, N.I.; Raja, M.A.Z.; Milyani, A.H.; Ahmed Azhari, A. Electricity Theft Detection in Smart Grids Using a Hybrid BiGRU–BiLSTM Model with Feature Engineering-Based Preprocessing. Sensors 2022, 22, 7818. https://doi.org/10.3390/s22207818
Munawar S, Javaid N, Khan ZA, Chaudhary NI, Raja MAZ, Milyani AH, Ahmed Azhari A. Electricity Theft Detection in Smart Grids Using a Hybrid BiGRU–BiLSTM Model with Feature Engineering-Based Preprocessing. Sensors. 2022; 22(20):7818. https://doi.org/10.3390/s22207818
Chicago/Turabian StyleMunawar, Shoaib, Nadeem Javaid, Zeshan Aslam Khan, Naveed Ishtiaq Chaudhary, Muhammad Asif Zahoor Raja, Ahmad H. Milyani, and Abdullah Ahmed Azhari. 2022. "Electricity Theft Detection in Smart Grids Using a Hybrid BiGRU–BiLSTM Model with Feature Engineering-Based Preprocessing" Sensors 22, no. 20: 7818. https://doi.org/10.3390/s22207818
APA StyleMunawar, S., Javaid, N., Khan, Z. A., Chaudhary, N. I., Raja, M. A. Z., Milyani, A. H., & Ahmed Azhari, A. (2022). Electricity Theft Detection in Smart Grids Using a Hybrid BiGRU–BiLSTM Model with Feature Engineering-Based Preprocessing. Sensors, 22(20), 7818. https://doi.org/10.3390/s22207818
 
        


 
       