Detection of Non-Technical Losses in Power Utilities—A Comprehensive Systematic Review

: Electricity theft and fraud in energy consumption are two of the major issues for power distribution companies (PDCs) for many years. PDCs around the world are trying di ﬀ erent methodologies for detecting electricity theft. The traditional methods for non-technical losses (NTLs) detection such as onsite inspection and reward and penalty policy have lost their place in the modern era because of their ine ﬀ ective and time-consuming mechanism. With the advancement in the ﬁeld of Artiﬁcial Intelligence (AI), newer and e ﬃ cient NTL detection methods have been proposed by di ﬀ erent researchers working in the ﬁeld of data mining and AI. The AI-based NTL detection methods are superior to the conventional methods in terms of accuracy, e ﬃ ciency, time-consumption, precision, and labor required. The importance of such AI-based NTL detection methods can be judged by looking at the growing trend toward the increasing number of research articles on this important development. However, the authors felt the lack of a comprehensive study that can provide a one-stop source of information on these AI-based NTL methods and hence became the motivation for carrying out this comprehensive review on this signiﬁcant ﬁeld of science. This article systematically reviews and classiﬁes the methods explored for NTL detection in recent literature, along with their beneﬁts and limitations. For accomplishing the mentioned objective, the opted research articles for the review are classiﬁed based on algorithms used, features extracted, and metrics used for evaluation. Furthermore, a summary of di ﬀ erent types of algorithms used for NTL detection is provided along with their applications in the studied ﬁeld of research. Lastly, a comparison among the major NTL categories, i.e., data-based, network-based, and hybrid methods, is provided on the basis of their performance, expenses, and response time. It is expected that this comprehensive study will provide a one-stop source of information for all the new researchers and the experts working in the mentioned area of research. machine learning algorithms for NTL detection.


Introduction
Losses of electrical energy in the power grids at the transmission and distribution level include both technical losses (TL) and non-technical losses (NTLs) [1]. The computation of TL is generally needed for the correct estimation of NTL [2]. TLs are unavoidable as these occur in the equipment during the transmission and distribution (T&D) process, whereas NTLs are labeled as administrative losses that occur because of non-billed electricity, malfunction of the equipment, error in billings, low-quality infrastructure, and illegal usage of electricity [3]. The fraudulent behavior of energy customers is usually associated with electricity theft, regularized corruption, and organized crime [4]. Therefore, such sort of losses cannot be precisely estimated. Generally, the expenses linked with these NTL activities are compensated by legitimate customers. The effect of the NTLs is worse in under-developed or developing countries; however, it can affect the developed economies too [5]. The researchers and experts in power industries and academia have been trying different methods to address the mentioned problem effectively. The traditional methods utilize the statistical analysis of data to understand the significant indicators of fraudulent behavior, allowing the development of effective policies to address the issue. The installation of smart meters has appeared to be one of the meaningful and latest solutions to address the NTL detection issue [6]. However, their deployment, operational cost, and design involve massive amounts, which are not practical solutions for weak economies [7]. Other methodologies include the utilization of machine learning algorithms for analyzing the data from meters and the evaluation of the consumption patterns that may imply fraudulent activities. Installations of specific equipment, sensors, and grid structures have also been suggested for efficient detection of NTLs [8]. There is a great number of research articles available in the literature that have used several methods to address the NTL detection issue. Hence a systematic compilation of all the related articles is the need of the hour which consequently led the motivation behind carrying out current systematic review study on this important topic.
Even though the author in [9] provided a useful review in the mentioned field, however, it only includes the paper until the year 2013. Furthermore, the authors have only discussed the technical aspects of the fraud detection system (FDS) while completely ignoring the features commonly used and the metrics which are used to evaluate the performance of the classifiers. The authors in [10] provided a detailed review of the NTL detection articles published in a few important conferences and journals, but the detailed description of the methodologies is lacking. Similarly, the author in [11] provides a comprehensive review in similar area of research. However, it covers articles up to 2017 and lacks in critical review of the literature. Furthermore, contrary to current work, the authors in the mentioned work did not adopt a systematic review approach. Therefore, the authors of the current research work felt the necessity of a comprehensive review in the mentioned field of study as there is a vast amount of research work with a variety of novel techniques left unattended which need to be summarized in order to provide a single comprehensive source of information in modern NTL detection methods. This article offers a comprehensive review of the literature on the subject of NTLs detection, offering a helping hand to the utilities and researchers on the latest development in NTL detection methods. The clear objectives of the paper are as follows: i.
Review for evaluating the available NTL detection solutions. ii. Review of the various features used for NTL detection. iii. Review of various metrics required for the evaluation of machine learning classifiers. iv. The detailed comparison of the current approaches with their advantages and limitations.
It is worthwhile to mention here that, for achieving the objectives of the current review, an analysis of 85 papers has been carried out. The selected articles are thoroughly reviewed and analyzed in the current systematic literature review by using the guidelines of a general systematic literature review.
The article is set out as follows: In Section 2 the approach described in the article is discussed. Section 3 sets out a description of NTL forms and sources. Section 4 provides a description of the outcomes of NTL detection approaches. The meanings are set out in Section 5.

Review Methodology
In this section, the methodology adopted for screening and selecting the articles for the current review is described in detail. The section is further divided into the following subsections based on the articles' selection process.

Search Terms
The articles that were published after the year 2000 are adopted, while the rest are discarded. A wide variety of keywords such as electricity theft, non-technical losses, fraud in energy consumption, and supervised machine learning-based electrical theft detection methods.

Inclusion and Exclusion Criteria
Criteria for exclusion and inclusion are used to select the research articles from the pool of studies. The inclusion criteria in this review consist of: i.
The study should identify, estimate, or predict any form of NTLs in the electric grid. ii. The study must present the analysis and determinants influencing the NTL. iii. The study should suggest a novel strategy for detecting NTLs.
The exclusion criteria are stated below: i.
Studies must not be published before the year 2000. ii. Duplicate articles with the same methodologies are discarded. iii. It should not be a feasibility study of a specific area.
After performing the inclusion and exclusion criteria, only 85 most relevant articles were selected for the review.

Prisma Flowchart
The search process with the mentioned keywords was performed initially and 273 articles were selected from the databases of the following four repositories; Science Direct, IEEE explore, Google Scholar, and ACM Digital library. After eliminating the duplicates, all abstracts and titles were screened to choose the most appropriate research articles depending on the inclusion and exclusion criteria. The finalized lists of the articles selected from the journals and the conferences are depicted in tabular form in Tables 1 and 2, respectively and the flow chart of the article selection procedure is shown in Figure 1.     [52,53] IET Generation, Transmission and Distribution 2 [54,55] Renewable and sustainable energy 2 [7,56] IEEE Journal on selected areas in communication 2 [8,57] Computers and Electrical Engineering 2 [58,59] Energy Research and social science 1 [3] International Journal of Artificial Intelligence and Applications 1 [2] Tsinghua science and technology 1 [9] IEEE Transactions on information forensics and security 1 [60] Energy 1 [61] Computers and Security 1 [62] Electronics 1 [63] ACM Transactions on Information and System Security 1 [64] Measurement: Journal of the International Measurement Confederation 1 [65] Knowledge-Based Systems 1 [66] Utilities Policy 1 [67] Expert Systems with Applications 1 [68] Machine Learning and Data Mining in Pattern Recognition 1 [69]  IEEE/ACM International Conference on big data computing [70] IEEE PES Transmission and Distribution Conference and Exposition [71] International Conference on Information Technology and Multimedia [72] IEEE International Conference on Data Science and Advanced Analytics [73] International Conference on Critical Information Infrastructures Security (CRITIS) [74] International Conference on Power Systems (ICPS) [75] North American Power Symposium, NAPS 2015 [76] IEEE Symposium on Computational Intelligence and Applications in Smart Grid. [77] International Conference on Advances in Science, Engineering and Robotics Technology 2019 [78] IEEE Power and Energy Society Innovative Smart Grid Technologies Conference. [79] 2017 IEEE Power and Energy Conference at Illinois [80] Energies 2020, 13, 4727 5 of 25

Features Used in NTL Detection
One of the most significant approaches available for NTL detection through machine learning algorithms is the feature-based NTL detection approach [61]. In such methods, generally, time-series energy usage data are utilized to extract a few of the most relevant features of each consumer and based on the analysis of the outcomes, the consumers are categorized as honest or fraudulent [11]. Different research studies use different types and numbers of features; therefore, it is a challenging task to explain all the features used for NTL detection precisely. The time resolution of these newly synthesized features relies heavily on the time resolution of its initial raw data features. Two of the major advantages of the feature-based classification are enhanced classification accuracy and reduction in data size as only the most relevant feature is selected by using a feature selection method that provides an accurate picture of NTL activity [61]. However, the frequency for which new features are computed must be chosen carefully, i.e., hourly consumption features for one month may provide a different outcome than that of the monthly average consumption of 1 year.
Another important aspect of the feature-based NTL detection methods is the feature selection process. As mentioned earlier, the optimal and most relevant features are selected by utilizing the feature selection algorithm [62]. The feature selection algorithms have utilized several times in literature for enhancing the functioning of different algorithms used for the NTL detection, such as in [12,23,30,58,59,81]. The authors in [23] have tested all the possible fraud indications by utilizing a comparatively smaller number of features. Ramos et al. [12] utilized several types of meta-heuristic algorithms, such as binary black hole algorithm, Harmony Search (HS), Particle Swarm Optimization (PSO), Differential Evolution (DE), and Genetic Algorithm (GA) for selecting an optimal set of features. The authors in [58] evaluated the HS algorithm for the optimal feature selection process and have compared it to PSO on identical conditions. The same authors extended their work in [30] and used the Binary Gravitational Search Algorithm (BGSA) for the same purpose and evaluated its performance against the HS and PSO. In another article, Pereira et al. [59] used Social Spider Optimization (SPO) for tuning the parameters of the classifier and feature selection process. Martino et al. [82] proposed the filter and wrapper-based method for selecting the best features. Table 3 provides the detailed list of most commonly used features in the literature for NTL detection process. Table 3. List of most commonly used features in the literature for non-technical losses (NTL) detection.

Features Used Description Reference
Max/Min, Standard Deviation, Average, Monthly kWh consumption Energy consumption feature computed for a given period. [5,31,43,83] Streaks The number of times the energy consumption curve rises and falls to the mean axis. [24,33] Load factor It is defined as the ratio of average load to the peak load over a given period. [12,30] Wavelet coefficients The gap between the Wavelet coefficients determined from the to be listed consumption curve and the Wavelet coefficients from previous year consumption curves. [32,82] Estimated readings The number of estimated readings that are charged by the PDC because of their inability to obtain the actual reading. [73] Reduction in energy consumption The decrease in energy usage during a specified period compared to the previous reading of the same duration. [73] Seasonal consumption difference Total energy usage by the customer in the particular season compared to the energy consumption of another season [34,43] Euclidean distance to mean customer A Euclidean distance from the consumption curve to the active energy consumption curve that is measured in the data set as the mean consumption of all customers. [4,63] Predicted kWh The difference between the observed active energy consumption value to the anticipated value. [40,74] PCA components A set of factors that are measured by the Kernel Principal Component Analysis (KPCA) or Principal Component Analysis (PCA) on the active energy consumption curves. [58,84] Fractional order dynamic errors Features that convey the difference between the use of a profiled meter calculation and a time series consumption for real-time. [13,14,54] Discrete Cosine Transform coefficients The first k coefficients of the discrete transformation of cosines. [76] Pearson coefficient It is defined as the active energy usage curve over a given duration. The Pearson coefficient calculates how well the relationship exists between the time and actual energy consumption, as calculated in a linear equation. [33] Power factor It is the ratio of real power to the apparent power. Its value is between 0 and 1. [12,30,59] Skewness Computes the asymmetry or distortion in a normal distribution of a dataset. [43] Kurtosis Measures the total number of outliers present in the distribution. [43] Energies 2020, 13, 4727 6 of 25

Performance Metrics Used for NTL Detection
Generally, the performance of a classifier is assessed by evaluating different performance indicators. One of such indicators is the confusion matrix. It presents the details for the accurate classification as "True" and of the wrong classification as "False." The True Positive (TP) in the confusion matrix signifies the fraudster customers who are rightly identified as the fraudsters, whereas the False Positive (FP) portrays the honest customers that are incorrectly classified as fraudsters. Likewise, the True Negative (TN) reflects legitimate customers accurately identified as legitimate and False Negative (FN) signifies the fraudulent customers incorrectly classified as honest. The other well-known metrics that are evaluated through the confusion matrix in the NTL classification problems are Accuracy (Acc), precision, Detection Rate (DR), True Negative Rate (TNR), False-Negative Rate (FNR), False Positive Rate (FPR), and F1 score.
The most commonly used matrix in the literature for NTL detection and in almost all the data-based approaches are Acc and DR. A better DR and good Acc reflect that the model operates well and has an excellent classifying ability for the samples belonging to both classes. However, there is a need for other performance metrics in the scenarios where the dataset is imbalanced, i.e., when the number of samples of negative class (honest) is significantly higher than the number of samples of positive type (fraudulent). Detection Rate (DR), Recall, or True Positive Rate are the metrics that are popularly used in such situations. These metrics describe the percentage of NTL samples classified accurately to the total amount of NTL in the dataset. High DR values usually imply a well-operating NTL detection model; however, to verify this, other metrics should also be considered. Therefore, both accuracy and DR must be considered when determining the efficiency of the model. FPR and precision are the other two most utilized metrics for NTL detection. Precision or Positive Predictive Value (PPV) of the classifier can be calculated by dividing the total number of fraudster customers correctly identified to the overall customers classified as fraudsters. On the other hand, the high precision of the classifier indicates that the majority of the customers that are classified as fraudsters are truly fraudsters. It is worthwhile to mention here that recall and precision are antagonistic metrics. The improvement in the performance of the one metric will result in the reduction of the performance of the other. Therefore, it is essential to achieve the perfect balance between both metrics. This balance between the two metrics is evaluated by computing the F1 score. A high value of the F1 score indicated that the model identifies maximum fraud cases with a very low rate of false positives. Nevertheless, his metric is occasionally used in literature for NTL detection, yet it is still one of the most relevant and vital metrics while working with imbalanced data [73,85]. In fact, numerous research works have been conducted on this issue as it offers a perfect solution for the correct assessment of the classifier [37]. Another important performance evaluation metric is FPR. It is expressed as the number of samples that are wrongly marked as positive (fraudster) to the overall negative (honest) samples. If the value for FPR is high for any classifier, it leads to the excessive meter inspections that result in a large operational cost burden on PDCs. Therefore, the PDCs always want to achieve a low value of FPR because the threshold level relies on the overall size of the two classified categories, i.e., theft and healthy.
It is important to note that making decisions on the basis of any single metric may provide misleading information. For example, for a collection of 1000 customers with ten fraudulent and 990 honest consumers, the FPR is calculated as 10 percent, which implies that 99 customers are fraudulent. This assumption will lead to 99 needless inspections of meters to identify 10 cases of fraud. The FPR of 1% indicates that ten false reports would be generated in detecting ten genuine fraud cases. Therefore, the selection of the performance metrics is very tricky while addressing the class imbalance problems as in NTL detection. In the scenarios of imbalanced datasets, the combination of different metrics such as precision, accuracy, FPR, TNR, and DR should be utilized. The Bayesian Detection Rate (BDR) is another crucial metric, but that is not commonly used in literature for NTL detection [64]. This parameter usually obtains small values in the intrusion and fraud detection domains because the fraud in these scenarios is not that frequent. Table 4 provides the list of most commonly used performance metrics for the evaluation of NTL detection models. Energy balance mismatch The difference in the amount of active energy at the consumer level and substation level. [87,88] Cost of an undetected attack Defined as the cost of the worst possible undetected attack. [89] Inspection costs The expenses required for inspecting all customers classified as a fraudster. [17,25] Average bill increase Increase in the electricity bill if the NTLs is shared in all customers. [88] Anomaly coverage index The ratio of fraudster customers identified by RTUs to the actual number of fraudster customers. [53,90] Minimum detected deviation Described as the smallest deviation identified from the typical profile. [18] RTU cost The total expenditure required for installing RTUs. [19,26]

Categorization of NTL Detection Methods
Researchers are trying different methodologies for efficiently identifying fraudster customers. The existing methods for NTL detection can be broadly categorized into hardware-based and non-hardware-based methods. The hardware-based solutions mainly focus on installing meters with specific equipment to enable PDCs in identifying any malicious activity by consumers [56,60]. The recent advances in communication and data processing of the energy consumers' behavior have led to the development of non-hardware-based NTL detection methods. These non-hardware-based solutions are classified into three major categories, i.e., data-based methods, network-based methods, and hybrid methods. Furthermore, the relationship between demographic and socio-economic issues that helps PDCs to investigate the phenomenon of NTLs is categorized into different theoretical methods. A general classification of NTL detection schemes is given in Figure 2. Inspection costs The expenses required for inspecting all customers classified as a fraudster. [17,25] Average bill increase Increase in the electricity bill if the NTLs is shared in all customers. [88] Anomaly coverage index The ratio of fraudster customers identified by RTUs to the actual number of fraudster customers. [53,90] Minimum detected deviation Described as the smallest deviation identified from the typical profile. [18] RTU cost The total expenditure required for installing RTUs. [19,26]

Categorization of NTL Detection Methods
Researchers are trying different methodologies for efficiently identifying fraudster customers. The existing methods for NTL detection can be broadly categorized into hardware-based and nonhardware-based methods. The hardware-based solutions mainly focus on installing meters with specific equipment to enable PDCs in identifying any malicious activity by consumers [56,60]. The recent advances in communication and data processing of the energy consumers' behavior have led to the development of non-hardware-based NTL detection methods. These non-hardware-based solutions are classified into three major categories, i.e., data-based methods, network-based methods, and hybrid methods. Furthermore, the relationship between demographic and socio-economic issues that helps PDCs to investigate the phenomenon of NTLs is categorized into different theoretical methods. A general classification of NTL detection schemes is given in Figure 2. The detailed aspect of every category along with the methodology adopted, benefits, and drawbacks are given in the subsequent sections.

Studies Based on Social and Economic Conditions
The literature discloses the presence of NTL over the specific population or geographical area and social aspects that relate to the fraud. The authors in [91] and [47] used the statistical techniques to develop the relation between market variables, economic, and social demographics to the amount of theft. The author in [92] proposed a scheme to understand the main parameters linked to NTL in India and regions of Tanzania using ethnographic fieldwork, surveys, and empirical analysis. The author in [67] examines different socio-economic characteristics of fraudulent customers using The detailed aspect of every category along with the methodology adopted, benefits, and drawbacks are given in the subsequent sections.

Studies Based on Social and Economic Conditions
The literature discloses the presence of NTL over the specific population or geographical area and social aspects that relate to the fraud. The authors in [47,91] used the statistical techniques to develop the relation between market variables, economic, and social demographics to the amount of theft. The author in [92] proposed a scheme to understand the main parameters linked to NTL in India and regions of Tanzania using ethnographic fieldwork, surveys, and empirical analysis. The author in [67] examines different socio-economic characteristics of fraudulent customers using econometric analysis. The primary advantages of these studies are that they are beneficial in designing policies and have a great impact on decisions to decrease NTL. The complexity and the quantity of the data available in these types of studies can be managed easily even if it consists of the indicators and variables that cover the entire region. The major drawback of these techniques is that their scope is limited, i.e., it usually focuses on the specific country or region. These studies are not enough to identify instances of fraud and errors in billing or metering. The generalized additive model (GAM) has been used to demonstrate the spatial allocation of NTL activities in [32]. This model has been inspired by the subject domain of epidemiology. The research study assumes that electricity theft and fraudulent activities increase in specific areas rendering to specific technical and social characteristics. GAM can be utilized to compute the chances of fraudulent activities in any area and the effect of every technical and social feature if provided with the consumers consumption data. A Markov chain model was utilized later to demonstrate how NTL can sweep in a particular region. This algorithm calculates the spatial distribution and the likelihood of fraud but did not detected the NTL. These models are beneficial when planning and making long-term guidelines for lessening the fraud.

Hardware-Based Solutions
The hardware-based solution mainly proposes a method in which the researchers majorly emphasize on the characterization and design of the apparatus that allows the identification and estimation of any fraudulent activity [60]. The author in [93] proposes an anti-tampering algorithm that offers help to PDCs in identifying tampering of energy meters. This anti-tempering algorithm will help the PDCs to address the issue of bypassing the neutral line and mainline, reversing problems and opening of the frame cover or terminal. The author in [94] proposed a processor for the protection of energy meters from bypassing phase line, disconnection of neutral line, and tempering of energy meters. A message will be sent to the PDCs if any of the mentioned discrepancies are observed. The authors in [95] addressed the NTL detection problem by proposing a radio frequency identification (RFID) technology for the sealing of energy meters and speeding the inspection process. The authors in [65] proposed two points reading that identifies the difference in the electric current flowing from the power pole to the meter wire. The difference in readings of both points can lead to the detection of an NTL activity. The authors in [96] proposed the usage of the harmonic signal generator for the NTL detection problem. The signals were introduced in the distribution feeders after disconnecting the supplies of honest consumers to destroy the appliances of fraudulent consumers. The authors in [48] proposed a high-frequency signal generator to identify the location of NTL activity.

Non-Hardware Based Solutions
As the hardware-based methods need to install the new infrastructure, which involves massive amounts, therefore, it is not feasible for several PDCs, especially those in underdeveloped countries. Therefore, the researchers these days are giving more importance to non-hardware-based solutions. Non-hardware-based solutions relies on the studies in which the main focus of the researchers is to identify the presence of electricity theft from consumers' energy consumption data. The non-hardware-based solutions can be further classified into the three major classes. i.
Data-based methods. ii. Network-based methods.
The detail of each of the methods is given in the respective sections. Data-based methods are merely based on machine learning techniques and data analytics. The main difference between data-based and network-based methods is the utilization of the network in the electricity grid [11]. The data-based methods are further subdivided into the supervised methods and unsupervised methods, each of which is discussed in the subsequent subsections.

Supervised Methods
Supervised methods are those methods that utilize the data of both classes (positive class/fraudster or negative class/honest class) for the training of the classifier [73]. These methods learn the patterns in the consumption data of energy consumers for prediction purposes. These supervised methods used the labeled data for training of the classifier, which is later tested on the new dataset. The main drawback of the supervised methods is that, in the absence of the labeled data of fraudster (positive class) consumers, or where the number of fraud cases is much lower than those of honest customers, it is very difficult to use this method [20]. The supervised machine learning techniques that have been used to solve the NTL detection problem are depicted in Figure 3. new dataset. The main drawback of the supervised methods is that, in the absence of the labeled data of fraudster (positive class) consumers, or where the number of fraud cases is much lower than those of honest customers, it is very difficult to use this method [20]. The supervised machine learning techniques that have been used to solve the NTL detection problem are depicted in Figure 3. Each of the methods depicted in Figure 3, is described in subsequent subsections along with their applications, merits, and limitations.

Support Vector Machine (SVM)
SVM has been used plenty of times as the binary classifiers for NTL detection problem because they are immune to the class imbalance issue [1,5,15,21,28,31,40,43,59,72,78]. Many different methodologies have been used, including the cost-sensitive SVM (CS-SVM) and one-class SVM (OC-SVM). The OC-SVM can be viewed as an outlier identification technique. It is normally trained on the data that belong to one class (mostly the honest consumers, i.e., negative class). Nagi et al. [5] trained an SVM-based model by utilizing the risk and energy usage data of the costumers to predict the presence of NTL. The model was tested in peninsular Malaysia to assist Tenaga Nasional Berhad (TNB) sdn.bhd. The SVM-based model increased the detection rate from 3 to 60%. The CS-SVM can give distinct weights to different classes, for example, one class can allot a higher cost to the wrong classifications of the class with a smaller number of labels that ultimately results in better performance, i.e., low FPR and high BDR. The study of recent literature indicates that SVMs can be used for NTL detection even though it is time-taking and problematic to tune the parameters of SVM. Each of the methods depicted in Figure 3, is described in subsequent subsections along with their applications, merits, and limitations.

Support Vector Machine (SVM)
SVM has been used plenty of times as the binary classifiers for NTL detection problem because they are immune to the class imbalance issue [1,5,15,21,28,31,40,43,59,72,78]. Many different methodologies have been used, including the cost-sensitive SVM (CS-SVM) and one-class SVM (OC-SVM). The OC-SVM can be viewed as an outlier identification technique. It is normally trained on the data that belong to one class (mostly the honest consumers, i.e., negative class). Nagi et al. [5] trained an SVM-based model by utilizing the risk and energy usage data of the costumers to predict the presence of NTL. The model was tested in peninsular Malaysia to assist Tenaga Nasional Berhad Energies 2020, 13, 4727 10 of 25 (TNB) sdn.bhd. The SVM-based model increased the detection rate from 3 to 60%. The CS-SVM can give distinct weights to different classes, for example, one class can allot a higher cost to the wrong classifications of the class with a smaller number of labels that ultimately results in better performance, i.e., low FPR and high BDR. The study of recent literature indicates that SVMs can be used for NTL detection even though it is time-taking and problematic to tune the parameters of SVM. The other most frequently used classes of SVM are the radial basis function kernel (RBF) SVM (SVM-RBF) and linear kernel SVM (Linear-SVM). The difference between SVM-RBF and linear-SVM is that in SVM-RBF, the cost and gamma are required to be tuned, whereas only cost parameter must be tuned in case of Linear-SVM. To increase the classification performance, SVMs are sometimes combined with other well-known classifiers like fuzzy inference system (FIS), DT, or artificial neural networks (ANN).

Artificial Neural Networks (ANN)
ANN are multilayered machine learning algorithms. It has an interrelated set of artificial neurons that are used for addressing prediction and complex problems [97]. Multi-layer perceptron (MLP) is the most used version of ANN that is used as a binary classifier for NTL detection problems along with back propagation MLP (BP-BLP) [1,2,29,41,43,98]. Furthermore, ANN models have also been utilized for time series forecasting of energy consumption. The difference between the measured value and the predicted value is used for detecting fraud in energy consumption. The selection of different thresholds and the likelihood that fraudulent consumers' energy consumption data will affect the output of ANN must be considered. However, in both cases (forecast and binary classification), the training of the model must be done after choosing the structure of the network. Even though in most of the recent works, the selection of the optimal network structure is important, yet few of the methodologies select the number of hidden layers and a corresponding number of neurons by the trial and error method. The cross-validation process is used in this scenario to ensure that the model has good generalization ability. Extreme learning machine (ELM) has also been proposed apart from the BP-MLP for forecast and binary classification. ELMs have single or multiple layers of hidden nodes. Hence the parameter tuning of the hidden layers is not needed. In most of the cases, the output weights of the hidden layers are generally computed in a single step. Therefore, ELM models can be trained much faster without having any decline in their performance.

Optimum Path Forest
Optimum path forest (OPF) is also a supervised machine learning graph-based algorithm that is usually used for classification applications. The classification process in OPF is comprised of two steps, i.e., i. Fit and ii. Predict. Compared to the previously used machine learning algorithms which attempt to obtain the optimal hyperplane for the separation of the two classes. The OPF classifier functions by splitting the graph into optimum path trees (two or more than two trees), each one characterizing a separate class. Every tree is connected to the prototype and the assembly of these trees makes the OPF classifier. In the prediction process, the testing samples are assigned the labels of the prototype with the help of cost function. The main advantage of OPF classifier is that it can handle the overlapped class issue with less training time. Therefore, making it feasible for online training of NTL detection [12,23,30,38,58]. These features are critical in the scenarios where the testing samples may differ significantly from the training samples.

Rule Induction Methods
A set of rules can be utilized for NTL detection comes in the category of rule induction methods. Expert knowledge and statistical analysis are mostly used in defining the rules for this purpose. In most of the cases, such knowledge is not enough. Therefore, the rule induction methods are utilized for extracting the rules hidden in the data (customarily labelled). The major aim of the process is to forecast the sample belonging to a specific class using the values of other features. FIS has been used for the NTL detection process to explain the reasoning procedure of the experts [24,31,33,35,44,66,68]. To improve the classification performance of the rule-based systems, they are sometimes joined with well-known classifiers like DT, Bayesian networks and SVM, etc.

Decision Tree
DTs have been used several times for addressing the NTL detection problem [25,33,40,43,62,63,[99][100][101]. A DT is a support tool that uses a flowchart-like graph or model to form a set of rules that helps in classifying new samples. DT algorithms are regarded as one of the highly promising AI algorithms in supervised machine learning methods. DT maps the non-linear relations much better than linear models. These are used for performing both classification and regression problems. The rule formed by DTs helps in better understanding the characteristics of the NTL. Ensembles of DTs are formed by combining the practices of DTs with other specific rules, as defined by the experts [63]. DTs are highly dependent on the training of the dataset and are sensitive to the class imbalance issue. A number of different DTs such as QUEST, CART, C5.0, and EBT have been used for solving NTL detection problems in the literature.
The key benefit of using DTs includes their easy interpretation and transparency to the operatives.

Kth Nearest Neighbor
Kth nearest neighbor (K-NN) is one of the simplest supervised machine learning algorithms used for both classification and regression. K-NN classifies the new data by comparing it with the already present data using the similarity measure. K-NN is usually used in the literature for NTL detection as a baseline for comparison with other machine learning algorithms [20,55,63,102]. The class membership is the output in the k-NN classification. The plurality vote of the neighbors is used for classifying the object which is just allocated to the specific class if the value of k = 1 of its nearest neighbor.

Bayesian Classifiers
The Bayesian classifiers are the probabilistic classifiers in which the responsibility of the class is to predict the values of the features. The Bayesian classifier needs the previous information of NTL probability that can be obtained from the general statistics [27,33,62,64,103]. The working of this classifier is based on the fact that if the class of the sample is known, it can be utilized to calculate the values of the different features. It accomplishes the mentioned objective by using the non-intrusive load monitoring (NILM) procedure to learn the pattern of every device being used by the customer. In the scenario of the new sample, NILM is performed repeatedly and the chance of fraud is estimated at the end of the process. These methods need a huge amount of previous information that significantly affects the output of the classifier. Bayesian networks in Bayesian classifiers utilize a set of variables that graphically signify the class probability. The main advantage of Bayesian networks is that they can be easily understood by humans and hence provide a human-friendly interface.

Unsupervised Methods
All those methods that do not require any labels (positive/negative) for the training of the classifiers are known as unsupervised methods [38]. These methods do not need supervision and hence the model work at its own to learn the information hidden in the provided data. As compared to the supervised machine learning algorithms, unsupervised methods can perform more complicated processing tasks. The following types of unsupervised machine learning algorithms have been used for the NTL detection problem.
Each of the mentioned algorithms in Figure 4 is discussed in the subsequent subsections. Energies 2020, 13, x FOR PEER REVIEW 13 of 27

Clustering Algorithms
Clustering algorithms have been used several times in the NTL detection problem [4,16,38,42]. Clustering is mostly done at the initial stage for data preprocessing. This is done in order to assemble similar costumers with different types of energy patterns and then a classifier is trained on them to identify or classify the unlabeled data. This process increases the classification performance by reducing the false-positive cases. The clustering algorithms can also be used to compute the baseline energy consumption profile of different customers. The fraud is identified when the new samples substantially vary from these baseline profiles. The clustering process can also be utilized as the unsupervised classification process by calculating the space between the new sample to the middle of the cluster. Fuzzy clustering has been used by correlating the new samples with the odds of fraud. This allows the PDCs to develop and tune the method according to their requirements. Density-based clustering methods have also been used effectively for the NTL detection as they consider different clusters with a little resemblance that can form the dense area and unlike clusters from the less dense area of the space.

Expert Systems
Expert systems are based on the instructions described by the professionals responsible for identifying the NTL in the distribution system [45,66,68,104]. They are utilized in both supervised as well as unsupervised methods. One of the examples of such systems is FIS. The key aim of these systems is to solve the complex issues by the logic and analysis, mostly represented by if-then rules. These rules are very simple but still achieve high classification performance. Advanced expert systems can merge the latest knowledge clearly and hence update the models very quickly.

Statistical Methods
Control graphs, particularly for the time-series data, have been utilized for observing the individual energy consumption and for describing the areas, where the consumption pattern may be considered as anomalous for time series data. These control graphs investigate the difference between the moving range and the actual consumption. Violation of the rules is the indication of fraud and

Clustering Algorithms
Clustering algorithms have been used several times in the NTL detection problem [4,16,38,42]. Clustering is mostly done at the initial stage for data preprocessing. This is done in order to assemble similar costumers with different types of energy patterns and then a classifier is trained on them to identify or classify the unlabeled data. This process increases the classification performance by reducing the false-positive cases. The clustering algorithms can also be used to compute the baseline energy consumption profile of different customers. The fraud is identified when the new samples substantially vary from these baseline profiles. The clustering process can also be utilized as the unsupervised classification process by calculating the space between the new sample to the middle of the cluster. Fuzzy clustering has been used by correlating the new samples with the odds of fraud. This allows the PDCs to develop and tune the method according to their requirements. Density-based clustering methods have also been used effectively for the NTL detection as they consider different clusters with a little resemblance that can form the dense area and unlike clusters from the less dense area of the space.

Expert Systems
Expert systems are based on the instructions described by the professionals responsible for identifying the NTL in the distribution system [45,66,68,104]. They are utilized in both supervised as well as unsupervised methods. One of the examples of such systems is FIS. The key aim of these systems is to solve the complex issues by the logic and analysis, mostly represented by if-then rules. These rules are very simple but still achieve high classification performance. Advanced expert systems can merge the latest knowledge clearly and hence update the models very quickly.

Statistical Methods
Control graphs, particularly for the time-series data, have been utilized for observing the individual energy consumption and for describing the areas, where the consumption pattern may be considered as anomalous for time series data. These control graphs investigate the difference between the moving range and the actual consumption. Violation of the rules is the indication of fraud and therefore needs immediate inspection [32,66,69,75]. Non-parametric cumulative sum control chart and exponentially weighted moving average (EWMA) control chart are few other types of charts that have been explored for NTL detection. These types of maps are famous in industries for the rapid identification of NTL as they provide online information and have the ability of visual examination of data. Such quick detections commonly result in a large number of FPs, therefore, reducing the overall outcome. One of the major limitations of these types of methods is that it fails to identify the fraudulent activity if the consumer is involved in the fraud from its very initial monitoring phase since their very working mechanism is based on detecting the change in consumption patterns. Another limitation of this method is that it interprets the different types of energy consumption changes as the fraud which is not the case in real scenario and hence results in a lot of unnecessary FPs.

Regression Methods
Auto-regressive moving average (ARMA) and auto-regressive integrated moving average (ARIMA) are the regression models that are applied for estimating time series models. The methodology of regression methods depends on the difference between the expected and the calculated value if the regression model has been trained with one class of consumers' consumption data. The chances of fraud will be higher if the difference between the forecasted and calculated values is higher [36,105]. ARIMA and ARMA are two well-known regression models for solving the time series forecasting problems. ARIMA-based models are superior in performance for domestic customers.

Outlier Detection
NTL detection is generally carried out by borrowing many concepts from outlier detection methods [38,42,51,[106][107][108]. For example, consider a data set of honest customers in Multivariate Gaussian Distribution (MGD) where every cluster of samples is demonstrated as the Gaussian distribution [38]. The probability of the new sample belonging to both of the distributions is determined. The likelihood of the samples is evaluated with the threshold value to decide whether the new sample belongs to the anomaly or not. The difficult task in these scenarios is to decide the total number of clusters and the boundaries of the gaussian distribution. K-means clustering and OPF are usually utilized for the outlier detection purpose. The local outlier factor (LOF) has also been used in [109]. It is the density-based indication of the fraud. LOF is used to compute the local density of the sample and then makes its comparison to the average of the density of the nearest neighbor's sample. All samples will be considered as anomalies if the local density is considerably less than its neighbors. In this context, it is important to identify the true indication of fraud as a very high LOF is not necessarily an indicator of the fraud. Therefore, new rules should be made and tested before marking a sample as an anomaly. Kaulback-Lieber-Divergence (KLD) has been used in [110] for NTL detection purposes. It is computed by calculating the distance among the two probability distributions. KLD is utilized for the comparison of the distribution of classes with the standard attained from the historical distribution. KLD is utilized for identifying the smart attacks which conceal the fraudulent consumption usage by adjusting it into the authentic ARIMA models. One obvious benefit of these methodologies is that they can still identify the fraudulent activity even if they were not previously present in the training set.

Network-Based Methods
Network-based methods rely on the information attained from the smart meters and calculation of different physical parameters of the electrical network for efficiently identifying the NTL activity [11,111]. Several studies have used the power flow process to calculate the total volume of NTL activity. The area where the fraudulent activity occurs is identified by calculating the energy stability and other parameters with the help of the central observer meter. In contrast, other studies have used distribution state estimation and false data detection for the problems mentioned above. Network-based methods usually are more accurate; however, these methods are not easy to implement. Dedicated sensors are Energies 2020, 13, 4727 14 of 25 generally required for detecting fraudulent activity. To calculate the minimum number of sensors and their optimal placement in the distribution grid, AI algorithms are generally utilized. Figure 5 depicts different types of Network-based methods utilized in literature for NTL detection purpose.
based methods usually are more accurate; however, these methods are not easy to implement. Dedicated sensors are generally required for detecting fraudulent activity. To calculate the minimum number of sensors and their optimal placement in the distribution grid, AI algorithms are generally utilized. Figure 5 depicts different types of Network-based methods utilized in literature for NTL detection purpose.

Load Flow Approach
The calculation of energy flow in the distribution grid is a clear way to detect NTL activity [18,22,42]. The central observer meter, which is installed for the monitoring of energy meter on the LV section of a distribution transformer, is required in such scenarios. The observer meter's measurement is then compared with the total energy recorded by the smart meters. The difference between these two readings is computed for computing the overall percentage of technical losses. The chances of NTL activities are higher if the difference between the mentioned readings is higher. Fraudulent activities can be detected with these methodologies up to the secondary substation level. However, the central observer meter is needed to identify every single fraudster customer. In addition to that, the difficult task in these methods is the precise calculation of technical losses in the network. The authors in [90] and [57] used this concept for NTL detection. The different parameters related to meter's behavior are computed through various methods. These parameters are then matched with that of the regular meter's parameters to indicate the fraud.
Similarly, for the scenarios in which TLs are not known, the authors in [18] proposed a new model for calculating network parameters and computing the TLs. The authors in [39] used a probabilistic power scheme for spotting the NTL activity. The fraudulent activity is identified at the network level with the help of observer meters and the exact location of fraud under the specific observer meter can also be identified. The authors in [112] proposed a smart substation-based methodology in which both observer meters and smart meters are present. The model will be able to identify the location of the fraudulent activity at the consumers' level, even if there is not a significant difference between the observer meter reading and the smart meter reading. The authors in [113] used an identical approach to track network voltage changes by utilizing smart meter data of honest customers. These changes are then used as the network model for computing voltages provided the active power values are known.

State Estimation Approach
Distribution state estimation (DSE) methodologies utilize data from smart meters to observe the grid and hence act as an excellent scheme for correct identification of fraudulent activity [9,114,115]. DSE has been mostly employed at the medium voltage (MV) networks. Fraudulent activity is usually termed as the false data injection (FDI) attacks or as the bad data in the scenarios of state estimation approaches. The significant dissimilarity between FDI and wrong data may include several different bad data. Therefore, FDI is very hard to identify as compared to bad data, which generally appears

Load Flow Approach
The calculation of energy flow in the distribution grid is a clear way to detect NTL activity [18,22,42]. The central observer meter, which is installed for the monitoring of energy meter on the LV section of a distribution transformer, is required in such scenarios. The observer meter's measurement is then compared with the total energy recorded by the smart meters. The difference between these two readings is computed for computing the overall percentage of technical losses. The chances of NTL activities are higher if the difference between the mentioned readings is higher. Fraudulent activities can be detected with these methodologies up to the secondary substation level. However, the central observer meter is needed to identify every single fraudster customer. In addition to that, the difficult task in these methods is the precise calculation of technical losses in the network. The authors in [57,90] used this concept for NTL detection. The different parameters related to meter's behavior are computed through various methods. These parameters are then matched with that of the regular meter's parameters to indicate the fraud.
Similarly, for the scenarios in which TLs are not known, the authors in [18] proposed a new model for calculating network parameters and computing the TLs. The authors in [39] used a probabilistic power scheme for spotting the NTL activity. The fraudulent activity is identified at the network level with the help of observer meters and the exact location of fraud under the specific observer meter can also be identified. The authors in [112] proposed a smart substation-based methodology in which both observer meters and smart meters are present. The model will be able to identify the location of the fraudulent activity at the consumers' level, even if there is not a significant difference between the observer meter reading and the smart meter reading. The authors in [113] used an identical approach to track network voltage changes by utilizing smart meter data of honest customers. These changes are then used as the network model for computing voltages provided the active power values are known.

State Estimation Approach
Distribution state estimation (DSE) methodologies utilize data from smart meters to observe the grid and hence act as an excellent scheme for correct identification of fraudulent activity [9,114,115]. DSE has been mostly employed at the medium voltage (MV) networks. Fraudulent activity is usually termed as the false data injection (FDI) attacks or as the bad data in the scenarios of state estimation approaches. The significant dissimilarity between FDI and wrong data may include several different bad data. Therefore, FDI is very hard to identify as compared to bad data, which generally appears in a random and isolated fashion. The application of FDI effectively results in fooling the state-estimation-based data detector approaches. The attacker requires substantial information of the grid parameters to initiate the process successfully. The authors in [57] proposed Kalman filter state-estimator-based centralized solution for locating biases and line currents. The users are expected to be committing fraud if their preferences are higher than the already defined threshold. The Kalman filter ensured privacy by offering a distributed solution. In such scenarios, the operator does not need to acquire permission to access the voltage measurements and power measurement of the users. The proposed solution provides excellent results in the microgrids with the small line spans. In [52], the fraudulent customer is supposed to have incomplete information of the network topology and hence lacks the ability to enlarge or reduce the values of many smart meters at the uniform spell. These attempts result in non-detection by conventional electricity stability schemes. The author proposed sensor placement, communication devices and smart meters for identifying these types of attacks. The authors in [114] proposed a weighted least square (WLS) based state estimator approach for calculating a total load of MV/LV transformers from three-phase current, voltage, reactive, and active power values. The considerable variation in estimated and measured values indicate a potential NTL activity. The authors in [115] proposed a methodology based on network clustering for the detection of bad data. The process is repeated for each of the networks for the identification of bad data and network partitioning.

Sensor Network Approach
The installation of the specific sensors in the power distribution system is an emerging development in the network-oriented-methods. The aim is to reduce the infrastructure expenses and to compute the optimal position and number of sensors to localize and detect NTLs efficiently [8,17,52,116]. These approaches usually need accurate information on the network structure. They are closely linked to the state estimation approaches, as the objective of the majority of these models is to increase the inspection of the network. Instead of locating the ideal position of smart meters with sensors, the work also examines the positioning of unnecessary smart meters. The issue is resolved by installing the central observer meter and the inspector box before each customer's smart meter. The central observer meter exchanges the information with customers' smart meters and compares the energy usage information to detect NTL activity. The fraud is authenticated if the difference between the measured readings is significant.

Hybrid Methods
Hybrid-based approaches are used to implement the techniques and algorithms of both data-oriented and network-based methods for NTL detection with suitable accuracy [16,26,40]. The authors in [16] used the SVM algorithm along with the central observer meter. The observer meter cross-checks the SVM output to calculate the active power measurements of the network. The difference in the active power measurement and the system's technical losses are calculated with the SVM algorithm. The inspection is required if the customer is classified as fraudulent, i.e., if the SVM yields positive output and the difference is higher than the predefined limit. Similarly, the authors in [40] used the same methodology and proposed a combination of DT and SVM for the same purpose.
in [118] used ANOVA and state estimation for NTL detection that require RTU data, smart metering data, and different network parameters. A distribution state computation is applied by utilizing the smart meters' consumption data. The normalized residual procedure is then used to identify the exact location of fraudulent activity at the LV transformer level. The ANOVA is used at the later stage for comparing the outcomes with the previously verified baselines. The findings of ANOVA are then fed back to the state estimator section to replace the bad data with the improved estimates. The authors in [119] also used the same idea to address the state estimation issues with semidefinite programming. The author in [120] proposed the opposite framework in which the density of different anomalies per transformer is calculated by an unsupervised anomaly detection algorithm. The weight matrix of the state estimator is then adjusted by that density, which computes the loading position of the transformer utilizing the pseudo-calculations and load forecasting. TL and NTL's can then be calculated at the individual transformer level. The author in [121] used a state estimation-based approach along with the supervised approach based on the OPF classifier to evaluate the level of fraudulent activities. The author in [122] proposed the hybrid method using multivariate control charts, state estimation, and path search algorithms. The state estimation is applied to the MV side utilizing data from the field appliances. The variance in the estimated current/voltage value and measured value are being used for describing the multivariate procedure monitoring issues. The fraud is detected if any of the above-described differences are outside the region. Table 5 provides a detailed comparison of data-based methods in term of resources, class imbalance, response time, cost required and performance.

Resources
Need a huge amount of data to guarantee generalization of models. Need labelled data. Rely on extensive data sets of less diversity.
Don't need data of significant volume. Needs data of high-quality and high resolution. Needs extensive data from observer meters, smart meters, RTUs and network data information.
Hybrid methods also need large volumes and a large variety of data.

Class imbalance
Sensitive to class imbalance problem.
The model tends to overfit if the data of one class is scarce.
Immune to class imbalance problems, since they don't require labelled data for training and validation.
Counter similar issues as the data-based methods.
Response time Need monthly, seasonal or yearly consumption data that increases the response time.
Fastest response time. Do not need a large amount of data for finalizing the decision. High-resolution real time data speeds up the NTL detection process.
Face similar issues as data-based methods.

Cost
Don't require additional cost for purchasing, installing and maintaining the existing system. The data-based methods can be developed promptly with small costs and existing infrastructure.
Need a huge amount of expenses. Need additional communication devices, observer meters and RTUs. Their operating tools increase training/operating costs.
It fluctuates between both according to the need of specific devices from the network-based methods.

Performance
Models can be trained easily on existing data; therefore, they can easily identify existing frauds. However, it produces a large number of FP's if there are changes in energy usage resulting from the change of residents or house, making it look like a fraud.
Performs better after installation of specific devices, i.e., smart meters, observer meters and RTU's.
Performs adequately depending on the utilization of the model. The addition of energy balancing condition to the data-based method can significantly enhance the performance of hybrid methods.

Limitations of Available Solutions and Future Recommendations
The supervised machine learning algorithms generally perform superior compared to unsupervised machine learning algorithms [123]. However, the major drawback associated with the supervised machine learning algorithms is the requirement of labelled data, which is not readily available. Another major problem faced by supervised machine learning algorithms is a class imbalance. Since there are pros and cons linked to every supervised machine learning algorithm, it is very crucial to select a suitable algorithm for a particular classification task. For example, SVM performs better when dealing with higher dimensional data and cases where the classes are separable. Another key advantage of SVM-based algorithms is that the outlier does not have much impact on the results. Despite the vast applicability of SVM classifiers, they suffer from few drawbacks such as, they require huge time for processing of data, performs poorly in the scenarios of overlapped classes, and the selection of appropriate kernels and hyperparameters is a strenuous task. Naïve Bayes, on the other hand, are suitable for real-time classification/predictions and perform well with high dimensional data.
The key limitation of this algorithm is that the significance of the single feature does not hold prominent as a combination of all the features contributes to the outcome. The advantage of logistic regression (LR)-based algorithms is that these algorithms are simple and effective to implement as tuning of hyperparameters and scaling of features is not needed; however, they perform poorly in the case of non-linear data-set where they are generally outperformed by its counterpart algorithms. Decision tree-based algorithms does not require scaling and normalization of data. They can tackle the missing values and can be easily interpreted. The major drawbacks of DT-based algorithms are that they result in overfitting, sensitive to the class imbalance and need massive time for the training of models. Similarly, ensemble methods like Bagging and Boosting reduces the error due to voting of weak learners to reach the final decision and, at the same time, perform well in the scenarios of imbalanced data sets. Another important aspect of the mentioned category of classifiers is that they can handle huge data and missing values better than conventional algorithms such as SVM and DTs. Furthermore, the outliers have a very minute impact on the final predictions in ensemble methods and hence these models do not result in overfitting to the training data. Despite the mentioned benefits, this method also suffers from few prominent drawbacks such as; they appear as the black box, trees are needed to be uncorrelated and each feature needs to have the predictive ability; otherwise, they are not considered.
Contrary to the supervised machine learning methods, the unsupervised machine learning methods do not require labelled data for the model training purpose. This is because of the reason that these methods are applied in the scenarios where positive samples (fraudster customers data) cases are rare and large number of negative samples (honest customers data) are available. Unlike the supervised ML methods, the unsupervised methods possess the capability of detecting the new fraud cases; hence they can be applied conveniently for NTL detection; however, they normally result in poor performance with a large number of false positives.
Owing to the above mentioned facts, it is evident that the supervised methods can be explicitly used in the scenarios where there sufficient samples of positive class (fraudster) are available while unsupervised methods are suitable when the data for the mentioned class is scarce. Concluding the above discussion, it is important to note that all algorithms respond differently to different types of data. For example, Naïve Bayes performs best when features are highly independent. SVM, on the other hand, is suited for a medium-sized data set with a large number of features. Linear regression and logistic regression are best suited for training the models when there exists a linear relationship between independent and dependent variables. K-NN can be used in the scenarios when the data set is of small size and the relationship between the independent and dependent variables is not known. Hence, different machine learning algorithms respond differently to different types of data sets. Therefore, there is no thumb rule for selecting an algorithm accurately for a classification problem; however, based on the above discussion one can try the most relevant algorithms considering the available data and its stated features.
The detailed review of the literature reveals that there is a scarcity of research that assesses the impact of NTL in non-developed countries. Contrary to that, the developed countries have much less impact of NTL; yet the impact is considerable. It is worthwhile to mention that, there is a lack of research on assessing the financial impact of implementing the network-based methods, i.e., installation of specific sensors and smart meters. The stated assessment is important to infer whether the benefit from NTL reduction from installing new infrastructure exceeds the equipment cost. Most of the published research work on this area of research focuses on a single aspect of NTL sources. The authors of the current research, after a detailed literature review in the current area of research, felt the necessity of a systematic study that considers all types of potential NTL sources and their implications. Furthermore, the applications that incorporate numerous solutions to classify the NTLs from a variety of possible sources are also lacking in the existing literature. The author, therefore, envisages that the future studies should emphasize on developing the applications which utilize the multiple solutions of NTL detection in an integrated manner.

Conclusions
This article presents a detailed review of state-of-the-art methodologies for identifying fraudulent activities in PDCs as discussed in three significant repositories: ACM Digital Library, Science Direct, and IEEE explore published since 2000. The study focuses mainly on the proposed solutions, criteria, and drawbacks. The literature review in the current article focused primarily on nor-hardware-based solutions, 79 of 91 studies belongs to non-hardware-based solutions. The summary of the literature revealed a sufficient gap in the category of theoretical methods as there is a strong need to investigate the causes of NTL in developing countries. The developed countries have a much low percentage of NTLs, yet the consequences are substantial. There is an insufficient evaluation of the financial viability in the hardware-based methods, as this is essential for concluding that the reduction from the NTLs activities can recover the cost of equipment. How hardware-based methods communicate with non-hardware-based methods should also be examined in-depth since these methodologies go side by side typically.

Conflicts of Interest:
The authors declare no conflict of interest.