1. Introduction
Commercial loss is one of the main problems in power distribution companies, contributing to financial losses. Several studies on energy loss detection have reported this problem. Glauner et al. [
1] presented the most advanced research efforts in a detailed and up-to-date analysis of the algorithms, features, and datasets. They identified the most important scientific and technical challenges in detecting non-technical losses (NTLs) and suggested how these challenges can be addressed in the future. Shah et al. [
2] proposed an algorithm to estimate actual energy losses when smart meter measurements are incorrect and to determine energy consumed by other sources of NTLs. They tested the algorithm using simulations to verify its effectiveness in the accurate identification of NTLs by determining technical losses.
To increase efficiency, power utility companies should minimize energy losses resulting from distribution system outages, anomalies in distribution grids, and fraudulent consumers. This is particularly crucial in rural distribution grids, which are usually large-sized and unautomated and operate on radial systems.
Our research question was: Is it possible to use artificial intelligence (AI) to identify irregular irrigation users? Irregular irrigation users are those committing fraudulent energy consumption. Identifying these irregularities allows inspection field teams to focus on the most suspicious sources of NTLs. To answer the research question, we defined a region to analyze consumers using irrigation systems for rice crops, which delimits the pilot study area. This work did not aim to prevent NTLs but to identify them in consumer units and help the power distribution company detect potential consumers causing NTLs.
Brazil is one of the top ten rice producers in the world, and rice production strongly depends on irrigation systems. In Brazil, crop production using different irrigation systems corresponds to 8.2 million hectares (Mha), and rice production occupies the second largest irrigation area, after sugarcane production, corresponding to 1.2 million hectares [
3]. Thus, the analysis of energy consumption in irrigation systems becomes important.
In the pilot study area, the volume of water required by the flood-irrigated rice crops is approximately 6500 m
·ha
during an average irrigation period of 90 days [
4]. Thus, the large volume of water leads to high energy consumption in irrigation systems, becoming important to find economic losses.
Information on energy losses in Brazil and the rural distribution grids for irrigation systems will be presented. We also briefly analyzed some studies on NTLs.
1.1. Energy Losses in Brazil
In Brazil, the total energy losses in power distribution accounted for approximately 14% of the low-voltage market in 2021. This represented 43.9% of the total energy injected [
5].
Figure 1 presents the energy losses comparing the total losses, technical losses, and NTLs over the years. In this period, total energy losses remained between 13% and 15% of the total energy injected into the customer. NTL is the difference between the total loss and the technical loss. In 2021, NTL was 6.5%. The total amount of energy loss corresponding to NTLs in power distribution was about 34.66 TWh, while technical losses were 40.40 TWh. Therefore, these energy losses significantly affect power utility companies and consumers.
The lack of equipment efficiency, NTLs, and power distribution grid overload are pertinent concerns in rural distribution grids. NTLs (commercial losses) refer to the amount of unbilled energy. Non-metered energy and fraudulent consumers lead to NTLs [
6]. The cost of NTLs is paid by the power utility companies and/or legitimate consumers. NTLs occur both in urban and rural power distribution grids.
Figure 2 shows the irrigated crop area in Brazil. In 2017, the irrigated crop area was 6.7 million hectares, of which 25% were concentrated in the southern region [
3]. More than 70% of the Brazilian rice is produced in the state of Rio Grande do Sul [
7]. Sugarcane is the main crop in the first largest irrigated area, which accounts for 39.8% of irrigation.
The irrigation land use over time in the Brazilian regions shows that irrigation has been widely used in several crops. Over the decades, the irrigated crop area in southern Brazil has decreased.
Figure 3 shows that the Brazilian rice crop area has decreased in recent years [
8]. However, the average productivity has increased since the irrigated crop area is proportionally more productive than the rainfed area. The productivity of irrigated rice crops is more than three times higher than non-irrigated rice crops.
Rice production has been relatively stable due to greater efficiency in water use and advances in crop management technology. Although the crop area has decreased, the crop yield levels have increased. In the last decade, the area under rainfed cultivation declined to less than 500 thousand hectares, while the area under irrigated cultivation was between 1.3 and 1.4 million hectares [
8].
Figure 4 shows the amount of energy consumed by irrigation systems in the state of Rio Grande do Sul (RS) [
9]. This location concentrates the irrigated rice crops in Brazil. The highest energy consumption was in western municipalities, ranging from 30 to 210 million kWh. These are the main irrigated rice crop areas. The energy cost for irrigation systems corresponds to approximately 7% of the total production cost, after the cost of fertilizers (15%) and water (9%) [
10]. In the rice crop period, between October and February, energy consumption increases by more than 500 GWh to meet the irrigation demand. Irrigation efficiency should also be considered, along with concerns about meeting the demand in rural activities. The irrigation systems use motors and pumps to provide water to crops [
11]. Thus, improving irrigation efficiency increases productivity and producer profits.
Rice irrigation by flooding uses ditches in the cropped area to conduct the water to where irrigation is required. This irrigation system has lower implementation and maintenance costs. Rice farms in southern Brazil have extensive irrigated areas, which require pumping stations to maintain a water layer during the irrigation cycle between 80 and 100 days. The energy consumption in pumping stations receives a differential tariff system, Horo-Seasonal green, and adopts a restricted schedule.
Köpp et al. [
12] suggested reference values for sizes or pumping station operations to define the performance index (
). The authors proposed an acceptable index corresponding to a “good” rating of energy consumption for each hectare of irrigated crop according to Equation (
1).
Due to the high volume of rice crops in the state, the pilot study area considers irrigation users from one of the municipalities with the largest irrigated rice crop area. Irrigation users are good candidates for NTLs due to the large amount of energy consumed.
We cannot access accurate information about crop areas since the concessionaire does not have adequate mapping of crops, nor the cropped area in the western border region of the state. The equipment in rural grids is not telemetered, making it difficult to determine monthly consumption. As a result, most data are collected over long intervals. The commercial sector database is responsible for billing and presents many inconsistencies, making it impossible to use data from some consumers.
1.2. Detection of Non-Technical Losses
Several solutions for the detection of NTLs have been proposed, such as Omar et al. [
13], Park and Kim [
14], and Buzau et al. [
15]. This diversity is due to different contexts and problem specificities.
De Oliveira Ventura et al. [
16] studied the impact of NTLs on power distribution companies in Latin America. The NTLs reduce companies’ revenues, and the electricity tariffs paid by the final consumers include part of these revenue losses. The percentage of NTLs distributed to consumers varies according to the national regulator.
Some studies applied big data to power distribution grids. However, many studies focused on specific services in low- and medium-voltage distribution grids. Big data applications in power distribution grids can perform fault detection, predictive maintenance, transient stability, state estimation, power quality monitoring, topology identification, load and profile, load failure, and NTL detection [
17].
Savian et al. [
18] analyzed how NTLs affect countries, utilities, and society. They explained the main barriers and strategies for detecting NTLs and analyzed the most important regulations of NTLs from different countries. They demonstrated the impact of NTLs on the economy and society and presented strategies to mitigate electricity fraud.
Saeed et al. [
19] classified the techniques for detecting NTLs as either hardware-based or non-hardware-based. Teles Faria et al. [
20] linked NTLs to particular populations and locations considering social factors associated with electricity fraud.
Hardware-based NTL methods use meters that have specific devices installed on consumer units. This device enables power distribution companies (PDCs) to detect any malicious activities by consumers, according to Viegas et al. [
6] and Xia et al. [
21]. Installing these devices on the consumer premises requires significant new infrastructure.
Advances in communications and data processing on energy consumer behavior allowed for the development of non-hardware-based methods for detecting NTLs. Thus, researchers are investigating this type of NTLs solution method, whose main focus is to detect the presence of electricity theft from the energy consumption data, as reported by Cui et al. [
22], Khan et al. [
23], and Feng et al. [
24].
Messinis and Hatziargyriou [
25] and Saeed et al. [
19] categorized non-hardware-based methods into three groups: data-driven, network-driven, and hybrid. These methods require energy consumption measurements. The data-driven method uses data related to the consumer, such as personal and spatial technical characteristics, social information, and financial information. The network-driven method uses data such as topology and measurements from remote terminal units (RTUs) and observer meters [
26]. The hybrid method combines the two previous ones.
Data-oriented methods are solely based on data analysis and machine learning. The methods are categorized as either supervised or unsupervised [
25]. Supervised methods use labeled data, which can be sorted into two classes: positive/fraud or negative/not-fraud, such as support vector machine (SVM), artificial neural network (ANN), optimum-path forest (OPF), decision trees (DT), and nearest neighbor (k-NN). Unsupervised methods do not use labeled data.
Network-based methods use information from smart meters and calculate various physical parameters of the distribution grid [
27]. The methods are classified based on the main concept or algorithm used, namely, state estimation, load flow, or special sensors for detecting fraud [
26].
Hybrid methods share characteristics of both data-oriented and network-oriented methods. Some combinations have been proposed by Messinis et al. [
26], such as combining SVM with observer meters to verify the energy balance and combining SVM with decision trees and observer meters.
Ahmad et al. [
28] proposed several approaches to detect unauthorized energy consumption and other methods used for electricity theft. They identified various setbacks and problems that arise in the implementation of measures to control unauthorized energy consumption.
The detection of NTLs is critical for power utility companies. This challenge is even more pronounced for customers in rural distribution grids. Power utility companies conduct inspections to detect NTLs at selected customer locations based on predictions. Rural inspections are expensive due to the long distances that should be covered by the technicians during on-site inspections for NTLs [
1]. Inspection in rural areas is difficult, requiring the displacement of technicians and the inaccessibility of some consumers due to vegetation and environmental conditions. Thus, investment in the accuracy of prediction should be important.
Despite the several studies on detecting NTLs, there are limited studies on rural energy consumers or irrigated crops. This work proposes a methodology to identify areas of interest related to rural energy consumption for irrigation systems. We define a model that uses AI algorithms and applied it to selected data to detect NTLs. To validate the proposed methodology and data selection, a pilot study area is considered. We also analyze the accuracy of the proposed methodology.
The main contributions of this work are as follows:
- -
The identification of a dataset from the meteorological data of the study area, historical energy consumption data, and crop information.
- -
The analysis of the selected data for rural energy consumers served by non-automated distribution grid for irrigation systems.
- -
The definition of a methodology based on AI algorithms to detect NTLs in irrigated rice crops.
- -
A validation procedure for the proposed methodology.
2. Methodology
The proposed methodology describes the steps and techniques for detecting NTLs in crop irrigation systems. The tool development is part of a pilot project in partnership with the Federal University of Santa Maria and the power utility company CPLF Energy (Companhia Paulista de Força e Luz). This project aims to detect NTLs in the state of Rio Grande do Sul, southern Brazil. This region concentrates most rice producers using crop irrigation systems.
We divided the proposed methodology into three main phases: (1) the selection of variables based on their relevance to the energy consumption for irrigation systems, (2) the development of an AI model, and (3) the validation and adjustment of the model based on field inspections.
Figure 5 shows the flowchart of the proposed methodology, presenting the study area, the detection system for NTLs, and the result analysis.
This flowchart starts with the analysis region selection by searching the consumer database. Then, the algorithm is executed to select the input variables for the fuzzy logic algorithm. These variables are selected from those related to energy consumption for irrigation systems. The fuzzy logic algorithm identifies suspicious consumer units of NTLs. The system prioritizes the suspicious consumer units, and an inspection list is sent to the field inspection team. Then, the results from field inspections are compared to the inspection list generated by the system.
First, we analyzed the most significant variables that affect the energy consumption in rice irrigation systems. We considered the meteorological variables, the variables related to rice crops, and the data from the energy supplier. We adopted correlation analysis to understand the relationships among these variables and excluded redundant or unrelated variables to understand their respective relationships with energy consumption.
Next, we treated the selected variables using normalization, synchronization, and missing data removal. In the second step, the previously processed variables serve as inputs to the AI algorithms, as described in
Figure 6.
A fuzzy logic algorithm receives this information and predicts if additional irrigation is required by the rice crops analyzed. Next, an expert system indicates the consumer units that significantly differ from the expected energy consumption for the current rice crop. These consumer units are identified as suspicious sources of NTLs. A list of suspicious consumer units is generated for field inspection.
Finally, field inspection teams inspect the suspicious consumer units. The data generated by the field inspection will be used to improve the algorithm. This will foresee the expansion of the current pilot project into a permanent one at the power utility company.
2.1. Variable Selection
A study on technological mapping identified the most significant factors that contribute to energy consumption in rice irrigation systems in southern Brazil. The study contains a literature review, interviews with experts, and field inspections of irrigated rice crops.
Data on water usage in rice crops is directly associated with the need for supplemental irrigation. These data are related to the energy consumption by the water pumping system. Meteorological and crop variables directly impact energy consumption. The energy consumption pattern of the consumer unit in previous harvests is also a significant factor. Furthermore, the suspicious consumer units of NTLs are reinforced by the total energy loss rate of the energy supplier.
Four groups of variables are available: meteorological data, crop data, historical energy consumption, and electricity data from the energy supplier.
Meteorological data: the National Institute of Meteorology (INMET) [
29] provides sixteen variables from weather stations in Brazil, such as precipitation; wind speed and direction; and average, maximum, and minimum values for temperature, humidity, atmospheric pressure, and dew point. Data from weather stations closest to a given region are interpolated to provide more accurate values.
Figure 7 shows the weather stations located in southern Brazil (dark ellipses).
Crop data: crop area, soil type, and crop type variables are available from satellite images and specific algorithms.
Figure 8 shows part of the satellite image data processing used to recognize the crop area and, subsequently, classify the crop type [
30]. January is considered the optimal month for acquiring satellite images to identify rice crops. At the end of the satellite image processing, the crop area, in hectares, is identified and calculated for each segment of the rice crop.
Historical power consumption: historical energy consumption data are available from the power utility company database together with the installed capacity of the consumer unit.
Electricity data: electricity data for rural consumer units can be accessed in the power utility company database, while data on energy losses are based on Reference [
31].
This large number of data led us to select the most significant variables for the proposed methodology. Since some variables are mutually dependent, i.e., they carry redundant data, we can select only one of the correlated variables. Thus, we created a statistical model based on correlation analysis and direct selection.
For this step, we used a dataset comprising ten consumer units (irrigated rice crops) and three rice crop from 2019/20, 2020/21, and 2021/22. We defined a correlation value of 0.80.
Figure 9 presents the results for the meteorological variables.
From the sixteen variables available, we selected rainfall (precipitation), wind speed, wind direction, average temperature, average humidity, and minimum humidity. The remaining variables were excluded since they had a high correlation value with at least one of the selected variables.
The crop area, soil type, and crop type variables were included in the correlation analysis with the six meteorological variables to test their correlation with energy consumption. Although we stated the correlation value of 0.80 as a standard for variable selection, we also considered variables with higher correlation values.
Table 1 presents the correlation analysis results for the meteorological variables. Rainfall (precipitation), wind speed, average temperature, and crop area were identified as the four variables correlated with energy consumption for supplemental irrigation of rice crops.
Energy loss data from energy suppliers is not directly related to energy consumption in irrigation systems. These data will be used later in the expert system step.
Correlation analysis plays a vital role and aims to improve the performance of the fuzzy logic algorithm. Reducing the number of variables decreases rule numbers and programming complexity of the algorithm. The number of inputs optimizes processing and computational costs during operation.
2.2. Fuzzy Algorithm
The fuzzy logic algorithm predicts the need for the supplemental irrigation of rice crops. The primary irrigation source for these crops comes from natural replenishment through precipitation. Evaporation, plant transpiration, and vertical water percolation through soil lead to water losses that are not naturally replaced. The main irrigation systems in southern Brazil use electric motors.
Irrigation demand is inversely proportional to precipitation but directly proportional to wind speed, average temperature, and crop area. The irrigated crop area variable was defined by analyzing data from satellite images of the region. The parameters of the meteorological variables were determined based on climate normals available by INMET [
32].
Each variable was assigned to three membership functions (low, average, and high) based on the trapezoidal function. The parameters of each membership function were determined by combining statistical analysis and expert opinion.
Figure 10 shows the fuzzification of the rainfall variable (precipitation).
The set of rules for the fuzzy logic algorithm is based on expert opinions about the impact of each variable on the need for supplemental irrigation. We consulted experts in meteorology, phytotechnics, and rice crops production. Then, historical data from the last three crop seasons were used to refine the rules.
The output of the fuzzy logic algorithm is defined as the necessary irrigation period within the crop, denoted by
and measured in hours, proportional to the irrigation needs in that crop.
Figure 11 illustrates the defuzzification and parameterization of the output variable.
This irrigation period parameter allowed us to predict energy consumption for the rice crop (
E), as given by Equation (
2), in kWh, where
P is the installed capacity present in the power utility company database. The energy consumption index per area (
), in kWh·ha
, can be calculated by Equation (
3), where
A represents the crop area.
This metric is compared to the statistical normal values for the analyzed region to identify consumer units that differ from the standard behavior. The expert system performs this step.
2.3. Expert System Algorithm and NTLs Suspicious Consumer Indication
According to Köpp et al. [
12], the average energy consumption index per area (
) for rice irrigation in the pilot study area ranges from 550 to 750 kWh·ha
per harvest. This approach aims to identify consumer units whose energy consumption is above or below the average range. We give particular attention to those consumer units whose energy consumption is below the average range, and they are tagged as suspicious consumer units of NTLs. The power utility company is also interested in consumer units whose energy consumption exceeds the average range since they indicate low efficiency in the irrigation system and may receive investments in energy efficiency projects.
The expert system considers the results of the energy consumption index per area of each consumer unit and intersects this information with the total energy loss index of the energy supplier in which the consumer unit is located.
The historical energy consumption behavior of the consumer unit is also compared to the nearby consumer units. For instance, in a particular crop, if the average behavior of the consumer group was to increase energy consumption and if the energy consumption varied in the opposite direction, it reinforces the suspicion of NTLs.
The expert system is developed using a fuzzy logic algorithm to incorporate the desired relationships between the three variables into the set of fuzzy rules. These results classify the suspicious consumer units of NTLs into very high suspicion, high suspicion, mean suspicion, low suspicion, or low efficiency.
Table 2 summarizes the standard inserted into the expert system to evaluate suspicious NTLs.
For instance, a consumer unit classified as high suspicion of NTLs occurs when the energy consumption index per area is below the average range, while the total energy losses of the energy supplier is above 25% and the energy consumption behavior between the last harvests differs by more than 35%.
The expert system allowed us to generate a field inspection list, based on power utility company criteria.
2.4. Field Inspection
In the pilot project, we considered the total crop area of 75,800 hectares and about 475 consumer units for rice crop irrigation. Consumers were tested following the proposed methodology, to generate an inspection list of the suspicious consumer units associated with NTLs. Considering the total number of consumers, 90 were identified as the main suspects in the pilot study area. From these, a total of 60 were inspected by the inspection field teams to verify possible tampering and fraud in the power measurement systems. The next section describes the results of the field inspections.
3. Results and Discussion
The pilot study area comprised 5,702,098 km
, as shown in
Figure 12. This crop area is studied in two periods. The first one was between 2021 and 2022 and the second one between 2022 and 2023. The first crop period had 248 consumer units (2021/2022), while the second one had 227 (2022/2023).
Applying the proposed methodology to all consumer units selected, we can observe that 90 were pointed out as potential NTLs, as shown in
Table 3. The suspicious consumer units of NTLs are classified from mean to very high suspicion. We also highlighted that the proposed methodology reduced the search space for NTLs by 73.79% in the first rice crop and 88.99% in the second one. In general, the search space for NTLs is reduced by 81.40%, as shown in
Figure 13.
Field inspection lists are generated for all suspicious consumer units of NTLs detected. Since the number of unsuspicious consumer units is high, we selected a sampling of 10% of these consumer units to be inspected. Thus, for the first rice crop, 65 suspicious consumer units and 25 unsuspicious consumer units will be inspected. For the second rice crop, 25 suspicious consumer units and 20 unsuspicious consumer units will be inspected.
After field inspections, for the first rice crop, we confirmed 28 consumer units as responsible for NTLs of the total of 65 suspicious consumer units detected. All 25 unsuspicious consumer units sampled were confirmed unsuspicious. For the second crop, we confirmed 12 suspicious consumer units as responsible for NTLs of the total 25 detected, and all 20 unsuspicious consumer units sampled were confirmed.
By considering both crops together, there were 90 suspicious consumer units of NTL; 40 were confirmed as suspicious and 45 as unsuspicious. This confusion matrix analysis is shown in
Table 4.
The confusion matrix analysis can determine the accuracy and error rate of the proposed methodology. The accuracy is the ratio between the true and false positive results and the total number of observations, as given by Equation (
4). The error rate is the ratio between the true and false negative results and the total number of observations, as presented in Equation (
5). We also determined the specificity and precision of the model. Specificity is the ratio between the true negatives and the sum of false positives and true negatives, as stated in Equation (
6). Precision is the ratio between the true positives and the sum of true positives and false positives, as in Equation (
7).
The accuracy of the proposed methodology was 63%, while the error rate was 37%. The accuracy was slightly below that of Dominguez et al. [
33], who applied a machine-learning model and reached 74% accuracy. Viegas et al. [
34] used a fuzzy-based model and achieved similar accuracy (63.6%). Some studies reported precision results above 90%, such as Salman Saeed et al. [
35] who used decision tree, Messinis et al. [
26] who applied support vector machines, and Saeed et al. [
36] who used ensemble bagged tree models. It may seem that the proposed methodology was inferior when compared to those last studies. However, for high accuracies in which the dataset is unbalanced, this measure is not suitable. For example, if a dataset has 90% of negative cases and the model classifies all of them as negative, an accuracy of 90% will be reached. In addition, the aforementioned studies did not focus on rural areas, accounting for several consumers due to the customer density in urban grids. In this work, for the pilot study area, the number of consumer units is quite small compared to the other studies. In this sense, specificity and precision measurements are used as an additional analysis. For both indicators, the proposed methodology reached 100%. Salman Saeed et al. [
35] reached 98.2% and 93.2% for specificity and precision, respectively. Saeed et al. [
36] reached 98.2% for specificity, and Messinis et al. [
26] did not present such information.
The power distribution company currently uses correlation analysis and direct selection to identify the most significant variables in the meteorological and cultural data and energy consumption records. The sampling and data collection are completely manual, and the detection of NTLs has an accuracy of 57%. This work proposed a methodology that uses automation to obtain relevant data from the database, makes a total analysis of the areas of interest, and presents greater precision. Thus, the fuzzy logic algorithm can bring greater flexibility and speed for application in different areas of interest.
For the pilot study area, the results were promising and proved the effectiveness of the proposed methodology. In addition, this work is restricted to one culture and one area due to the difficulty of accessing commercial data from the power utility company. However, we could evaluate the effectiveness of the proposed methodology.
We can observe that the error rate is linked only to false positives, i.e., the detection of suspicious consumer units of NTLs has proved to be energy-inefficient consumer units. This is due to the selection criterion for the suspicious consumer units of NTLs since this classification encompasses both the inefficiency of the consumer unit and the presence of NTLs.
By changing the criteria to consider only high and very high suspicious consumer units, there is a high chance that consumer units with NTLs would not be included in the list, which would be disastrous for a system seeking this fact. However, even considering a more adjusted suspicion criterion, we can guarantee that unsuspicious consumer units of NTLs do not have this characteristic. Thus, this criterion can be understood as the most interesting one. Some adjustments should be made to the modeling to include other factors that are not currently accessible, such as a long-term history for each consumer unit.
The results from field inspections classified the inconsistencies as follows: 3.5% as irregular with billing impact and 10.6% as irregular without billing impact. The first category includes fraud, equipment problems, and process errors. In practice, the field inspection teams should register the fraud detected as equipment failures and recovery as part of the loss since the legislation allows for the concessionaire to review the values of the last three months from inspection registration.
4. Conclusions
Power utility companies have faced the challenge of identifying NTLs in rural areas. Effectively, field inspections are difficult in the rural distribution grids in the state of Rio Grande do Sul, in southern Brazil, due to their characteristics.
The pilot study area was effective at reducing the search area for NTLs in consumer units. This work established a multi-parametric and multi-criteria model that allowed us to classify consumer units based on the suspicion of NTLs.
In this pilot study, we selected rice crop irrigation to analyze energy consumption. Our proposed methodology achieved promising results and could be robust enough to establish standards for other crops, such as sugarcane and coffee, achieving the same accuracy as rice.
This work presents a comprehensive methodology to establish the criteria for suspicious consumer units of NTLs in irrigation systems. The contribution is significant since on-site inspection is costly due to large areas usually located far from urban centers. Reducing the search space for these consumer units helps to reduce inspection costs and increases the effectiveness of actions to prevent NTLs in irrigation systems.
Reducing commercial losses enables power utility companies to improve load forecasts, providing compensation during more critical moments. As a result, the quality of the energy delivered to all consumers served by the rural grids improves.