1. Introduction
1.1. Contextual Background
Electricity theft poses a significant global challenge, leading to substantial economic losses for power utilities and introducing safety risks, such as equipment damage and outages. According to a 2017 report by Northeast Group LLC [
1], non-technical losses—including theft, fraud, and billing errors—amount to approximately USD 96 billion annually worldwide. In China, electricity theft has severely impacted the State Grid Corporation, prompting USD 84 billion in grid investments in 2024 to curb losses [
2]. Despite substantial investments in grid modernization, theft remains prevalent, particularly in developing economies where electricity access is limited and regulatory enforcement is weak.
The Democratic Republic of Congo (DRC) exemplifies this paradox. Despite possessing 100,000 MW of hydropower potential, only 2.5% is harnessed, leaving 21.5% of the population with electricity access as of 2022 [
3]. Rapid urban expansion in Kinshasa (13 million residents) and inflation (23.8% in 2023) [
4] have weakened household purchasing power, making utility payments unaffordable [
5]. The inability to collect revenue effectively restricts grid expansion, worsening service reliability and theft risks. Additionally, operational inefficiencies—including limited personnel, outdated infrastructure, and unreliable billing practices—create opportunities for electricity theft. Socioeconomic disparities, inconsistent billing practices, unreliable supply, and general awareness of tampering methods contribute to a complex environment where electricity theft becomes a coping mechanism for many [
6].
1.2. Problem Statement
Despite extensive technical advancements in electricity distribution, theft remains an unresolved issue, particularly in developing economies. While previous studies have explored real-time fraud monitoring, smart metering, and AI-driven detection [
7,
8], they often overlook the role of socioeconomic factors and behavioral dynamics. In Kinshasa, the persistence of electricity theft stems from a complex interplay of economic hardships, regulatory weaknesses, and consumer perceptions [
9]. Understanding these behavioral and infrastructural determinants is critical to formulating holistic solutions beyond mere technological interventions.
1.3. Aim and Objectives
1.3.1. Aim
This work aims to analyze the primary technical, socioeconomic, and behavioral factors influencing electricity theft in Kinshasa using logistic regression.
1.3.2. Objectives
Specifically, this work seeks to:
Estimate the likelihood of electricity theft in Kinshasa using logistic regression;
Determine key predictors, including billing methods, meter types, and socioeconomic indicators;
Assess the impact of behavioral factors, such as awareness of tampering, on theft probability;
Formulate strategic recommendations to mitigate electricity theft through infrastructure upgrades, policy interventions, and public awareness initiatives.
1.4. Work Structure
The remainder of this paper is organized as follows:
Section 2: Literature Review—a critical evaluation of existing theft detection methods.
Section 3: Methodology—survey design, variable selection, logistic regression modeling.
Section 4: Results—model findings and predictive validity assessment.
Section 5: Discussion—interpretation, limitations, and future directions.
Section 6: Conclusion—key results and policy implications.
2. Literature Review
Electricity theft remains a persistent challenge that requires robust detection and mitigation strategies. Research efforts have categorized existing approaches into hardware-based, non-hardware-based, game-theoretic models, and behavioral analysis. While technological solutions focus on real-time monitoring and anomaly detection, emerging studies highlight the role of socioeconomic determinants in shaping theft patterns. This section critically evaluates existing methodologies and identifies gaps relevant to Kinshasa’s electricity theft landscape.
2.1. Hardware-Based Methods
Hardware-based approaches deploy physical devices such as smart meters, tamper-resistant seals, and sensor-driven systems to deter unauthorized electricity usage. Zulu and Dzobo introduced a real-time theft monitoring framework using double-connected data capture, improving anomaly detection rates [
5]. Additionally, Advanced Metering Infrastructure (AMI) enables utilities to track consumption patterns remotely and generate alerts for suspicious activities [
10].
Despite their effectiveness, hardware-based approaches face deployment challenges, particularly in regions with outdated grids or limited funding. High installation costs hinder large-scale adoption, leaving utilities reliant on manual inspections and penalty enforcement, as highlighted by Bhakta et al. [
11]. This highlights the need for hybrid approaches, combining hardware with predictive analytics.
2.2. Non-Hardware Methods
Data-driven approaches leverage network analysis, machine learning, and artificial intelligence (AI) to detect irregular consumption patterns. These techniques offer scalable alternatives to hardware solutions, utilizing grid data and meter readings for theft identification.
2.2.1. Network Data Analysis
Network analysis employs load flow assessments, state estimation, and anomaly detection algorithms to expose theft. Dehghanpour et al. [
12] and Zhai et al. [
13] demonstrated the utility of state estimation models, while clustering algorithms, as applied in [
14,
15], group consumption data to expose atypical usage patterns indicative of fraud. Additionally, Sen and Yang [
16] utilize three-phase power data with neural network models to enhance theft detection capabilities. Although these methods successfully identify network-wide anomalies, pinpointing individual theft incidents remains difficult.
2.2.2. Consumer Meter Data with Artificial Intelligence
Artificial intelligence (AI) has become essential in consumer-level electricity theft detection, leveraging machine learning to analyze meter readings and identify anomalies.
Supervised learning models, including Support Vector Machines (SVMs), decision trees, and neural networks, have proven effective in classifying fraudulent users [
8,
17,
18]. Hu et al. [
19] highlight Random Forest algorithms, known for their ability to handle imbalanced datasets. Regression techniques [
20] have also been applied to detect faulty smart meters and theft through consumption anomalies.
Despite their effectiveness, supervised models face data labeling constraints, class imbalance issues, and overfitting risks. To mitigate these challenges, unsupervised learning techniques, such as clustering, have gained traction. Sasmoko et al. [
21] utilize anomaly detection models, Yang et al. [
22] introduce ant colony clustering algorithms, and Žarković and Dobrić [
23] leverage K-Means clustering, while Qi et al. [
24] integrate wavelet-based feature extraction with fuzzy c-means (FCM) clustering. However, threshold definition and overlapping clusters remain obstacles.
Ensemble learning approaches have emerged as powerful alternatives. Kawoosa and Prashar [
25] employ XGBoost, enhancing classification accuracy, while Qi et al. [
24] integrate Minimal Gated Memory (MGM) networks and Adaptive Synthetic Sampling (ADASYN) to address class imbalance issues. Hybrid models [
26,
27] combining Recurrent Neural Networks (RNNs) and Bidirectional Long Short-Term Memory (Bi-LSTM) further enhance theft detection efficiency.
However, AI models face limitations such as overfitting risks, data labeling constraints, and high computational requirements [
25]. A promising alternative is logistic regression modeling, which simplifies predictive frameworks and ensures practical usability in policy applications.
2.2.3. Hybrid Methods
Hybrid methods combine network-level data with consumer meter data to enhance theft detection accuracy. Wang et al. [
28] demonstrate that clustering algorithms help detect anomalies in network data, while AI models refine individual consumption analysis, improving detection precision [
29]. A key approach employs heuristic segmentation, transforming time-series consumption patterns and grid loss metrics to extract theft indicators. Consumers showing a strong correlation with line losses and consistent waveform anomalies are flagged as suspicious.
Despite their effectiveness, hybrid methods require advanced data infrastructure and high computational power. Nonetheless, they are integral to modern smart grids, enabling real-time, data-driven monitoring for theft mitigation.
2.3. Game Theory
Game theory evaluates consumer–utility interactions, treating electricity theft as a strategic decision-making process.
Wei et al. [
30] integrate Benford’s Law with a Stackelberg model, positioning the utility company as a leader, while consumers react by optimizing theft strategies. Likelihood Ratio Tests (LRTs) further refine detection. In contrast to other studies presented in [
31], Ref. [
32] explored Bayesian intrusion detection frameworks, enhancing utility intervention tactics.
Despite its theoretical appeal, game theory models rely on assumptions of rational consumer behavior, limiting real-world applicability due to data constraints and scalability issues [
31]. Thus, behavioral analysis remains critical in understanding theft motivations beyond economic utility optimization.
2.4. Behavioral Analysis
Behavioral analysis examines the psychological and socioeconomic motivations behind electricity theft, identifying patterns in consumer decision-making.
Surveys and interviews have uncovered key theft drivers, such as financial hardship, lack of awareness, and dissatisfaction with utility services [
33]. Socioeconomic models [
34] integrating income levels, demographics, and billing history provide data-driven fraud detection. Studies [
9,
35] emphasize the need for enhanced billing systems, stricter enforcement, and public awareness campaigns to mitigate theft risks. Razavi and Fleury [
9] highlight income inequality and inadequate infrastructure as key contributors to non-technical losses.
Increasingly, behavioral insights are combined with AI-based detection models, as demonstrated by Hussain [
36] and Bansal et al. [
37], improving predictive accuracy.
2.5. Research Gaps and Contribution
The existing literature predominantly focuses on hardware detection, AI-based monitoring, and economic modeling. However, limited research incorporates behavioral predictors within statistical fraud detection frameworks.
Table 1 provides a summary of currently available literature. This study bridges the gap by integrating billing stress, meter type, tampering awareness, financial hardship, and regulatory influences into logistic regression modeling, offering a comprehensive predictive framework for electricity theft detection in Kinshasa.
3. Methodology
This study employs a rigorous statistical approach to analyze electricity theft determinants in Kinshasa, leveraging logistic regression for predictive modeling. The research design integrates random sampling, bootstrapping techniques, and regularization-based feature selection to ensure robust, reliable insights into theft patterns.
3.1. Data Collection and Preprocessing
3.1.1. Survey Design
A structured survey was developed to assess technical, behavioral, and socioeconomic influences on electricity theft. The key predictor variables and their modalities are detailed in
Table 2 below:
To ensure validity and reliability, domain relevance guided variable selection, grounded in existing electricity theft research [
9,
39]. Exploratory analyses assessed initial correlations between theft likelihood and independent variables.
3.1.2. Population and Sample Size Determination
A statistically valid sample size is essential for meaningful conclusions in survey-based studies [
37]. Using the Krejcie and Morgan formula, an initial sample size of 385 participants was determined for Kinshasa’s population (13 million residents). A power analysis further validated its adequacy in capturing theft-related socioeconomic patterns:
where
N = population size;
Z = 1.96 (95% confidence level);
p = 0.5 (the population proportion assuming maximum variance);
e = 0.05 (margin of error or the degree of accuracy expressed as a proportion).
As Kinshasa’s urban population represents 40% of the DRC’s total population, Equation (1) simplifies under large-population assumptions:
resulting in a final sample size of 385 respondents.
To prevent demographic biases, random sampling ensured equal probability selection, complemented by bootstrapping techniques for enhanced model stability and reliability.
These methodological choices enable a holistic understanding of electricity theft drivers in Kinshasa, independent of the subgroup segmentation.
3.1.3. Preprocessing
Following data collection, the steps taken included the following:
Integrity checks confirmed completeness, eliminating the need for imputation.
Quantitative variables were standardized, ensuring uniform scales for regression modeling.
Categorical variables underwent dummy coding (binary) and ordinal transformations (ranked perceptions) for suitability in logistic regression.
Multicollinearity detection via the Variance Inflation Factor (VIF) safeguarded model stability, ensuring predictor independence.
3.2. Logistic Regression
3.2.1. Model Specification
A binary logistic regression model was formulated to predict the likelihood of electricity theft (Y = Yes or No) [
40], and Logit function was chosen as the link function, ensuring a proper classification framework:
where
Y is the binary dependent variable (theft occurrence);
Xn (n = 1 to 19) represents independent predictors (socioeconomic, technical, and behavioral variables);
βn (n = 1 to 19) represents the regression coefficients (interpreted using odds ratios).
3.2.2. Variable Selection and Optimization
To prevent overfitting and enhance interpretability, the following analyses were performed:
Multicollinearity diagnostics, using the Variance Inflation Factor (VIF) to exclude highly correlated predictors.
Feature importance analysis ranked the most influential variables.
Lasso regression (cv = 5) optimized the regularization parameters.
3.2.3. Parameter Estimation
The Newton–Raphson algorithm iteratively updated the model coefficients, converging when the likelihood change dropped below 0.001, ensuring accurate parameter estimation.
3.2.4. Model Evaluation
The predictive performance of the model was evaluated primarily using the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve. Additional metrics, including McFadden’s R2, Cox and Snell R2, and Nagelkerke R2, were computed to measure the goodness of fit. Furthermore, classification, accuracy, sensitivity, and specificity were derived from the confusion matrix to quantify the overall performance and diagnostic effectiveness.
3.2.5. Significance Testing and Validation
Predictors underwent Wald tests and Type II likelihood ratio tests to establish significance, with 95% confidence intervals calculated for odds ratio interpretations.
4. Results
4.1. Sample Adequacy and Validation
4.1.1. Sample Selection and Bootstrap Validation
The study employed a random bootstrapping approach, ensuring model stability across different samples. The AUC distribution over 100 bootstrap iterations yielded a mean AUC of 94.92%, with a standard deviation of 0.8%, indicating strong discriminative performance. Additionally, 94% of samples fell within the upper and lower confidence bounds, confirming consistent classification accuracy across different sampling conditions. This is indicated in
Figure 1.
4.1.2. Post-Hoc Power Analysis
The sample size used was 385.
The observed effect size.
The logistic regression model explained 61.4% of the variability in electricity theft likelihood (Nagelkerke R2 = 0.614).
The predictors demonstrated strong influence (Cohen’s f2 = 1.588), confirming the robustness of the model.
The significance level (α) was 0.05.
This is the probability of making a Type I error (rejecting a true null hypothesis).
α = 0.05 means there is a 5% chance of falsely detecting an effect when none exists.
The computed power (1–β) was 1.0000.
This indicates an extremely high probability of detecting a true effect if one exists.
Given this high power, the risk of Type II error (false negatives) is negligible, ensuring the reliability of the model results.
4.2. Descriptive Analysis
4.2.1. Quantitative Variables
The key variables exhibited moderate variations:
Household size (Mean = 4.699, Std Dev = 2.174).
Number of appliances (Mean = 3.418, Std Dev = 2.311).
Electricity supply quality (Mean = 5.036, Std Dev = 2.354). The low standard deviation for backup energy sources (X5 = 0.353) suggests consistent reliance on alternative power solutions.
Table 3 provides descriptive statistics for this analysis.
4.2.2. Qualitative Variables
The categorical variables provided the following key behavioral insights:
A total of 41.04% of households reported electricity theft occurrences, highlighting the prevalence of the issue.
A total of 70.91% of respondents were tenants, suggesting potential links between rental status and theft likelihood.
Prepaid electricity usage (57.66%) was more common than postpaid, possibly affecting theft patterns.
A total of 54.03% had tampering awareness, indicating a widespread understanding of meter manipulation techniques.
Table 4 provides descriptive analysis of quantitative variables.
4.3. Multicollinearity and Feature Selection
The variables X7 and X19 exhibit high VIF values (>5), potentially indicating strong collinearity, whereas X5 has the lowest VIF (1.18), suggesting minimal correlation with other predictors. The multicollinearity statistics are provided in
Table 5.
Lasso regression optimized feature selection, with variables X8 (financial stress) and X9 (tampering awareness) showing the strongest positive coefficients. The selection regression feature for electricity theft detection are given in
Table 6.
4.4. Regression Analysis of Variable Y
4.4.1. Model Specification
This summary highlights the model’s effectiveness and provides key insights into its explanatory power. This is shown in
Table 7.
4.4.2. Test of Null Hypothesis
The
p-values (<0.0001) indicate strong statistical significance, suggesting that the model is highly informative for explaining variations in the dependent variable. The model evaluation is summarized in
Table 8.
4.4.3. Type II Analysis (Variable Y)
This table presents the Type II analysis, detailing statistical measures for various predictors associated with variable Y. It includes the following:
Degrees of Freedom (DDL): These determine the number of independent comparisons per variable.
Wald Chi-Square (Khi2 Wald) and Pr > Wald: These assess the individual significance of predictors in the model. Smaller p-values (Pr > Wald) suggest stronger statistical significance.
Likelihood Ratio Chi-Square (Khi
2 LR) and Pr > LR: These evaluate the overall model contribution when including each variable. Results for type II analysis are given in
Table 9.
4.4.4. Hosmer–Lemeshow Test (Variable Y)
The Hosmer–Lemeshow test evaluates the goodness of fit of a logistic regression model by comparing observed and predicted probabilities across different risk groups.
4.4.5. Model Parameters (Y)
This table presents the results of a logistic regression analysis for Variable Y, detailing statistical indicators for the following model predictors:
Values and Standard Errors: These estimate each variable’s effect size with its respective uncertainty.
Wald Chi-Square and Pr > Chi2: These evaluate statistical significance, where low p-values (<0.05) indicate strong predictor influence.
Confidence Intervals (95%): The lower and upper bounds reflect estimation precision.
Odds Ratio (OR): The OR measures the likelihood of outcome occurrence for each predictor. Logistic Regression results are given in
Table 11, and the standardized coefficients in
Table 12.
4.5. Predictions and Residuals (Variable Y)
4.5.1. Prediction Accuracy
The model maintained high accuracy across training, test, and validation sets:
Train Set: accuracy = 87.8%, and the F1-score = 0.88
Test Set: accuracy = 77.8%, and the F1-score = 0.78
Validation Set: accuracy = 83.9%, and the F1-score = 0.83. These results highlight consistent predictive reliability. The model performance metrics for class 1, 0 and combined are given in
Table 13,
Table 14 and
Table 15, respectively.
4.5.2. Residual Analysis
The mean deviance residuals indicated in
Figure 2 remained within acceptable thresholds, confirming model robustness. The residual metrics are provided in
Table 16.
5. Discussion
5.1. Model Fit and Performance
The logistic regression model exhibited strong predictive capability, achieving an AUC of 96.51%, indicating high classification accuracy in distinguishing electricity theft from non-theft cases. The Hosmer–Lemeshow test (p = 0.471) confirmed a good model fit, with no statistically significant deviations between the observed and predicted probabilities.
To enhance model stability, random bootstrapping sampling was applied across 100 resampled datasets, yielding an average AUC of 94.92%, with a standard deviation of 0.8%, reinforcing consistent classification performance. Additionally, 94% of the bootstrap samples fell within upper and lower confidence bounds, verifying the model’s robustness across varying conditions.
Further strengthening these results, a post-hoc power analysis validated the sample size (n = 385), confirming high statistical power (1 − β = 1.0000) and ensuring a negligible risk of Type II errors, meaning that true effects were accurately detected.
5.2. Addressing Potential Multicollinearity
Multicollinearity assessments identified high Variance Inflation Factor (VIF) values for X7 (meter experience, VIF = 7.47) and X19 (theft penalty opinion, VIF = 7.89), indicating strong predictor redundancy.
To refine feature selection, Lasso regression (cv = 5) was applied, ensuring optimal dimensionality reduction while preserving key explanatory variables. The strongest contributors to electricity theft prediction—X8 (financial stress, Lasso coefficient = 0.130) and X9 (tampering awareness, coefficient = 0.111)—were retained, reinforcing their significant impact on theft likelihood.
5.3. Significant Predictors and Their Implications
The key predictors with a significant influence on Y are discussed below.
5.3.1. Number of Appliances (X2) (p = 0.048)
The negative coefficient (−0.191) and odds ratio (0.826) suggest that households with more appliances are less likely to engage in electricity theft. This indicates that higher appliance usage might correlate with greater energy demand, leading to formalized billing relationships with energy providers rather than illicit connections. Olatunde et al. [
41] found that energy-efficient appliances lower household electricity demand, reducing the incentive for illegal connections. Research from EBSCO Research Starters highlights that higher appliance usage does not necessarily lead to theft but rather encourages formal billing relationships [
42]. A systematic review in Environment, Development and Sustainability suggests that subsidizing energy-efficient appliances can help low-income households transition to legal electricity use [
43]. Governments can mitigate theft by subsidizing appliances, deploying smart meters, and educating consumers on efficient energy use.
5.3.2. Electricity Supply Quality (X4) (p < 0.001)
The strong negative coefficient (−0.391) and odds ratio (0.676) suggest that a better electricity supply reduces the likelihood of theft. Households that receive consistent and reliable electricity are less likely to engage in theft compared to those experiencing frequent blackouts and poor service quality. This highlights the need for investment in smart grids, improved distribution infrastructure, and proactive maintenance strategies. In Brazil, India, and South Africa, smart grid investments have significantly reduced electricity theft. A study by Northeast Group, LLC found that emerging markets deploying smart meters and grid automation could cut theft-related losses by billions [
44].
5.3.3. Alternatives Used During Outages (X5)
The presence of backup power sources correlated with increased theft risk (
p < 0.0001, OR = 7.308). This suggests that frequent outages drive consumers toward illegal connections. It highlights how greater awareness and access to legal alternatives can play a role in reducing unauthorized electricity connections. Governments can reduce reliance on stolen electricity by subsidizing renewable energy alternatives through tax credits, low-interest loans, and prepaid energy credits. Investing in solar-powered microgrids and smart meters enhances legal access while discouraging theft. Policies like net metering allow households to sell excess solar power, reducing financial strain. Public awareness campaigns and community-based renewable projects further promote legal electricity use [
10].
5.3.4. Receipt of Electricity Bills (X6) (p = 0.072, OR = 0.329)
Households that receive bills regularly are less likely to resort to theft. This emphasizes the importance of transparent billing systems and improved meter installations. Utilities can implement digital invoicing for transparency and efficiency. Digital invoicing eliminates manual errors, ensures accurate, real-time billing for electricity consumption, and reduces the risk of billing manipulation and unauthorized adjustments.
5.3.5. Stress About Paying Bills (X8) (p < 0.001)
The strong negative coefficient (−1.781) and odds ratio (0.168) show that households experiencing financial distress are significantly more likely to engage in theft. High unemployment, income disparity, and energy costs force households into illegal connections. Solutions could be flexible payment plans (e.g., prepaid meters or pay-as-you-go electricity models) to ease financial strain, energy subsidies for low-income households, community solar projects to provide affordable, legal electricity, and public awareness campaigns to promote legal energy use.
5.3.6. Awareness of Tampering (X9)
The strong statistical significance of awareness suggests that communities knowledgeable about tampering techniques are at higher risk of theft. In areas with weak enforcement, electricity theft becomes socially accepted, often justified by economic hardship and lack of consequences. Community-wide participation and corrupt oversight further reinforce this behavior. Strengthening education and enforcement can reduce theft and improve electricity access, for example, public awareness campaigns highlighting the impact of theft on grid stability, stricter penalties and enforcement to deter illegal connections, tamper-proof smart meters to prevent unauthorized access, and subsidized energy programs offering affordable legal alternatives.
5.4. Lesser-Impact Predictors
While the logistic regression model identified several strong predictors, some variables demonstrated lower statistical significance but may still offer contextual insights into electricity theft behavior.
5.4.1. Perception of Electricity Costs (X15)
Though X15 (perceived cost of electricity) influenced theft likelihood, its significance varied across subcategories:
High perceived costs (p = 0.087, OR = 0.205) showed a mild negative association, suggesting consumers who view electricity as expensive may be more cautious about illegal connections.
Low- and medium-cost perceptions (p = 0.003 and p = 0.025, OR = 0.080 and 0.170, respectively) exhibited stronger negative associations, reinforcing the link between affordability and theft avoidance.
While not the strongest predictor, X15 provides useful insights for pricing strategies that could lower theft risks through tiered rates or affordability adjustments.
5.4.2. Difficulty Paying Bills (X18)
While financial hardship exhibited a significant impact through X8 (stress about bills), individual categories within X18 (difficulty paying bills) showed weaker predictive power:
“Never” (p = 0.027, OR = 0.177) displayed mild protection against theft.
“Always” (p = 0.504, OR = 2.721) had a high odds ratio but lacked statistical significance, likely due to the small sample representation.
This suggests that general financial stress (X8) is a more robust predictor than X18’s detailed payment difficulty categories, reinforcing the need for broad economic interventions rather than case-specific mitigation strategies.
5.4.3. Opinion on Theft Penalties (X19)
X19 exhibited high multicollinearity (VIF = 7.89), leading to weaker independent significance (p = 0.435, OR = 0.488). While opinions on penalties may shape theft behavior, other factors—such as awareness of tampering (X9) and billing transparency (X6)—appear more influential, indicating regulatory enforcement alone may not deter theft effectively.
5.4.4. Technical Ability to Manipulate Meters (X16)
Despite being conceptually relevant, X16 did not exhibit a strong statistical impact (p-values between 0.595 and 0.998 across skill levels). This may be due to
Limited direct self-assessment accuracy, where respondents may underreport technical skills;
High dependency on awareness (X9), suggesting that knowing about tampering techniques (X9) is more impactful than the ability to execute them (X16).
5.5. Testing Interaction Effects Between Predictors
While individual predictors significantly influence electricity theft likelihood, interactions between them may reveal hidden dependencies that provide deeper insights into consumer behavior. Assessing interaction effects helps determine whether combined variables amplify or diminish theft probability beyond their individual contributions.
5.5.1. Key Interaction Terms Considered
Based on feature importance rankings from Lasso regression and Type II analysis, the following interactions were tested for statistical significance:
Financial stress (X8) × perception of electricity costs (X15): Economic hardship combined with high price perception may increase theft risk as affordability concerns worsen.
Tampering awareness (X9) × billing transparency (X6): Consumers aware of tampering techniques may exhibit lower theft rates if billing is transparent, suggesting that trust in the utility system counteracts fraud incentives.
Homeownership status (X10) × payment type (X11): Tenants using postpaid billing may show higher theft rates, as temporary residence reduces accountability for long-term billing obligations.
5.5.2. Findings from Interaction Effects Testing
Logistic regression models incorporating interaction terms showed the following:
X8 × X15 (financial stress × electricity cost perception): Significant interaction (p < 0.003, OR = 1.341), indicating economic hardship combined with high-cost perception, substantially increases theft likelihood.
X9 × X6 (tampering awareness × billing transparency): The interaction effect was not statistically significant (p = 0.217), implying that billing transparency alone may not counteract fraud behavior in high-awareness groups.
X10 × X11 (homeownership status × payment type): There was moderate significance (p = 0.045, OR = 1.231), suggesting tenants using postpaid billing face higher theft risks.
5.5.3. Implications for Theft Mitigation Policies
Our findings indicate the following:
Dynamic pricing strategies could help lower-income households manage payments effectively, reducing theft likelihood (X8 × X15).
Billing transparency alone is insufficient to deter tampering, requiring stronger fraud monitoring systems (X9 × X6).
Tenant-focused interventions, such as incentives for prepaid adoption, could lower theft risk among non-owner households (X10 × X11).
5.6. Policy Recommendations and Practical Implementation
Given the findings, multi-pronged interventions must integrate technical, behavioral, and regulatory measures.
5.6.1. Infrastructure Investment
To enhance grid stability and limit opportunities for illegal connections, the following is recommended:
Smart grid modernization to enhance reliability and reduce theft (X4: supply quality).
Tamper-resistant meter installations to prevent unauthorized access (X9: tampering awareness).
Renewable energy projects to mitigate reliance on illegal connections (X5: backup alternatives).
5.6.2. Behavioral Interventions
Electricity theft is often driven by financial hardship and lack of awareness. Therefore, the following suggestions are to be considered:
Consumer awareness programs to discourage tampering (X9: awareness).
Flexible payment structures for economically vulnerable households (X8: financial stress).
Incentivized transition programs for informal electricity users.
5.6.3. Regulatory Enforcement
To strengthen compliance and deter illegal connections, the following should be considered:
Strengthening Legal Frameworks: Stricter penalties should be introduced for electricity theft while ensuring fair enforcement (X9: awareness of tampering).
Enhanced Billing Transparency: Utility companies should improve their billing accuracy and accessibility to reduce billing disputes (X6: receipt of electricity bills).
Fraud Detection Strategies: Investment should be made into AI-driven monitoring techniques to detect theft patterns and prevent illegal meter tampering.
5.7. Alternative and Future Research
5.7.1. Improving Threshold Optimization for Classification
The default classification threshold (0.50) used in logistic regression may not be optimal for electricity theft detection. ROC curve analysis reveals variability across bootstrap samples, suggesting that dynamic threshold adjustments could enhance model sensitivity and specificity.
To refine classification precision, Youden’s Index could be employed to determine an ideal cutoff point, balancing false positives and false negatives more effectively.
5.7.2. Exploring Alternative Statistical Techniques
While logistic regression provided interpretable, policy-relevant insights, future studies could benefit from the following:
Accuracy: The model achieved a classification accuracy of 89.35%, with high specificity (91.18%) for identifying “No” and sensitivity (86.71%) for identifying Y, using Random Forest and XGBoost, and capturing the non-linear relationships in theft prediction.
Bayesian hierarchical modeling: This would help refine the probabilistic estimations of theft likelihood.
Recurrent Neural Networks (RNNs): RNNs could be used to analyze long-term behavioral patterns in electricity theft.
5.7.3. Limitations and Future Research Directions
Despite the model’s strong predictive accuracy, several limitations should be considered when interpreting the findings:
Residual sampling biases may persist despite the structured random sampling design. For instance, there may be underrepresentation of peripheral or densely populated informal settlements where unmetered or illegal access to electricity is more common. Additionally, the dataset may have excluded off-grid households or informal users lacking formal billing, and response bias may be present due to the sensitive nature of theft-related questions.
Self-reported survey data may be subject to social desirability or recall bias, particularly regarding sensitive behaviors like meter tampering. These limitations highlight the need for cross-regional validation using mixed-method approaches or objective consumption data when available.
Some predictors—including X3, X10, X13, and X18—exhibited weaker statistical significance, suggesting the need for stepwise regression refinements or interaction modeling in future studies.
Finally, electricity theft is inherently dynamic. Future research should integrate time-series modeling to capture behavioral changes and theft trends over time, particularly in response to infrastructure upgrades, policy reforms, or energy access interventions.
To improve generalizability and robustness, we recommend that future studies adopt oversampling strategies, prioritize the inclusion of targeted subpopulations, and leverage administrative data to supplement survey-based insights.
6. Conclusions
This study provides a comprehensive quantitative and behavioral analysis of electricity theft in Kinshasa, DRC, emphasizing its technical and socioeconomic drivers. The methodology incorporated random bootstrapping sampling to enhance model reliability, while power analysis validated sample adequacy, ensuring statistical rigor in theft prediction. The logistic regression model, refined via Lasso regression feature selection, identified electricity supply quality (X4), financial stress (X8), tampering awareness (X9), and billing transparency (X6) as significant theft predictors. Households facing economic constraints and unreliable service demonstrated higher theft likelihood, reinforcing the need for flexible payment systems and consumer protection measures.
To combat electricity theft, the following strategic interventions must be incorporated:
Infrastructure upgrades, including grid modernization and smart meter deployment.
Behavioral-focused policies, such as public awareness programs and community engagement initiatives.
Regulatory enforcement, including stricter penalties and fraud detection mechanisms using AI-driven monitoring.
Future research should further explore the interaction effects between socioeconomic factors, optimize classification thresholds, and assess alternative statistical techniques, including Bayesian modeling and ensemble learning approaches. By integrating technical advancements with behavioral insights, utility providers can develop more effective anti-theft strategies, ensuring equitable energy distribution and sustainable economic progress.
Author Contributions
Conceptualization, P.K.; Formal analysis, P.K.; Investigation, P.K.; Writing—draft, P.K.; Writing—review & editing, P.B.; Supervision, review & editing, P.B. All authors have read and agreed to the published version of the manuscript.
Funding
Faculty of Engineering and the Built-Environment, University of Johannesburg.
Data Availability Statement
The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Louw, Q.; Bokoro, P. An Alternative technique for the detection and mitigation of electricity theft in South Africa. SAIEE Afr. Res. J. 2019, 110, 209–216. [Google Scholar] [CrossRef]
- Saleh, A.M.; István, V.; Khan, M.A.; Waseem, M.; Ahmed, A.N.A. Power system stability in the Era of energy Transition: Importance, Opportunities, Challenges, and future directions. Energy Convers. Manag. X 2024, 24, 100820, ISSN 2590-1745. [Google Scholar] [CrossRef]
- DR Congo Court of Auditors. The 2022 Annual Public Report. Available online: https://www.ccomptes.fr/fr/publications/le-rapport-public-annuel-2022 (accessed on 8 June 2025).
- UNDP/DR Congo. DRC Statistical Yearbook 2020. Available online: https://www.undp.org/fr/drcongo/publications/annuaire-statistique-rdc-2020 (accessed on 8 June 2025).
- Zulu, C.L.; Dzobo, O. Real-time power theft monitoring and detection system with double connected data capture system. Electr. Eng. 2023, 105, 3065–3083. [Google Scholar] [CrossRef]
- Saini, M.; Khan, S.; Singh, S.; Gupta, R.; Upadhyay, P.; Soni, S. Smart Grid: Problems, Avenues for Study & Attainable Solutions. In Proceedings of the 2021 International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, 4–5 March 2021; pp. 513–518. [Google Scholar] [CrossRef]
- Kim, S.; Sun, Y.; Lee, S.; Seon, J.; Hwang, B.; Kim, J.; Kim, J.; Kim, K.; Kim, J. Data-Driven Approaches for Energy Theft Detection: A Comprehensive Review. Energies 2024, 17, 3057. [Google Scholar] [CrossRef]
- Abro, S.A.; Hua, L.G.; Laghari, J.A.; Bhayo, M.A.; Memon, A.A. Machine learning-based electricity theft detection using support vector machines. IJECE 2024, 14, 1240. [Google Scholar] [CrossRef]
- Razavi, R.; Fleury, M. Socio-economic predictors of electricity theft in developing countries: An Indian case study. Energy Sustain. Dev. 2019, 49, 1–10. [Google Scholar] [CrossRef]
- Kgaphola, P.M.; Marebane, S.M.; Hans, R.T. Electricity Theft Detection and Prevention Using Technology-Based Models: A Systematic Literature Review. Electricity 2024, 5, 334–350. [Google Scholar] [CrossRef]
- Bhakta, P.; Debnath, S.; Debnath, P.; Das, P.; Pal, S. Power Theft Detection System. IJCRT 2022, 10. [Google Scholar]
- Dehghanpour, K.; Wang, Z.; Wang, J.; Yuan, Y.; Bu, F. A Survey on State Estimation Techniques and Challenges in Smart Distribution Systems. IEEE Trans. Smart Grid 2019, 10, 2312–2322. [Google Scholar] [CrossRef]
- Zhai, B.; Yang, D.; Zhou, B.; Li, G. Distribution System State Estimation Based on Power Flow-Guided GraphSAGE. Energies 2024, 17, 4317. [Google Scholar] [CrossRef]
- Abbasi, A.; Sultan, K.; Aziz, M.A.; Khan, A.U.; Khalid, H.A.; Guerrero, J.M.; Zafar, B.A. A Novel Dynamic Appliance Clustering Scheme in a Community Home Energy Management System for Improved Stability and Resiliency of Microgrids. IEEE Access 2021, 9, 142276–142288. [Google Scholar] [CrossRef]
- Bagundang, E.; Rael, C. Clustering Commercial And Residential Electricity Consumption Using K-Means Algorithm. Int. J. Sci. Technol. Res. 2021, 10, 8–11. [Google Scholar]
- Sen, A.; Yang, N.-C. Power Theft Detection Using Advanced Neural Network in Three-phase Distribution Systems. IEEE Trans. Instrum. Meas. 2024, 73, 1–10. [Google Scholar] [CrossRef]
- Nayak, R. Employing Feature Extraction, Feature Selection, and Machine Learning to Classify Electricity Consumption as Normal or Electricity Theft. SN Comput. Sci. 2023, 4, 1–15. [Google Scholar] [CrossRef]
- Bello, H.O.; Idemudia, C.; Iyelolu, T.V. Integrating machine learning and blockchain: Conceptual frameworks for real-time fraud detection and prevention. World J. Adv. Res. Rev. 2024, 23, 56–68. [Google Scholar] [CrossRef]
- Hu, Y.; Zhang, Y.; Huang, T.; Hu, Z.; Fan, Z.; Li, C. A Detection Method for Electricity Theft Based on Random Forest Algorithm. In Proceedings of the 2020 10th International Conference on Power and Energy Systems (ICPES), Chengdu, China, 25–27 December 2020; pp. 553–557. [Google Scholar] [CrossRef]
- Yip, S.-C.; Wong, K.; Hew, W.-P.; Gan, M.-T.; Phan, R.C.-W.; Tan, S.-W. Detection of energy theft and defective smart meters in smart grids using linear regression. Int. J. Electr. Power Energy Syst. 2017, 91, 230–240. [Google Scholar] [CrossRef]
- Sasmoko, R.P.; Setyonegoro, M.I.B.; Hidayah, I. Electricity Theft Detection Using K-means Clustering in Electricity Information System. In Proceedings of the 2024 International Conference on Smart Computing, IoT and Machine Learning (SIML), Surakarta, Indonesia, 6–7 June 2024; pp. 316–321. [Google Scholar] [CrossRef]
- Yang, Z.; Liu, L.; Li, N.; Li, H. A self-decision ant colony clustering algorithm for electricity theft detection. Eng. Appl. Artif. Intell. 2024, 133, 108442. [Google Scholar] [CrossRef]
- Žarković, M.; Dobrić, G. Artificial Intelligence for Energy Theft Detection in Distribution Networks. Energies 2024, 17, 1580. [Google Scholar] [CrossRef]
- Qi, R.; Zheng, J.; Luo, Z.; Li, Q. A Novel Unsupervised Data-Driven Method for Electricity Theft Detection in AMI Using Observer Meters. IEEE Trans. Instrum. Meas. 2022, 71, 1–10. [Google Scholar] [CrossRef]
- Kawoosa, A.I.; Prashar, D. Application of XGBoost ensemble method for energy theft detection in Smart Energy Meters. In Proceedings of the 2022 10th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 13–14 October 2022; pp. 1–6. [Google Scholar] [CrossRef]
- Xu, L.; Shao, Z.; Chen, F. A combined unsupervised learning approach for electricity theft detection and loss estimation. IET Energy Syst. Integr. 2023, 5, 213–227. [Google Scholar] [CrossRef]
- Khalid, A.; Mustafa, G.; Rana, M.R.R.; Alshahrani, S.M.; Alymani, M. RNN-BiLSTM-CRF based amalgamated deep learning model for electricity theft detection to secure smart grids. PeerJ Comput. Sci. 2024, 10, e1872. [Google Scholar] [CrossRef]
- Wang, Y.; Jin, S.; Cheng, M. A Convolution–Non-Convolution Parallel Deep Network for Electricity Theft Detection. Sustainability 2023, 15, 10127. [Google Scholar] [CrossRef]
- Yang, J.; Wei, M.; Huang, D. A High-loss Power Line Theft Detection Method Based on Segmented Dynamic Time Warping Distance. J. Phys. Conf. Ser. 2024, 2717, 012023. [Google Scholar] [CrossRef]
- Wei, L.; Sundararajan, A.; Sarwat, A.I.; Biswas, S.; Ibrahim, E. A distributed intelligent framework for electricity theft detection using benford’s law and stackelberg game. In Proceedings of the 2017 Resilience Week (RWS), Wilmington, DE, USA, 18–22 September 2017; pp. 5–11. [Google Scholar] [CrossRef]
- Amin, S.; Schwartz, G.A.; Cardenas, A.A.; Sastry, S.S. Game-Theoretic Models of Electricity Theft Detection in Smart Utility Networks: Providing New Capabilities with Advanced Metering Infrastructure. IEEE Control Syst. 2015, 35, 66–81. [Google Scholar] [CrossRef]
- Sethi, A.R.; Amin, S.; Schwartz, G. Value of intrusion detection systems for countering energy fraud. In Proceedings of the 2017 American Control Conference (ACC), Seattle, WA, USA, 24–26 May 2017; pp. 2739–2746. [Google Scholar] [CrossRef]
- Babar, Z.; Jamil, F.; Haq, W. Consumer’s perception towards electricity theft: A case study of Islamabad and Rawalpindi using a path analysis. Energy Policy 2022, 169, 113189, ISSN 0301-4215. [Google Scholar] [CrossRef]
- Pulz, J.; Muller, R.B.; Romero, F.; Meffe, A.; Neto, Á.F.G.; Jesus, A.S. Fraud detection in low-voltage electricity consumers using socio-economic indicators and billing profile in smart grids. CIRED—Open Access Proc. J. 2017, 2017, 2300–2303. [Google Scholar] [CrossRef]
- Jamil, F.; Ahmad, E. Policy considerations for limiting electricity theft in the developing countries. Energy Policy 2019, 129, 452–458. [Google Scholar] [CrossRef]
- Hussain, M.; Iacovides, I.; Lawton, T.; Sharma, V.; Porter, Z.; Cunningham, A.; Habli, I.; Hickey, S.; Jia, Y.; Morgan, P.; et al. Development and translation of human-AI interaction models into working prototypes for clinical decision-making. In Proceedings of the Designing Interactive Systems Conference (DIS’24), Copenhagen, Denmark, 1–5 July 2024; pp. 1607–1619. [Google Scholar] [CrossRef]
- Pathak, A.; Bansal, V. AI as decision aid or delegated agent: The effects of trust dimensions on the adoption of AI digital agents. Comput. Hum. Behav. Artif. Hum. 2024, 2, 100094. [Google Scholar] [CrossRef]
- De Souza, M.A.; Pereira, J.L.R.; Alves, G.D.O.; De Oliveira, B.C.; Melo, I.D.; Garcia, P.A.N. Detection and identification of energy theft in advanced metering infrastructures. Electr. Power Syst. Res. 2020, 182, 106258. [Google Scholar] [CrossRef]
- Saini, S. Social and behavioral aspects of electricity theft: An explorative review. Int. J. Res. Econ. Soc. Sci. 2017, 7, 26–37. [Google Scholar]
- Maraden, Y.; Wibisono, G.; Nugraha, I.G.D.; Sudiarto, B.; Jufri, F.H. Enhancing Electricity Theft Detection through K-Nearest Neighbors and Logistic Regression Algorithms with Synthetic Minority Oversampling Technique: A Case Study on State Electricity Company (PLN) Customer Data. Energies 2023, 16, 5405. [Google Scholar] [CrossRef]
- Olatunde, T.M.; Okwandu, A.C.; Akande, D.O. Reviewing the impact of energy-efficient appliances on household consumption. Int. J. Sci. Technol. Res. Arch. 2024, 6, 1–11. [Google Scholar] [CrossRef]
- EBSCO. Appliances and energy consumption. Available online: https://www.ebsco.com/research-starters/power-and-energy/appliances-and-energy-consumption (accessed on 8 June 2025).
- Salami, H.; Okpara, K.; Choochuay, C.; Kuaanan, T.; Akeju, D.; Shitta, M. Domestic energy consumption, theories, and policies: A systematic review. Environ. Dev. Sustain. 2023, 27, 5821–5867. [Google Scholar] [CrossRef]
- Mahmood, M.; Chowdhury, P.; Yeassin, R.; Hasan, M.; Ahmad, T.; Chowdhury, N.U.R. Impacts of digitalization on smart grids, renewable energy, and demand response: An updated review of current applications. Energy Convers. Manag. X 2024, 24, 100790. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).