Next Article in Journal
A Multi-Timescale Operational Strategy for Active Distribution Networks with Load Forecasting Integration
Previous Article in Journal
Environmental and Social Dimensions of Energy Transformation Using Geothermal Energy
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bridging the Energy Divide: An Analysis of the Socioeconomic and Technical Factors Influencing Electricity Theft in Kinshasa, DR Congo

by
Patrick Kankonde
1,2 and
Pitshou Bokoro
2,*
1
Department of Basic Sciences, Faculty of Polytechnic, University of Kinshasa, Kinshasa P.O. Box 255, Democratic Republic of the Congo
2
Department of Electrical and Electronic Engineering Technology, Faculty of Engineering and the Built Environment, University of Johannesburg, Johannesburg 2024, South Africa
*
Author to whom correspondence should be addressed.
Energies 2025, 18(13), 3566; https://doi.org/10.3390/en18133566
Submission received: 8 May 2025 / Revised: 16 June 2025 / Accepted: 28 June 2025 / Published: 7 July 2025
(This article belongs to the Section F4: Critical Energy Infrastructure)

Abstract

Electricity theft remains a persistent challenge, particularly in developing economies where infrastructure limitations and socioeconomic disparities contribute to illegal connections. This study analyzes the determinants influencing electricity theft in Kinshasa, the Democratic Republic of Congo, using a logistic regression model applied to 385 observations, which includes random bootstrapping sampling for enhanced stability and power analysis validation to confirm the adequacy of the sample size. The model achieved an AUC of 0.86, demonstrating strong discriminatory power, while the Hosmer–Lemeshow test (p = 0.471) confirmed its robust fit. Our findings indicate that electricity supply quality, financial stress, tampering awareness, and billing transparency are key predictors of theft likelihood. Households experiencing unreliable service and economic hardship showed higher theft probability, while those receiving regular invoices and alternative legal energy solutions exhibited lower risk. Lasso regression was implemented to refine predictor selection, ensuring model efficiency. Based on these insights, a multifaceted policy approach—including grid modernization, prepaid billing systems, awareness campaigns, and regulatory enforcement—is recommended to mitigate electricity theft and promote sustainable energy access in urban environments.

1. Introduction

1.1. Contextual Background

Electricity theft poses a significant global challenge, leading to substantial economic losses for power utilities and introducing safety risks, such as equipment damage and outages. According to a 2017 report by Northeast Group LLC [1], non-technical losses—including theft, fraud, and billing errors—amount to approximately USD 96 billion annually worldwide. In China, electricity theft has severely impacted the State Grid Corporation, prompting USD 84 billion in grid investments in 2024 to curb losses [2]. Despite substantial investments in grid modernization, theft remains prevalent, particularly in developing economies where electricity access is limited and regulatory enforcement is weak.
The Democratic Republic of Congo (DRC) exemplifies this paradox. Despite possessing 100,000 MW of hydropower potential, only 2.5% is harnessed, leaving 21.5% of the population with electricity access as of 2022 [3]. Rapid urban expansion in Kinshasa (13 million residents) and inflation (23.8% in 2023) [4] have weakened household purchasing power, making utility payments unaffordable [5]. The inability to collect revenue effectively restricts grid expansion, worsening service reliability and theft risks. Additionally, operational inefficiencies—including limited personnel, outdated infrastructure, and unreliable billing practices—create opportunities for electricity theft. Socioeconomic disparities, inconsistent billing practices, unreliable supply, and general awareness of tampering methods contribute to a complex environment where electricity theft becomes a coping mechanism for many [6].

1.2. Problem Statement

Despite extensive technical advancements in electricity distribution, theft remains an unresolved issue, particularly in developing economies. While previous studies have explored real-time fraud monitoring, smart metering, and AI-driven detection [7,8], they often overlook the role of socioeconomic factors and behavioral dynamics. In Kinshasa, the persistence of electricity theft stems from a complex interplay of economic hardships, regulatory weaknesses, and consumer perceptions [9]. Understanding these behavioral and infrastructural determinants is critical to formulating holistic solutions beyond mere technological interventions.

1.3. Aim and Objectives

1.3.1. Aim

This work aims to analyze the primary technical, socioeconomic, and behavioral factors influencing electricity theft in Kinshasa using logistic regression.

1.3.2. Objectives

Specifically, this work seeks to:
  • Estimate the likelihood of electricity theft in Kinshasa using logistic regression;
  • Determine key predictors, including billing methods, meter types, and socioeconomic indicators;
  • Assess the impact of behavioral factors, such as awareness of tampering, on theft probability;
  • Formulate strategic recommendations to mitigate electricity theft through infrastructure upgrades, policy interventions, and public awareness initiatives.

1.4. Work Structure

The remainder of this paper is organized as follows:
  • Section 2: Literature Review—a critical evaluation of existing theft detection methods.
  • Section 3: Methodology—survey design, variable selection, logistic regression modeling.
  • Section 4: Results—model findings and predictive validity assessment.
  • Section 5: Discussion—interpretation, limitations, and future directions.
  • Section 6: Conclusion—key results and policy implications.

2. Literature Review

Electricity theft remains a persistent challenge that requires robust detection and mitigation strategies. Research efforts have categorized existing approaches into hardware-based, non-hardware-based, game-theoretic models, and behavioral analysis. While technological solutions focus on real-time monitoring and anomaly detection, emerging studies highlight the role of socioeconomic determinants in shaping theft patterns. This section critically evaluates existing methodologies and identifies gaps relevant to Kinshasa’s electricity theft landscape.

2.1. Hardware-Based Methods

Hardware-based approaches deploy physical devices such as smart meters, tamper-resistant seals, and sensor-driven systems to deter unauthorized electricity usage. Zulu and Dzobo introduced a real-time theft monitoring framework using double-connected data capture, improving anomaly detection rates [5]. Additionally, Advanced Metering Infrastructure (AMI) enables utilities to track consumption patterns remotely and generate alerts for suspicious activities [10].
Despite their effectiveness, hardware-based approaches face deployment challenges, particularly in regions with outdated grids or limited funding. High installation costs hinder large-scale adoption, leaving utilities reliant on manual inspections and penalty enforcement, as highlighted by Bhakta et al. [11]. This highlights the need for hybrid approaches, combining hardware with predictive analytics.

2.2. Non-Hardware Methods

Data-driven approaches leverage network analysis, machine learning, and artificial intelligence (AI) to detect irregular consumption patterns. These techniques offer scalable alternatives to hardware solutions, utilizing grid data and meter readings for theft identification.

2.2.1. Network Data Analysis

Network analysis employs load flow assessments, state estimation, and anomaly detection algorithms to expose theft. Dehghanpour et al. [12] and Zhai et al. [13] demonstrated the utility of state estimation models, while clustering algorithms, as applied in [14,15], group consumption data to expose atypical usage patterns indicative of fraud. Additionally, Sen and Yang [16] utilize three-phase power data with neural network models to enhance theft detection capabilities. Although these methods successfully identify network-wide anomalies, pinpointing individual theft incidents remains difficult.

2.2.2. Consumer Meter Data with Artificial Intelligence

Artificial intelligence (AI) has become essential in consumer-level electricity theft detection, leveraging machine learning to analyze meter readings and identify anomalies.
Supervised learning models, including Support Vector Machines (SVMs), decision trees, and neural networks, have proven effective in classifying fraudulent users [8,17,18]. Hu et al. [19] highlight Random Forest algorithms, known for their ability to handle imbalanced datasets. Regression techniques [20] have also been applied to detect faulty smart meters and theft through consumption anomalies.
Despite their effectiveness, supervised models face data labeling constraints, class imbalance issues, and overfitting risks. To mitigate these challenges, unsupervised learning techniques, such as clustering, have gained traction. Sasmoko et al. [21] utilize anomaly detection models, Yang et al. [22] introduce ant colony clustering algorithms, and Žarković and Dobrić [23] leverage K-Means clustering, while Qi et al. [24] integrate wavelet-based feature extraction with fuzzy c-means (FCM) clustering. However, threshold definition and overlapping clusters remain obstacles.
Ensemble learning approaches have emerged as powerful alternatives. Kawoosa and Prashar [25] employ XGBoost, enhancing classification accuracy, while Qi et al. [24] integrate Minimal Gated Memory (MGM) networks and Adaptive Synthetic Sampling (ADASYN) to address class imbalance issues. Hybrid models [26,27] combining Recurrent Neural Networks (RNNs) and Bidirectional Long Short-Term Memory (Bi-LSTM) further enhance theft detection efficiency.
However, AI models face limitations such as overfitting risks, data labeling constraints, and high computational requirements [25]. A promising alternative is logistic regression modeling, which simplifies predictive frameworks and ensures practical usability in policy applications.

2.2.3. Hybrid Methods

Hybrid methods combine network-level data with consumer meter data to enhance theft detection accuracy. Wang et al. [28] demonstrate that clustering algorithms help detect anomalies in network data, while AI models refine individual consumption analysis, improving detection precision [29]. A key approach employs heuristic segmentation, transforming time-series consumption patterns and grid loss metrics to extract theft indicators. Consumers showing a strong correlation with line losses and consistent waveform anomalies are flagged as suspicious.
Despite their effectiveness, hybrid methods require advanced data infrastructure and high computational power. Nonetheless, they are integral to modern smart grids, enabling real-time, data-driven monitoring for theft mitigation.

2.3. Game Theory

Game theory evaluates consumer–utility interactions, treating electricity theft as a strategic decision-making process.
Wei et al. [30] integrate Benford’s Law with a Stackelberg model, positioning the utility company as a leader, while consumers react by optimizing theft strategies. Likelihood Ratio Tests (LRTs) further refine detection. In contrast to other studies presented in [31], Ref. [32] explored Bayesian intrusion detection frameworks, enhancing utility intervention tactics.
Despite its theoretical appeal, game theory models rely on assumptions of rational consumer behavior, limiting real-world applicability due to data constraints and scalability issues [31]. Thus, behavioral analysis remains critical in understanding theft motivations beyond economic utility optimization.

2.4. Behavioral Analysis

Behavioral analysis examines the psychological and socioeconomic motivations behind electricity theft, identifying patterns in consumer decision-making.
Surveys and interviews have uncovered key theft drivers, such as financial hardship, lack of awareness, and dissatisfaction with utility services [33]. Socioeconomic models [34] integrating income levels, demographics, and billing history provide data-driven fraud detection. Studies [9,35] emphasize the need for enhanced billing systems, stricter enforcement, and public awareness campaigns to mitigate theft risks. Razavi and Fleury [9] highlight income inequality and inadequate infrastructure as key contributors to non-technical losses.
Increasingly, behavioral insights are combined with AI-based detection models, as demonstrated by Hussain [36] and Bansal et al. [37], improving predictive accuracy.

2.5. Research Gaps and Contribution

The existing literature predominantly focuses on hardware detection, AI-based monitoring, and economic modeling. However, limited research incorporates behavioral predictors within statistical fraud detection frameworks. Table 1 provides a summary of currently available literature. This study bridges the gap by integrating billing stress, meter type, tampering awareness, financial hardship, and regulatory influences into logistic regression modeling, offering a comprehensive predictive framework for electricity theft detection in Kinshasa.

3. Methodology

This study employs a rigorous statistical approach to analyze electricity theft determinants in Kinshasa, leveraging logistic regression for predictive modeling. The research design integrates random sampling, bootstrapping techniques, and regularization-based feature selection to ensure robust, reliable insights into theft patterns.

3.1. Data Collection and Preprocessing

3.1.1. Survey Design

A structured survey was developed to assess technical, behavioral, and socioeconomic influences on electricity theft. The key predictor variables and their modalities are detailed in Table 2 below:
To ensure validity and reliability, domain relevance guided variable selection, grounded in existing electricity theft research [9,39]. Exploratory analyses assessed initial correlations between theft likelihood and independent variables.

3.1.2. Population and Sample Size Determination

A statistically valid sample size is essential for meaningful conclusions in survey-based studies [37]. Using the Krejcie and Morgan formula, an initial sample size of 385 participants was determined for Kinshasa’s population (13 million residents). A power analysis further validated its adequacy in capturing theft-related socioeconomic patterns:
n = N . Z 2 . p . 1 p e 2 . N 1 + Z 2 . p . 1 p              
where
  • N = population size;
  • Z = 1.96 (95% confidence level);
  • p = 0.5 (the population proportion assuming maximum variance);
  • e = 0.05 (margin of error or the degree of accuracy expressed as a proportion).
As Kinshasa’s urban population represents 40% of the DRC’s total population, Equation (1) simplifies under large-population assumptions:
lim N n = Z 2 . p . 1 p e 2    
resulting in a final sample size of 385 respondents.
To prevent demographic biases, random sampling ensured equal probability selection, complemented by bootstrapping techniques for enhanced model stability and reliability.
These methodological choices enable a holistic understanding of electricity theft drivers in Kinshasa, independent of the subgroup segmentation.

3.1.3. Preprocessing

Following data collection, the steps taken included the following:
  • Integrity checks confirmed completeness, eliminating the need for imputation.
  • Quantitative variables were standardized, ensuring uniform scales for regression modeling.
  • Categorical variables underwent dummy coding (binary) and ordinal transformations (ranked perceptions) for suitability in logistic regression.
Multicollinearity detection via the Variance Inflation Factor (VIF) safeguarded model stability, ensuring predictor independence.

3.2. Logistic Regression

3.2.1. Model Specification

A binary logistic regression model was formulated to predict the likelihood of electricity theft (Y = Yes or No) [40], and Logit function was chosen as the link function, ensuring a proper classification framework:
P Y = M 2 = 1 1 + e β 0 + β 1 X 1 + + β 20 X 19          
where
  • Y is the binary dependent variable (theft occurrence);
  • Xn (n = 1 to 19) represents independent predictors (socioeconomic, technical, and behavioral variables);
  • βn (n = 1 to 19) represents the regression coefficients (interpreted using odds ratios).

3.2.2. Variable Selection and Optimization

To prevent overfitting and enhance interpretability, the following analyses were performed:
  • Multicollinearity diagnostics, using the Variance Inflation Factor (VIF) to exclude highly correlated predictors.
  • Feature importance analysis ranked the most influential variables.
  • Lasso regression (cv = 5) optimized the regularization parameters.

3.2.3. Parameter Estimation

The Newton–Raphson algorithm iteratively updated the model coefficients, converging when the likelihood change dropped below 0.001, ensuring accurate parameter estimation.

3.2.4. Model Evaluation

The predictive performance of the model was evaluated primarily using the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve. Additional metrics, including McFadden’s R2, Cox and Snell R2, and Nagelkerke R2, were computed to measure the goodness of fit. Furthermore, classification, accuracy, sensitivity, and specificity were derived from the confusion matrix to quantify the overall performance and diagnostic effectiveness.

3.2.5. Significance Testing and Validation

Predictors underwent Wald tests and Type II likelihood ratio tests to establish significance, with 95% confidence intervals calculated for odds ratio interpretations.

4. Results

4.1. Sample Adequacy and Validation

4.1.1. Sample Selection and Bootstrap Validation

The study employed a random bootstrapping approach, ensuring model stability across different samples. The AUC distribution over 100 bootstrap iterations yielded a mean AUC of 94.92%, with a standard deviation of 0.8%, indicating strong discriminative performance. Additionally, 94% of samples fell within the upper and lower confidence bounds, confirming consistent classification accuracy across different sampling conditions. This is indicated in Figure 1.

4.1.2. Post-Hoc Power Analysis

  • The sample size used was 385.
  • The observed effect size.
  • The logistic regression model explained 61.4% of the variability in electricity theft likelihood (Nagelkerke R2 = 0.614).
  • The predictors demonstrated strong influence (Cohen’s f2 = 1.588), confirming the robustness of the model.
  • The significance level (α) was 0.05.
  • This is the probability of making a Type I error (rejecting a true null hypothesis).
  • α = 0.05 means there is a 5% chance of falsely detecting an effect when none exists.
  • The computed power (1–β) was 1.0000.
  • This indicates an extremely high probability of detecting a true effect if one exists.
  • Given this high power, the risk of Type II error (false negatives) is negligible, ensuring the reliability of the model results.

4.2. Descriptive Analysis

4.2.1. Quantitative Variables

The key variables exhibited moderate variations:
  • Household size (Mean = 4.699, Std Dev = 2.174).
  • Number of appliances (Mean = 3.418, Std Dev = 2.311).
  • Electricity supply quality (Mean = 5.036, Std Dev = 2.354). The low standard deviation for backup energy sources (X5 = 0.353) suggests consistent reliance on alternative power solutions. Table 3 provides descriptive statistics for this analysis.

4.2.2. Qualitative Variables

The categorical variables provided the following key behavioral insights:
  • A total of 41.04% of households reported electricity theft occurrences, highlighting the prevalence of the issue.
  • A total of 70.91% of respondents were tenants, suggesting potential links between rental status and theft likelihood.
  • Prepaid electricity usage (57.66%) was more common than postpaid, possibly affecting theft patterns.
  • A total of 54.03% had tampering awareness, indicating a widespread understanding of meter manipulation techniques. Table 4 provides descriptive analysis of quantitative variables.

4.3. Multicollinearity and Feature Selection

The variables X7 and X19 exhibit high VIF values (>5), potentially indicating strong collinearity, whereas X5 has the lowest VIF (1.18), suggesting minimal correlation with other predictors. The multicollinearity statistics are provided in Table 5.
Lasso regression optimized feature selection, with variables X8 (financial stress) and X9 (tampering awareness) showing the strongest positive coefficients. The selection regression feature for electricity theft detection are given in Table 6.

4.4. Regression Analysis of Variable Y

4.4.1. Model Specification

This summary highlights the model’s effectiveness and provides key insights into its explanatory power. This is shown in Table 7.

4.4.2. Test of Null Hypothesis

The p-values (<0.0001) indicate strong statistical significance, suggesting that the model is highly informative for explaining variations in the dependent variable. The model evaluation is summarized in Table 8.

4.4.3. Type II Analysis (Variable Y)

This table presents the Type II analysis, detailing statistical measures for various predictors associated with variable Y. It includes the following:
  • Degrees of Freedom (DDL): These determine the number of independent comparisons per variable.
  • Wald Chi-Square (Khi2 Wald) and Pr > Wald: These assess the individual significance of predictors in the model. Smaller p-values (Pr > Wald) suggest stronger statistical significance.
  • Likelihood Ratio Chi-Square (Khi2 LR) and Pr > LR: These evaluate the overall model contribution when including each variable. Results for type II analysis are given in Table 9.

4.4.4. Hosmer–Lemeshow Test (Variable Y)

The Hosmer–Lemeshow test evaluates the goodness of fit of a logistic regression model by comparing observed and predicted probabilities across different risk groups.
  • A Chi-Square value of 8.641 with eight degrees of freedom suggests the model’s predictions align reasonably well with the observed data.
  • The p-value (Pr > Khi2) of 0.374 indicates that there is no strong evidence of poor fit, meaning the model does not significantly deviate from the expected outcomes. The goodness-of-fit test results are provided in Table 10.

4.4.5. Model Parameters (Y)

This table presents the results of a logistic regression analysis for Variable Y, detailing statistical indicators for the following model predictors:
  • Values and Standard Errors: These estimate each variable’s effect size with its respective uncertainty.
  • Wald Chi-Square and Pr > Chi2: These evaluate statistical significance, where low p-values (<0.05) indicate strong predictor influence.
  • Confidence Intervals (95%): The lower and upper bounds reflect estimation precision.
  • Odds Ratio (OR): The OR measures the likelihood of outcome occurrence for each predictor. Logistic Regression results are given in Table 11, and the standardized coefficients in Table 12.

4.5. Predictions and Residuals (Variable Y)

4.5.1. Prediction Accuracy

The model maintained high accuracy across training, test, and validation sets:
  • Train Set: accuracy = 87.8%, and the F1-score = 0.88
  • Test Set: accuracy = 77.8%, and the F1-score = 0.78
  • Validation Set: accuracy = 83.9%, and the F1-score = 0.83. These results highlight consistent predictive reliability. The model performance metrics for class 1, 0 and combined are given in Table 13, Table 14 and Table 15, respectively.

4.5.2. Residual Analysis

The mean deviance residuals indicated in Figure 2 remained within acceptable thresholds, confirming model robustness. The residual metrics are provided in Table 16.

5. Discussion

5.1. Model Fit and Performance

The logistic regression model exhibited strong predictive capability, achieving an AUC of 96.51%, indicating high classification accuracy in distinguishing electricity theft from non-theft cases. The Hosmer–Lemeshow test (p = 0.471) confirmed a good model fit, with no statistically significant deviations between the observed and predicted probabilities.
To enhance model stability, random bootstrapping sampling was applied across 100 resampled datasets, yielding an average AUC of 94.92%, with a standard deviation of 0.8%, reinforcing consistent classification performance. Additionally, 94% of the bootstrap samples fell within upper and lower confidence bounds, verifying the model’s robustness across varying conditions.
Further strengthening these results, a post-hoc power analysis validated the sample size (n = 385), confirming high statistical power (1 − β = 1.0000) and ensuring a negligible risk of Type II errors, meaning that true effects were accurately detected.

5.2. Addressing Potential Multicollinearity

Multicollinearity assessments identified high Variance Inflation Factor (VIF) values for X7 (meter experience, VIF = 7.47) and X19 (theft penalty opinion, VIF = 7.89), indicating strong predictor redundancy.
To refine feature selection, Lasso regression (cv = 5) was applied, ensuring optimal dimensionality reduction while preserving key explanatory variables. The strongest contributors to electricity theft prediction—X8 (financial stress, Lasso coefficient = 0.130) and X9 (tampering awareness, coefficient = 0.111)—were retained, reinforcing their significant impact on theft likelihood.

5.3. Significant Predictors and Their Implications

The key predictors with a significant influence on Y are discussed below.

5.3.1. Number of Appliances (X2) (p = 0.048)

The negative coefficient (−0.191) and odds ratio (0.826) suggest that households with more appliances are less likely to engage in electricity theft. This indicates that higher appliance usage might correlate with greater energy demand, leading to formalized billing relationships with energy providers rather than illicit connections. Olatunde et al. [41] found that energy-efficient appliances lower household electricity demand, reducing the incentive for illegal connections. Research from EBSCO Research Starters highlights that higher appliance usage does not necessarily lead to theft but rather encourages formal billing relationships [42]. A systematic review in Environment, Development and Sustainability suggests that subsidizing energy-efficient appliances can help low-income households transition to legal electricity use [43]. Governments can mitigate theft by subsidizing appliances, deploying smart meters, and educating consumers on efficient energy use.

5.3.2. Electricity Supply Quality (X4) (p < 0.001)

The strong negative coefficient (−0.391) and odds ratio (0.676) suggest that a better electricity supply reduces the likelihood of theft. Households that receive consistent and reliable electricity are less likely to engage in theft compared to those experiencing frequent blackouts and poor service quality. This highlights the need for investment in smart grids, improved distribution infrastructure, and proactive maintenance strategies. In Brazil, India, and South Africa, smart grid investments have significantly reduced electricity theft. A study by Northeast Group, LLC found that emerging markets deploying smart meters and grid automation could cut theft-related losses by billions [44].

5.3.3. Alternatives Used During Outages (X5)

The presence of backup power sources correlated with increased theft risk (p < 0.0001, OR = 7.308). This suggests that frequent outages drive consumers toward illegal connections. It highlights how greater awareness and access to legal alternatives can play a role in reducing unauthorized electricity connections. Governments can reduce reliance on stolen electricity by subsidizing renewable energy alternatives through tax credits, low-interest loans, and prepaid energy credits. Investing in solar-powered microgrids and smart meters enhances legal access while discouraging theft. Policies like net metering allow households to sell excess solar power, reducing financial strain. Public awareness campaigns and community-based renewable projects further promote legal electricity use [10].

5.3.4. Receipt of Electricity Bills (X6) (p = 0.072, OR = 0.329)

Households that receive bills regularly are less likely to resort to theft. This emphasizes the importance of transparent billing systems and improved meter installations. Utilities can implement digital invoicing for transparency and efficiency. Digital invoicing eliminates manual errors, ensures accurate, real-time billing for electricity consumption, and reduces the risk of billing manipulation and unauthorized adjustments.

5.3.5. Stress About Paying Bills (X8) (p < 0.001)

The strong negative coefficient (−1.781) and odds ratio (0.168) show that households experiencing financial distress are significantly more likely to engage in theft. High unemployment, income disparity, and energy costs force households into illegal connections. Solutions could be flexible payment plans (e.g., prepaid meters or pay-as-you-go electricity models) to ease financial strain, energy subsidies for low-income households, community solar projects to provide affordable, legal electricity, and public awareness campaigns to promote legal energy use.

5.3.6. Awareness of Tampering (X9)

The strong statistical significance of awareness suggests that communities knowledgeable about tampering techniques are at higher risk of theft. In areas with weak enforcement, electricity theft becomes socially accepted, often justified by economic hardship and lack of consequences. Community-wide participation and corrupt oversight further reinforce this behavior. Strengthening education and enforcement can reduce theft and improve electricity access, for example, public awareness campaigns highlighting the impact of theft on grid stability, stricter penalties and enforcement to deter illegal connections, tamper-proof smart meters to prevent unauthorized access, and subsidized energy programs offering affordable legal alternatives.

5.4. Lesser-Impact Predictors

While the logistic regression model identified several strong predictors, some variables demonstrated lower statistical significance but may still offer contextual insights into electricity theft behavior.

5.4.1. Perception of Electricity Costs (X15)

Though X15 (perceived cost of electricity) influenced theft likelihood, its significance varied across subcategories:
  • High perceived costs (p = 0.087, OR = 0.205) showed a mild negative association, suggesting consumers who view electricity as expensive may be more cautious about illegal connections.
  • Low- and medium-cost perceptions (p = 0.003 and p = 0.025, OR = 0.080 and 0.170, respectively) exhibited stronger negative associations, reinforcing the link between affordability and theft avoidance.
While not the strongest predictor, X15 provides useful insights for pricing strategies that could lower theft risks through tiered rates or affordability adjustments.

5.4.2. Difficulty Paying Bills (X18)

While financial hardship exhibited a significant impact through X8 (stress about bills), individual categories within X18 (difficulty paying bills) showed weaker predictive power:
  • “Never” (p = 0.027, OR = 0.177) displayed mild protection against theft.
  • “Always” (p = 0.504, OR = 2.721) had a high odds ratio but lacked statistical significance, likely due to the small sample representation.
This suggests that general financial stress (X8) is a more robust predictor than X18’s detailed payment difficulty categories, reinforcing the need for broad economic interventions rather than case-specific mitigation strategies.

5.4.3. Opinion on Theft Penalties (X19)

X19 exhibited high multicollinearity (VIF = 7.89), leading to weaker independent significance (p = 0.435, OR = 0.488). While opinions on penalties may shape theft behavior, other factors—such as awareness of tampering (X9) and billing transparency (X6)—appear more influential, indicating regulatory enforcement alone may not deter theft effectively.

5.4.4. Technical Ability to Manipulate Meters (X16)

Despite being conceptually relevant, X16 did not exhibit a strong statistical impact (p-values between 0.595 and 0.998 across skill levels). This may be due to
  • Limited direct self-assessment accuracy, where respondents may underreport technical skills;
  • High dependency on awareness (X9), suggesting that knowing about tampering techniques (X9) is more impactful than the ability to execute them (X16).

5.5. Testing Interaction Effects Between Predictors

While individual predictors significantly influence electricity theft likelihood, interactions between them may reveal hidden dependencies that provide deeper insights into consumer behavior. Assessing interaction effects helps determine whether combined variables amplify or diminish theft probability beyond their individual contributions.

5.5.1. Key Interaction Terms Considered

Based on feature importance rankings from Lasso regression and Type II analysis, the following interactions were tested for statistical significance:
  • Financial stress (X8) × perception of electricity costs (X15): Economic hardship combined with high price perception may increase theft risk as affordability concerns worsen.
  • Tampering awareness (X9) × billing transparency (X6): Consumers aware of tampering techniques may exhibit lower theft rates if billing is transparent, suggesting that trust in the utility system counteracts fraud incentives.
  • Homeownership status (X10) × payment type (X11): Tenants using postpaid billing may show higher theft rates, as temporary residence reduces accountability for long-term billing obligations.

5.5.2. Findings from Interaction Effects Testing

Logistic regression models incorporating interaction terms showed the following:
  • X8 × X15 (financial stress × electricity cost perception): Significant interaction (p < 0.003, OR = 1.341), indicating economic hardship combined with high-cost perception, substantially increases theft likelihood.
  • X9 × X6 (tampering awareness × billing transparency): The interaction effect was not statistically significant (p = 0.217), implying that billing transparency alone may not counteract fraud behavior in high-awareness groups.
  • X10 × X11 (homeownership status × payment type): There was moderate significance (p = 0.045, OR = 1.231), suggesting tenants using postpaid billing face higher theft risks.

5.5.3. Implications for Theft Mitigation Policies

Our findings indicate the following:
  • Dynamic pricing strategies could help lower-income households manage payments effectively, reducing theft likelihood (X8 × X15).
  • Billing transparency alone is insufficient to deter tampering, requiring stronger fraud monitoring systems (X9 × X6).
  • Tenant-focused interventions, such as incentives for prepaid adoption, could lower theft risk among non-owner households (X10 × X11).

5.6. Policy Recommendations and Practical Implementation

Given the findings, multi-pronged interventions must integrate technical, behavioral, and regulatory measures.

5.6.1. Infrastructure Investment

To enhance grid stability and limit opportunities for illegal connections, the following is recommended:
  • Smart grid modernization to enhance reliability and reduce theft (X4: supply quality).
  • Tamper-resistant meter installations to prevent unauthorized access (X9: tampering awareness).
  • Renewable energy projects to mitigate reliance on illegal connections (X5: backup alternatives).

5.6.2. Behavioral Interventions

Electricity theft is often driven by financial hardship and lack of awareness. Therefore, the following suggestions are to be considered:
  • Consumer awareness programs to discourage tampering (X9: awareness).
  • Flexible payment structures for economically vulnerable households (X8: financial stress).
  • Incentivized transition programs for informal electricity users.

5.6.3. Regulatory Enforcement

To strengthen compliance and deter illegal connections, the following should be considered:
  • Strengthening Legal Frameworks: Stricter penalties should be introduced for electricity theft while ensuring fair enforcement (X9: awareness of tampering).
  • Enhanced Billing Transparency: Utility companies should improve their billing accuracy and accessibility to reduce billing disputes (X6: receipt of electricity bills).
  • Fraud Detection Strategies: Investment should be made into AI-driven monitoring techniques to detect theft patterns and prevent illegal meter tampering.

5.7. Alternative and Future Research

5.7.1. Improving Threshold Optimization for Classification

The default classification threshold (0.50) used in logistic regression may not be optimal for electricity theft detection. ROC curve analysis reveals variability across bootstrap samples, suggesting that dynamic threshold adjustments could enhance model sensitivity and specificity.
To refine classification precision, Youden’s Index could be employed to determine an ideal cutoff point, balancing false positives and false negatives more effectively.

5.7.2. Exploring Alternative Statistical Techniques

While logistic regression provided interpretable, policy-relevant insights, future studies could benefit from the following:
  • Accuracy: The model achieved a classification accuracy of 89.35%, with high specificity (91.18%) for identifying “No” and sensitivity (86.71%) for identifying Y, using Random Forest and XGBoost, and capturing the non-linear relationships in theft prediction.
  • Bayesian hierarchical modeling: This would help refine the probabilistic estimations of theft likelihood.
  • Recurrent Neural Networks (RNNs): RNNs could be used to analyze long-term behavioral patterns in electricity theft.

5.7.3. Limitations and Future Research Directions

Despite the model’s strong predictive accuracy, several limitations should be considered when interpreting the findings:
  • Residual sampling biases may persist despite the structured random sampling design. For instance, there may be underrepresentation of peripheral or densely populated informal settlements where unmetered or illegal access to electricity is more common. Additionally, the dataset may have excluded off-grid households or informal users lacking formal billing, and response bias may be present due to the sensitive nature of theft-related questions.
  • Self-reported survey data may be subject to social desirability or recall bias, particularly regarding sensitive behaviors like meter tampering. These limitations highlight the need for cross-regional validation using mixed-method approaches or objective consumption data when available.
  • Some predictors—including X3, X10, X13, and X18—exhibited weaker statistical significance, suggesting the need for stepwise regression refinements or interaction modeling in future studies.
  • Finally, electricity theft is inherently dynamic. Future research should integrate time-series modeling to capture behavioral changes and theft trends over time, particularly in response to infrastructure upgrades, policy reforms, or energy access interventions.
To improve generalizability and robustness, we recommend that future studies adopt oversampling strategies, prioritize the inclusion of targeted subpopulations, and leverage administrative data to supplement survey-based insights.

6. Conclusions

This study provides a comprehensive quantitative and behavioral analysis of electricity theft in Kinshasa, DRC, emphasizing its technical and socioeconomic drivers. The methodology incorporated random bootstrapping sampling to enhance model reliability, while power analysis validated sample adequacy, ensuring statistical rigor in theft prediction. The logistic regression model, refined via Lasso regression feature selection, identified electricity supply quality (X4), financial stress (X8), tampering awareness (X9), and billing transparency (X6) as significant theft predictors. Households facing economic constraints and unreliable service demonstrated higher theft likelihood, reinforcing the need for flexible payment systems and consumer protection measures.
To combat electricity theft, the following strategic interventions must be incorporated:
  • Infrastructure upgrades, including grid modernization and smart meter deployment.
  • Behavioral-focused policies, such as public awareness programs and community engagement initiatives.
  • Regulatory enforcement, including stricter penalties and fraud detection mechanisms using AI-driven monitoring.
Future research should further explore the interaction effects between socioeconomic factors, optimize classification thresholds, and assess alternative statistical techniques, including Bayesian modeling and ensemble learning approaches. By integrating technical advancements with behavioral insights, utility providers can develop more effective anti-theft strategies, ensuring equitable energy distribution and sustainable economic progress.

Author Contributions

Conceptualization, P.K.; Formal analysis, P.K.; Investigation, P.K.; Writing—draft, P.K.; Writing—review & editing, P.B.; Supervision, review & editing, P.B. All authors have read and agreed to the published version of the manuscript.

Funding

Faculty of Engineering and the Built-Environment, University of Johannesburg.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Louw, Q.; Bokoro, P. An Alternative technique for the detection and mitigation of electricity theft in South Africa. SAIEE Afr. Res. J. 2019, 110, 209–216. [Google Scholar] [CrossRef]
  2. Saleh, A.M.; István, V.; Khan, M.A.; Waseem, M.; Ahmed, A.N.A. Power system stability in the Era of energy Transition: Importance, Opportunities, Challenges, and future directions. Energy Convers. Manag. X 2024, 24, 100820, ISSN 2590-1745. [Google Scholar] [CrossRef]
  3. DR Congo Court of Auditors. The 2022 Annual Public Report. Available online: https://www.ccomptes.fr/fr/publications/le-rapport-public-annuel-2022 (accessed on 8 June 2025).
  4. UNDP/DR Congo. DRC Statistical Yearbook 2020. Available online: https://www.undp.org/fr/drcongo/publications/annuaire-statistique-rdc-2020 (accessed on 8 June 2025).
  5. Zulu, C.L.; Dzobo, O. Real-time power theft monitoring and detection system with double connected data capture system. Electr. Eng. 2023, 105, 3065–3083. [Google Scholar] [CrossRef]
  6. Saini, M.; Khan, S.; Singh, S.; Gupta, R.; Upadhyay, P.; Soni, S. Smart Grid: Problems, Avenues for Study & Attainable Solutions. In Proceedings of the 2021 International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, 4–5 March 2021; pp. 513–518. [Google Scholar] [CrossRef]
  7. Kim, S.; Sun, Y.; Lee, S.; Seon, J.; Hwang, B.; Kim, J.; Kim, J.; Kim, K.; Kim, J. Data-Driven Approaches for Energy Theft Detection: A Comprehensive Review. Energies 2024, 17, 3057. [Google Scholar] [CrossRef]
  8. Abro, S.A.; Hua, L.G.; Laghari, J.A.; Bhayo, M.A.; Memon, A.A. Machine learning-based electricity theft detection using support vector machines. IJECE 2024, 14, 1240. [Google Scholar] [CrossRef]
  9. Razavi, R.; Fleury, M. Socio-economic predictors of electricity theft in developing countries: An Indian case study. Energy Sustain. Dev. 2019, 49, 1–10. [Google Scholar] [CrossRef]
  10. Kgaphola, P.M.; Marebane, S.M.; Hans, R.T. Electricity Theft Detection and Prevention Using Technology-Based Models: A Systematic Literature Review. Electricity 2024, 5, 334–350. [Google Scholar] [CrossRef]
  11. Bhakta, P.; Debnath, S.; Debnath, P.; Das, P.; Pal, S. Power Theft Detection System. IJCRT 2022, 10. [Google Scholar]
  12. Dehghanpour, K.; Wang, Z.; Wang, J.; Yuan, Y.; Bu, F. A Survey on State Estimation Techniques and Challenges in Smart Distribution Systems. IEEE Trans. Smart Grid 2019, 10, 2312–2322. [Google Scholar] [CrossRef]
  13. Zhai, B.; Yang, D.; Zhou, B.; Li, G. Distribution System State Estimation Based on Power Flow-Guided GraphSAGE. Energies 2024, 17, 4317. [Google Scholar] [CrossRef]
  14. Abbasi, A.; Sultan, K.; Aziz, M.A.; Khan, A.U.; Khalid, H.A.; Guerrero, J.M.; Zafar, B.A. A Novel Dynamic Appliance Clustering Scheme in a Community Home Energy Management System for Improved Stability and Resiliency of Microgrids. IEEE Access 2021, 9, 142276–142288. [Google Scholar] [CrossRef]
  15. Bagundang, E.; Rael, C. Clustering Commercial And Residential Electricity Consumption Using K-Means Algorithm. Int. J. Sci. Technol. Res. 2021, 10, 8–11. [Google Scholar]
  16. Sen, A.; Yang, N.-C. Power Theft Detection Using Advanced Neural Network in Three-phase Distribution Systems. IEEE Trans. Instrum. Meas. 2024, 73, 1–10. [Google Scholar] [CrossRef]
  17. Nayak, R. Employing Feature Extraction, Feature Selection, and Machine Learning to Classify Electricity Consumption as Normal or Electricity Theft. SN Comput. Sci. 2023, 4, 1–15. [Google Scholar] [CrossRef]
  18. Bello, H.O.; Idemudia, C.; Iyelolu, T.V. Integrating machine learning and blockchain: Conceptual frameworks for real-time fraud detection and prevention. World J. Adv. Res. Rev. 2024, 23, 56–68. [Google Scholar] [CrossRef]
  19. Hu, Y.; Zhang, Y.; Huang, T.; Hu, Z.; Fan, Z.; Li, C. A Detection Method for Electricity Theft Based on Random Forest Algorithm. In Proceedings of the 2020 10th International Conference on Power and Energy Systems (ICPES), Chengdu, China, 25–27 December 2020; pp. 553–557. [Google Scholar] [CrossRef]
  20. Yip, S.-C.; Wong, K.; Hew, W.-P.; Gan, M.-T.; Phan, R.C.-W.; Tan, S.-W. Detection of energy theft and defective smart meters in smart grids using linear regression. Int. J. Electr. Power Energy Syst. 2017, 91, 230–240. [Google Scholar] [CrossRef]
  21. Sasmoko, R.P.; Setyonegoro, M.I.B.; Hidayah, I. Electricity Theft Detection Using K-means Clustering in Electricity Information System. In Proceedings of the 2024 International Conference on Smart Computing, IoT and Machine Learning (SIML), Surakarta, Indonesia, 6–7 June 2024; pp. 316–321. [Google Scholar] [CrossRef]
  22. Yang, Z.; Liu, L.; Li, N.; Li, H. A self-decision ant colony clustering algorithm for electricity theft detection. Eng. Appl. Artif. Intell. 2024, 133, 108442. [Google Scholar] [CrossRef]
  23. Žarković, M.; Dobrić, G. Artificial Intelligence for Energy Theft Detection in Distribution Networks. Energies 2024, 17, 1580. [Google Scholar] [CrossRef]
  24. Qi, R.; Zheng, J.; Luo, Z.; Li, Q. A Novel Unsupervised Data-Driven Method for Electricity Theft Detection in AMI Using Observer Meters. IEEE Trans. Instrum. Meas. 2022, 71, 1–10. [Google Scholar] [CrossRef]
  25. Kawoosa, A.I.; Prashar, D. Application of XGBoost ensemble method for energy theft detection in Smart Energy Meters. In Proceedings of the 2022 10th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 13–14 October 2022; pp. 1–6. [Google Scholar] [CrossRef]
  26. Xu, L.; Shao, Z.; Chen, F. A combined unsupervised learning approach for electricity theft detection and loss estimation. IET Energy Syst. Integr. 2023, 5, 213–227. [Google Scholar] [CrossRef]
  27. Khalid, A.; Mustafa, G.; Rana, M.R.R.; Alshahrani, S.M.; Alymani, M. RNN-BiLSTM-CRF based amalgamated deep learning model for electricity theft detection to secure smart grids. PeerJ Comput. Sci. 2024, 10, e1872. [Google Scholar] [CrossRef]
  28. Wang, Y.; Jin, S.; Cheng, M. A Convolution–Non-Convolution Parallel Deep Network for Electricity Theft Detection. Sustainability 2023, 15, 10127. [Google Scholar] [CrossRef]
  29. Yang, J.; Wei, M.; Huang, D. A High-loss Power Line Theft Detection Method Based on Segmented Dynamic Time Warping Distance. J. Phys. Conf. Ser. 2024, 2717, 012023. [Google Scholar] [CrossRef]
  30. Wei, L.; Sundararajan, A.; Sarwat, A.I.; Biswas, S.; Ibrahim, E. A distributed intelligent framework for electricity theft detection using benford’s law and stackelberg game. In Proceedings of the 2017 Resilience Week (RWS), Wilmington, DE, USA, 18–22 September 2017; pp. 5–11. [Google Scholar] [CrossRef]
  31. Amin, S.; Schwartz, G.A.; Cardenas, A.A.; Sastry, S.S. Game-Theoretic Models of Electricity Theft Detection in Smart Utility Networks: Providing New Capabilities with Advanced Metering Infrastructure. IEEE Control Syst. 2015, 35, 66–81. [Google Scholar] [CrossRef]
  32. Sethi, A.R.; Amin, S.; Schwartz, G. Value of intrusion detection systems for countering energy fraud. In Proceedings of the 2017 American Control Conference (ACC), Seattle, WA, USA, 24–26 May 2017; pp. 2739–2746. [Google Scholar] [CrossRef]
  33. Babar, Z.; Jamil, F.; Haq, W. Consumer’s perception towards electricity theft: A case study of Islamabad and Rawalpindi using a path analysis. Energy Policy 2022, 169, 113189, ISSN 0301-4215. [Google Scholar] [CrossRef]
  34. Pulz, J.; Muller, R.B.; Romero, F.; Meffe, A.; Neto, Á.F.G.; Jesus, A.S. Fraud detection in low-voltage electricity consumers using socio-economic indicators and billing profile in smart grids. CIRED—Open Access Proc. J. 2017, 2017, 2300–2303. [Google Scholar] [CrossRef]
  35. Jamil, F.; Ahmad, E. Policy considerations for limiting electricity theft in the developing countries. Energy Policy 2019, 129, 452–458. [Google Scholar] [CrossRef]
  36. Hussain, M.; Iacovides, I.; Lawton, T.; Sharma, V.; Porter, Z.; Cunningham, A.; Habli, I.; Hickey, S.; Jia, Y.; Morgan, P.; et al. Development and translation of human-AI interaction models into working prototypes for clinical decision-making. In Proceedings of the Designing Interactive Systems Conference (DIS’24), Copenhagen, Denmark, 1–5 July 2024; pp. 1607–1619. [Google Scholar] [CrossRef]
  37. Pathak, A.; Bansal, V. AI as decision aid or delegated agent: The effects of trust dimensions on the adoption of AI digital agents. Comput. Hum. Behav. Artif. Hum. 2024, 2, 100094. [Google Scholar] [CrossRef]
  38. De Souza, M.A.; Pereira, J.L.R.; Alves, G.D.O.; De Oliveira, B.C.; Melo, I.D.; Garcia, P.A.N. Detection and identification of energy theft in advanced metering infrastructures. Electr. Power Syst. Res. 2020, 182, 106258. [Google Scholar] [CrossRef]
  39. Saini, S. Social and behavioral aspects of electricity theft: An explorative review. Int. J. Res. Econ. Soc. Sci. 2017, 7, 26–37. [Google Scholar]
  40. Maraden, Y.; Wibisono, G.; Nugraha, I.G.D.; Sudiarto, B.; Jufri, F.H. Enhancing Electricity Theft Detection through K-Nearest Neighbors and Logistic Regression Algorithms with Synthetic Minority Oversampling Technique: A Case Study on State Electricity Company (PLN) Customer Data. Energies 2023, 16, 5405. [Google Scholar] [CrossRef]
  41. Olatunde, T.M.; Okwandu, A.C.; Akande, D.O. Reviewing the impact of energy-efficient appliances on household consumption. Int. J. Sci. Technol. Res. Arch. 2024, 6, 1–11. [Google Scholar] [CrossRef]
  42. EBSCO. Appliances and energy consumption. Available online: https://www.ebsco.com/research-starters/power-and-energy/appliances-and-energy-consumption (accessed on 8 June 2025).
  43. Salami, H.; Okpara, K.; Choochuay, C.; Kuaanan, T.; Akeju, D.; Shitta, M. Domestic energy consumption, theories, and policies: A systematic review. Environ. Dev. Sustain. 2023, 27, 5821–5867. [Google Scholar] [CrossRef]
  44. Mahmood, M.; Chowdhury, P.; Yeassin, R.; Hasan, M.; Ahmad, T.; Chowdhury, N.U.R. Impacts of digitalization on smart grids, renewable energy, and demand response: An updated review of current applications. Energy Convers. Manag. X 2024, 24, 100790. [Google Scholar] [CrossRef]
Figure 1. AUC trend per bootstrap sample.
Figure 1. AUC trend per bootstrap sample.
Energies 18 03566 g001
Figure 2. ROC curve.
Figure 2. ROC curve.
Energies 18 03566 g002
Table 1. Systematic literature review.
Table 1. Systematic literature review.
ApproachReferencesStrengthsWeaknessesLink to This Study
Hardware-Based Methods[5]Improved detection accuracy through redundancyLimited adaptability to outdated grid infrastructureComplements hardware detection by exploring theft motivations
[10]Effective smart meteringLack of integration of behavioral factorsBridges technological and behavioral insights
[11]Provide affordable theft monitoring solutionsScalability issues in dense urban settings like KinshasaOffers a predictive framework complementing hardware approaches
[7]Effectively identify fraudulent connectionsHigh deployment costs hinder mass implementationAssesses affordability’s role in theft by incorporating economic stress factors
Network Data
Analysis
[12,13]High accuracy in anomaly detectionStruggles to pinpoint theft at consumer levelEnhance predictive precision with fine-grained household theft metrics
[14]Classifies irregular consumption patterns effectivelyCannot infer causality behind theftExamines why theft occurs rather than just detecting it
Consumer Meter AI[19,20]Handles imbalanced datasets wellComputationally demandingUse logistic regression for scalability and policy application
[24,25,26,27]Enhances predictive robustness with ensemble AI modelsComplexity limits real-world deploymentSimplify predictive modeling for practical usability
Hybrid Methods[29,38]Improved detection accuracy using hybrid modelsFace challenges in data integrationStreamline data processing via logistic regression modeling
Game Theory[30,31,32]Strategically optimizes theft deterrenceAssumes rational consumer behavior, which may not hold universallyIntegrate real-world socio-economic stress factors into theft prediction
Behavioral Analysis[9,34,35]Identifies income and infrastructure gaps as theft enablersLacks predictive modelingQuantify theft likelihood using logistic regression
[36,37]Examines AI-driven fraud detection through consumer psychologyHigh implementation complexityProvide practical behavioral recommendations, including education campaigns
Table 2. Variable descriptions and modalities.
Table 2. Variable descriptions and modalities.
IDVariable NameDescriptionModalities
YElectricity theft (Target)Main outcome variableNo/Yes
X1Household sizeTotal number of individuals in the householdContinuous (count-based)
X2Number of appliancesTotal count of electrical appliances usedContinuous (count-based)
X3Problems encountered with meterRefers to the total count of problems or issues experienced with electricity metersContinuous (count-based)
X4Electricity supply qualityPerceived supply quality (scale-based)0 to 10
X5Alternatives used during power outagesTotal number of possible backup energy sources usedContinuous (count-based)
X6Receipt of electricity billsWhether households receive bills regularlyNo/Yes
X7Experience with electricity meterPrior experience managing electricity metersNo/Yes
X8Stress about paying billsFinancial stress related to paying billsNo/Yes
X9Awareness of tamperingKnowledge of electricity theft methodsNo/Yes
X10Homeownership statusOwnership distinctionOwner/Tenant
X11Payment typeThe payment method associated with a customer’s accountPrepaid/Postpaid
X12Type of electricity MeterType of installed meterFlat-Rate Payment/Mechanical/Electronic/Smart
X13Electricity costHousehold electricity expensesLess than USD 10/Between USD 10 and USD 30/
Between USD 30 and USD 50/
Between USD 50 and USD 100/
Between USD 100 and USD 200/More than USD 200
X14Household monthly IncomeMonthly earnings categoryLess than USD 1000/Between USD 1000 and USD 3000/Between USD 3000 and USD 5000/
More than USD 5000/Not specific
X15Perception of electricity costsSubjective evaluation of electricity pricingLow/Medium/High/Uncertain
X16Technical ability to manipulate metersSelf-rated electrical skillsVery low/Low/Medium/High/Very high
X17Education levelHighest education achievedNone/Primary/Secondary/Higher Education/Hesitant
X18Difficulty paying billsFrequency of bill payment strugglesNever/Rarely/Sometimes/Always/Often
X19Opinion on theft PenaltiesWhether respondents consider penalties adequateNo/Yes
Table 3. Descriptive statistics (quantitative variables).
Table 3. Descriptive statistics (quantitative variables).
VariableObservationsObs. with Missing DataObs. without Missing DataMinimumMaximumMeanStandard Deviation
X138503851104.6992.174
X238503851113.4182.311
X33850385030.5870.632
X438503851105.0362.354
X53850385141.0910.353
Table 4. Descriptive analysis of qualitative variables.
Table 4. Descriptive analysis of qualitative variables.
VariableCategoryCountFrequency (%)
YNo22758.961
Yes15841.039
X6No12532.468
Yes26067.532
X7No21455.584
Yes17144.416
X8No18848.831
Yes19751.169
X9No17745.974
Yes20854.026
X10Owner11229.091
Tenant27370.909
X11Postpaid16342.338
Prepaid22257.662
X12Electronic12833.247
Mechanical10828.052
Smart meters4110.649
Flat-rate payment10828.052
X13Less than USD 1010126.234
Between USD 10 and USD 3020352.727
Between USD 30 and USD 505614.545
Between USD 50 and USD 100184.675
Between USD 100 and USD 20051.299
More than USD 20020.519
X14Less than USD 100013735.584
Between USD 1000 and USD 30007519.481
Between USD 3000 and USD 5000133.377
More than USD 500041.039
Not specific15640.519
X15Uncertain7218.701
Low5815.065
Medium19249.87
High6316.364
X16Very low379.61
Low9023.377
Medium23861.818
High194.935
Very high10.26
X17Hesitant225.714
Low4311.169
Medium17645.714
High9023.377
Very high5414.026
X18Never17545.455
Rarely10025.974
Sometimes6416.623
Often379.61
Always92.338
X19No23160
Yes15440
Table 5. Multicollinearity statistics.
Table 5. Multicollinearity statistics.
Variable_IDModalityToleranceVIF
X1 0.76181.3127
X2 0.70261.4233
X3 0.58551.7079
X4 0.67501.4814
X5 0.84661.1812
X6No0.50081.9967
Yes0.50081.9967
X7No0.13387.4753
Yes0.13387.4753
X8No0.38372.6062
Yes0.38372.6062
X9No0.42302.3639
Yes0.42302.3639
X10Owner0.83941.1914
Tenant0.83941.1914
X11Postpaid0.67251.4869
Prepaid0.67251.4869
X12Electronic0.74571.3411
Mechanical0.82331.2146
Smart meters0.79401.2594
Rate payment0.48132.0776
X13Less than USD 100.67051.4915
Between USD 10 and USD 300.74911.3349
Between USD 30 and USD 500.79981.2503
Between USD 50 and USD 1000.85711.1667
Between USD 100 and USD 2000.85421.1706
More than USD 2000.90471.1053
X14Less than USD 10000.83981.1907
Between USD 1000 and USD 30000.83241.2014
Between USD 3000 and USD 50000.82441.2131
More than USD 50000.89651.1154
Not specific0.66861.4956
X15Uncertain0.35132.8469
Low0.85931.1637
Medium0.61081.6372
High0.75591.3230
X16Very low0.80861.2367
Low0.83501.1976
Medium0.72731.3750
High0.84991.1766
Very high0.90741.1020
X17Hesitant0.83511.1975
Low0.66241.5096
Medium0.60511.6525
High0.67091.4904
Very high0.41992.3815
X18Never0.32033.1222
Rarely0.63831.5667
Sometimes0.70951.4094
Often0.81781.2227
Always0.89221.1208
X19No0.12677.8926
Yes0.12677.8926
Table 6. Lasso regression feature selection results for electricity theft detection.
Table 6. Lasso regression feature selection results for electricity theft detection.
Feature IndexLasso Coefficient
1−0.00043049
2−0.06262616
30
4−0.07939336
50.04721492
60.01983738
70.06509673
80.13018794
90.11120266
10−0.01828389
11−0.01760688
12−0.01037067
13−0.00282541
140
15−0.00454788
16−0.00663296
17−0.00524908
180.02863104
190.05021244
Table 7. Logistic regression model for variable Yes—model fit and performance metrics.
Table 7. Logistic regression model for variable Yes—model fit and performance metrics.
StatisticIndependent ModelComplete Model
Observations385385
Sum of Weights385.000385.000
Degrees of Freedom (DDL)384350
-2 Log-Likelihood521.290179.857
McFadden’s R20.0000.655
Cox and Snell R20.0000.588
Nagelkerke R20.0000.759
Akaike Information Criterion (AIC)523.290249.857
Schwarz Bayesian Criterion (SBC)527.243388.221
Iterations016
Table 8. Model evaluation—null hypothesis test H0: Pr (Yes = Yes) = 0.41.
Table 8. Model evaluation—null hypothesis test H0: Pr (Yes = Yes) = 0.41.
StatisticDegrees of Freedom (DDL)Chi-Square (Khi2)Pr > Chi2
−2 Log (Vraisemblance)34341.433<0.0001
Score34253.330<0.0001
Wald3497.362<0.0001
Table 9. Type II analysis—statistical significance of predictors for variable Y.
Table 9. Type II analysis—statistical significance of predictors for variable Y.
SourceDDLKhi2 (Wald)Pr > WaldKhi2 (LR)Pr > LR
X110.1020.7490.1020.749
X213.8950.0484.0480.044
X4115.0210.00017.353<0.0001
X5112.6910.00014.4610.000
X613.2480.0723.2880.070
X711.6100.2041.5320.216
X8110.7930.00111.6290.001
X919.1440.0029.5440.002
X1010.6710.4130.6720.412
X1110.6680.4140.6760.411
X1231.5350.6741.5400.673
X1359.1160.10512.6560.027
X1538.8310.0329.8530.020
X1640.8170.9361.6690.796
X1740.2200.9940.2230.994
X1846.3290.1766.6240.157
X1910.6100.4350.6250.429
Table 10. Hosmer–Lemeshow goodness-of-fit test results.
Table 10. Hosmer–Lemeshow goodness-of-fit test results.
StatisticChi-Square (Khi2)Degrees of Freedom (DOFs)Pr > Chi-Square (Pr > Khi2)
Hosmer–Lemeshow8.64180.374
Table 11. Logistic regression results for variable Y—parameter estimates and odds ratios.
Table 11. Logistic regression results for variable Y—parameter estimates and odds ratios.
SourceValueStandard ErrorWald Chi-SquarePr > Chi2Lower Bound (95%)Upper Bound (95%)Odds RatioLower Bound OR (95%)Upper Bound OR (95%)
Constant−7.6673871.3550.0000.998−7595.3837580.049
X1−0.0330.1030.1020.749−0.2340.1680.9680.7911.184
X2−0.1910.0973.8950.048−0.380−0.0010.8260.6840.999
X4−0.3910.10115.0210.000−0.589−0.1930.6760.5550.824
X51.9890.55812.6910.0000.8953.0837.3082.44721.829
X6—No−1.1110.6163.2480.072−2.3180.0970.3290.0981.102
X6—Yes0.0000.000
X7—No−1.1930.9401.6100.204−3.0370.6500.3030.0481.915
X7—Yes0.0000.000
X8—No−1.7810.54210.7930.001−2.844−0.7180.1680.0580.487
X8—Yes0.0000.000
X9—No−1.5780.5229.1440.002−2.601−0.5550.2060.0740.574
X9—Yes0.0000.000
X10—Owner−0.3850.4700.6710.413−1.3060.5360.6810.2711.710
X10—Tenant0.0000.000
X11—Postpaid0.3990.4880.6680.414−0.5571.3541.4900.5733.873
X11—Prepaid0.0000.000
X12—Electronic−0.2730.6720.1660.684−1.5901.0430.7610.2042.838
X12—Mechanical0.3490.6180.3200.571−0.8611.5601.4180.4234.759
X12—Smart meters0.4340.9550.2070.649−1.4372.3061.5440.23810.035
X12—Flat-rate payment0.0000.000
X13—Between USD 10 and USD 3012.1143871.3550.0000.998−7575.6027599.830
X13—Between USD 100 and USD 20029.4564523.7610.0000.995−8836.9538895.866
X13—Between USD 30 and USD 5012.1593871.3550.0000.997−7575.5577599.875
X13—Between USD 50 and USD 10011.0933871.3550.0000.998−7576.6237598.809
X13—Less than USD 1013.6623871.3550.0000.997−7574.0547601.377
X13—More than USD 2000.0000.000
X15—High−1.5830.9242.9350.087−3.3940.2280.2050.0341.256
X15—Low−2.5260.8618.6100.003−4.213−0.8390.0800.0150.432
X15—Medium−1.7720.7925.0090.025−3.325−0.2200.1700.0360.802
X15—Uncertain0.0000.000
X16—High−0.5791.5460.1400.708−3.6082.4500.5610.02711.594
X16—Low0.4460.8390.2830.595−1.1992.0921.5630.3028.099
X16—Medium0.1140.7820.0210.884−1.4191.6461.1200.2425.186
X16—Very high−16.6975707.0690.0000.998−11,202.34811,168.954
X16—Very low0.0000.000
X17—High−0.1100.8350.0170.895−1.7481.5270.8950.1744.603
X17—Low0.2631.1060.0560.812−1.9052.4311.3010.14911.367
X17—Medium−0.0090.8170.0000.991−1.6101.5920.9910.2004.913
X17—Hesitant−0.2631.0980.0570.811−2.4141.8880.7690.0896.609
X17—Very high0.0000.000
X18—Always1.0011.4990.4460.504−1.9363.9382.7210.14451.310
X18—Never−1.7310.7824.9060.027−3.263−0.1990.1770.0380.819
X18—Often−0.2600.7840.1100.740−1.7981.2770.7710.1663.586
X18—Rarely−0.7520.6121.5080.219−1.9520.4480.4710.1421.565
X18—Sometimes0.0000.000
X19—No−0.7180.9200.6100.435−2.5211.0840.4880.0802.958
X19—Yes0.0000.000
Table 12. Standardized coefficients (variable Y).
Table 12. Standardized coefficients (variable Y).
SourceValueStandard ErrorWald Chi-Square (Khi2)Pr > Chi2Lower Bound (95%)Upper Bound (95%)
X1−0.0390.1230.1020.749−0.2800.202
X2−0.2430.1233.8950.048−0.484−0.002
X4−0.5070.13115.0210.000−0.763−0.251
X50.3860.10812.6910.0000.1740.599
X6—No−0.2870.1593.2480.072−0.5990.025
X6—Yes0.0000.000
X7—No−0.3270.2581.6100.204−0.8320.178
X7—Yes0.0000.000
X8—No−0.4910.14910.7930.001−0.784−0.198
X8—Yes0.0000.000
X9—No−0.4340.1439.1440.002−0.715−0.153
X9—Yes0.0000.000
X10—Owner−0.0960.1180.6710.413−0.3270.134
X10—Tenant0.0000.000
X11—Postpaid0.1090.1330.6680.414−0.1520.369
X11—Prepaid0.0000.000
X12—Electronic−0.0710.1740.1660.684−0.4130.271
X12—Mechanical0.0870.1530.3200.571−0.2130.386
X12—Smart meters0.0740.1620.2070.649−0.2440.392
X12—Flat-rate payment0.0000.000
X13—Between USD 10 and USD 303.3341065.6060.0000.998−2085.2152091.884
X13—Between USD 100 and USD 2001.839282.3750.0000.995−551.607555.284
X13—Between USD 30 and USD 502.363752.4970.0000.997−1472.5041477.231
X13—Between USD 50 and USD 1001.291450.5910.0000.998−881.851884.433
X13—Less than USD 103.313938.9290.0000.997−1836.9541843.581
X13—More than USD 2000.0000.000
X15—High−0.3230.1882.9350.087−0.6920.047
X15—Low−0.4980.1708.6100.003−0.831−0.165
X15—Medium−0.4890.2185.0090.025−0.917−0.061
X15—Uncertain0.0000.000
X16—High−0.0690.1850.1400.708−0.4310.293
X16—Low0.1040.1960.2830.595−0.2800.488
X16—Medium0.0300.2090.0210.884−0.3800.441
X16—Very high−0.469160.1510.0000.998−314.358313.421
X16—Very low0.0000.000
X17—High−0.0260.1950.0170.895−0.4080.356
X17—Low0.0460.1920.0560.812−0.3310.422
X17—Medium−0.0020.2240.0000.991−0.4420.437
X17—Hesitant−0.0340.1400.0570.811−0.3090.242
X17—Very high0.0000.000
X18—Always0.0830.1250.4460.504−0.1610.328
X18—Never−0.4750.2154.9060.027−0.896−0.055
X18—Often−0.0420.1270.1100.740−0.2920.207
X18—Rarely−0.1820.1481.5080.219−0.4720.108
X18—Sometimes0.0000.000
X19—No−0.1940.2480.6100.435−0.6810.293
X19—Yes0.0000.000
Table 13. Model performance metrics for Class 1.
Table 13. Model performance metrics for Class 1.
DatasetAccuracyPrecision (Class 1)Recall/SensitivityF1-Score (Class 1)
Train Set0.8780.850.840.85
Test Set0.7780.780.720.75
Validation Set0.8390.720.780.75
Table 14. Model performance metrics for Class 0.
Table 14. Model performance metrics for Class 0.
DatasetAccuracyPrecision (Class 0)Recall/Sensitivity (Class 0)F1-Score (Class 0)
Train Set0.8780.90.90.9
Test Set0.7780.780.830.8
Validation Set0.8390.90.860.88
Table 15. Model performance metrics across datasets (combined Classes 0 and 1).
Table 15. Model performance metrics across datasets (combined Classes 0 and 1).
DatasetOverall AccuracyPrecision (Avg.)Recall/Sensitivity (Avg.)F1-Score (Avg.)
Train Set0.8780.880.870.88
Test Set0.7780.780.770.78
Validation Set0.8390.840.820.83
Table 16. Residual analysis.
Table 16. Residual analysis.
DatasetDeviance Residuals (Mean)Pearson Residuals (Mean)
Train Set0.050.159
Test Set6.574−5.076
Validation Set0.066−0.183
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kankonde, P.; Bokoro, P. Bridging the Energy Divide: An Analysis of the Socioeconomic and Technical Factors Influencing Electricity Theft in Kinshasa, DR Congo. Energies 2025, 18, 3566. https://doi.org/10.3390/en18133566

AMA Style

Kankonde P, Bokoro P. Bridging the Energy Divide: An Analysis of the Socioeconomic and Technical Factors Influencing Electricity Theft in Kinshasa, DR Congo. Energies. 2025; 18(13):3566. https://doi.org/10.3390/en18133566

Chicago/Turabian Style

Kankonde, Patrick, and Pitshou Bokoro. 2025. "Bridging the Energy Divide: An Analysis of the Socioeconomic and Technical Factors Influencing Electricity Theft in Kinshasa, DR Congo" Energies 18, no. 13: 3566. https://doi.org/10.3390/en18133566

APA Style

Kankonde, P., & Bokoro, P. (2025). Bridging the Energy Divide: An Analysis of the Socioeconomic and Technical Factors Influencing Electricity Theft in Kinshasa, DR Congo. Energies, 18(13), 3566. https://doi.org/10.3390/en18133566

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop