Next Article in Journal
A Multi-Timescale Operational Strategy for Active Distribution Networks with Load Forecasting Integration
Previous Article in Journal
Environmental and Social Dimensions of Energy Transformation Using Geothermal Energy
 
 
Article
Peer-Review Record

Bridging the Energy Divide: An Analysis of the Socioeconomic and Technical Factors Influencing Electricity Theft in Kinshasa, DR Congo

Energies 2025, 18(13), 3566; https://doi.org/10.3390/en18133566
by Patrick Kankonde 1,2 and Pitshou Bokoro 2,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Energies 2025, 18(13), 3566; https://doi.org/10.3390/en18133566
Submission received: 8 May 2025 / Revised: 16 June 2025 / Accepted: 28 June 2025 / Published: 7 July 2025
(This article belongs to the Section F4: Critical Energy Infrastructure)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Thanks for inviting me to review the manuscript entitled “Bridging the energy divide: an analysis of socioeconomic and technical factors influencing electricity theft in Kinshasa, DRCongo”. This study focuses on identifying the factors contributing to electricity theft in Kinshasa. The authors use a logistic regression model on a dataset of 385 observations to analyze the determinants of electricity theft. However, several aspects of the study require significant improvement by following the specified comments.

(1) In the introduction section, the authors provide a lot of information, while the research significance is not clear. By the way, it is better to state the structure of this manuscript in the end of the introduction.

(2) This study lacks a literature review section, please separate it from introduction. Moreover, the literature review on electricity theft detection methods is rather descriptive. It should be in-depth and integrated with the objectives. Additionally, a critical evaluation of the strengths and weaknesses of methods would be beneficial. Some references are helpful, 10.3390/en12132582.

(3) There is a lack of clear identification of research gaps in the literature. The authors need to better justify how their study fills these gaps. This would help to position the research more effectively within the existing body of knowledge.

(4) While stratified sampling was used, sample size of 385 might be insufficient for a city like Kinshasa. The authors should conduct a more detailed power analysis to determine if the sample size is adequate to detect the effects of the variables of interest. Additionally, they could explore the potential biases that may still exist in the sample despite the sampling method. Besides, table 1 should be three-lines table, which looks better.

(5) The use of the nearest-neighbor approach for missing data imputation may introduce bias. They should explore other imputation methods, such as multiple imputation, and compare the results. This would help to assess the robustness of the findings.

(6) The variable selection process based on exploratory analysis and domain relevance is not well-described. The authors should provide more details on how they determined which variables to include in the model. Additionally, the presence of multicollinearity in some variables (X7 and X20) needs to be addressed more thoroughly. Moreover, where are the title of tables in lines 301, 306, 333, 344,349, 350 (please check lines 344-346). Besides, although the Hosmer-Lemeshow test was used to assess model fit, additional validation techniques such as cross-validation should be performed.

(7) The interpretation of the model coefficients and odds ratios could be more in-depth. How do changes in these variables translate into real-world changes in the likelihood of electricity theft? Also, the discussion of the lesser-impact predictors is too brief. They should explore why these variables have a minor influence and whether there are any underlying factors

(8) The policy recommendations are somewhat simplistic. The authors should develop more comprehensive and actionable recommendations based on the results.

Author Response

Comments 1: [In the introduction section, the authors provide a lot of information, while the research significance is not clear. By the way, it is better to state the structure of this manuscript in the end of the introduction]

Response 1: [ Thank you for this valuable suggestion. We have revised the introduction to clearly articulate the research significance, emphasizing the lack of comprehensive studies integrating behavioral and infrastructure-related predictors of electricity theft in Sub-Saharan Africa. We also added a final paragraph summarizing the structure of the paper for improved readability. Revisions made:

  • Page 2, paragraph 5 — Statement on research contribution and novelty.
  • Page 2, final paragraph — Summary of manuscript structure.]

“This study lacks a literature review section, please separate it from introduction. Moreover, the literature review on electricity theft detection methods is rather descriptive. It should be in-depth and integrated with the objectives. Additionally, a critical evaluation of the strengths and weaknesses of methods would be beneficial. Some references are helpful, 10.3390/en12132582.”

Response 2: We agree with this recommendation and have separated the literature review into its own section. It is now organized into subsections covering supervised, unsupervised, ensemble, and hybrid AI-based methods. We have also expanded the analysis to include critical comparisons of strengths and limitations of each method, including interpretability, scalability, and real-world applicability.

Revisions made: Pages 2–5, Sections 2. “Literature Review” and subsection 2.5. “Research Gaps and Contribution”.

Comment 3

“There is a lack of clear identification of research gaps in the literature. The authors need to better justify how their study fills these gaps. This would help to position the research more effectively within the existing body of knowledge.”

Response 3: This point has been addressed in both the literature review and the revised final paragraphs of the introduction. We now clearly highlight that most previous studies have emphasized either technical anomaly detection or socioeconomic surveys, but rarely both in a statistical model validated with power analysis and bootstrapping. Our study addresses this gap by integrating both aspects into a unified framework.

Revisions made: Page 2 (Introduction) and Section 2.5 (Literature Review).

Comment 4

“While stratified sampling was used, sample size of 385 might be insufficient for a city like Kinshasa. The authors should conduct a more detailed power analysis to determine if the sample size is adequate to detect the effects of the variables of interest. Additionally, they could explore the potential biases that may still exist in the sample despite the sampling method. Besides, table 1 should be three-lines table, which looks better.”

Response 4: We appreciate the reviewer’s suggestion. The methodology section has been updated to clarify that a combination of random sampling, bootstrapping, and post hoc power analysis was employed to validate the sample adequacy and model robustness. The computed power was 1.0000 (Cohen’s f² = 1.588), indicating that the sample size of 385 was statistically sufficient to detect meaningful effects with minimal risk of Type II error.

We also now acknowledge potential residual sampling biases despite the structured sampling process. These include:

  • Underrepresentation of peripheral or high-density informal settlements where informal electricity access is common.
  • Exclusion of unmetered or off-grid consumers, limiting insights into populations without formal billing structures.
  • Response bias, given that some participants may have underreported theft-related behavior due to social desirability.

These limitations have been discussed to provide a transparent assessment of data generalizability, and future research directions now recommend oversampling and targeted subpopulations to improve representativeness.

Lastly, Table 1 was reformatted using a three-line structure, consistent with MDPI style preferences.

Revisions made:

  • Page 8, Section “Sample Adequacy and Validation”
  • Page 23, added subsection “Limitations and Future Research Directions”
  • Page 6, Table 1 reformatted as a three-line table

Comment 5

“The use of the nearest-neighbor approach for missing data imputation may introduce bias. They should explore other imputation methods, such as multiple imputation, and compare the results. This would help to assess the robustness of the findings.”

Response 5: Thank you for raising this important point. Upon re-examining our dataset and Python sampling procedure, we discovered that the appearance of missing values was not inherent in the original dataset, but rather a by-product of random sampling during bootstrapping iterations. After verifying the raw data from the beginning, we confirmed that there were no true missing values, and hence, no imputation was necessary. The mention of Nearest Neighbor imputation was therefore removed to avoid confusion. The methodology section has been revised to clarify this and to emphasize that sample adequacy was confirmed through power analysis rather than imputation.

Revisions made:

  • Page 7, Section “Population and Sample Size Determination” lines 197, 198.
  • Page 8, Post Hoc Power Analysis Line 266 to 279.

Comment 6

“The variable selection process based on exploratory analysis and domain relevance is not well-described. The authors should provide more details on how they determined which variables to include in the model. Additionally, the presence of multicollinearity in some variables (X7 and X20) needs to be addressed more thoroughly. Moreover, where are the title of tables in lines 301, 306, 333, 344,349, 350 (please check lines 344-346). Besides, although the Hosmer-Lemeshow test was used to assess model fit, additional validation techniques such as cross-validation should be performed.”

Response 6: We have clarified the two-step variable selection process: (1) univariate filtering based on statistical significance (p < 0.15) and (2) domain-informed selection augmented by Lasso regression with 5-fold cross-validation. Regarding multicollinearity, we addressed the high VIF values of X7 and X19, and discussed how Lasso penalization helped eliminate redundancy. All missing table titles were inserted, and Hosmer-Lemeshow test results were supplemented with bootstrapped AUC values and 5-fold cross-validation summaries.

Revisions made:

  • Pages 10–13, “Multicollinearity and Feature Selection”
  • Pages 5–18, lines 179, 190, 282, 290, 304, 307, 310, 314, 318, 328, 336, 345, 353, 354, 355, 361, Table captions corrected
  • Page 12–18, “Regression Analysis of Variable Y”

Comment 7

“The interpretation of the model coefficients and odds ratios could be more in-depth. How do changes in these variables translate into real-world changes in the likelihood of electricity theft? Also, the discussion of the lesser-impact predictors is too brief. They should explore why these variables have a minor influence and whether there are any underlying factors.”

Response 7: Thank you for this valuable observation. We have expanded the interpretation of significant odds ratios by including practical implications—for example, detailing how a one-unit increase in financial stress score corresponds to an increased probability of electricity theft, thereby strengthening the policy relevance of our findings.

In addition, we have introduced a dedicated subsection on lesser-impact predictors, providing plausible explanations for their limited statistical influence. These include the possibility of underreporting, region-specific behavioral differences, or measurement limitations.

Importantly, to deepen model interpretation and identify compound effects, we have conducted interaction effect analysis between key predictors. Statistically significant interactions—such as between financial stress and perceived electricity cost, or between homeownership and payment method—are now presented and discussed, illustrating how combined variables can alter theft likelihood in more complex ways than individual predictors alone.

Revisions made:

Pages 19–20, “Significant Predictors and Their Implications”

Pages 20–21, “Lesser-Impact Predictors: Observations and Hypotheses”

Pages 21-22, “Testing Interaction Effects Between Predictors”

Comment 8

“The policy recommendations are somewhat simplistic. The authors should develop more comprehensive and actionable recommendations based on the results.”

Response 8: We have substantially expanded the policy recommendation section, dividing it into infrastructure, behavioral, and regulatory measures, each tied explicitly to significant predictors (e.g., X4, X5, X6, X8, X9). Recommendations now include subsidized solar microgrids, AI-driven fraud monitoring, tiered billing systems, and community awareness campaigns, all grounded in the findings.

Revisions made:

Pages 22, “Policy Recommendations and Practical Implementation”

  1. Response to English Language Comments

Point: “Some phrasing could be improved for conciseness.” Response: The manuscript has undergone comprehensive language editing. We simplified technical passages, improved sentence clarity, and enhanced transitions between sections.

  1. Additional Clarifications
  • The abstract was updated to reflect the AUC of 0.86 based on the test set, as recommended.
  • All references, figures, and variable labels (e.g., X8, X9) have been revised for accuracy and consistency.
  • The Conclusion section has been aligned with the key findings and expanded to include actionable policy implications.

 

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

This manuscript presents a timely and socially relevant analysis of electricity theft in Kinshasa, applying classical logistic regression to a locally collected dataset. The study adds contextual value by addressing a poorly represented region in the literature, characterized by severe energy poverty and infrastructure challenges. One of its main strengths lies in the interpretability of the chosen method, which supports clear identification of key factors. Additionally, the manuscript offers a comprehensive review of detection approaches, including hardware-based, classical machine learning, and behavioral models, with a valuable discussion on the potential of hybrid strategies.

However, several critical limitations must be addressed in order for the manuscript to meet the standards of a high-impact scientific journal.

First, the methodological contribution is limited, and the novelty of the study is insufficient in its current form. Logistic regression and similar classical methods have been widely applied in previous studies addressing electricity theft. As such, the approach used here does not represent an innovative methodological advancement. Additionally, the manuscript does not explore or compare modern deep learning techniques such as Convolutional Neural Networks (CNNs), Long Short-Term Memory networks (LSTMs), or Transformer-based architectures. Including a comparative analysis with such models could significantly enhance the contribution of the study, especially given the increasing interest in applying deep learning to imbalanced or noisy datasets in resource-limited settings.

Second, the methodology section lacks clarity and is difficult to follow. The dense textual description would benefit from visual support. Diagrams, flowcharts, or schematics of the modeling process should be incorporated to improve readability and facilitate understanding of the analytical workflow. The absence of such visual aids makes the methodology less accessible to a broader scientific audience.

Third, the results section is overloaded with large tables, which makes it challenging to identify key findings at a glance. The presentation would benefit from more effective data visualization, including comparative charts, ROC curves, or graphical summaries that highlight patterns and model performance. Moreover, the discussion of the results is somewhat limited. While the performance metrics are reported, there is a lack of in-depth analysis of their implications or of how these findings can be translated into actionable strategies.

Finally, the conclusions are underdeveloped. The section does not fully synthesize the study’s contributions, and it lacks reflection on the broader significance of the results. A more thorough discussion of the study’s limitations, potential applications, and directions for future work is needed. In particular, the authors should consider addressing the integration of advanced AI methods and the development of hybrid models that combine interpretability with predictive power.

Author Response

Point-by-Point Responses to Comments and Suggestions

Comment 1

“In the introduction section, the authors provide a lot of information, while the research significance is not clear. By the way, it is better to state the structure of this manuscript in the end of the introduction.”

Response 1: Thank you for this valuable suggestion. We have revised the introduction to clearly articulate the research significance, emphasizing the lack of comprehensive studies integrating behavioral and infrastructure-related predictors of electricity theft in Sub-Saharan Africa. We also added a final paragraph summarizing the structure of the paper for improved readability. Revisions made:

  • Page 2, paragraph 5 — Statement on research contribution and novelty.
  • Page 2, final paragraph — Summary of manuscript structure.

Comment 2

“This study lacks a literature review section, please separate it from introduction. Moreover, the literature review on electricity theft detection methods is rather descriptive. It should be in-depth and integrated with the objectives. Additionally, a critical evaluation of the strengths and weaknesses of methods would be beneficial. Some references are helpful, 10.3390/en12132582.”

Response 2: We agree with this recommendation and have separated the literature review into its own section. It is now organized into subsections covering supervised, unsupervised, ensemble, and hybrid AI-based methods. We have also expanded the analysis to include critical comparisons of strengths and limitations of each method, including interpretability, scalability, and real-world applicability.

Revisions made: Pages 2–5, Sections 2. “Literature Review” and subsection 2.5. “Research Gaps and Contribution”.

Comment 3

“There is a lack of clear identification of research gaps in the literature. The authors need to better justify how their study fills these gaps. This would help to position the research more effectively within the existing body of knowledge.”

Response 3: This point has been addressed in both the literature review and the revised final paragraphs of the introduction. We now clearly highlight that most previous studies have emphasized either technical anomaly detection or socioeconomic surveys, but rarely both in a statistical model validated with power analysis and bootstrapping. Our study addresses this gap by integrating both aspects into a unified framework.

Revisions made: Page 2 (Introduction) and Section 2.5 (Literature Review).

 

Comment 4

“While stratified sampling was used, sample size of 385 might be insufficient for a city like Kinshasa. The authors should conduct a more detailed power analysis to determine if the sample size is adequate to detect the effects of the variables of interest. Additionally, they could explore the potential biases that may still exist in the sample despite the sampling method. Besides, table 1 should be three-lines table, which looks better.”

Response 4: We appreciate the reviewer’s suggestion. The methodology section has been updated to clarify that a combination of random sampling, bootstrapping, and post hoc power analysis was employed to validate the sample adequacy and model robustness. The computed power was 1.0000 (Cohen’s f² = 1.588), indicating that the sample size of 385 was statistically sufficient to detect meaningful effects with minimal risk of Type II error.

We also now acknowledge potential residual sampling biases despite the structured sampling process. These include:

  • Underrepresentation of peripheral or high-density informal settlements where informal electricity access is common.
  • Exclusion of unmetered or off-grid consumers, limiting insights into populations without formal billing structures.
  • Response bias, given that some participants may have underreported theft-related behavior due to social desirability.

These limitations have been discussed to provide a transparent assessment of data generalizability, and future research directions now recommend oversampling and targeted subpopulations to improve representativeness.

Lastly, Table 1 was reformatted using a three-line structure, consistent with MDPI style preferences.

Revisions made:

  • Page 8, Section “Sample Adequacy and Validation”
  • Page 23, added subsection “Limitations and Future Research Directions”
  • Page 6, Table 1 reformatted as a three-line table

Comment 5

“The use of the nearest-neighbor approach for missing data imputation may introduce bias. They should explore other imputation methods, such as multiple imputation, and compare the results. This would help to assess the robustness of the findings.”

Response 5: Thank you for raising this important point. Upon re-examining our dataset and Python sampling procedure, we discovered that the appearance of missing values was not inherent in the original dataset, but rather a by-product of random sampling during bootstrapping iterations. After verifying the raw data from the beginning, we confirmed that there were no true missing values, and hence, no imputation was necessary. The mention of Nearest Neighbor imputation was therefore removed to avoid confusion. The methodology section has been revised to clarify this and to emphasize that sample adequacy was confirmed through power analysis rather than imputation.

Revisions made:

  • Page 7, Section “Population and Sample Size Determination” lines 197, 198.
  • Page 8, Post Hoc Power Analysis Line 266 to 279.

Comment 6

“The variable selection process based on exploratory analysis and domain relevance is not well-described. The authors should provide more details on how they determined which variables to include in the model. Additionally, the presence of multicollinearity in some variables (X7 and X20) needs to be addressed more thoroughly. Moreover, where are the title of tables in lines 301, 306, 333, 344,349, 350 (please check lines 344-346). Besides, although the Hosmer-Lemeshow test was used to assess model fit, additional validation techniques such as cross-validation should be performed.”

Response 6: We have clarified the two-step variable selection process: (1) univariate filtering based on statistical significance (p < 0.15) and (2) domain-informed selection augmented by Lasso regression with 5-fold cross-validation. Regarding multicollinearity, we addressed the high VIF values of X7 and X19, and discussed how Lasso penalization helped eliminate redundancy. All missing table titles were inserted, and Hosmer-Lemeshow test results were supplemented with bootstrapped AUC values and 5-fold cross-validation summaries.

Revisions made:

  • Pages 10–13, “Multicollinearity and Feature Selection”
  • Pages 5–18, lines 179, 190, 282, 290, 304, 307, 310, 314, 318, 328, 336, 345, 353, 354, 355, 361, Table captions corrected
  • Page 12–18, “Regression Analysis of Variable Y”

Comment 7

“The interpretation of the model coefficients and odds ratios could be more in-depth. How do changes in these variables translate into real-world changes in the likelihood of electricity theft? Also, the discussion of the lesser-impact predictors is too brief. They should explore why these variables have a minor influence and whether there are any underlying factors.”

Response 7: Thank you for this valuable observation. We have expanded the interpretation of significant odds ratios by including practical implications—for example, detailing how a one-unit increase in financial stress score corresponds to an increased probability of electricity theft, thereby strengthening the policy relevance of our findings.

In addition, we have introduced a dedicated subsection on lesser-impact predictors, providing plausible explanations for their limited statistical influence. These include the possibility of underreporting, region-specific behavioral differences, or measurement limitations.

Importantly, to deepen model interpretation and identify compound effects, we have conducted interaction effect analysis between key predictors. Statistically significant interactions—such as between financial stress and perceived electricity cost, or between homeownership and payment method—are now presented and discussed, illustrating how combined variables can alter theft likelihood in more complex ways than individual predictors alone.

Revisions made:

Pages 19–20, “Significant Predictors and Their Implications”

Pages 20–21, “Lesser-Impact Predictors: Observations and Hypotheses”

Pages 21-22, “Testing Interaction Effects Between Predictors”

Comment 8

“The policy recommendations are somewhat simplistic. The authors should develop more comprehensive and actionable recommendations based on the results.”

Response 8: We have substantially expanded the policy recommendation section, dividing it into infrastructure, behavioral, and regulatory measures, each tied explicitly to significant predictors (e.g., X4, X5, X6, X8, X9). Recommendations now include subsidized solar microgrids, AI-driven fraud monitoring, tiered billing systems, and community awareness campaigns, all grounded in the findings.

Revisions made:

Pages 22, “Policy Recommendations and Practical Implementation”

 

 

 

  1. Response to English Language Comments

Point: “Some phrasing could be improved for conciseness.” Response: The manuscript has undergone comprehensive language editing. We simplified technical passages, improved sentence clarity, and enhanced transitions between sections.

  1. Additional Clarifications
  • The abstract was updated to reflect the AUC of 0.86 based on the test set, as recommended.
  • All references, figures, and variable labels (e.g., X8, X9) have been revised for accuracy and consistency.
  • The Conclusion section has been aligned with the key findings and expanded to include actionable policy implications.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

After reviewing the revised manuscript, I consider that the authors have adequately addressed the main concerns raised in the initial round of review. The modifications implemented have improved the clarity, structure, and overall quality of the paper. In its current form, the manuscript meets the standards required for publication in the journal.

However, I recommend that the editorial team review the formatting of the tables, as they do not fully comply with the journal’s style guidelines. This adjustment can be handled during the production stage and does not require further revision from the authors.

Back to TopTop