Power Outage Prediction on Overhead Power Lines on the Basis of Their Technical Parameters: Machine Learning Approach
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsAuthors have provided a power outage prediction method based on machine learning approach. The topic is interesting, but there are still many areas that need to be modified as following:
1. The research review is not sufficient and lacks references from important journals in the past three years.
2.The proposed method lacks innovation and is only a simple application of existing machine learning methods.
3.The description of the problem to be solved by the proposed method and the method itself is not detailed enough.
Author Response
Dear Reviewer,
Firstly, we would like to thank you for high assessing our research and the comprehensive review. We tried to extract the main issues from it and answered them step-by-step. All changes were highlighted in yellow in the manuscript. Please, find response cover letter attached.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsComment 1:
The manuscript is well-organized, with a clear flow from data exploration and preprocessing to model selection, evaluation, and feature importance analysis. To improve readability, consider breaking up some of the longer paragraphs, especially in Sections 3 and 4, so that key points are easier for readers to digest.
Comment 2:
The study uses a relatively small dataset (395 objects), which limits model consistency and generalization. It would be helpful to discuss potential ways to expand the dataset, such as incorporating power line data from other regions or including additional historical failure records.
Comment 3:
The feature importance analysis for both Logistic Regression and CatBoost models is well-presented. Adding a brief discussion on the practical implications of key features for example, “Length of PTL overhead sections” or the proportion of reinforced concrete supports would provide more context for grid maintenance and planning. The inverse relationship between PTL length in populated areas and failure probability is intriguing; a short explanation, such as higher maintenance frequency in populated areas, could enrich the discussion.
Comment 4:
Using multiple metrics (ROC AUC, AUC-PR, Accuracy, Precision, Recall, F1) is appropriate. Consider clarifying which metric is most important for your conclusions, especially given the slight class imbalance. For Figures 5a and 5b, adding annotations for key threshold values could make it easier for readers to interpret trade-offs between true positive and false positive rates.
Comment 5:
Ensure consistent terminology (e.g., “PTL transit” vs. “transit power line”) and correct minor typographical errors. Figures and tables are informative but could be referenced more smoothly in the text to guide the reader.
Comment 6:
The manuscript rightly notes that the limited dataset affects model performance. Adding a short subsection summarizing limitations and potential future directions such as using multi-year data, higher-voltage lines, or additional environmental features would strengthen the study.
Comment 7:
The detailed description of preprocessing, encoding, scaling, and hyperparameter tuning is commendable. Providing code or pipeline configurations in supplementary materials would further enhance reproducibility and transparency.
Comment 8:
Consider adding flowcharts or schematic diagrams to illustrate the overall workflow of the study, including data preprocessing, model training, evaluation, and feature importance analysis. Visual aids can improve clarity and help readers quickly understand the methodology.
Comment 9:
This study presents a valuable application of machine learning for predicting power line outages, with clear methodology and useful insights. Minor revisions focusing on clarity, practical implications, and careful proofreading would improve the manuscript’s readability and impact.
Author Response
Dear Reviewer,
Firstly, we would like to thank you for high assessing our research and the comprehensive review. We tried to extract the main issues from it and answered them step-by-step. All changes were highlighted in yellow in the manuscript. Please, find response cover letter attached.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for Authors1) The first criticism is that the manuscript lacks a comprehensive literature review focused on similar models. It is necessary to rewrite Section 1 in its entirety, eliminating all lumped references and providing brief comments on each related work. The authors must discuss the limitations of similar approaches and clearly state the contributions and improvements of the proposed solution. Including more recent references published in reputable journals is a must.
2) While the best-performing model is logistic regression with class weighting, CatBoost and LightGBM presented similar results. Is this behavior expected for other scenarios? And why choose logistic regression over one of the gradient boosting models?
3) According to the study, service life and condition index had little influence on outage probability. It is unclear if this is due to the nature of the data, reporting practices, or genuine resilience of the infrastructure.
4) The dataset is relatively small and associted with a single region. So, can the model really deliver reliable results when applied to other regions with diverse environmental and technical conditions? Please elaborate.
5) Which additional features could be added to the model to improve outage prediction accuracy?
6) Logistic regression reached a higher ROC AUC on the test set than on validation. However, this may be due to insufficient data. Applying other techniques, such as cross-region validation or temporal validation, could help assess model robustness.
7) Weighting worked better than class weighting and SMOTE-NC when dealing with imbalance. Perhaps combining these two approaches would lead to more satisfactory results.
Comments on the Quality of English LanguageThe manuscript requires thorough polishing. The definition of acronyms is inconsistent. Contractions should not be used.
Author Response
Dear Reviewer,
Firstly, we would like to thank you for high assessing our research and the comprehensive review. We tried to extract the main issues from it and answered them step-by-step. All changes were highlighted in yellow in the manuscript. Please, find response cover letter attached.
Author Response File: Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for AuthorsIn this paper, the problem of estimating the probability of an overhead line transmission outage is reduced to a binary classification problem. And then, to solve it, it is proposed to use machine learning algorithms. The following are used as input features: Conductor, type and cross-section of the line transmission; line transmission relation to transit; Condition index; Overhead line length; Overexploitation; Reinforced Concrete Supports; line transmission length through the forest; line transmission length in populated areas. The target variable is the fact of the overhead transmission line outage. The work is undoubtedly interesting and worthy of publication. However, a number of corrections and improvements are required.
- Both the abstract and the introduction of the paper use the term "predict". Using this term without mentioning classification unwittingly pushes towards regression tasks, which can lead to confusion.
- Although the text provides a description of Fig. 1, but there is no specific indication of which of the graphs in this figure is being discussed (a, b, c…).
- There is no detailed description of the input features. A short reference of 1-2 sentences is needed for each of them.
- There is no clear description and indication of the number of input features. Line 193 mentions "395 objects", and lines 184 and 185 mention "163 electrical lines" and "232 that". Summing up the latter, it becomes clear that "395 objects" refers to power transmission lines. Please describe the data preparation method in more detail.
- In my opinion, the description of the used python-packages in a separate subsection 2.3 is redundant. It should be integrated with subsection 2.2.
- In Table 1, format the name of the logistic regression model in accordance with other models.
- The figure captions should be corrected. For example, Figure 1. The results of model testing: a – logistic regression; b – decision tree.
Author Response
Dear Reviewer,
Firstly, we would like to thank you for high assessing our research and the comprehensive review. We tried to extract the main issues from it and answered them step-by-step. All changes were highlighted in yellow in the manuscript. Please, find response cover letter attached.
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThe paper can be accepted in the current form.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors addressed all raised concerns. However, as a final advice, I would kindly ask them to avoid using the track changes mode when revising their work. This makes the manuscript is nearly unreadable.