A Method Combining Line Detection and Semantic Segmentation for Power Line Extraction from Unmanned Aerial Vehicle Images
Round 1
Reviewer 1 Report
Overall I find the work interesting, IMHO it is fixable and I would encourage authors to address these points:
Literature review is somewhat shallow in the analysis (it just mentions some methods, but does not explore advantages and disadvantages of each - so must be rewritten) and would benefit from the comparison table.
Formulas are too generic. Add more details.
Add all hyper parameters and explain how they were selected.
Experimental setup is not clear. Processing happens where? Maybe also provide an architecture diagram for this solution.
No proper statistical reliability analysis. Must be added
No performance analysis. Must be added
I would recommend adding a detailed comparison to other works with similar type of image prosessing (in discussion). Also it is not clear how table 1 was made.
Add links to dataset and RAW experimental data.
Author Response
Dear Reviewer:
Thank you very much for your constructive and positive comments on our manuscript entitled “A Method Combining Line Detector and Semantic Segmentation for Power Line Extraction from Unmanned Aerial Vehicles Images” submitted to Remote Sensing.
We have revised our paper along the lines outlined by the reviewer. Our detailed responses follow, and the reviewer’s comments are in bold. Note also that changes in the manuscript are in yellow colour.
Point 1: Literature review is somewhat shallow in the analysis (it just mentions some methods, but does not explore advantages and disadvantages of each - so must be rewritten) and would benefit from the comparison table.
Reply: We rewrote the introduction of this manuscript, summarized and classified the methods mentioned, and summarized the advantages and limitations of different types of methods. Please review lines 49-54 and 75-80 of the manuscript in detail. In addition, we organize the methods mentioned into a table (Table 1) and summarize the characteristics of different methods with short words.
Point 2: Formulas are too generic. Add more details.
Reply: We review every formula in this manuscript to ensure that every parameter in the formula has been explained, rewrite formula 1-3 and add more parameter descriptions, and add formula 23 to make the description of the method more comprehensive.
Point 3: Add all hyper parameters and explain how they were selected.
Reply: There are three main external parameters used in the manuscript: the threshold of constructing Gaussian pyramid, the threshold of object Markov random field (OMRF) and the threshold of Kalman filter (KF) tracking. We added lines 392-395 to give the threshold required for KF tracking and explain the reason for selection. The selection of Gaussian pyramid and OMRF threshold is based on the experiments in sections 4.1.1 and 4.1.2 respectively. We marked the selected value of threshold in yellow colour for your review.
Point 4: Experimental setup is not clear. Processing happens where? Maybe also provide an architecture diagram for this solution.
Reply: We add lines 401-403 to describe the hardware platform on which the manuscript experiment depends, and combed the overall process of the method. The main algorithm process is summarized into a table after line 403. The method process can also be seen in the flow chart (Figure 3).
Point 5: No proper statistical reliability analysis. Must be added.
Reply: We add the statistical chart and table of Recall value of test images (Figure 17 and Table 2). This paper mainly uses Recall and Precision as the evaluation indexes of model extraction accuracy. These two indexes are also the most commonly used accuracy evaluation indicators in the fields of computer vision and machine learning. They can comprehensively reflect the accuracy level of a classification or segmentation model.
Point 6: No performance analysis. Must be added.
Reply: We add Table 3 in the discussion to show the time cost for different methods to extract power lines in the images. The method used in the manuscript needs to build a Gaussian Pyramid in the line detection link, and needs to iterate repeatedly to obtain the optimal solution when using OMRF. Therefore, the complexity of the algorithm is large. This advantage is that it can accurately extract the required objects in the complex background, but it also greatly reduces the efficiency of the algorithm in image processing. This is the biggest disadvantage of the method used in this paper, which makes this method not suitable for application scenes that need real-time and rapid detection. Therefore, we do not reserve too much space to elaborate on this issue in the manuscript, but only make a brief description in the discussion part. In the future, we need to further optimize the method to improve the efficiency of the algorithm as much as possible on the premise of ensuring the detection accuracy.
Point 7: I would recommend adding a detailed comparison to other works with similar type of image processing (in discussion). Also, it is not clear how table 1 was made.
Reply: We add line 575-582 in the discussion part to further qualitatively describe the characteristics and differences of each method and their impact on the extraction results. The original table 1 (current table 2) selects 30 images from the dataset for accuracy evaluation. The accuracy evaluation is mainly based on formulas 28 and 29. The specific conditions can be reviewed in line 508-529.
Point 8: Add links to dataset and RAW experimental data.
Reply: Thank you for your suggestions on data disclosure. Since the funding project of the research in this manuscript has not been concluded, the project also involves commercial companies engaged in UAV hardware development and data acquisition, and the data used in the manuscript are independently obtained by the project participants, rather than using online public datasets (such public image dataset are also very few), so these data cannot be disclosed temporarily before the conclusion. It is a pity that we can't add the dataset link and the RAW experimental data. You can pay attention to our follow-up research. After the project is completed, we will gradually disclose the data we use.
Special thanks to you for your positive and constructive comments for our work.
Author Response File: Author Response.docx
Reviewer 2 Report
The authors present a method for the automatic extraction of power line from UAV images. For that, very consolidated method are blended to obtain a very high final accuracy.
The article is sound even if, from a conceptual perspective, there are two points that the authors should explain, since the methodology sounds outdated.
1 - several methods in the literature prove the advantage of exploiting AI segmentation methods. Why are not discussed in this paper
2 - as the dataset is UAV based, I might expect that a photogrammetric approach would improve the final result by exploiting the third dimension.
Author Response
Dear Reviewer:
Thank you very much for your constructive and positive comments on our manuscript entitled “A Method Combining Line Detector and Semantic Segmentation for Power Line Extraction from Unmanned Aerial Vehicles Images” submitted to Remote Sensing.
We have revised our paper along the lines outlined by the reviewer. Our detailed responses follow, and the reviewer’s comments are in bold. Note also that changes in the manuscript are in yellow colour.
Point 1: several methods in the literature prove the advantage of exploiting AI segmentation methods. Why are not discussed in this paper?
Reply: Thank you for your comment. With the wide application of deep learning in the field of image processing, a variety of algorithms such as image classification, object recognition and image segmentation based on deep learning have developed rapidly. In recent years, researchers have made a preliminary exploration on power line detection in images under the framework of deep learning. Pan et al. [1] first designed an edge detector to obtain a set of edge images, then classified these edge images by convolutional neural network (CNN) to obtain the edge images containing power lines, and finally fitted the power lines by Hough transform. Yetgin et al. [2] proposed two power line recognition algorithms in aerial images based on CNN. The first method adopts the end-to-end design idea. The main network selects VGG-19 and Res Net-50 structures to replace the last layer of CNN with a randomly initialized softmax layer. The second method first uses the feature extraction layer of CNN to extract features, then uses principal component analysis (PCA) algorithm to reduce the dimension of the output results, and then inputs them into support vector machine (SVM), Bayesian classifier and random forest classifier respectively. Finally, the images containing power lines can be obtained. In addition, this method shows that the CNN network model pre-trained by Image Net database can improve the detection accuracy of power lines. Zhao et al. [3] proposed a linear IoU calculation method according to the linear characteristics of power line, which improved the original IoU of Mask R-CNN and improved the performance of power line extraction; the improved network is trained on the power line dataset to obtain the rough extraction result of power line; finally, through the segment grouping fitting algorithm, the rough extraction results are clustered and fitted.
It can be seen that although deep learning methods such as CNN have great advantages in the field of image detection, classification and segmentation, AI methods still have the following problems for the specific task of power line recognition and extraction in images: (1) There are great differences between power lines and other conventional detected objects in the field of image processing. The object of power line in UAV image is relatively weak and has a thin and long physical structure, which brings great trouble to feature extraction and feature matching. Therefore, it is extremely difficult to directly use the deep learning algorithm to locate the power line target end-to-end; (2) The deep learning method needs to train the model and has high requirements for sample data. When establishing the database, it needs to ensure the sufficient number of samples and the diversity of scenes contained in the sample as much as possible. At the same time, it also needs to label the data, this work often needs manual operation, which will greatly increase the economic cost and time cost of the research. At present, the data used in this paper is not enough to meet the data volume and sample richness required by the deep learning model. To sum up, this paper does not further involve the application of AI methods such as deep learning in power line extraction from images.
The algorithm proposed in this paper regards power line extraction as an image segmentation problem. Referring to a large number of existing research on power line extraction in images, it can be found that the algorithm system of power line extraction using semantic segmentation algorithm is still in its infancy. This kind of algorithm has certain research potential and is suitable for application scenarios without large sample database, no prior knowledge and low real-time requirements. In the future, with the continuous accumulation of data and further optimization of methods, we will focus on the application of AI methods such as deep learning in power line extraction. And we add 595-599 lines in the discussion part of this manuscript to illustrate this problem.
References
[1] Pan C, Cao X, Wu D. Power line detection via background noise removal. IEEE Global Conference on Signal and Information Processing (Global SIP), 2016, 871-875.
[2] Yetgin Ö E, Benligiray B, Gerek Ö N. Power line recognition from aerial images with deep learning, IEEE Transactions on Aerospace and Electronic Systems, 2018, 55(5), 2241-2252.
[3] Zhao Y, Hu Y, Wang X, Zhao L. Automatic power line extraction algorithm in complex scene. Bulletin of Surveying and Mapping, 2021, (8), 1-6.
Point 2: as the dataset is UAV based, I might expect that a photogrammetric approach would improve the final result by exploiting the third dimension.
Reply: Thank you for your proposal. The use of photogrammetry to obtain the three-dimensional information of objects mainly depends on the oblique photography technology, which obtains rich high-resolution textures of the top surface and side view of objects by synchronously collecting images from five different perspectives—one vertical and four oblique visual angles. To some extent, it can truly reflect the situation of objects, and further generate a real three-dimensional object model through positioning, fusion and modeling technologies. However, the application of this technology in this paper has the following difficulties:
(1) In order to obtain multiple angle information of the ground object, the oblique photogrammetry takes images of the ground object in the front, rear, left, right and vertical directions through five cameras carried by the aircraft or UAV (Figure 1a). Therefore, the camera system carried by oblique photogrammetry is more complex and generally requires multiple cameras to be combined (Figure 1b), which will greatly increase the economic cost of the experimental equipment. The equipment used in this paper is only equipped with a high spatial resolution camera, which can obtain the orthophoto image of the ground, that is, the image angle is perpendicular to the ground. Therefore, it is difficult to build the hardware system of photogrammetry in a short time, and it is impossible to obtain the three-dimensional information of objects through photogrammetry technology.
* Please refer to the Word file for figure details.
(2) The obtained oblique image also needs a series of operations such as connection point extraction, multi view image matching, control point correction and triangulation to form the three-dimensional model of ground objects, which greatly increases the difficulty of image processing and the time cost of research. Moreover, the power line distribution is always in the form of corridor, with small width and large length. Some power line corridors can be distributed for hundreds of kilometers, which requires a lot of time to build a three-dimensional model. Therefore, the practical application of the oblique photography technology in power line extraction is of little significance.
(3) Photogrammetry technology is mainly used in topographic map mapping, urban digital city construction, geological disaster monitoring, archaeological excavation and other fields. The research objects of these tasks are often large-scale ground objects such as buildings, roads and mountains, which have less requirements for the detailed expressiveness of ground objects, and the model accuracy is low in areas close to the surface and covered by other objects (Figure 2). The power lines are only a few pixels wide in one image. The modeling accuracy of this object in photogrammetry technology is low and it is not easy to obtain fine details. Therefore, the application ability of photogrammetry technology in small ground objects such as power line is relatively limited.
* Please refer to the Word file for figure details.
Therefore, this paper does not use photography technology to enhance the results, but the three-dimensional information has a strong enhancement effect on the detection and extraction of ground objects. The UAV equipment used in this paper is also equipped with a set of light detection and ranging (LiDAR) system. LiDAR has the advantages of active remote sensing, rapid acquisition of three-dimensional information, high spatial resolution and strong penetration. At present, it has become the main means to obtain 3D information of ground objects. The follow-up research will focus on the fusion of power line image information and LiDAR point cloud information to further enrich the feature details and extraction accuracy of the object. Due to the limited space, it cannot be described in detail in this manuscript, which will be further discussed in the follow-up paper. And we add 595-599 lines in the discussion part of this manuscript to illustrate this problem.
Special thanks to you for your positive and constructive comments for our work.
Author Response File: Author Response.docx
Reviewer 3 Report
Formal evaluation:
Electricity transmission and distribution systems are key elements providing the energy for almost all aspects of modern life. The key elements of these systems are power lines, mostly constructed as overhead power lines, exposed to the environment (sever whether, vegetation, …) and thus very vulnerable. Regular inspections of the power line corridor are needed to lower the risks related to the vegetation grow in the corridor of the power lines, and also to identify possible structural problems (damage of pylon construction, destabilization of pylon base due to landslides, flooding etc.). However, operators are usually maintaining several thousand kilometers of powerlines, often crossing difficult terrain, making the inspection a very difficult and time-consuming task. Aerial scanning (optical, lidar, radar) can significantly improve the inspection speed (compared to traditional methods) and provide unprecedent level of details related to the safety conditions of the power line, however it produces huge amounts of data that must be processed semi manually, increasing the time and costs. Therefore, effective algorithms for automatic detection of important features from the scanned data are crucial for further utilization of this modern inspection methods. The basic problem is of course effective detection and classification of important power line elements (towers, conductors, insulators etc.). From this point of view, the topic of the manuscript is actual, important, interesting and fully falls into the scope of this journal.
The title of the manuscript is informative and well describes the topic of this work. Abstract provides sufficient information about the content of this manuscript. Introduction provides a fair insight into the problematic, how it’s currently covered in the literature and what is the main motivation for this work. List of references is adequate. The general structure of the manuscript is ok, the order of sections is logical.
The layout of the document should be improved in the final version (there are some problems like figure descriptions moved to another side, improper paragraph formatting at page 18 etc.) however I believe this is not the fault of authors but result of conversion into the format required by this journal (I personally think the previous template used by MDPI was much better than the current template).
Quality of figures is fine. Basically, all images are well readable even at a 100 % zoom, so the document is readable also when printed. Formatting of equations is OK.
I have to stress I’m not an English native speaker, so I don’t dare to make any strict judgements about the language level of the document. From my point of view, the document is easy to read and understand (the only thing decreasing the readability of the document is intensive use of abbreviations). However, some language polishing is desirable. I recommend checking the language of the document by a native speaker or use an English editing service to improve the language level of the manuscript. I also encourage authors to add a nomenclature section describing all the abbreviations used in the document.
Content evaluation:
To be honest, my primary specialization is electrical power engineering. I have experience with aerial scanning of power line corridors and subsequent processing of acquired data (mostly from LiDAR scanning). But the image processing is not my primary area of expertise, so I will try to provide my point of view as a power engineer (as a final consumer of these results) and wont deep dive into the details of the proposed methodology (I believe this aspect of the manuscript will be covered in the review report of my fellow reviewers).
The first problem (from the power engineer point of view) is the misleading terminology used across the document.
Authors are using somehow misleading or inaccurate terminology related to electrical power systems. Please note that power system, power line and conductor are terms describing completely different things. The most problematic is using the term “power line” instead of “conductor”. What you are detecting in the images are conductors. In case of transmission lines, individual conductors are usually bundled (typically 2-6 conductors are connected together in a uniform configuration and form one phase of the system). Bundled phase conductors are clearly visible in fig. 18 a).
The power line consists of several phase and ground conductors supported by towers (pylons), with the phase conductors separated from the tower by insulators. The power line is a system, not a single conductor. If you really want to detect powerlines in your figures, your algorithm has to figure out somehow what conductors detected in the figure are part of a single power line. (Fig. 16. a) shows crossing of two powerlines, each consisting of multiple conductors). Indeed, there are some software tools providing this functionality (the system tries to classify detected conductors as belonging to a single powerline, and also identifying individual phases of the power line). However, this is apparently not the goal of this work.
Power system is a general term describing all powerplants, substations, power lines as a whole (including both transmission and distribution level).
In this context, even the title of this manuscript is not entirely correct, however the context is clear enough to justify the improper terminology (in fact I don’t recommend changing the term power line in the title to conductor, because it could be somehow confusing for some readers). However, in the text of the manuscript, you should use the proper terminology.
If you are not sure about the correct meaning of these terms, please use the portal https://www.electropedia.org, this portal is an online dictionary of terms related to electrical engineering, it is basically an online form of the IEC 60050 series of international standards.
Another weak point of the document is the absence of some definition of the required accuracy to support the intended inspection tasks. You have provided a comparison of the accuracy for the proposed and several other detection methods and from the results it seems the proposed methodology provides significantly better results than the other mentioned methods. But is this accuracy enough to make the method practically applicable? From my experience with lidar data processing, the commercial software we used (TerraScan) was able to detect and classify the points belonging to a conductor with very high accuracy – I don’t remember the exact numbers, but I’m sure in most cases the accuracy was higher than 95 %. However even if the accuracy would be 99 %, it means there always will be some misclassified points, requiring subsequent manual classification. Therefore, it was often faster and easier to use a semi-manual classification than use a fully automatic classification routine. From this point of view, I as a reader would really appreciate to see what accuracy is necessary for some common inspection tasks, so I could see what the true potential of the proposed method in a real-world application is (or how much further improvement is needed to fulfil the requirements and allow a truly autonomous processing).
The last important question related to the practical application of this methods is the computational performance – in other words how much processing time it takes to process a sample picture or a set of pictures representing a given length of a powerline (and of course comparison with other methods). The problem is somehow mentioned in the discussion; however, it would be interesting to see some exact numbers to see how much more complex and time consuming the process is.
I have to say I appreciate that in the discussion, authors have clearly declared the limitations and drawbacks of the proposed method.
Jus a small question (not directly related to the content of the article). If the proposed method still requires some user interaction, why not let the user manually define the general direction of the power line (witch is usually a relatively easy task due to the linear nature of the powerline, the user has just to connect the centers of towers at both ends of a straight section of the power line) and then, when the approximate direction in a given power line section is known, it should be very easy to effectively filter out all the detected segments with improper orientation and position (related to the approximate axis of the powerline).
Appendix:
Few definitions from the Electropedia portal related to this work:
Power system - all installations and plant provided for the purpose of generating, transmitting and distributing electricity (IEV ref. 601-01-01)
Power line - device consisting of conductors, insulating materials and accessories for the purpose of conveying electromagnetic energy between two electric devices (IEV ref. 151-12-27)
Conductor - element intended to carry electric current (IEV ref. 151-12-05)
Conductor bundle - set of individual conductors connected in parallel and disposed in a uniform geometrical configuration, that constitutes one phase or pole of an overhead line (IEV ref. 466-10-20)
Author Response
Dear Reviewer:
Thank you very much for your constructive and positive comments on our manuscript entitled “A Method Combining Line Detector and Semantic Segmentation for Power Line Extraction from Unmanned Aerial Vehicles Images” submitted to Remote Sensing.
We have revised our paper along the lines outlined by the reviewer. Our detailed responses follow, and the reviewer’s comments are in bold. Note also that changes in the manuscript are in yellow colour.
Point 1: The first problem (from the power engineer point of view) is the misleading terminology used across the document.
Reply: Thank you for your suggestions on professional terms. We have also carefully studied the definition of various facilities and equipment in the field of electrical engineering. Referring to the previous research results and references, the term “power line” is widely used in the field of remote sensing and image vision, involving power line detection and extraction work. Using the professional term of “conductor” may cause some confusion and misleading to readers, make the research object of this paper unclear and give other researchers a greater sense of strangeness. Therefore, the term “power line” is still used in this paper, but the definition of this term is added in the introduction (line 100-104), which emphasizes that the power line used in this paper refers to the conductor defined in the field of electrical engineering, which is a transmission facility formed by multiple power lines tied together, and gives the website of the online dictionary of terms related to electrical engineering. For readers interested in this field, these materials can be further consulted. This operation can better balance the universality and professionalism of the wording of this manuscript, and facilitate readers in various fields to obtain effective information.
Point 2: Another weak point of the document is the absence of some definition of the required accuracy to support the intended inspection tasks.
Reply: We add the statistical chart and table of Recall value of test images (Figure 17 and Table 2). This paper mainly uses Recall and Precision as the evaluation index of model extraction accuracy. These two indexes are also the most commonly used accuracy evaluation indicators in the fields of computer vision and machine learning. They can comprehensively reflect the accuracy level of a classification or segmentation model. These two indexes are briefly introduced here:
Generally, the samples predicted are divided into positive samples and negative samples. TP (true positive): indicates that the positive sample is predicted as a positive sample; FP (false positive): indicates that the negative sample is predicted as a positive sample; TN (true negative): indicates that the negative samples is predicted as a negative sample; FN (false negative): indicates that the positive sample is predicted as a negative sample.
Recall = TP/ (TP+FN) |
(1) |
Precision = TP/ (TP+FP) |
(2) |
Recall indicates how many positive examples in the samples are predicted correctly, and Precision indicates how many of the samples predicted to be positive are really positive. The relationship between the two can be expressed in the figure below.
* Please refer to the Word file for figure details.
Recall and precision, one shows whether the prediction results are complete and the other shows whether the prediction results are accurate. So, Recall can also be called Completeness, and Prec can be called Correctness. Combined with these two indexes, the accuracy quality of a classification or segmentation model can be clearly obtained.
Point 3: The last important question related to the practical application of this methods is the computational performance.
Reply: We add Table 3 in the discussion to show the time cost for different methods to extract power lines in the images. The method used in the manuscript needs to build a Gaussian Pyramid in the line detection link, and needs to iterate repeatedly to obtain the optimal solution when using Markov Random Field. Therefore, the complexity of the algorithm is large. This advantage is that it can accurately extract the required objects in the complex background, but it also greatly reduces the efficiency of the algorithm in image processing. This is the biggest disadvantage of the method used in this paper, which makes this method not suitable for application scenes that need real-time and rapid detection. Therefore, we do not reserve too much space to elaborate on this issue in the manuscript, but only make a brief description in the discussion part. In the future, we need to further optimize the method to improve the efficiency of the algorithm as much as possible on the premise of ensuring the detection accuracy.
Point 4: Jus a small question (not directly related to the content of the article). If the proposed method still requires some user interaction, why not let the user manually define the general direction of the power line and then, when the approximate direction in a given power line section is known, it should be very easy to effectively filter out all the detected segments with improper orientation and position.
Reply: Human-computer interaction is indeed the best way to improve the extraction accuracy at this stage. But thousands of images will be obtained every UAV fly mission, if all the human-computer interaction modes are adopted, the working time cost and economic cost will be greatly increased. Therefore, all researches are exploring the method of automation, which can greatly reduce human intervention and hand labour, and is closer to the needs of this work in practical application. The methods used in the manuscript should be automated as much as possible. Although some external parameters need to be set, these parameters have been determined in the early research, the subsequent extraction work only uses the same parameter values and does not need to repeat the parameter adjustment process, which greatly enhances the automation ability of the algorithm. In the follow-up, we need to consider using the adaptive parameter adjustment method to obtain the better parameter value for different data sources.
Special thanks to you for your positive and constructive comments for our work.
Author Response File: Author Response.docx
Round 2
Reviewer 1 Report
Because the authors have responded to my concerns, I can recommend accepting this work with minor revisions (a) more works must be added to table 1 (dealing in line processing in general; b) English grammar must be edited by a professional speaker).
Author Response
Dear reviewer,
Thank you for your comments on this manuscript.
I checked and revised all the tables in the manuscript (the modified contents are marked with yellow colour), and completed the English editing (English editing ID: English-41048).
Thank you again for your contribution to our manuscript.