Automatic Boundary Extraction for Photovoltaic Plants Using the Deep Learning U-Net Model

: Nowadays, the world is in a transition towards renewable energy solar being one of the most promising sources used today. However, Solar Photovoltaic (PV) systems present great challenges for their proper performance such as dirt and environmental conditions that may reduce the output energy of the PV plants. For this reason, inspection and periodic maintenance are essential to extend useful life. The use of unmanned aerial vehicles (UAV) for inspection and maintenance of PV plants favor a timely diagnosis. UAV path planning algorithm over a PV facility is required to better perform this task. Therefore, it is necessary to explore how to extract the boundary of PV facilities with some techniques. This research work focuses on an automatic boundary extraction method of PV plants from imagery using a deep neural network model with a U-net structure. The results obtained were evaluated by comparing them with other reported works. Additionally, to achieve the boundary extraction processes, the standard metrics Intersection over Union ( IoU ) and the Dice Coefﬁcient ( DC ) were considered to make a better conclusion among all methods. The experimental results evaluated on the Amir dataset show that the proposed approach can signiﬁcantly improve the boundary and segmentation performance in the test stage up to 90.42% and 91.42% as calculated by IoU and DC metrics, respectively. Furthermore, the training period was faster. Consequently, it is envisaged that the proposed U-Net model will be an advantage in remote sensing image segmentation.


Introduction
In the last decade, the world began the transition towards renewable energy the harvesting of solar energy one of the most promising sources used today. Photovoltaic (PV) energy production is a fast-growing market: The Compound Annual Growth Rate (CAGR) of cumulative PV plants was 35% from year 2010 to 2019. The main reasons for this accelerated growth are: production cost of PV panels have decreased, return on investment ranging from 0.7 to 1.5 years. Some countries offer economic benefits for new facilities and the performance ratio (which informs how energy-efficient and reliable PV plants are against its theoretical production) is better nowadays. Before 2000 it was 70%, today performance ranges from 80% to 90% [1,2].
Nonetheless, PV plants present some challenges for maintaining proper performance with failures and defects being the most common ones. In general, failures on PV systems are more concentrated in the inverters and PV modules. In the PV modules, because of dirty equipment, environmental conditions, or manufacturing problems the PV plant energy output can be reduced by 31% [3][4][5]. To detect these problems, it is necessary to consider that the PV systems are commonly located on roofs, rooftops, and farms. Therefore the access, maintenance, and detection of possible problems in the panels should be carried out by trained and qualified personnel working at heights to detect these problems. These The convolutional neural networks are used for extracting dense semantic representations from input images and to predict labels at the pixel level. To perform this task, it is necessary to obtain or create a dataset, perform a pre-processing of the data, select an appropriate model and train it based on metrics, and then evaluate the results as shown in [11]. This is a fundamental challenge in computer vision with wide applications in scene interpretation, medical imaging, robot vision, etc. [39]. Once the segmentation is done, the next step is to obtain the automatic Coverage Path Planning (CPP).
Although advances in GPS systems have improved and accuracy is around 10 cm in low-cost Real Time Kinematics (RTK) GPS systems [40]. Most of the projects use software tools that provide companies like Mission Planner [41] or development groups as Qground-Control [42]. These tools are based on simple polygonal coverage areas and a coverage pattern of zigzag path. They require time when the area is of complex geometry, or when the plant is in continuous expansion. Additionally, the programmer preloads waypoints without optimal coverage. As a consequence, to develop a real-time path-planning algorithm for an autonomous monitoring system, it is a hard task in this platform. Therefore, it is first necessary to determine the boundary of the PV plant. By taking out the boundaries of PV plants, aerial photogrammetry and mapping can be faster, effective, economical and customizable [27], they motivate to make this work.
The key contributions of this work are as follows: • In the revised literature, there is no report of U-net model to extract the boundaries of PV Plants; this work proposes such a model.

•
The IoU and DC metrics were not used in previous related research works. For the trained and tested of U-net and FCN model, this work uses these metrics and finds a better solution.
This paper is structured as follows. In Section 2, the necessary definitions and techniques to obtain the results are described. In Section 3, the three techniques implemented for boundary extraction are compared to show the best method. Finally, in Section 4 some conclusions are shown.

Boundary Extraction Procedure
UAVs must have a precise set of coordinates to define the coverage path planning correctly and thus fly over the total area of PV Plants in the inspection mission. To achieve this task automatically, it is necessary to explore how to extract the boundary of photovoltaic facilities with some techniques. There is a process called semantic segmentation, where each pixel is labeled with the class of its enclosing object or region, which can extract the PV Plants as a particular object in an image [11], but with the constraints that this work addresses. Two techniques have been implemented so far: Traditional Image Processing (TIP) [10] and Deep Learning (DL) [11]. Figure 1 shows the steps followed to reach that result by TIP and DL-based techniques.

Traditional Image Processing (TIP)
The process to obtain the boundary pixels of a target can be achieved by means of traditional image processing techniques with functions that extract, increase, filter, and detect the features of an image and obtain its segmentation [27,32]. The main stages were used to remove the borders of PV plants out of an image, as shown in Figure 1 [10]. In the first stage, the original image was filtered using "filter2D" function from OpenCV, that is a convolution filter with 5 × 5 averaging filter kernel, as shown at Algorithm 1. This filter is compound with various Low-Pass Filters (LPF) and High-Pass Filters (HPF). LPF helps in removing noise, blurring images. HPF filters help in finding edges in images.
In the second stage, the filtered image is transformed into the HSV (hue, saturation, and value) representation. The transformation lessens reflection caused by environmental light during aerial image collection. Furthermore, this transformation helps in the colorbased segmentation required in the next stages.
In the third stage, each channel was processed separately to extract the area of the PV plants. This was achieved by applying thresholding operations on the HSV image. To extract the PV blue color out of the image, the HSV range limits for thresholding where determined: from (50, 0, 0) to (110, 255, 255). Thresholding was implementing using the inRange function of OpenCV.
At the fourth stage, two morphological operators were applied: the "erode" and "dilate" functions. Together these operations helps to reduce noise and to better define the boundaries of the PV devices, the application of erosion followed by dilation is also known as opening operation. Erosion and dilation requires an structuring element (also known as kernel) to be applied to the images. In this case, a rectangular kernel of 2 × 2 pixels (MORPH_RECT,(2, 2)) was used for both operations. Lines 13, 14 and 15 from Algorithm 1 show the creation of the structuring element and the successive use of the erode and dilate functions.
Then, the "findCountours" function was used to help in extracting the contours from the image. The contour can be defined as a curve joining all the continuous points in the boundary of the PV installation. The input parameters for this function are: the image (dilated image from previous stage), the type of contour to be extracted (in this case only the external contours, RETR_EXTERNAL) and the contour approximation method (in this case not approximation, CHAIN_APPROX_NONE). Finally the area was recognized using a multi-stage algorithm to detect a wide range of edges in images, known as the Canny edge detection "Canny" [44].
The pseudo-code of the Traditional Image Processing is shown in Algorithm 1, and was implemented in Python 3 using OpenCV library.

Deep Learning
Another approach to ascertain the boundaries of PV plants uses a DL-based technique which consists of several steps:

Data Specifications
The first step is to select the data for training the Neural Networks. The parameters to take into account are: PV Plants in orthophotos and aerial images with the respective masks for each image [11].

Data Understanding
The data preparation phase can be subdivided, into at least four steps. The first step is data selection inside the dataset. The second step involves correcting the individual data, which are assumed to be noisy, apparently incorrect, or absent. The third step involves resizing the data as needed. Finally, most of the available implementations assume that the data are given in a single table, so if the data are in several tables, they must be parsed together in a single one [45].

Modeling
In the literature, the semantic segmentation task has many existing models that can be selected for the desired task. In this work, two methods based on deep learning have been selected, taking into account the following criteria: the most competent for the type of task, the amount of data to be processed, the execution time, and the ease of implementation to predict each label for each pixel. The methods were selected according to [11,[46][47][48][49]. The FCN model was the first one selected, which was proposed by [33] and used by [11]. The network architecture is delineated in Figure 2. The second one is the U-Net model, first proposed by [38] and selected for this project. The network architecture is illustrated in Figure 3.  (a). Fully Convolutional Network (FCN) model: This model has two blocks. The first block is a series of 13 layers in order to create a modified version of a VGG16 backbone Figure 2, which was introduced for the first time by [50]. The VGG16 backbone has 16 convolutional layers and its creators belong to the team "Visual Geometry Group", hence its name VGG16. The backbone is the network that takes the image as input and extracts the feature map upon which the rest of the network is based. The second block consists of a series of deconvolutional layers that simply reverses the forward and backward passes of convolution. The last layer uses a softmax function to predict the probability of the category as shown in Figure 2. As a result, the input of FCN model is an RGB image, and the output is the predicted mask of the PV plants. For more details, read [33]. The parameters for the training process were depicted in Table 1.
(b). The U-net network model: This model has two blocks: a decreasing path and an increasing path, respectively, which gives it the u-shaped architecture or horizontal hourglass shape [51]. The decreasing path is a typical convolutional network that consists of repeated application of convolutions, each followed by a rectified linear unit (ReLU) and a max-pooling operation. During the decrease, the spatial information is reduced whereas feature information is increased. The increasing pathway combines the feature and spatial information through a sequence of upsampling layers followed by two layers of transposed convolution for each step [38,52], as illustrated in Figure 3. The parameters for the training process were depicted in Table 1. Its architecture is shown in Table 2. The platform used for FCN and Unet models by this work was Tensorflow with Keras backend [53]. The U-net model had never been used for this kind of application so far. The FCN and U-net models additionally have a binary cross-entropy function (H p ) to calculate the loss in the process of training the neuronal network [54]. As the problem at hand is a semantic segmentation task, Equation (1) is used. This function examines each pixel and compares the binary-predicted values vector with the binary-encoded target vector.
where y is the label of each pixel, it takes the value of 1 for the PV plants area and 0 to indicate other areas or elements, and p(y) is the probability of the pixel belonging to the PV plants area for all N points. The Adam optimization function is used to optimize the models [55]. Because semantic segmentation is the task at hand, it is essential to implement metrics to ensure the model performs well.

Metrics
The metrics evaluate the similarities between the predicted mask (N) and the original mask (S). Such similarity assessment can be performed by considering spatial overlapping information, that is, by computing the true positives (TP), false positives (FP) and false negatives (FN) given by TP = |N ∩ S|, FP = N \ S , and FN = S \ N , respectively.
There are three standard metrics commonly employed to evaluate the effectiveness of the proposed semantic segmentation technique [29,48,49,56]. The three metrics, namely, pixel accuracy (Acc), region Intersection over Union (IoU), and Dice Coefficient (DC).
Pixel accuracy is the ratio of correctly classified PV plants pixels to the total number of PV plants pixels in the original mask image [57], which can be mathematically represented as Equation (2).
The IoU metric (the Jaccard index) is defined by Equation (3). This equation is a ratio between the intersection of the predicted mask N, and the original mask S and the union of both. More details can be found in [58].
The DC metric [56,58,59] is expressed as Equation (4). This equation divides the intersection of the predicted mask N, and the original mask S times two by the sum of N and S.
To validate the results of the techniques described above, the FCN and U-net models were trained and their performance was evaluated by validating and testing samples of the Amir dataset [43]. The next section describes such results and compares the models in detail.

Database Specification
For this work, the DeepSolar [60], Google Sun-Roof [61], OpenPV [62], and Amir's databases were accessed [43]. Only the last database met the established parameters. It contained PV plants in orthophotos and aerial images with their respective masks. Furthermore, the PV plants images were from different countries around the world. Therefore, the "Amir" dataset was selected.

Results with TIP Technique
The results obtained in this work were compared with the results obtained in previous investigations where the TIP and the deep learning techniques were used alongside the FCN model [11].
The stages to obtain the results are shown in Figure 4. In the First Stage, a 2D filter was applied, depicted in Figure 4a. In the second stage, the filtered image is transformed into the HSV representation, Figure 4b. In the third stage, the blue color was filtered out, Figure 4c. At the fourth stage, the opening function was used, as seen in Figure 4d. Finally, the area was recognized using the canny method illustrated in Figure 4e. The results were satisfactory and can be modified depending on the environment.
The results are shown in Table 3. The TIP technique was obtained by randomly selecting images out of the test dataset, then applying the functions described in the methodology section (Section 2), and finally comparing the mask obtained with the original mask. The IoU metric obtained was 71.62% and the DC was 71.62%.

Results with DL-Based Techniques
The training data consisted of 2864 aerial images selected at random: 90% of the training dataset in the Amir database. The validation data were the remaining 10% of the same training dataset. Figure 5a shows the loss function and IoU metric of the FCN model during the training and validation process. The general trend of the two curves is consistent, showing that the network converges rapidly and is stable at iteration 30, and the loss value tends to 0.04%. Figure 5b shows the DC metric of the model during the training and validation stage. The general trend of the two curves is consistent at iteration 30.
On the other hand, using the same metrics, the U-net model proposed in this work shows a better performance. Figure 6a shows the loss function and IoU metric of the model during the training and validation stage. The common trend of the two curves shows the network converges quickly and is stable at iteration 16, and the loss value tends to 0.03%. Figure 6b shows the DC metric of the model all along the training and validation phase. The prevalent trend of the two curves is consistent and in iteration 16.  In the evaluation stage, 716 images were used along with the trained FCN model for PV plant detection. Some relevant results are shown in Figure 7. In this figure, the columns correspond to different PV plants. The first row contains the original images; the second row, the original masks; and, the third one, the predicted masks. The images used were taken in deserted regions and vegetation zones. The FCN model detects the PV plants in vegetation zones with some false positives. As an example, the second and third predictions of Figure 7 identify a lake and vegetation as part of the PV plants. In deserted regions, PV plants are detected more precisely. Although these images have very high precision, their predicted shape does not fully correspond to the original mask. Hence, it was necessary to review the performance metrics of the algorithm [63].
The segmentation results in the evaluation stage, using the same 716 images and the trained U-Net model, are shown in Figure 8. The arrangement is the same as in the previous Figure 7. It is noteworthy that this model correctly segments the photovoltaic plant while the other model does not achieve this result, as can be seen in the second and third predictions in Figure 8.  Afterwards, the trained model tested 716 samples. Table 3 shows the results and comparison among the TIP technique, the U-net proposed model and the FCN model used by [11], which was replicated in this study. The FCN and the proposed U-net models were compared. The accuracies obtained for the FCN model in the stages of training and testing were 97.99% and 94.16% respectively [11]. For U-Net proposed, the accuracy obtained in the stages of training and testing were 97.07% and 95.44%, respectively. Both results can be seen in Table 3.
To compare the FCN model proposed by Amir [11], and the U-net model proposed in this work, the two most used metrics in semantic segmentation problems were used. The FCN model was implemented with the standard IoU metric, whose result for the training stage was 94.13%,and the validation stage was 90.91% and for test stage was 87.47%. The DC metric of the validation 92.96% and test 89.61% which deviates a little from the training 95.10%. However, using the same metrics the U-net model proposed in this work shows a better performance. The IoU metric obtained was 93.57% in the training stage, 93.51% in the validation stage, and 91.42% in the test stage. The DC metric of the validation 94.44% was almost the same as that of the training 94.03% which deviates a little from the test 91.42%. Table 3 shows these results. Due to this, a difference was found between the FCN and U-net model for the first metric of 2.95% and for the second metric used of 1.81% difference was calculated. All files and logs from the experiments are available at GitHub in [64].

Discussion
The U-net model proposed reconstructs the segmented image and protects the original image shape characteristics by storing the grouping indices of the max-pooling layer, a process that is not done in the FCN model.
The training and testing accuracy is the percentage of pixels in the image that are classified correctly and cannot be taken as indicators of how similar the predicted PV plants and the original mask are [65]. For the purpose of comparing the similarity in the results, the IoU metric was used. This metric varies from 0 to 1 (0-100%) with 0 meaning no similarity and 1 meaning total similarity between original and predicted masks [63].
The U-net model proposed in this work aimed to obtain a value closer to 1 in the IoU metric. The iteration times show the model used is faster and therefore reliable for the training and processing stages obtaining results virtually in real time [66]. The DC is the other metric used in this work. This metric also ranges from 0 to 1, with 1 signifying the greatest similarity between the predicted and original masks [63]. Both metrics were used to determine if the U-net model was better than the FCN model in the validation and test stages. The values of the IoU and Dice metrics in Table 3 showed the U-net model had a better performance when compared to the FCN model. This work was implemented with VGG16 as an encoder because it was the encoder used by Amir [11], which is a comparison work, but in future work, it is possible to use other encoders like ResNet, AlexNet, etc. [37].
Finally, the results obtained with the TIP and FCN model agree with the results obtained by other authors [11,13]. The authors mentioned they did not use the standard metrics for these kinds of problems and the bias in the results were expected. On the contrary, this work did take these metrics into account and found satisfactory results. The U-net network increased the processing speed, veracity in the segmentation process, and the overall performance of the model.

Conclusions
This work used three techniques, namely, the TIP technique, the DL-based FCN and U-net models. This work applied the U-net model to PV plants. All the models were used for the extraction of the PV plants boundaries out of an image. As a consequence, the TIP technique can be very precise but requires constant adjustment depending on the image, whereas the FCN and U-net network models are more useful when it comes to unknown PV plants.
The U-net network model is novel for this kind of problem. It allows greater processing speeds and performance when predicting the area of PV plants, also better features. The results obtained open the door for further investigation of this model in this problem.
The U-net technique turned out to be satisfactory compared to the TIP technique and the FCN model used in previous studies. The values obtained in the implemented metrics guarantee that the areas predicted for the PV plants are similar to the real ones. The results also help to predict possible false positives, such as lakes in the vicinity of photovoltaic plants. The relevant features of an object can be obtained using this technique while using the FCN technique is not possible.