Combined Multi-Layer Feature Fusion and Edge Detection Method for Distributed Photovoltaic Power Station Identiﬁcation

: Distributed photovoltaic power stations are an e ﬀ ective way to develop and utilize solar energy resources. Using high-resolution remote sensing images to obtain the locations, distribution, and areas of distributed photovoltaic power stations over a large region is important to energy companies, government departments, and investors. In this paper, a deep convolutional neural network was used to extract distributed photovoltaic power stations from high-resolution remote sensing images automatically, accurately, and e ﬃ ciently. Based on a semantic segmentation model with an encoder-decoder structure, a gated fusion module was introduced to address the problem that small photovoltaic panels are di ﬃ cult to identify. Further, to solve the problems of blurred edges in the segmentation results and that adjacent photovoltaic panels can easily be adhered, this work combines an edge detection network and a semantic segmentation network for multi-task learning to extract the boundaries of photovoltaic panels in a reﬁned manner. Comparative experiments conducted on the Duke California Solar Array data set and a self-constructed Shanghai Distributed Photovoltaic Power Station data set show that, compared with SegNet, LinkNet, UNet, and FPN, the proposed method obtained the highest identiﬁcation accuracy on both data sets, and its F1-scores reached 84.79% and 94.03%, respectively. These results indicate that e ﬀ ectively combining multi-layer features with a gated fusion module and introducing an edge detection network to reﬁne the segmentation improves the accuracy of distributed photovoltaic power station identiﬁcation.


Introduction
Renewable energy is a sustainable and inexhaustible energy, including biomass energy, wind energy, solar energy, etc., which plays an important role in solving the energy crisis. Biomass energy can be converted into Eco-fuels, and it has been found that Eco-fuels are a sustainable energy scenario at the local scale [1]. The main use of wind energy is to convert energy into electricity through wind turbines. Solar energy is a clean and safe renewable energy source (RES) with strong development potential and application value [2]. Photovoltaic power generation is an effective way to use solar energy [3], of which there are two main forms: Centralized photovoltaic power generation and distributed photovoltaic power generation [4,5]. Centralized photovoltaic power stations are installed primarily in the desert and other ground areas and the generated electricity is usually incorporated into the national public power grid [6], while distributed photovoltaic power stations are generally installed on tops of buildings and the generated electricity is mainly for the inhabitants' own use [7]. Distributed photovoltaic power stations have advantages such as unlimited installed capacity, no occupation of land resources [8], and no pollution. Thus, exploitation of distributed photovoltaic power generation is an important solar energy development mode that has entered a stage of rapid development and is supported by Chinese policy [9,10]. The International Energy Agency predicts that the world's total renewable energy generation will grow by 50% between 2019 and 2024, with solar photovoltaic generation alone accounting for nearly 60% of the prospective growth. Distributed photovoltaic generation is expected to account for approximately half of the growth in total photovoltaic power generation [11]. The installed capacity of distributed photovoltaic power stations is currently growing rapidly. Consequently, the ability to accurately and efficiently acquire the installation locations, distribution, and total area of distributed photovoltaic power stations over a wide range is of importance to energy companies, governmental departments, and investors. For example, obtaining information of distributed photovoltaic power stations can help optimize power system planning [12]. The information of distributed photovoltaic power stations and solar irradiance data of building surfaces can be combined to predict the power generation potential [13]. Moreover, it can also support the development of open data and energy systems and facilitate the development of the energy field [14]. However, due to the spontaneity and randomness of distributed photovoltaic power station construction, it is difficult to obtain accurate information regarding the quantity and distribution of distributed photovoltaic power stations solely from governmental department planning information. In addition, distributed photovoltaic power stations are generally installed on the tops of buildings, making it difficult to investigate their distribution and area manually. High-resolution remote sensing imagery has the characteristics of high spatial resolution, high efficiency, and wide coverage. Thus, it provides the possibility for automatic identification of large-scale distributed photovoltaic power stations.
Traditional distributed photovoltaic power station identification methods rely mainly on manually designed features, and it is difficult to accurately obtain the location and area of photovoltaic power stations. Malof [15] pioneered the use of manual features for extracting distributed photovoltaic power stations and proposed a method that first obtains all the maximally stable extreme regions (MSERs) [16] from an image and then filters out the areas with low confidence. Then, color features and shape features in the remaining candidate area are extracted for classification by a support vector machine (SVM) [17]. However, this method does not obtain photovoltaic panel areas accurately. Later, Malof [18] used color, texture, and other features in the neighborhood of each pixel to represent the pixel, and then used a random forest [19] to predict the category of each pixel. However, this method also has difficulty accurately obtaining the location and area information of photovoltaic panels. On the basis of the research conducted by the authors of [18], Malof [20] cascaded the random forest and convolutional neural network [21] to identify distributed photovoltaic power stations. However, this method still relies on feature information designed by humans. In a later work, Malof [22] proposed a distributed photovoltaic power station identification model based on a VGG model [23]. However, its ability to accurately obtain the locations and shapes of photovoltaic panels is limited.
As deep learning technology has developed, a series of convolutional neural network (CNN) models have been proposed [23][24][25][26][27][28][29][30]. Semantic segmentation technology based on deep learning can use a CNN, which has strong feature-learning ability, to automatically learn object features from massive amounts of data. Compared with earlier machine learning methods, such as SVMs and random forests, CNNs significantly improved the object extraction accuracy. Semantic segmentation technology has been widely applied and developed rapidly in fields such as medical image segmentation, automatic driving, and video segmentation. Jiang [31] used a CNN model and small data sets to extract the heart and lungs. Zhou [32] proposed the UNet++ model that has achieved high accuracy in nodule, nuclei, and liver segmentation. In addition to 2D medical image segmentation, the 3D full convolutional neural network can be used to realize organ segmentation in CT images [33]. Deep learning has become a robust and effective method for medical image segmentation [34]. In the field of automatic driving, CCNet [35] and ACFNet [36], respectively, used spatial context information and class context information to achieve the segmentation of objects in the street scene. Gated-scnn [37] combined shape and semantic information to extract targets on the street. In addition, in order to improve the performance of target segmentation in automatic driving, the idea of knowledge distillation has been used to retain the model's high precision while reducing the computation [38]. For the video semantic segmentation task, Paul [39] proposed an efficient video segmentation method that combines a convolutional neural network running on the GPU with an optical stream running on the CPU. Pfeuffer [40] added recurrent neural network into the video segmentation model to make full use of the time information of video sequence and improved the accuracy of video segmentation. Jain [41] proposed a video segmentation model with two input branches, which made use of the feature information of the current frame and the context information of the previous frame. Nekrasov [42] proposed a video segmentation algorithm without reliance on the optical flow, which further improved the efficiency of video segmentation. In addition to the natural image domain, semantic segmentation methods based on fully convolutional neural networks (FCN) [43] models have been widely used for object identification from remote sensing imagery, including road extraction, building extraction, and water extraction. For example, Zhou [44] proposed a road extraction method based on encoder-decoder structure and series-parallel dilated convolutions. Wu [45] added attention mechanism to the model [44], which further improved the accuracy of road extraction. Xu [46] designed a road extraction model based on DenseNet [30] and attracted local and global attention. Gao [47] used the refined residual convolutional neural network to extraction road in high-resolution remote sensing images. Xu [48] used deep convolutional neural network to extract buildings and optimized the results with guided filters. Yang [49] used DenseNet [30] and the spatial attention module to extract buildings. Huang [50] presented a residual refinement network for building extraction that fused aerial images and LiDAR point cloud data. Sun [51] proposed a building extraction method combining multi-scale convolutional neural network and SVM. Yu [52] proposed a water body extraction method based on convolutional neural networks, which used both spectral and spatial information from Landsat images Chen [53] proposed a cascade hyperpixel segmentation and convolutional neural network classification method to extract urban water bodies. Li [54] used fully convolutional network to extract water bodies from GeoFen-2 images with limited training data. Some previous deep learning-based semantic segmentation methods have been applied to the identification of distributed photovoltaic power stations. Yuan [55] was the first to introduce an FCN model for distributed photovoltaic power station identification. However, the adopted FCN model requires up-sampling by a large multiple, which may cause the loss of feature information. Subsequently, SegNet [56] and UNet [57] were used to identify distributed photovoltaic power stations [58,59]. Although the identification results of those models are superior to the results of traditional methods, they still do not solve the problem that photovoltaic panels with small areas are easily missed and densely installed photovoltaic panels are easily adhered.
To solve the above problems, this paper proposes a distributed photovoltaic power station identification method that combines multi-layer features and edge detection. The main contributions aims of this paper are as follows: • To address the problem that small photovoltaic panels are difficult to recognize, a gated fusion module is introduced into the encoder-decoder model to effectively fuse multi-layer features, which improves the model's ability to identify small photovoltaic panels.

•
To address the problem of edge blurring, a multi-task learning model that combines edge detection and semantic segmentation is proposed to refine the edges of the segmentation results using feature information of the target edge.

•
Comparative experiments are conducted on the Duke California Solar Array data set [60] and the Shanghai Distributed Photovoltaic Power Station data set, and the results verify the effectiveness of the proposed method.
The remainder of this article is organized as follows. Section 2 introduces the distributed photovoltaic power station identification model designed in this paper, including the encoder-decoder architecture, gated fusion module, and edge detection network. Section 3 presents the experiments and results analysis on the two data sets, including the experimental data, evaluation metrics, experimental settings, and the experimental results. The results are analyzed and compared with those of other methods. Finally, Section 4 concludes this paper.

Model Architecture and Design
The model proposed in this paper was composed of a semantic segmentation network and an edge detection network. These 2 networks were trained in parallel for multi-task learning, as shown in Figure 1. The semantic segmentation network was used to extract the semantic features of photovoltaic panels, and its architecture included an encoder-decoder structure based on UNet. The encoder was Efficientnet-B1 [61]. In the semantic segmentation network, a gated fusion module was introduced to control the transmission of valuable information, effectively fuse multi-layer features, and improve the ability to identify small photovoltaic panels. The edge detection network was used to extract the edge features of the photovoltaic panels and guide the semantic segmentation network to produce segmentation results with more refined edges to alleviate the problem of blurred and unrefined edges in segmentation results.

Semantic Segmentation Network with Gated Fusion Multi-Layer Features
A semantic segmentation network was used to extract the semantic features of photovoltaic panels. Efficientnet-B1 uses an encoder, and a gated fusion module was introduced to effectively fuse multi-layer features.

Encoder and Decoder
This study adopted EfficientNet-B1, which has strong feature representation capabilities, as the encoder for feature extraction. This decoder is the same as that used in the original UNet. The Efficientnet-B1 network structure is shown in Figure 2. The basic component of Efficientnet-B1 is the MBConv module. In the MBConv module, a 1 × 1 convolution is first used to change the channels of the input features, followed by a depth-wise convolution. Then, the channel attention mechanism of SENet [62] is introduced, and finally, a 1 × 1 convolution is used to reduce the channels of the feature maps. The original UNet encoder structure consists of 5 stages. The feature resolution at each stage is successively changed to half of that of the previous stage through down-sampling, and the features of each stage are fused with the corresponding decoder features through skip connections. Based on the UNet structure, this paper adopted the output features of Stages 0, 2, 3, 5, and 7 of Efficientnet-B1 as the 5 encoder blocks used in the encoder of our model, as shown in Figure 3, which assumes that the size of the input image is 256 × 256 × 3. The decoder is mainly used to gradually up-sample the low-resolution high-level features to restore the original size of the input image. During the up-sampling process, the corresponding features of the encoder and decoder are concatenated through skip connections. The decoder structure block is shown in Figure 4. The decoding features represent the output feature of the previous decoder block, and the encoding features represent the features passed to the corresponding decoder block through the skip connections. First, the decoding features are up-sampled twice and then concatenated with the encoding features on the channel dimension. The number of channels of the concatenated features is the sum of the number of channels of the two features. After the concatenation and two 3 × 3 convolutional layers, the output features of the decoder block are obtained. The output features of the current decoder block are the input decoding features for the next decoder block.

Gated Fusion Module
Inspired by the research conducted by the authors of [63], a gating fusion module was introduced to effectively fuse the multi-layer features to improve the ability to identify small photovoltaic panels. The gating fusion module structure is shown in Figure 5. The input is the feature of the adjacent layer of the encoder, and the features generated by the gating unit are used to measure the usefulness of the feature at each position in the spatial dimension. This arrangement controls the transmission of useful information and suppresses the transmission of useless information. The input to the gated fusion module consists of the features F i from layer i and the features F i+1 from the adjacent layer i + 1. Due to the differences in the feature sizes and the channel numbers, F i+1 is first up-sampled twice, and the number of channels in F i+1 is converted to be the same as that in F i . Then, F i+1 is input into the gating unit G. The output of gated fusion module is F i .
The purpose of gating unit G feeds the input features into a 1 × 1 convolution and then obtains the gated features G i through the sigmoid function, as shown in Equation (1). The gated feature graph is used to judge the usefulness of the spatial position features of the input features. The range of the gated feature values is [0, 1]. A value less than 0.5 (approximately 0) corresponds to useless feature information, whereas a value greater than 0.5 (approximately 1) corresponds to useful feature information. The transfer of useful information and useless information is controlled by element-by-element multiplication between the gated features and the input features of the gating unit: where σ is the sigmoid function, the asterisk (' * ') represents the convolution operation, and w i is the weight parameter of the convolution. The entire gated fusion module process can be defined as shown in Equation (2). For a position (x, y), when G i+1 (x, y) is larger and G i (x, y) is smaller, F i+1 transmits useful information to F i that F i lacks at this position. When G i+1 (x, y) is smaller or G i (x, y) is larger, this useless information is suppressed to reduce information redundancy: where denotes element-by-element multiplication.

Combining Edge Detection for Multi-Task Learning
The edge detection network was used to extract the edge features of photovoltaic panels. The semantic segmentation network was trained using multi-task learning so that the network model produced segmentation results with refined edges.

Edge Detection Network
Distributed photovoltaic stations have dense distribution characteristics, and the identified results of adjacent photovoltaic panels are prone to adhesion. In this paper, edge information extracted by the edge detection network was combined with the semantic segmentation network to ameliorate the problem of edge blurring.
In this paper, an encoder-decoder structure was adopted in the edge detection network, as shown in Figure 6. This is the same encoder used in semantic segmentation network for feature extraction and feature sharing. The decoder structure of the edge detection network is also the same as that of the semantic segmentation network. The object edge feature information is gradually obtained through multiple up-sampling operations, and the edge feature extracted by the encoder is fused by skip connections during the up-sampling process.

Loss Function
In the parallel training of 2 networks, a semantic segmentation loss function and an edge detection loss function are used to supervise the learning process for the semantic and edge features of photovoltaic panels, respectively. The semantic segmentation network loss function is calculated from the segmentation predictions and segmentation labels, while the edge detection loss function is calculated from the edge predictions and edge labels. Both the semantic segmentation and edge detection of photovoltaic power stations are binary classification tasks. In addition, compared with the background, the segmentation labels and edge labels account for only a small proportion. To avoid sample imbalance problems, a loss function composed of binary cross entropy (BCE) and the Dice loss function (Dice), namely, BCE + Dice [64,65], is used in both the semantic segmentation network and edge detection network. During training, the 2 loss functions are summed to obtain the total model loss, as shown in the following equation: where Loss_total is the total loss function of our proposed model, Loss_seg is the loss function of the semantic segmentation network and Loss_edge is the loss function of the edge detection network. The BCE loss function is shown in Equation (4). The Dice loss function is given by Equation (5).
where n represents the number of pixels in the image, g i represents the value of the i-th pixel in the label, p i denotes the value of the i-th pixel in the prediction result map, and G and P denote the label and prediction result map, respectively.

Experimental Data
The experimental data in this study consisted of the Duke California Solar Array and Shanghai Distributed Photovoltaic Power Station data sets.
1. Duke California Solar Array data set This data set is currently the largest manually labelled distributed photovoltaic power station data set, containing images and coordinate information of object boundary which can be used to train semantic segmentation and object detection algorithms. The images in the data set are collected by the United States Geological Survey (USGS), which uses remote sensing technology to perform orthographic correction on images, eliminating distortions caused by camera and terrain. The image size is 5000 × 5000 pixels, the spatial resolution is 0.3 m, and each image includes three bands: Red, green, and blue (the RGB code that is used to reproduce a broad array of colors). To ensure comparable results, a total of 526 images from Fresno, Modesto, and Stockton were selected and split following SolarMapper [66]. Fifty percent of the images were randomly selected to form the test set, and the remaining 50% of images were divided into a training set and verification set at a ratio of 8:2.
Given the limited memory available on the graphics card, the original images in the training set were clipped into 256 × 256 image blocks and the data were augmented by horizontal and vertical mirroring and a rotation of 90 degrees. Finally, a total of 85,448 image blocks were collected for training. During the training of the edge detection network, photovoltaic panel edge labels are needed. In this study, the edge labels were obtained based on the semantic segmentation labels. Some sample images, segmentation labels, and edge labels from this data set are shown in Figure 7.

Shanghai Distributed Photovoltaic Power Station Data Set
To verify the effectiveness of the proposed method in this paper for identifying domestically distributed photovoltaic power stations, the Shanghai Distributed Photovoltaic Power Station data set was constructed. The images were collected from the Songjiang and Pudong New districts in Shanghai. The data set contains 1000 aerial images with a size of 2048 × 2048 and a spatial resolution of 0.1 m and the images include three bands: Red, green, and blue. The data set images were randomly divided into a training set, a validation set, and a test set at a ratio of 7:1:2. The training set data were clipped into 256 × 256 image blocks. Then, the data were augmented by horizontal and vertical mirroring and rotations of 90, 180 and 270 degrees. Contrast transformation and brightness transformation was carried out. Finally, a total of 55,560 image blocks were collected for training. Some sample images, segmentation labels, and edge labels for this data set are shown in Figure 8.

Evaluation Metrics
In this study, IoU, precision, recall, and F1-scores were used as evaluation metrics. The IoU is the ratio of the intersection and union of the predicted result area and the labelled area. Precision represents the ratio of pixels correctly predicted as positive among all pixels predicted as positive. Recall represents the ratio of pixels correctly predicted as positive among all positive pixels. The F1 is a metric that combines precision and recall. The four evaluation metrics are calculated as shown in the following equations: where TP (true positive) represents the number of pixels that are both predicted and labelled as positive FP (false positive) represents the number of pixels that are predicted as positive but labelled as negative, and FN (false negative) represents the number of pixels that are predicted as negative but labelled as positive.

Experimental environment
The computer used in the experiments was equipped with an Ubuntu 16.04.5 LTS operating system, an Intel (R) Xeon (R) E5-2678 v3 CPU, and two NVIDIA TITAN XP graphics cards, each with 12 GB of memory. PyTorch was used to build all the semantic segmentation models.
2. Training strategy and hyperparameter settings All the models were trained using the Adam optimizer to help ensure a fast convergence speed. The batch size of the input images in each training epoch was 64. The initial learning rate was 1 × 10 −3 and the learning rate decay adopted the cosine annealing learning rate decline strategy. The cycle was 10, and the minimum learning rate was 1 × 10 −5 .

Experimental Results
To verify the effectiveness of the proposed method, EfficientNet-B1-UNet was considered as the baseline network. Then, the gated fusion module and edge detection network were added successively. The experiments used the Duke California Solar Array data set and the Shanghai Distributed Photovoltaic Power Station data set. The experimental results on the Duke California Solar Array data set are shown in Table 1. Effi-UNet represents UNet, which uses EfficientNet-B1 as the encoder; GFM represents the gated fusion module, and EDN represents the edge detection network.
On the Duke California Solar Array data set, by adding the gated fusion module, the IoU of the test set was increased from 72.41% to 73.33%, F1 was increased from 84.00% to 84.61%, and recall was increased from 82.64% to 83.24%. By adding the edge detection network, the IoU of the network model was further improved from 73.33% to 73.60% and F1 was improved from 84.61% to 84.79%.
The experimental results of the Shanghai Distributed Photovoltaic Power Station data set are shown in Table 2. On the Shanghai Distributed Photovoltaic Power Station data set, adding the gating fusion module increased the IoU of the test set from 87.40% to 88.34%, the F1-score from 93.27% to 93.81%, and the recall from 93.47% to 94.08%. After adding the edge detection network, the IoU of the network model was further improved to 88.74% and the F1-score improved to 94.03%.
The added modules improved all four evaluation metrics. This shows that the gated fusion module and edge detection network proposed in this paper can improve the accuracy of distributed photovoltaic panel identification tasks. Figure 9 shows a sample image, and its segmentation results are shown both before and after adding the gated fusion module. The first two rows of images are from the Duke California Solar Array data set and the second two rows of images are from the Shanghai Distributed Photovoltaic Power Station data set. The first column is the sample image, the second column is the labelled image, and the third column shows the segmentation results of Effi-UNet. Compared with the labelled image, the Effi-UNet results failed to detect of some small photovoltaic panels. The fourth column shows the segmentation results of Effi-UNet + GFM, revealing that, with the help of the GFM module, the network's ability to identify small photovoltaic panels was improved, which verifies the effectiveness of the module.

The influence of the gating fusion module on the segmentation results
2. The influence of the edge detection network on the segmentation results By extracting edge information and conducting multi-task learning of the edge detection and segmentation networks, more refined segmentation results can be generated. In Figure 10, the first two rows of sample images come were sourced from the Duke California Solar Array data set, while the second two rows of sample images are were sourced from the Shanghai Distributed Photovoltaic Power Station data set. The first column is the sample image, and the second column is the segmentation label. The third column is the Effi-UNet + GFM segmentation results. Compared with the segmentation label, the segmentation results of adjacent photovoltaic panels were adhered. The fourth column and the fifth column, respectively, represent the semantic segmentation results and edge detection results of Effi-UNet + GFM + EDN, and the sixth column is the label of edge detection. With the help of the edge detection network, fine edge results were obtained, distinguishing adjacent photovoltaic panels insofar as possible and alleviating the adhesion problem.

Comparisons with Other Methods
To further verify the effectiveness of the proposed method, the identification method proposed in this paper was compared with SegNet, LinkNet [67], UNet, and FPN [68] on the adopted two data sets. The results and analysis are as follows.
3.6.1. Results on the Duke California Solar Array Data Set The experimental results of each method on the test set of the Duke California Solar Array data set are shown in Table 3. The results show that the proposed method outperformed the other methods on all the evaluation metrics. The IoU of the proposed method in this paper reached 73.60%, and its F1-score reached 84.79%. Moreover, the IoU of the proposed method was 6.6% better than the IoU of SolarMapper [66]. The analysis of the results is as follows: (1) Although LinkNet, UNet, and FPN combine features from different layers, they do not consider the differences between the high-level and low-level features, nor do they make full use of object edge information. (2) In this paper, based on the encoder and decoder structure network, the multi-layer features were fused effectively by the gated fusion module, and the useful information was transferred by the gated mechanism improving the ability to identify small photovoltaic panels. (3) Based on the semantic segmentation network, the method in this paper combined an edge detection network for multi-task learning to ameliorate the edge-blurring problem. Figure 11 shows some of the experimental results of each method on the Duke California Solar Array data set. The segmentation results in the first and second rows show that the method proposed in this paper was better at identifying small photovoltaic panels compared with the other methods. In the segmentation results shown in the third and fourth rows, although each method identified the photovoltaic panel in the image, the method in this paper obtained more refined edges.  Table 4 shows the evaluation results of each model on the Shanghai Distributed Photovoltaic Power Station data set, revealing that the method proposed in this paper outperformed all the other methods on all the evaluation metrics. The IoU of the method in this paper reached 88.74%, and its F1-score reached 94.03% Due to the encoder-decoder structure, the method proposed in this paper effectively fused features from multiple layers, improved the ability to identify small photovoltaic panels, and refined the segmentation edge results using the edge detection network. Therefore, compared with the other methods, the method in this paper achieved higher accuracy. Figure 12 shows an example of the experimental results of the proposed method and the compared methods on the Shanghai Distributed Photovoltaic Power Station data set. As seen from the results in the first row, the method proposed in this paper was better at identifying small photovoltaic panels, and the identification results were more complete. In the second row, the two separate photovoltaic panels were difficult to identify due to their small sizes. Compared with the other methods, the proposed method not only recognized them but also obtained more refined edges in the identification results. In the third row, multiple photovoltaic panels were close to each other, which was likely to cause adhesion problems in the identification process. Compared with the other methods, with the help of the edge detection network, the identification results of the method proposed in this paper had more refined edges and alleviated the adhesion problem. A comparison of the results in the fourth row shows that the identification results of the proposed method had more refined edges.

Conclusions
This paper presented a novel fully connected convolutional neural network model that can automatically extract distributed photovoltaic power stations from remote sensing imagery. A distributed photovoltaic power station identification method that combines multi-layer features and edge detection was proposed to solve two problems: That small photovoltaic panels are difficult to identify and that adjacent photovoltaic panels can easily adhere. The model structure was composed of a semantic segmentation network and an edge detection network. A gated fusion module was introduced into the semantic segmentation network to conduct effective multi-layer feature fusion, and an edge detection network was used to guide the production of segmentation results with refined edges. Experiments on the Duke California Solar Array data set and the Shanghai Distributed Photovoltaic Power Station data set showed that the problem of missed small photovoltaic panels was improved and that the identification accuracy was enhanced by introducing a gating fusion module. By combining the edge detection network and semantic segmentation network for multi-task learning, the edge information of the photovoltaic panel was used to constrain the segmentation results, resulting in the extraction of photovoltaic panels with finer edges, which further improved the identification accuracy. Compared with SegNet, LinkNet, UNet and FPN, the method proposed in this paper achieved the highest identification accuracy on both data sets, and its F1-scores reached 84.79% and 94.03%, respectively.
However, there are also some limitations in this study: (1) In terms of data source, due to the limitations of the current data set, the trained model is only applicable to RGB optical images and cannot be directly used to images containing more bands. (2) In terms of the spatial resolution of the image, the training and testing of the method in this paper were carried out on the images with the same spatial resolution. Due to the differences of solar panels in images with different resolutions, the accuracy may be uncertain when the trained model is directly used to predict images with different resolutions.
(3) Since the training data only includes distributed photovoltaic power stations, the trained model cannot be used to identify centralized photovoltaic power stations. The future work will be carried out from the following aspects: (1) Explore the application of our method in multi-spectral images and further improve the segmentation performance with more spectral information. (2) Multiple images of different spatial resolutions will be collected to train our method so that our method can identify distributed photovoltaic power stations in images with different resolutions. (3) A centralized photovoltaic power station data set will be constructed, and our method will be extended to the identification of centralized photovoltaic power stations. (4) In addition, the extracted results of distributed photovoltaic power stations will be combined with solar radiation data to assess the power generation potential.