Automatic Detection of Photovoltaic Farms Using Satellite Imagery and Convolutional Neural Networks

: The number of solar photovoltaic (PV) arrays in Greece has increased rapidly during the recent years. As a result, there is an increasing need for high quality updated information regarding the status of PV farms. This information includes the number of PV farms, power capacity and the energy generated. However, access to this data is obsolete, mainly due to the fact that there is a difficulty tracking PV investment status (from licensing to investment completion and energy production). This article presents a novel approach, which uses free access high resolution satellite imagery and a deep learning algorithm (a convolutional neural network—CNN) for the automatic detection of PV farms. Furthermore, in an effort to create an algorithm capable of generalizing better, all the current locations with installed PV farms (data provided from the Greek Energy Regulator Authority) in the Greek Territory (131,957 km 2 ) were used. According to our knowledge this is the first time such an algorithm is used in order to determine the existence of PV farms and the results showed satisfying accuracy. This study presents a novel approach towards the problem of automatic recognition of PV farms. The recognition is based on the usage of satellite imagery and image classification techniques which until recently were used for other purposes (face recognition, flora and fauna species recognition, etc.). According to our research it is the first time that neural networks (in particular a CNN) was used for the automatic detection of PV farms. From the literature review we conducted, the only similar research used a CNN for the determination of small rooftop installed PV arrays, however we did not find any other similar research, which indicates that our approach is pioneering. Furthermore, another novelty of our approach is that the used dataset’s as well as the software (libraries, functions algorithms) used for the implementation of this research are freely available to the researchers, thus making our methodology easily replicable. The results showed that (even though the original dataset was rather small) we can expect correct identification accuracy reaching 60% when using high resolution imagery and lower results in case we use lower resolution. From the confusion matrixes we can determine that for 15 epochs 127 correct identifications were performed, 125 correct identifications were performed for 20 epochs and 125 were also recognized correctly for 25 epochs. identification results


Introduction
During the last three decades mankind is witnessing an evolution in the energy sector as we notice a shift in energy production methods, from the usage of fossil fuels (petroleum, natural gas, coal, etc.) to more environmentally friendly methods. This is caused mainly due to the fact that a significant portion of the worldʹs carbon dioxide production is a result of fossil fuels used for energy production [1][2][3].
However, as electricity consumption plays an important role for modern societies (and its usage cannot be reduced) other forms of energy production must be used in order to satisfy current and future energy demands [3][4][5][6][7].
Renewable energy methods can be considered as a viable solution for energy production and the reduction of CO2 emissions. These methods include the usage of sustainable sources based on wind, water, biomass, solar and geothermal energy for energy production which are in general called renewable energy sources (RES) [8].
The exploitation of solar energy is considered as one of the most common types of RES. Solar panels are used for transforming energy from indecent sunlight, to electricity using solar cells based on the photovoltaic effect, thus they are also called photovoltaic (PV) panels [9]. Nowadays, massive arrays of PV panels (in the form of solar or PV farms) are used for energy production throughout the world. These farms energy production capability ranges from 1 to 2000 MW, in the case of mega projects covering thousands of hectares [10].
In Europe, PV farms account for 13% of the total RES production. Furthermore, solar power is the fastest-growing source: in 2008, it accounted for 1%. This means that the growth in electricity from solar power has been dramatic, rising from 7.4 TWh in 2008 to 125.7 TWh in 2019 [11].
In Greece, data provided by the Regulatory Authority for Energy (RAE) indicate that currently there are 9791 PV potential installations (farms) in a variety of stages (licensed investments, licensed installations, licensed production or under evaluation), currently producing 715.6 MW of electric energy.
The variety of the existing stages of PV farms is making difficult to track the infiltration of PV to the Greek market as in many cases the time period from the initial evaluation of the energy production license to production can be years. Financial difficulties, public reaction against the investment as well as technical difficulties can pause the entire installation process.
In this work we investigate a new method of collecting installed PV information which is potentially cheaper and faster than existing methods. The proposed approach uses an algorithm which can automatically detect the existing PV farms based on high resolution free to use satellite imagery, current RAE data for training and deep learning techniques. The entire methodology can be divided in two separate steps.
The first step involves the association of the data provided by RAE with satellite images. For the implementation of this step, we used an algorithm for automatically annotating the images and matching RAE data with satellite images in order to create two datasets. A high-resolution dataset and a low-resolution dataset.
The second step involve the usage of the output produced in the first step in order to train a deep learning (DL) algorithm to automatically detect the PV farm's locations. The algorithm apart from the determination of the locations can also help scientists to extract other information. As it is basically a data unaware algorithm, it can also provide information such as the effect of land use in the selection of PV farm locations, the effect of micrometeorology to the installation locations etc.
The proposed approach offers a series of benefits when compared with other data analysis methods. First it allows the scalability of the produced results as well as the automatic improvement of the data collection. Usage of higher resolution images will provide the user with better results. Thus, the user is free to use data which originate from a variety of sources even from Google Earth, with the best results however, to be expected with data from paid services such as LandSat [12,13].
Additionally, the implementation of the approach using a computer algorithm allows the automation of the process. The entire procedure is easy to use and can be executed multiple times in order to monitor the installation rate. The produced information can also help scientists to predict the level of energy produced as well as help the Government to initiate programs related with RES adoption and provide a valuable tool to enhance the decision-making process regarding the determination of potential installation sites [2,14]. Finally, the presented methodology can be easily adapted in order to monitor other types of RES and reproduced in other regions.

Literature Review
Computer applications, sensor networks as well as the Internet of Things are responsible for the creation of enormous amounts of data [15]. For this reason, new and innovative techniques must be applied in order to perform sufficient analysis of the accumulated data. Deep Learning is a part of machine learning (ML) methods based on the usage of artificial neural networks with representation learning (supervised, semisupervised or unsupervised learning) [16].
Essentially DL is a methodology where many classifiers work together, and it is based on linear regression followed by activation functions. DL foundation relay on the same traditional statistical linear regression approach. The only difference is that there are many neural nodes in deep learning instead of only one node (in the case of linear regression). These nodes are known as neural network, and one classifier (a node) is known as perceptron. The network is organized in layers and each layer can have many hundreds or even thousands of nodes. Layers which are situated between the input and output layers constitute the hidden layer and accordingly the nodes which constitute this layer are known as hidden nodes. In contrary with traditional machine learning classifiers where the user must write complex hypothesis, in deep neural network applications the hypothesis is generated by the network itself, making it a powerful tool for learning nonlinear relationships effectively [16].
ML can be divided into two development phases, shallow learning (SL) and deep learning. The most widely spread SL methods include logistic regression, support vector machine (SVM) and Gaussian mixture models [17][18][19][20][21][22][23][24][25][26]. SL main disadvantage is that it cannot handle complex real-world problems such as voice and image recognition [16]. On the contrary DL specializes in solving problems such as image classification, voice recognition etc. For example, image classification of 1000 kinds of images provided a classification error rate of 3.5% which is higher than the accuracy of ordinary people [27].
Various DL algorithms were used for disease determination. Quiroz and Alferez [28] used DL image recognition of legacy blueberries in the rooting stage, planted in smart farms in Chile. For this reason, they used a convolutional neural network (CNN) to detect the presence of trays with living blueberry plants, the presence of trays without living plants and the absence of trays. The model produced results with 86% accuracy, 86% precision, 88% recall and 86% F1 score.
Other researchers used DL for apple pathology image recognition and diagnosis [29]. For this reason, they trained a CNN that obtained a recall rate of 98.4% using error back propagation analysis of sampled elements. In the study of Liu et al. [30], DL was used for the identification of citrus cancer based on the AlexNet model, with an optimized network structure which could reduce the network parameters while maintaining the same level of accuracy. The results from the application showed that the recognition accuracy reached 98%. In the study of Amara et al. [31], DL was used for detecting two well-known banana diseases. For this reason, they used a deep CNN based on the LeNet architecture, with the results accuracy at 85.9%, precision accuracy 86.7%, recall 85.9% and F1 score 86.3%.
DL was also used for other types of image recognition. Huang et al. [32] used DL for determining crack and leakage defects on metro shield tunnels which produced very good results with an identification error of 0.8%. Yang et al. [33] used a DL algorithm (in this case a modified AlexNet model) was used in order to determine wind turbine blade damage on images taken from an unmanned aerial vehicle. The model provided better results (97.1% average accuracy) when compared to the unmodified AlexNet model and support vector machine models. In [34], a DL approach was proposed for the classification of road surface conditions. For this, they used a CNN network and created a new activation function based on the rectified linear unit function. Their results showed a classification accuracy of 94.89% on the road state database. DL were also used to perform breast cancer classification. A new method called BDR-CNN-CGN was used to perform classification of breast cancer types, the results showed improved detection rates (accuracy 96.10%) compared to other neural network models [35]. A CNN was also used in order to perform COVID-19 diagnosis. The proposed CNN employed several new techniques such as rank-based average pooling and multiple-way data augmentation. Among the eight proposed models, the model named FGCNet performed better with performance percentage higher than 97% [36]. Finally, Malog et al. [37], used high resolution satellite imagery and deep forest algorithm in order to detect roof top installed photovoltaic arrays. Their data included imagery from an area of 135 Km 2 and the results showed 99.9% pixel-based detection accuracy and 90% object-based detection accuracy. Table 1 presents an overview of the aforementioned literature.

Materials and Methods
For the creation of the image data sets we used data provided by RAE as well as, data which are available from Apple Maps. Apple Maps is a free map service based on satellite data which are provided from DigitalGlobe. RAE data included a series of polygons (in Shape file form) which included all PV farm investments in Greece ( Figure 1). The data were categorized depending on the status of the investment in:  Investments with installation licenses;  Investments with production licenses;  Investments with operation licenses. Each shape file was at first converted to GeoJSON format. GeoJSON is a geospatial data interchange format compatible with the GNU/General Public License (GPL) guidelines, based on JavaScript Object Notation (JSON). It defines several types of JSON objects and the manner in which they are combined to represent data about geographic features, their properties, and their spatial extents. GeoJSON uses a geographic coordinate reference system, World Geodetic System 1984, and units of decimal degrees [38].
A special PYTHON algorithm was written in order to match the polygons with base map data. The algorithm used a GNU/GPL library called jimutmap in order to read each polygon in GeoJSON form and create an image file. Thus, concluding the first step of the methodology. Jimutmap allows the user to select different zoom levels when annotating the data and create images of different resolutions.
In Figure 2, we can easily observe that the library user, can easily select the zoom level value, using the zoom variable, and thus determine the resolution of the images created (higher zoom level creates images with lower resolution). This is due to the fact that satellite imagery provided by free services has limited resolution. Additionally, the library allows the usage of multiple core threads in order to perform quicker the required annotations. The second step included training a convolutional neural network to automatically detect the PV farm's locations. The CNN was developed using Google Collaboratory or Google Colab (GC) for short. GC is a product from Google Research allowing users to write and execute arbitrary PYTHON code using their browser, and is especially well suited to machine learning, data analysis. Additionally, it provides access to advanced cloud resources including the ability for the user to use graphics processor units (GPU's) and tensor processing units (TPU's). Unlike normal central processor units-CPU's (which are installed on all personal computers using the x64 architecture), GPU's are specialized electronic circuits designed to accelerate the creation and manipulation of images. Their highly parallel structure makes them more efficient than general-purpose (CPUs) for algorithms that process large blocks of data in parallel [39]. TPU's are artificial intelligence accelerator application-specific integrated circuits (ASICs) developed by Google specifically for neural network machine learning using TensorFlow a free and open-source software library for machine learning [40].

Convolutional Neural Networks
Convolutional neural networks (CNN) are inspired by the cat's cortex and were first proposed in the 1980s [41]. A CNN has similar structure with other multilayer neural networks, and it is comprised of layers. Each layer is composed of a number of twodimensional planes and each plane has independent neurons. Sparse connections are used between layers, meaning that the neuron in each feature map only connects to the neurons in a small area in the upper map, in contrast with the traditional neural networks. The CNN structure depends mainly in the shared weight, the local experience field and the sub-collector to ensure the invariance of input data [42].
The following figure (Figure 3) presents the layout of a CNN. In this case the network is comprised from an input layer, four hidden layers and an output layer. This network was created for performing image processing. In more detail image recognition of characters written by hand. In this case the input layer is made up using 28 × 28 sensory nodes. This layer receives the images which have been approximately centered and normalized in terms of size. Afterwards the computational layouts alternate between convolution and subsampling as follows:  The first hidden layer is responsible for the convolution. This layer consists of four feature maps, with each feature map consisting of 24 × 24 neurons. Each neuron is assigned a receptive field of 5 × 5 size.  The second hidden layer is responsible for subsampling and local averaging. Like the previous layer, it also consists of four feature maps, but each feature map is now made up of 12 × 12 neurons. Each neuron has a receptive field of size 2 × 2, a trainable coefficient, a trainable bias, and a sigmoid activation function. The trainable coefficient and bias control the operating point of the neuron.  The third hidden layer is responsible for the second convolution. It consists of 12 feature maps, with each feature map consisting of 8 × 8 neurons. Each neuron in this hidden layer may have synaptic connections from several feature maps in the previous hidden layer. Otherwise in operates in a manner similar to the first convolutional layer.  The fourth hidden layer is responsible for performing a second subsampling and local averaging. It consists of 12 feature maps, but with each feature map in this case consisting of 4 × 4 neurons. Otherwise, it operates in a manner similar to the first sampling layer.  Finally, the output layer is responsible for the final stage of convolution. This layer consists of 26 neurons, with each neuron assigned to one of 26 possible characters. As before each neuron is assigned a receptive field of size 4 × 4 [42]. The result of the previously described processes is the application of a bipyramidal effect. This means that with each convolutional or subsampling layer, the number of features maps is increased while the spatial resolution is reduced, compared to the corresponding previous layer.
CNN's first usage was for the identification of handwritten checks in banks, but they were incapable of recognizing large images. For this reason, [43] developed LeNet-5 which was a classical model of convolutional neural network with low error rates (only 0.9% on the MNIST data-set).
The main bottleneck on the application of CNN is the long training time due to many hidden nodes on the networks. However, weight sharing which is a characteristic of the CNN allows parallel processing of weights if the proper infrastructure exists. Today as modern graphics processor units (GPU's) support parallel computing the application of CNN's is easier. In [44], a GPU algorithm was used in order to solve the ImageNet problem.
The CNN implemented for automatically detecting PV farms was based on Keras 2.3.0, a deep learning application programing interface written in PYTHON 3.7, running on top of the machine learning platform TensorFlow 2.4.1 supported by Google Colab. Keras was developed with a focus on enabling fast experimentation.

Building the Model
Keras supports various image classification models (Xception, ResNet, MobileNet, VGG, etc.). In this study we used the InceptionV3 model mainly because it performs significantly better than the other Keras Supported models [45]. The images that will be used were randomly divided in two categories, Training Images used for training and validating the model and evaluation images used for determining the network performance against new, unseen, images.
Before presenting the images to the network we perform a series of augmentations which will ensure that our model would never use twice the exact same picture thus, the model will try to overfit on the training data. For this reason, we used the image data preprocessing function of Keras. This function has a series of arguments for manipulating the training image datasets. The following arguments were used for the manipulation:  Rotation range, rotates the images randomly;  Height shift range, shifts the image along the X axis;  Width shift range, shifts the image along the Y axis;  Horizontal flip, flips the image across the X axis;  Vertical flip, flips the image across the Y axis;  Validation split, determines the fraction of images reserved from the training dataset for model validation;  Zoom range, determines the zoom factor;  Brightness range, modifies the image brightness level;  Rescale, determines if the image is rescaled to specific dimensions;  Shear range, determines the image distortion across an axis in order to create or rectify perception angle;  Fill mode, determines the image location inside the canvas.
Continuing, we must determine the training epochs as well as the image batch size. Epochs refers to the number of times the network is trained through the entire dataset, whereas batch size determines the number of samples processed each time (before the model is updated).
In InceptionV3 we have the capability to use predefined training weights using the imagenet or initialize them randomly. Imagenet is an image database which is organized according to the WordNet hierarchy in which each node of the hierarchy is depicted by thousands of images [46]. The usage of this database is proven to significantly increase a CNN's performance [47]. Figure 4 displays the entire workflow of the model applied.

Results
The PYTHON algorithm used for extracting the images of PV farms created 570 images files. Of them, 220 where high-resolution images (approximately 1 MB each) and 350 where low-resolution (approximately 16 KB each). These images where divided randomly in Training and Evaluation datasets as show on Table 2. Following that, the datasets where augmented using the image data processing function. The parameters used in this function are presented in Table 3. Next, the images were imported to Keras and the InceptionV3 algorithm was applied, for 15, 20 and 25 epochs with a batch size of 15 using the ImageNet pre-trained weights. Batch size number was selected mainly because the number of the images used for training and validation is rather small. Generally, we use larger batch sizes when we have large datasets. The selection on the number of training epochs is based on the produced results (there is no guideline regarding the train period of a neural network). This means that if we notice overfitting in the results (meaning that the network cannot generalize properly), then we reduce training epochs. Table 4 includes the results taken from the three training sessions applied. The results show the percentage of correct prediction using the training dataset and the validation dataset. From the table it is evident that the applied model does not provide better results when trained for more than 20 epochs, as it can also be seen in the graphical representation of the results in Figure 5. From Figure 5 it is also obvious, that the model performs erratically during the last validation session with large fluctuations during the validation of the model. This means that the model must have overfitted during training for 25 epochs.  Additionally, from the same figure it is also obvious that the model performs better when trained for 15 epochs (although the training performance in this session is slightly smaller compared to the performance during the next training session). As it can be seen in model accuracy section of the diagrams, the model validation line follows more closely the training line. Generally, models with a smaller curve fluctuation during accuracy elevation have better training convergence. Furthermore, model training is better when the two curves (train and validation) are closer. After training completion, the model is also tested against new data which were not used during train and validation sessions. The produced evaluation results are shown on Table 5 and Figure 6 which also prove that the model trained for 15 epochs provides the best overall predictions. On Table 5, Pv 1 refers to the high-resolution images' dataset, whereas Pv 2 refers to the low-resolution images' dataset. Precision is the ability of the classifier not to label as positive a sample that is negative. Or in other terms, precision is the number of correct results divided by the number of all returned results.
Recall is the ability of the classifier to find all the positive samples. Or in other terms, recall is the fraction of relevant documents that are successfully retrieved.
F1 score is a measure of the test's accuracy. It is the harmonic mean of the precision and recall: 2 The worst value for this measure is 0 whereas the best is obtained when it equals to 1.
Accuracy is the weighted arithmetic mean of Precision and Inverse Precision (weighted by Bias) as well as the weighted arithmetic mean of Recall and Inverse Recall (weighted by Prevalence). Inverse Precision and Inverse Recall are simply the Precision and Recall of the inverse problem where positive and negative labels are exchanged.
Higher accuracy values demonstrate better model performance.
Macro Average, computes the F1 for each label and returns the average without considering the proportion for each label (in our case high-and low-resolution PV images) in the dataset. Weighted Average computes F1 for each label (in our case high-and lowresolution PV images) and returns the average considering the proportion of each label to the dataset. Finally, support is the number of occurrences of the given class (or label) in the dataset.
The results on Table 4 indicate that the trained model produce's better results for 15 epochs in both datasets (high and low resolution).

Discussion
For most researchers, terms such as deep learning and machine learning seem interchangeable concerning the world of artificial intelligence. However, this approach is mistaken. Deep learning is a specialized subset of machine learning which, in turn, is a specialized subset of artificial intelligence. Deep learning describes algorithms that analyze data with a structure which is similar to how a human would draw to a conclusion. The only drawback in the application of DL is the requirement of incredibly vast amounts of data and the need for substantial computing power for its usage.
However, the application of deep learning algorithms nowadays is a necessity. The evolvement of Internet of Things has created multiple devices capable of collecting a variety of unstructured data, ranging from simple arithmetic values to images from satellites. Therefore, the need arises to evaluate this data and extract useful patterns. DL algorithms have no requirement for human intervention as the nested layers in the neural networks put data through hierarchies of different concepts, and eventually learn through their own errors. Therefore, the usage of DL algorithms can greatly help toward the process of collected data, mainly because these algorithms ignore the data types which are processing. Thus, they can (if trained properly) used for solving many problems, including image detection and classification.
This study presents a novel approach towards the problem of automatic recognition of PV farms. The recognition is based on the usage of satellite imagery and image classification techniques which until recently were used for other purposes (face recognition, flora and fauna species recognition, etc.). According to our research it is the first time that neural networks (in particular a CNN) was used for the automatic detection of PV farms. From the literature review we conducted, the only similar research used a CNN for the determination of small rooftop installed PV arrays, however we did not find any other similar research, which indicates that our approach is pioneering.
Furthermore, another novelty of our approach is that the used dataset's as well as the software (libraries, functions algorithms) used for the implementation of this research are freely available to the researchers, thus making our methodology easily replicable.
The results showed that (even though the original dataset was rather small) we can expect correct identification accuracy reaching 60% when using high resolution imagery and lower results in case we use lower resolution. From the confusion matrixes we can determine that for 15 epochs 127 correct identifications were performed, 125 correct identifications were performed for 20 epochs and 125 were also recognized correctly for 25 epochs.
However, the identification results can be further improved if we use larger datasets. Additionally, the results showed that, increase in the number of training epochs does not provide significant improvements. Finally, the application of the algorithm also proven that high resolution images perform significantly better even in smaller datasets compared to low resolution imagery. This result was not expected because we believed that increasing the number of low-res input data could compensate for the lower resolutions, mainly due to the fact that input data are characterized by a specific geometry.
The approach presented in this work can also be applied in the recognition of other types of RES, if trained properly. It can also be used in other cases where automatic image recognition is necessary. The results could be improved by using images provided from paid services (and therefore high resolution) and by using larger datasets. Further improvements can be achieved if the user performs some kind of image pre-processing on the dataset (edge detection, color corrections, etc.), or deeper networks (more hidden layers).

Conclusions
Image recognition can provide a valuable tool for monitoring the adaption rate of renewable energy sources. Modern deep learning methods are unaware of the processing data and therefore can be easily used in order to recognize the various forms of RES (wind turbines, PV panels, hydroelectric stations, etc.). However, there is a need for large datasets in order to train properly the algorithms. The existence of various satellite imagery services allows the user to collect these data in a variety of resolutions and create datasets which contain images of RES forms in a variety of installation environments, various angles, different weather and time. Therefore, it is possible to create a tool which will be capable of identifying them with increased accuracy. This paper examined a first approach towards this goal. The dataset is based on the usage of PV farms in Greece and the results proved to be adequate given the size of the training dataset. As the years pass and more installations complete the algorithm can be trained again in order to increase its efficiency. Furthermore, advancements in computer technology and DL algorithms can also help towards this goal.
Finally, the combination of these algorithms with other types of software capable of calculating the annual solar energy output can help local and regional authorities to plan their energy policy. The methodology can also be used from the national authorities in an attempt to continuously monitor current RES status, determine the investment/adoption rate of RES in the various regions and regional units, and act as an overall tool for the application of national policy.