1. Introduction
In recent years, to achieve decarbonization goals, there has been a significant increase in photovoltaic (PV) installations, representing a key renewable energy source for the energy transition [
1,
2]. For optimized energy planning, territorial characterization is crucial, including knowing the locations of currently installed PV ground-mounted systems. Despite the growing interest in this technology, both in Italy and globally, there is a lack of available data regarding the distribution of PV installations across the territory. When such data are accessible, they are frequently outdated, rendering them less effective in a rapidly changing environment [
3]. While operators are aware of the locations and capacities of these installations, there is often an absence of publicly accessible data that would facilitate analysis of the spatial distribution of existing installations and their development over time.
The most recent attempts to map renewable energy systems (RES) on a national or continental scale are based on data derived from crowdsourcing initiatives (e.g., OpenStreetMap), existing harmonized databases, or measurement campaigns [
4,
5,
6]. While these initiatives are extremely useful, especially for humanitarian purposes, they rely on communities of volunteers who contribute to the maintenance and updating of the mapped information [
7]. As a result, the coverage, accuracy, and currency of the mapped data are not uniform, depending heavily on the efforts of each community, and need to be verified and validated by expert users [
8].
Remote sensing (RS) is a non-invasive surveying technique for studying and monitoring of the Earth’s surface through long-distance observation and has been effectively used in the past to address various needs related to the development of PV systems [
9]. In the literature, several examples can be found where RS data were applied to estimate the PV potential of a territory [
10,
11,
12], to detect and monitor failures in PV systems [
13,
14], as well as to map ground-mounted or rooftop PV installations [
15,
16,
17]. Thanks to the spatial coverage of the data, the update frequency, the availability of multiple types of free acquisitions, and the wide range of data processing techniques and algorithms, satellite or aerial RS can be successfully applied for the identification of PV installations for energy planning purposes. This constitutes an alternative to the aforementioned methods based on data collection from existing databases, field surveys, or crowdsourcing campaigns [
18].
From the earliest studies on the subject [
19,
20], two main methods for PV systems mapping were identified: (1) physically based approaches, using hyperspectral images; and (2) the application of machine learning (ML) algorithms to multispectral images. Physically based approaches exploit the reflectance characteristics of PV panels across different bands of the electromagnetic spectrum to extract installations from the surrounding background. The layered composition of PV modules generates a spectral signature characterized by low reflectance values in the visible wavelengths (i.e., between 400 nm and 700 nm), a rapid increase in reflectance between 900 nm and 1150 nm, and two strong absorption dips around 1730 nm and 2200 nm [
20]. However, these details can only be detected through hyperspectral remote sensing, including the DESIS sensor (DLR Earth Sensing Imaging Spectrometer), mounted on the International Space Station (ISS) [
21], and the AVIRIS-NG (Airborne Visible InfraRed Imaging Spectrometer–Next Generation) sensor and satellite images from the PRISMA mission (PRecursore IperSpettrale della Missione Applicativa) [
18].
Although these studies [
18,
20,
21] demonstrate the effectiveness of the physically based approaches for PV installations recognition, this method requires prior knowledge of the spectral characteristics of the materials composing solar modules and the surrounding environment. The variety of existing PV system types and the interest in research on increasingly efficient new materials require constant updates of the typical spectral signatures of solar panels to ensure their correct identification. Additionally, with this method, mapping errors can arise due to confusion between PV systems and surfaces with similar spectral properties, such as agricultural films, polyethylene covers, and synthetic grass used in sports fields [
18,
22]. The spread of this approach is also limited by the availability and accessibility of data from hyperspectral missions.
The second method for identifying PV systems involves the use of ML algorithms combined with multispectral RS imagery, often preferred for the huge number of available sensors and ease of access to data. Moreover, recent developments in image analysis techniques, including deep learning (DL) algorithms, allow for the rapid and efficient extraction of information from images [
23]. Regarding PV systems identification, available studies in the literature mainly differ for spectral and spatial resolution of input images and data processing techniques, which mostly consist of classification with ML algorithms or object extraction using DL models. In terms of the spectral and spatial resolution of the images, studies can be distinguished between those using medium-spatial-resolution (between 10 m and 30 m) multiband images [
24,
25,
26,
27,
28,
29,
30,
31,
32,
33] and those using ultra-high-spatial-resolution (<1 m) natural color (RGB) images [
15,
23,
34,
35,
36].
Multiband images include acquisitions in different bands of the electromagnetic spectrum, mainly in the VIS and NIR wavelengths, so as to exploit the spectral characteristics of the objects to identify the PV modules in a study area [
33]. Medium-resolution imagery from the Landsat and Sentinel-2 constellations allows for detecting ground-mounted PV systems over a large study area, generally at the national level. Moreover, thanks to availability of medium/long time series, those images can be used to monitor installations over time and verify how the territory has changed with the increasing penetration of PV installations. In [
30], Landsat time series were employed to study the development of ground-mounted PV systems in northwestern China from 2007 to 2019, highlighting that the conversion to PV systems has mainly affected desert or sandy lands, as well as areas covered by herbaceous vegetation.
High-resolution RGB images, on the other hand, are primarily used for the recognition of rooftop PV modules, since the small size of rooftop panels requires detailed acquisitions with metric or centimetric resolution; however, they are also effective for mapping ground-mounted systems [
23,
35,
37]. Studies based on RGB image analysis do not exploit the spectral properties of PV installations but rather their geometric characteristics to identify the shape of the panels and extract them from the surrounding background. In [
34], locations of rooftop PV modules were provided using RGB images with a spatial resolution of 12 cm, as well as an estimate of the generated capacity. However, the high spatial resolution limited the use on a large scale, as many studies reported analyses over a small area of interest, generally corresponding to a city [
15,
23,
34].
Regarding image processing techniques, the literature review shows a prevalence of classification algorithms with ML, especially Random Forest (RF) [
38], applied to multiband or RGB satellite images to identify PV modules with accuracies up to 98% [
24,
29,
30,
31,
39]. Among DL algorithms, semantic segmentation with convolutional neural networks (CNNs) predominates [
16,
32,
40]. Recent studies have proposed the application of refined neural networks on medium-resolution Sentinel-2 images for national-scale mapping of ground-mounted PV systems [
27,
32,
37]. In [
32], the map with the best accuracy (92%) was obtained from multiband images (i.e., RGB + NIR) from Sentinel-2, which were used as input for a semantic segmentation model with a U-Net architecture [
41]. The same model was employed to generate a global-scale dataset of PV systems with an accuracy close to 90%, including information on modules’ installation dates [
37]. Similarly, in [
27], multispectral acquisitions from the Sentinel-2 constellation were used to train a semantic segmentation model to identify locations and installation dates of ground-mounted PV systems distributed across as vast and heterogeneous a territory as India.
In the Italian context, a national mapping of ground-mounted PV installations with a power capacity greater than 100 kW is dated to 2019 [
42]. Yet, considering the highly dynamic nature of the current photovoltaic landscape, this map is already outdated and requires further updates. For some regions, a mapping of areas covered by PV systems can be derived from the regional land use map, where the classification is detailed down to the fourth level [
43]. However, this level of detail is not available for the entire national territory; the update frequency of the maps can vary from region to region, and, most importantly, power data associated with the systems are not available.
The aim of this study is to provide a methodology for the automatic recognition of ground-mounted PV systems in Italy, at a national scale. In this work, we applied semantic segmentation to Sentinel-2 satellite images, with the goal of ensuring frequent mapping updates. The detection algorithm ought to be improved through the integration of power estimation models, which would enable the mapping of both PV location and capacity data, thereby facilitating a precise geographical representation for energy planning objectives. The examination of the applicability of the model for energy planning constitutes the added value and innovation of our research with respect to the existing literature, which has primarily concentrated on the study and comparison of methods rather than their application. Furthermore, to the best of our knowledge, this study is the first to address the extensive mapping of ground-mounted PV systems in Italy. Different challenges in PV plant detection must be addressed. Primarily, the proposed methodology needs to be sufficiently robust to accommodate the diverse geographical contexts which characterize Italian landscapes, as well as the irregular distribution of plants throughout the territory and the extensive array of technologies and system types. Achieving a high level of mapping accuracy is essential, as the locations of PV systems must include their spatial extent to enable reliable capacity estimation. Finally, a comprehensive automated approach should be evaluated to guarantee the replicability and regular updating of information essential for energy planning in the swiftly changing field of renewable energy facilities. Thus, the primary aim of this research is to develop a model that not only effectively identifies ground-mounted PV installations but is also designed for continuous data updating on a wide scale. Regular monitoring and integration of both new and existing systems is essential for sustaining an accurate and current comprehension of photovoltaic assets. Furthermore, the methodology must be flexible enough to accommodate various study areas, facilitating regular automatic updates that account for changes in the distribution and capacity of installations.
5. Conclusions
In recent years, the expansion of photovoltaic installations has accelerated, highlighting their crucial role as a renewable energy source in the global energy transition. However, comprehensive information on the spatial distribution of PV systems remains limited, posing challenges for effective energy planning.
This study introduces a methodology for automatic recognition of the location and extent of ground-mounted PV systems in Italy using semantic segmentation applied to 10 m resolution RGB images acquired from the Sentinel-2 satellites. The proposed model employs a U-Net architecture, achieving 99% accuracy during training, including the validation dataset. The methodology relies on a multi-temporal approach, deploying the semantic segmentation model on a set of images collected throughout the year. Model outputs are then aggregated into a final map representing the probability of PV plant detection. This method allows for flexible accuracy optimization, depending on the application needs: lower probability thresholds can be chosen to increase producer accuracy (PA), while higher thresholds improve user accuracy (UA). Furthermore, lower probability thresholds ensure continuous area detection, proving more effective in estimating PV power output using an area-to-power relationship. Such thresholds are also advantageous for identifying new installations, despite the trade-off of increased false positives, which can be mitigated through post-processing techniques, such as filters for recognising plastic-covered greenhouses.
Nevertheless, the proposed methodology has certain limitations. Detection of small installations is constrained by image resolution, as well as some specific PV plant layouts which necessitate a broader training dataset, more specialized model architectures, or the integration of additional spectral data. Another challenge lies in the model’s generalizability across diverse landscapes. To address this issue, future work will focus on re-training the model with images from various environmental contexts to facilitate PV plant recognition across different landscapes and PV configurations. Additionally, the complexity of terrain morphology has emerged as a significant obstacle; integrating topographic information, such as elevation maps, slope, and hillshade data, is proposed to enhance detection accuracy and robustness.