Towards Scalable Economic Photovoltaic Potential Analysis Using Aerial Images and Deep Learning

Roof-mounted photovoltaic systems play a critical role in the global transition to renewable energy generation. An analysis of roof photovoltaic potential is an important tool for supporting decision-making and for accelerating new installations. State of the art uses 3D data to conduct potential analyses with high spatial resolution, limiting the study area to places with available 3D data. Recent advances in deep learning allow the required roof information from aerial images to be extracted. Furthermore, most publications consider the technical photovoltaic potential, and only a few publications determine the photovoltaic economic potential. Therefore, this paper extends state of the art by proposing and applying a methodology for scalable economic photovoltaic potential analysis using aerial images and deep learning. Two convolutional neural networks are trained for semantic segmentation of roof segments and superstructures and achieve an Intersection over Union values of 0.84 and 0.64, respectively. We calculated the internal rate of return of each roof segment for 71 buildings in a small study area. A comparison of this paper’s methodology with a 3D-based analysis discusses its benefits and disadvantages. The proposed methodology uses only publicly available data and is potentially scalable to the global level. However, this poses a variety of research challenges and opportunities, which are summarized with a focus on the application of deep learning, economic photovoltaic potential analysis, and energy system analysis.


Introduction
The world is increasing efforts to move towards renewable energy production. Solar power is one important pillar in this endeavor for the massive reduction in carbon dioxide emissions. Large-scale, free-standing photovoltaic (PV) plants achieve a low levelized cost of energy (LCOE) and can outperform fossil power plants in this matter [1][2][3]. Though more expensive, rooftop PV systems play a major role in a sustainable energy system because they do not seal additional land area. In the past, the introduction of PV was enabled and accelerated by subsidies. To support policymakers, researchers have conducted rooftop PV potential analyses ranging from city-scale to country-scale. This paper extends existing approaches by presenting a method for a scalable, individual building economic PV potential estimation using publicly available data sources and deep learning approaches. We first present the results and discuss experiences with the implemented methodology to outline challenges and research opportunities in this interdisciplinary field. The physical potential is the solar radiation at a geographic location and consists of direct, diffuse, and reflected radiation of the real-sky global irradiance. PV-GIS [9,10] or the Copernicus Atmosphere Monitoring Service data [11] provide estimations of Europe's physical potential via Application Programming Interfaces (APIs). Some researchers focus on geographic PV potential, which is the available solar radiation on roofs considering the roof planes' orientation and shadowing effects of surrounding structures [12,13]. Publications determine the technical potential, which takes the PV system efficiency into account and represents the actual energy generation [5,8,[14][15][16][17][18][19][20][21].
The fourth type of potential is the economic potential. Subsidies and policies have major leverage on the economic potential. Furthermore, a positive business case of a PV system is one of the most important drivers for installing PV systems. Therefore, for policymakers, it is important to know the technical potential, but it is even more critical to understand the economic potential. Hence, analyses should be extended from technical potential to economic potential. Recent studies include the economic potential to incorporate only the share of the PV potential, which is economically viable [4,7,[22][23][24][25][26][27]. The economic potential can be assessed by the LCOE, a common benchmarking measure for energy generation, which represents the cost per generated kWh over the lifetime of PV systems [28]. A comparison of the generation costs (e.g., LCOE) with an income threshold such as the feed-in-tariffs can determine the economic viability of a PV installation [4,7,22,23]. In addition, publications consider revenue and calculate the return on investment [25,26], the net present value [24], and the payback period [24][25][26][27]. This approach provides a more detailed profitability estimation. It also enables comparing economic potential internationally because it can include different feed-in-tariffs or additional revenue considerations, such as self-consumption models. In Germany, with decreasing feed-intariffs, PV systems are increasingly designed for self-consumption purposes and are coupled with stationary battery storages [29]. While it is common for studies on site-level to consider a mix of self-consumption and grid feed-in [29][30][31], few studies, such as the ones from Lee et al. [25,26], consider it on system level.

Existing PV Potential Analysis with Respect to Method
Besides categorizing publications with respect to potential type, they can also be grouped based on their method for estimating the PV potential. Reviews on methods for PV potential estimation are published by Melius et al. [32] and Freitas et al. [33]. Additionally, a review by Assouline et al. [34] distinguished six types of methods for largescale PV potential estimation: Physical/empirical, geostatistical, constant values, sampling, Geographic Information System (GIS)/Light Detection and Ranging (LiDAR), and machine learning. The chosen method is strongly dependent on the available data. Therefore, we slightly adapted the categorization from Assouline et al. [34] to integrate the input data perspective. We differentiated four groups with increasing detail of information: The physical potential is the solar radiation at a geographic location and consists of direct, diffuse, and reflected radiation of the real-sky global irradiance. PV-GIS [9,10] or the Copernicus Atmosphere Monitoring Service data [11] provide estimations of Europe's physical potential via Application Programming Interfaces (APIs). Some researchers focus on geographic PV potential, which is the available solar radiation on roofs considering the roof planes' orientation and shadowing effects of surrounding structures [12,13]. Publications determine the technical potential, which takes the PV system efficiency into account and represents the actual energy generation [5,8,[14][15][16][17][18][19][20][21].
The fourth type of potential is the economic potential. Subsidies and policies have major leverage on the economic potential. Furthermore, a positive business case of a PV system is one of the most important drivers for installing PV systems. Therefore, for policymakers, it is important to know the technical potential, but it is even more critical to understand the economic potential. Hence, analyses should be extended from technical potential to economic potential. Recent studies include the economic potential to incorporate only the share of the PV potential, which is economically viable [4,7,[22][23][24][25][26][27]. The economic potential can be assessed by the LCOE, a common benchmarking measure for energy generation, which represents the cost per generated kWh over the lifetime of PV systems [28]. A comparison of the generation costs (e.g., LCOE) with an income threshold such as the feed-in-tariffs can determine the economic viability of a PV installation [4,7,22,23]. In addition, publications consider revenue and calculate the return on investment [25,26], the net present value [24], and the payback period [24][25][26][27]. This approach provides a more detailed profitability estimation. It also enables comparing economic potential internationally because it can include different feed-in-tariffs or additional revenue considerations, such as self-consumption models. In Germany, with decreasing feed-in-tariffs, PV systems are increasingly designed for self-consumption purposes and are coupled with stationary battery storages [29]. While it is common for studies on site-level to consider a mix of selfconsumption and grid feed-in [29][30][31], few studies, such as the ones from Lee et al. [25,26], consider it on system level.

Existing PV Potential Analysis with Respect to Method
Besides categorizing publications with respect to potential type, they can also be grouped based on their method for estimating the PV potential. Reviews on methods for PV potential estimation are published by Melius et al. [32] and Freitas et al. [33]. Additionally, a review by Assouline et al. [34] distinguished six types of methods for large-scale PV potential estimation: Physical/empirical, geostatistical, constant values, sampling, Geographic Information System (GIS)/Light Detection and Ranging (LiDAR), and machine learning. The chosen method is strongly dependent on the available data. Therefore, we slightly adapted the categorization from Assouline et al. [34] to integrate the input data perspective. We differentiated four groups with increasing detail of information: statistical, geospatial, aerial image, and 3D. The approaches can also use multiple input Energies 2021, 14, 3800 3 of 22 data sources. In this case, we assigned the literature to the group with the highest level of detail. The category statistical uses statistical data to estimate the available roof area by assuming roof area per capita. Furthermore, the effects of roof orientation, shadow, or roof superstructures are considered using constant factors. Statistical approaches often examine large study areas such as the European Union [21,35] or Brazil [23] and take a top-down perspective. The geospatial approach determines available roof areas based on geospatial vector data such as building cadasters or maps. Hence PV potential is estimated bottom-up. Those studies applied constant factors to account for decreasing PV potential due to shadow and superstructures. The study areas are typically larger, such as Spain [8], the Canary Islands [22], or the Fujian Province [24].
With the availability of 3D models, PV potential can be assessed with higher accuracy and a higher resolution. Three-dimensional models usually contain buildings on the city level and can be created using stereo photos or LiDAR. The approaches using existing semantic 3D city models based on the CityGML standard [36] were also explored [37,38]. Studies on the city level were conducted for Feldkirch [13], Lisbon [14], Uppsala [20], Cambridge [39], and the Chao Yang District in Beijing [40], among others [12,15,38]. Hong et al. presented a 3D method for potential technical estimation of the Gangnam district in Seoul, which included a hill shade analysis based on geospatial vector shapes of building and elevation information [17]. Lee et al. extended the results with a potential economic analysis [25]. Margolis et al. [16] analyzed 128 cities in the USA using LiDAR data. Gagnon et al. [41] expanded this study to the whole USA by statistically extrapolating the results of the 3D approach. Other mixed approaches were presented by Assouline et al. in several publications [18,42]. The authors used 3D data and machine learning methods to extrapolate the technical potential estimated with high detail data from Geneva to the whole of Switzerland. A study for Switzerland by Walch et al. [5] used a similar approach and determined its uncertainty.
The LiDAR-based 3D approaches can be considered the best methods with regard to the level of detail of the input data [34]. Furthermore, they can be coupled with machine learning methods such as random forests to extrapolate findings from a smaller to a larger study area. This combination shows high accuracy and relatively low computational time [5,42]. The examples of existing LiDAR-based solar cadasters were made available in recent years by Mapdwell [43], Google's Project Sunroof [44], or tetraeder.solar [45]. However, even though LiDAR data are becoming increasingly available, there is still no exhaustive coverage. Therefore, some researchers propose alternative methods for extracting roof information based on aerial images.
The task of roof segment segmentation was investigated outside of the context of PV potential analysis. For example, Hazelhoff [46] used a line detection approach [47] to detect the roof ridge and gutters. Fan et al. used image processing methods to determine roof planes from LiDAR data [48]. Merabet et al. [49] presented a building roof segmentation method that is based on the watershed segmentation technique.
To estimate the technical PV potential of the city of Turin, Bergamasco et al. [19] explored a methodology that extracts roof segments by clustering an image's pixels into bins based on their color tones. Their qualitative validation revealed an accuracy level of around 90% for their dataset. Mainzer et al. [7] assessed the economic potential of the city of Freiburg. They used image processing to detect roof ridges and roof outlines and estimate the azimuth correctly for over 70% of the roofs. Additionally, the authors proposed an approach based on contour detection [50] to identify roof superstructures that decrease the usable roof area. Furthermore, they used convolutional neural networks (CNN) to identify existing PV modules. The reported accuracy reaches over 90% on their dataset, indicating the high potential of CNNs for this task [7].
In recent years, deep CNNs outperformed the state-of-the-art in many image-recognition tasks [51]. Therefore, deep learning has gained significant relevance in the remote sensing field [52]. For example, CNNs were successfully used for the task of building footprint segmentation [53,54]. In the PV context, studies exploited the advances in deep learning  [57] demonstrated the applicability of CNNs for mapping PV modules in Switzerland. Furthermore, CNNs are used for the estimation of available roof areas. A study from Huang et al. [58] estimated the geographic potential of Wuhan, China using semantic segmentation of roof footprints. The authors only detected the whole roof and did not consider individual roof segments and their orientation. To the best of our knowledge, the DeepRoof project presented by Lee et al. [59] is currently the only publication that extracted roof segments, including their azimuth values, from aerial images using CNNs. The authors labeled their own dataset containing 2274 buildings from six US cities. The comparison of their 2D approach for potential technical estimation with the LiDAR-based Google Project Sunroof results [44] demonstrates small mean errors of the estimated available PV installation area. This indicates that the PV potential analysis using deep learning and aerial images has a great potential to be a viable, scalable alternative to 3D LiDAR methods.

Contributions of the Paper
The literature analysis shows that currently, only a few publications consider the economic PV potential. Furthermore, the state-of-the-art bottom-up approaches are based on 3D data, with the drawback of fragmentarily available data. Hence, this paper combines the latest scientific developments and proposes a methodology for bottom-up economic PV potential analysis using aerial images and deep learning. To the best of our knowledge, there is currently only one paper that used aerial images and deep learning to extract roof segments for PV potential analysis [59]. This paper applies the same approach and adds to the state of the art by exploring the application of deep learning for the task of roof superstructure segmentation. The methodology only relies on publicly available data, making it potentially scalable on a global level. The potential economic assessment calculates the internal rate of return (IRR) instead of LCOE, which contains more information due to revenue and cost considerations. This paper's contributions are:

1.
A methodology for scalable, bottom-up, economic PV potential analysis using aerial images and deep learning as well as publicly available data; 2.
The application of CNNs for semantic segmentation of roof segments and roof superstructure. Initial results are discussed to point out the advantages and disadvantages of the methodology; 3.
A comprehensive summary of research challenges and opportunities for this novel approach.

Materials and Methods
This section describes the methodology for the scalable, large-scale economic PV potential analysis based on aerial images and deep learning. Figure 2 visualizes the respective steps grouped by the four potential types. The input data mainly rely on public data from Copernicus [11], Open Street Maps (OSM) [60], technical specifications of PV systems, as well as aerial images accessed from google maps static API [61]. Furthermore, some steps require input parameters that have been derived from the literature or online sources. The core element of deep learning-based extraction of roof information from aerial images is highlighted in green in the Section 2.

Physical Potential
This paper's approach used data of solar radiation on the horizontal plane from the Copernicus Atmosphere Monitoring Service [11]. The radiation data were provided with a continuous spatial resolution by interpolation [62]. We downloaded the data for one representative location in the small study instead of each individual roof. This simplification is viable because of the neglectable variance of the radiation within the small study area.

Geographic Potential
As described in the overview of this paper's contributions, a novel approach for determining the geographic potential based on aerial images is presented.
First, we requested all building footprints of a study area from OSM [60] to create a list of buildings for the potential analysis. By this, we avoided analyzing areas without buildings, and we made sure to cover all mapped roofs. Although OSM offers high completeness and spatial accuracy of 1.5 m [63], incorrect map data can lead to misestimations of the potential on aggregated city level. Alternatively, other map services with higher data quality could be used. Furthermore, a CNN trained for building footprint segmentation could supply a tailored roof object list. The second input data source is the Google Maps static API [64] which provides the aerial images with a resolution of up to 0.15 m/px [61]. High-resolution imagery is a prerequisite for detecting relatively small roof superstructures.

Physical Potential
Radiation

Physical Potential
This paper's approach used data of solar radiation on the horizontal plane from the Copernicus Atmosphere Monitoring Service [11]. The radiation data were provided with a continuous spatial resolution by interpolation [62]. We downloaded the data for one representative location in the small study instead of each individual roof. This simplification is viable because of the neglectable variance of the radiation within the small study area.

Geographic Potential
As described in the overview of this paper's contributions, a novel approach for determining the geographic potential based on aerial images is presented.
First, we requested all building footprints of a study area from OSM [60] to create a list of buildings for the potential analysis. By this, we avoided analyzing areas without buildings, and we made sure to cover all mapped roofs. Although OSM offers high completeness and spatial accuracy of 1.5 m [63], incorrect map data can lead to misestimations of the potential on aggregated city level. Alternatively, other map services with higher data quality could be used. Furthermore, a CNN trained for building footprint segmentation could supply a tailored roof object list. The second input data source is the Google Maps static API [64] which provides the aerial images with a resolution of up to 0.15 m/px [61]. High-resolution imagery is a prerequisite for detecting relatively small roof superstructures.
The input data are used for the semantic segmentation of roof segments and roof superstructures. The CNNs' outputs are further processed to pass roof segment polygons, Energies 2021, 14, 3800 6 of 22 their orientation, and the respective superstructure polygons to the module placement algorithm. On inclined roof segments, the modules are projected onto the horizontal plane as a function of the tilt angle, assuming an orthogonal aerial image. Then, a grid of modules is placed onto the segment aligning its longest side. Finally, modules intersecting with superstructure are deleted. On flat roofs, the placement has more degrees of freedom. Modules can be oriented south, east, west, or aligned with the building. On flat roofs, space for maintenance or fire protection is considered. The usable area is the sum of all successfully placed modules.
A critical factor for the module placement and the calculation of the irradiation is the inclination angle of the roof. Its inference from 2D image data provides a serious challenge due to missing height information. To cope with this problem, the slope angle is determined statistically using a normal distribution with a mean of 37 • and a standard deviation of 15 • as proposed by Mainzer et al. [7]. Hereby, the same angle is assumed for all roof segments of the same building. This assumption holds true, especially for gabled roofs, which are very common in Germany, but also for hip and pyramid roofs. However, more complex roof types may deviate from this assumption. Flat roofs are recognized by the CNN. The statistical slope estimation leads to the erroneous calculation of the available roof area and the irradiation on the roof. Figure 3 shows the relative deviation of the roof area and the yearly energy generation compared to a 37 • , south-facing roof. While the difference of a 5 • to a 37 • tilted area is only 20%, the error becomes large for higher slopes. A 70 • slope would mean a deviation of more than 125% from the 37 • area. The effect of the slope on the irradiance on a tilted surface is visualized by the relative yearly energy generation. A 60 • tilted, south-facing PV module yields around 5% less energy than a 37 • module. This effect becomes larger for east-or west-facing roofs, where this difference is around 10%. The use of statistical slope estimation results in a higher variance of PV system configurations but also leads to greater deviations from the real PV potential, especially if a flat roof is assigned with a high slope value and vice versa. This approach could be improved by adding limiting the slope range with respect to additional roof information such as building type or roof area. For example, this could avoid assigning high slope values to industry or larger office buildings. The input data are used for the semantic segmentation of roof segments and roof superstructures. The CNNs' outputs are further processed to pass roof segment polygons, their orientation, and the respective superstructure polygons to the module placement algorithm. On inclined roof segments, the modules are projected onto the horizontal plane as a function of the tilt angle, assuming an orthogonal aerial image. Then, a grid of modules is placed onto the segment aligning its longest side. Finally, modules intersecting with superstructure are deleted. On flat roofs, the placement has more degrees of freedom. Modules can be oriented south, east, west, or aligned with the building. On flat roofs, space for maintenance or fire protection is considered. The usable area is the sum of all successfully placed modules.
A critical factor for the module placement and the calculation of the irradiation is the inclination angle of the roof. Its inference from 2D image data provides a serious challenge due to missing height information. To cope with this problem, the slope angle is determined statistically using a normal distribution with a mean of 37° and a standard deviation of 15° as proposed by Mainzer et al. [7]. Hereby, the same angle is assumed for all roof segments of the same building. This assumption holds true, especially for gabled roofs, which are very common in Germany, but also for hip and pyramid roofs. However, more complex roof types may deviate from this assumption. Flat roofs are recognized by the CNN. The statistical slope estimation leads to the erroneous calculation of the available roof area and the irradiation on the roof. Figure 3 shows the relative deviation of the roof area and the yearly energy generation compared to a 37°, south-facing roof. While the difference of a 5° to a 37° tilted area is only 20%, the error becomes large for higher slopes. A 70° slope would mean a deviation of more than 125% from the 37° area. The effect of the slope on the irradiance on a tilted surface is visualized by the relative yearly energy generation. A 60° tilted, south-facing PV module yields around 5% less energy than a 37° module. This effect becomes larger for east-or west-facing roofs, where this difference is around 10%. The use of statistical slope estimation results in a higher variance of PV system configurations but also leads to greater deviations from the real PV potential, especially if a flat roof is assigned with a high slope value and vice versa. This approach could be improved by adding limiting the slope range with respect to additional roof information such as building type or roof area. For example, this could avoid assigning high slope values to industry or larger office buildings.  The solar irradiation on a tilted surface is calculated for each roof segment. It is split into a direct, diffuse, and ground-reflected component. The direct and ground-reflected components are calculated using isometric approaches, which rely on the trigonometric -25% 0% 25% 50% 75% 100% 125% 150% 5°15°25°37°50°60°70°R elative deviation roof area in % Roof tilt angle in °F igure 3. Relative deviation of roof area depending on roof tilt angle (a) and relative yearly energy generation (b) in comparison to a 37 • tilted, south-facing roof surface located in Munich, Germany.
The solar irradiation on a tilted surface is calculated for each roof segment. It is split into a direct, diffuse, and ground-reflected component. The direct and ground-reflected components are calculated using isometric approaches, which rely on the trigonometric relationship between the radiation beams and the roof surface. We applied the model by Perez et al. [65] to calculate the diffuse radiation component using pvlib [66]. It has shown high accuracy in comparison to other models [67]. Hence, it was already used in connection with PV potential analysis [39,68,69]. Shadowing effects and reduced sky view factors pose the same challenge as the slope estimation. This is considered with a constant value of 15%, whereas in the literature, values between 15% and 30% were applied [19,[70][71][72][73]. It was applied to the whole PV system and not to each module individually.
In the next sections, the core steps of this paper, the semantic segmentation for roof information extraction, are described in more detail.

Datasets for Semantic Segmentation
CNNs benefit from large training datasets. In the remote sensing context, such datasets exist, for example, for building footprints [74,75] or existing PV modules [76]. Lee et al. published a smaller dataset consisting of 444 images and 2274 buildings along with the DeepRoof paper [59]. We used this data in this paper for the roof segment training. There are 20 different label classes, including 16 for azimuth directions, as well as one for flat roofs and domes, respectively. Each azimuth class, e.g., north, covers a span of 22.5 degrees. An additional label classifies nearby trees. Pixels that do not belong to any class are labeled as background. Currently, to the best of our knowledge, there is no label dataset for roof superstructures. Therefore, we created our own preliminary dataset from aerial images of a small municipality in Bavaria, Germany. A total of 407 images with a size of 512 × 512 pixels were annotated as polygons with six semantic superstructure classes: window, dormer, chimney, ladders, PV modules, and unknown/others.

Performance Evaluation of Semantic Segmentation
This paper uses the Intersection over Union (IoU) or Jaccard Index for evaluating the semantic segmentation result, as it is the most commonly used metric [77]. This IoU measures the fraction of the intersection of the actual and predicted values to the Union for a specific class.

IoU =
Area of overlap Area of Union (1)

Semantic Segmentation for Roof Segments
The first semantic segmentation step is the identification of individual roof segments and their orientation, similarly to Lee et al. [59]. For this purpose, we train a CNN using a U-Net architecture [78] with a ResNet−152 [79] backbone. The DeepRoof dataset is randomly split into 60% training, 20% validation, and 20% test data. To increase the size of the dataset, each image is rotated by 30 degrees 12 times. The training is performed in two stages. At first, the weights of the encoder are frozen, and only the decoder weights are optimized using Adam optimizer [80]. The stage lasts for 55 epochs at 66 iterations and a batch size of 64 images. The learning rate starts at 4 × 10 −4 and is divided in half after 40 epochs. The second training phase incorporates all existing weights lasting for 100 epochs. It starts at a learning rate of 4 × 10 −5 which is divided in half after 60 and 80 epochs, respectively. During the training process, random augmentations are applied to the images to artificially increase the dataset and make the network less prone to overfitting [81]. The augmentations can be split into three different categories, which can be applied at the same time. Initially, the image is mirrored on the horizontal or vertical center axis. Afterward, the image can be randomly cropped to 80% of its size. Lastly, the image is changed on a pixel level by varying its brightness, contrast, gamma, or saturation. On the test set, we achieve an IoU of 0.89 averaged over all classes and 0.84 only focusing on roof classes.

Semantic Segmentation for Roof Superstructures
The second semantic segmentation step uses a deep learning approach for roof superstructure segmentation, which is inspired by the PV mapping approaches [55][56][57]. A CNN based on the U-Net is used because of its superior performance on small datasets [78]. We compared the backbone architectures EfficientNet-B3 [82], Inception-resnet-v2 [83] and VGG-19 [84]. We chose Inception-resnet-v2 because of better performance. Due to the small size of the dataset with only 407 images, a split of 86%, 7%, and 7% was used for training, validation, and testing, respectively. Mirroring and cropping were applied to the superstructure dataset, similarly to the roof segments dataset augmentation. Furthermore, we merged all six superstructure classes into one superstructure class. In general, it would be preferable to detect each superstructure class individually, which leads a more challenging training.
The network is trained with an initial learning rate of 10 −4 . The learning rate is decreased using an exponential decay rate scheduling. We initialized the weights applying He initialization, which is the preferred choice for ReLU activated neural networks [85]. The Adam optimizer is used for the training process and a dropout with a skip probability of 0.5 is conducted after each layer [80]. Figure 4 visualizes the class representation of roof pixels, superstructure pixels, and background pixels. All six superstructure classes combined make up only 1.44% of all pixels. Jadon [86] provided an overview of loss functions for semantic segmentation and recommended the use of Focal Loss for highly imbalanced classes. Therefore, we implemented a combination of Focal Loss with weighted Dice Loss. This led to better performance than the Cross-Entropy Loss function.

Semantic Segmentation for Roof Superstructures
The second semantic segmentation step uses a deep learning approach for roof superstructure segmentation, which is inspired by the PV mapping approaches [55][56][57]. A CNN based on the U-Net is used because of its superior performance on small datasets [78]. We compared the backbone architectures EfficientNet-B3 [82], Inception-resnet-v2 [83] and VGG-19 [84]. We chose Inception-resnet-v2 because of better performance. Due to the small size of the dataset with only 407 images, a split of 86%, 7%, and 7% was used for training, validation, and testing, respectively. Mirroring and cropping were applied to the superstructure dataset, similarly to the roof segments dataset augmentation. Furthermore, we merged all six superstructure classes into one superstructure class. In general, it would be preferable to detect each superstructure class individually, which leads a more challenging training.
The network is trained with an initial learning rate of 10 . The learning rate is decreased using an exponential decay rate scheduling. We initialized the weights applying He initialization, which is the preferred choice for ReLU activated neural networks [85]. The Adam optimizer is used for the training process and a dropout with a skip probability of 0.5 is conducted after each layer [80]. Figure 4 visualizes the class representation of roof pixels, superstructure pixels, and background pixels. All six superstructure classes combined make up only 1.44% of all pixels. Jadon [86] provided an overview of loss functions for semantic segmentation and recommended the use of Focal Loss for highly imbalanced classes. Therefore, we implemented a combination of Focal Loss with weighted Dice Loss. This led to better performance than the Cross-Entropy Loss function.

Technical Potential
The technical potential is the electricity generation considering losses caused by the technical properties of the solar system (Table A1). Following [87], the framework differentiates between the technical efficiency of the single module and the performance ratio on the plant level. The efficiency is a unique characteristic of a certain module type. In contrast, the performance ratio is a constant factor that comprises multiple aspects reducing the power output of the entire solar plant. It considers losses from dirt, turndown, temperature dependencies, partial load operation, conduction losses, and shadowing effects. Quaschning [87] proposed a performance ratio of 0.7. Instead, we used a performance ratio of 0.8 (Table A3) because we considered shadowing losses separately in the geographic potential.

Economic Potential
This paper extends existing economic potential estimations [7,23] from LCOE to net present value. In addition to costs, the approach includes savings and revenues and thus has more explanatory value than LCOE. However, it requires estimating the electricity consumption of buildings.
The first step is extracting building types from OSM building tags. We differentiated between residential, industrial, commercial, and public buildings because of their specific electricity consumption characteristics and electricity prices. According to standard load profiles for German buildings [88], commercial buildings exhibit their peak load at noon;

Technical Potential
The technical potential is the electricity generation considering losses caused by the technical properties of the solar system (Table A1). Following [87], the framework differentiates between the technical efficiency of the single module and the performance ratio on the plant level. The efficiency is a unique characteristic of a certain module type. In contrast, the performance ratio is a constant factor that comprises multiple aspects reducing the power output of the entire solar plant. It considers losses from dirt, turndown, temperature dependencies, partial load operation, conduction losses, and shadowing effects. Quaschning [87] proposed a performance ratio of 0.7. Instead, we used a performance ratio of 0.8 (Table A3) because we considered shadowing losses separately in the geographic potential.

Economic Potential
This paper extends existing economic potential estimations [7,23] from LCOE to net present value. In addition to costs, the approach includes savings and revenues and thus has more explanatory value than LCOE. However, it requires estimating the electricity consumption of buildings.
The first step is extracting building types from OSM building tags. We differentiated between residential, industrial, commercial, and public buildings because of their specific electricity consumption characteristics and electricity prices. According to standard load profiles for German buildings [88], commercial buildings exhibit their peak load at noon; meanwhile, residential consumption peaks in the morning and evening. Additionally, commercial energy prices are lower than residential ones. The load profile of the building is determined by applying the methodology of Alhamwi et al. [89]. The yearly energy consumption is estimated by multiplying the building area with the specific consumption per square meter. Consequently, standard load profiles [88] are used to obtain a consumption time series.
Following Bertsch et al. [90], the economic return of a solar plant is calculated using the IRR. The share of self-consumption and grid feed is calculated dependent on the technical potential and energy consumption. The revenues differ between the two cases. The feed-in tariff stays constant over the entire lifetime of the plant while the electricity price increases based on historical data [21], which makes self-consumption even more attractive. The costs are divided into initial investment costs and yearly maintenance costs. Additionally, the fee on self-consumption for larger PV systems can be considered (Table A1). The economic potential can be described as the share of technical potential for which the IRR exceeds a given threshold. This publication uses the weighted average cost of capital as the threshold.

Case Study and Parameterization
In this paper, we conducted a case study to discuss the applicability of the proposed method. The method analyzes each building individually and takes 1-2 min per building on a laptop with an Intel i7-7820HQ processor. Scaling this building-specific analysis to a whole city or region would require optimization of the program. The computation can be parallelized well to decrease the overall runtime. We selected a small residential area as a case study and chose the town Grafing bei München, because of the availability of potential solar information from a LiDAR-based cadaster [91]. The number of buildings was limited to 71 to allow for manual in-depth examination and comparison. The suburban setting was chosen because of the similarity to the training data. The equations for the technical and economic calculations are given in Tables A1 and A2 of Appendix A. Appendix B also summarizes technical and economic assumptions in Tables A3 and A4.

Results Convolutional Neural Networks
The IoU of the trained networks for semantic segmentation of roof segments and roof superstructures are 0.84 and 0.64, respectively. These values are in the range of similar other publications, the DeepRoof paper [59] for roof segmentation and a paper on PV module detection [57]. However, these values represent preliminary results and cannot be used as a benchmark. In deep learning, the selection or variation of the training, validation, and test data induce significant effects on the training results. Within the scope of this paper, we did not carry out a deeper analysis of those effects. Therefore, the next sections focus on qualitative discussion of example images within the study area.
The roof segmentation is a critical step in the methodology because the rest of the PV potential analysis is based on the roof geometry. Figure 5 presents four examples of the resulting roof segments for the selected case study area. Examples (a) and (b) show a correct segmentation result. The buildings in the center are covered entirely by two segments. Most of the surrounding building area is detected as well. The azimuth classes of the segments are correctly labeled. In Figure 5a, a small part of a flat roof remains undetected. In Figure 5b, the roof segment of the building in the left-center was labeled with two classes, but only the east-north-east (ENE) azimuth would be correct. Figure 5c displays a mediocre result. The center building is more complex than the gable roofs of (a) and (b) and was segmented well. Both coverage and azimuth classes are correct. However, the surrounding buildings were detected only partly or not at all. Possibly because of trees covering parts of the roof. The last Figure 5d shows a building with incorrect results, although gabled roof structures are usually detected well. The network proposes five different classes instead of two, and the roof is not covered well. A possible explanation could be the ladder on the roof, which could be interpreted as a separating line between two buildings. The building in the upper right is not detected at all. The results confirm the potential of using deep learning PV potential analysis demonstrated by the DeepRoof paper [59]. However, the number of misclassified buildings is still too high to estimate the PV potential for a whole area with the high accuracy of each individual building. Additional training for the roof segmentation network or quality control would be required. classes instead of two, and the roof is not covered well. A possible explanation could be the ladder on the roof, which could be interpreted as a separating line between two buildings. The building in the upper right is not detected at all. The results confirm the potential of using deep learning PV potential analysis demonstrated by the DeepRoof paper [59]. However, the number of misclassified buildings is still too high to estimate the PV potential for a whole area with the high accuracy of each individual building. Additional training for the roof segmentation network or quality control would be required.  The second CNN was trained to detect roof superstructures. We trained the network on a single superstructure class, as well as on multiple classes such as windows, PV modules, chimneys, and dormers, and achieved better results for the single class training. classes instead of two, and the roof is not covered well. A possible explanation could be the ladder on the roof, which could be interpreted as a separating line between two buildings. The building in the upper right is not detected at all. The results confirm the potential of using deep learning PV potential analysis demonstrated by the DeepRoof paper [59]. However, the number of misclassified buildings is still too high to estimate the PV potential for a whole area with the high accuracy of each individual building. Additional training for the roof segmentation network or quality control would be required.    Figure 6a shows a correct semantic segmentation for the center building. A superstructure of the top-left building is also detected. The network labeled superstructures in the bottom, which are parked cars. The network is also able to detect existing PV modules as superstructures, as displayed in Figure 6b. However, additional modules on the smaller building attachment are not recognized. Further superstructures on the center building, such as the chimney or windows, remain unlabeled. Figure 6c shows the same building as Figure 5c. There are five superstructures on the center roof, but none of them are de-tected. Similar to the roof segmentation network, the superstructure segmentation network indicates the potential for correctly labeling obstructions on the roof.

Results Economic Potential
Based on the correct outputs of the network, the next steps of the methodology calculate the geographic potential, the module placement, the technical potential, and the economic potential using the building's assumed energy consumption. The result is an IRR calculated for each roof segment. Figure 7 shows an example of the module placement and the IRR. The existing modules are likely solar thermal models, and they were successfully detected as superstructures on the larger roof. The highest IRR of more than 10% is achieved by the small south-east facing roof segment. Due to the assumed constant system cost of 1071 €/kWp, small segments profit from high self-consumption and consecutively high IRR. Furthermore, the larger south-west roof segment shows a higher IRR of 7% than the northeast facing segment with an IRR of less than 5%, as expected. The segment's individual IRR does not represent the optimal PV system solution for an entire building because of the possibility of combining multiple segments. The cost dependency of the system design needs to be considered to transform the economic potential of each segment into an economically optimal PV system. For example, installation on multiple roof segments might be more time-consuming and, therefore, more costly. structure of the top-left building is also detected. The network labeled superstructures in the bottom, which are parked cars. The network is also able to detect existing PV modules as superstructures, as displayed in Figure 6b. However, additional modules on the smaller building attachment are not recognized. Further superstructures on the center building, such as the chimney or windows, remain unlabeled. Figure 6c shows the same building as Figure 5c. There are five superstructures on the center roof, but none of them are detected. Similar to the roof segmentation network, the superstructure segmentation network indicates the potential for correctly labeling obstructions on the roof.

Results Economic Potential
Based on the correct outputs of the network, the next steps of the methodology calculate the geographic potential, the module placement, the technical potential, and the economic potential using the building's assumed energy consumption. The result is an IRR calculated for each roof segment. Figure 7 shows an example of the module placement and the IRR. The existing modules are likely solar thermal models, and they were successfully detected as superstructures on the larger roof. The highest IRR of more than 10% is achieved by the small south-east facing roof segment. Due to the assumed constant system cost of 1071 €/kWp, small segments profit from high self-consumption and consecutively high IRR. Furthermore, the larger south-west roof segment shows a higher IRR of 7% than the north-east facing segment with an IRR of less than 5%, as expected. The segment's individual IRR does not represent the optimal PV system solution for an entire building because of the possibility of combining multiple segments. The cost dependency of the system design needs to be considered to transform the economic potential of each segment into an economically optimal PV system. For example, installation on multiple roof segments might be more time-consuming and, therefore, more costly.

Comparison of Aerial Image and LiDAR Based Approach
As described in the state of the art, solar cadaster based on LiDAR data are available to promote solar systems to citizens. We compared our results to a German solar cadaster from tetraeder.solar [91] to discuss the benefits and challenges of the aerial image-based methodology. Therefore, the same example from Figure 7 was used. Figure 8 shows an extract from the solar potential map (b) and the respective solar radiation (c). Additionally, the roof superstructures and module placement (d) from our tool are displayed. Without additional knowledge, the building is classified as one address because of the information from OSM and Google Maps. However, tetraeder.solar uses official cadaster data from the state, which divides the building into two addresses (b). Furthermore,

Comparison of Aerial Image and LiDAR Based Approach
As described in the state of the art, solar cadaster based on LiDAR data are available to promote solar systems to citizens. We compared our results to a German solar cadaster from tetraeder.solar [91] to discuss the benefits and challenges of the aerial image-based methodology. Therefore, the same example from Figure 7 was used. Figure 8 shows an extract from the solar potential map (b) and the respective solar radiation (c). Additionally, the roof superstructures and module placement (d) from our tool are displayed. Without additional knowledge, the building is classified as one address because of the information from OSM and Google Maps. However, tetraeder.solar uses official cadaster data from the state, which divides the building into two addresses (b). Furthermore, auxiliary buildings such as garages are mapped, too. The LiDAR data allow estimating the roof inclination and detecting protruding roof superstructures. This can be anticipated by the different color at the location of the chimney in Figure 8c. However, flat obstructions such as windows or the existing solar modules are not detected. Therefore, the solar potential is displayed as high, and the solar cadaster proposes placing 16 modules each on the south-west facing roof parts, respectively. Figure 8d shows the resulting module placement from our tool, which proposes 16 modules in total for the whole south-west facing roof. Based on the LiDAR data, the roof slope is estimated at 20 • , while the statistical roof slope assign-ment of our tool allocated a slope of 27.4 • to the roof. Using the assumptions (Table A3), 16 modules have a peak power of 4.8 kWp. They produce 4123 kWh/a, according to the tetraeder.solar's cadaster and 4884 kWh/a according to our tool. auxiliary buildings such as garages are mapped, too. The LiDAR data allow estimating the roof inclination and detecting protruding roof superstructures. This can be anticipated by the different color at the location of the chimney in Figure 8c. However, flat obstructions such as windows or the existing solar modules are not detected. Therefore, the solar potential is displayed as high, and the solar cadaster proposes placing 16 modules each on the south-west facing roof parts, respectively. Figure 8d shows the resulting module placement from our tool, which proposes 16 modules in total for the whole south-west facing roof. Based on the LiDAR data, the roof slope is estimated at 20°, while the statistical roof slope assignment of our tool allocated a slope of 27.4° to the roof. Using the assumptions (Table A3), 16 modules have a peak power of 4.8 kWp. They produce 4123 kWh/a, according to the tetraeder.solar's cadaster and 4884 kWh/a according to our tool. Furthermore, we compared our aggregated results for the whole study area of 71 buildings in Table 1. The LiDAR dataset comprised more buildings in the study area than our analyses because of auxiliary buildings and incomplete mapping in OSM. We only selected the buildings that were both in our dataset and in the LiDAR dataset. The comparison of the results shows that the technical potential calculated by the LiDAR approach is 118% larger than the results from this paper. The average number of placeable modules per roof differs by about 34 modules or 75%, although the difference in the average roof area is only 11%. This deviation can be explained by the used superstructures recognition since roof structures prevent the installation of PV modules on the entire area, as well as the significant influence of the assumed roof inclination. Our assumptions lead to a mean slope of 31°, which is more than 9° or 30% greater than the more accurate estimation by the LiDAR approach. Additionally, roof segmentation results such as the one displayed in Figure 5d constrain the module placement incorrectly and lead to a lower number of placed modules. The comparison shows that the application for a small study area leads to results that are of the same magnitude as a LiDAR approach. The lower technical potential can be partly attributed to superstructure recognition, slope estimation, and incorrect roof segmentation. However, the overall estimation difference of more than 118% requires further analyses. Furthermore, the comparison needs to be expanded to different areas with varying roof architecture to prove scalability. Furthermore, we compared our aggregated results for the whole study area of 71 buildings in Table 1. The LiDAR dataset comprised more buildings in the study area than our analyses because of auxiliary buildings and incomplete mapping in OSM. We only selected the buildings that were both in our dataset and in the LiDAR dataset. The comparison of the results shows that the technical potential calculated by the LiDAR approach is 118% larger than the results from this paper. The average number of placeable modules per roof differs by about 34 modules or 75%, although the difference in the average roof area is only 11%. This deviation can be explained by the used superstructures recognition since roof structures prevent the installation of PV modules on the entire area, as well as the significant influence of the assumed roof inclination. Our assumptions lead to a mean slope of 31 • , which is more than 9 • or 30% greater than the more accurate estimation by the LiDAR approach. Additionally, roof segmentation results such as the one displayed in Figure 5d constrain the module placement incorrectly and lead to a lower number of placed modules. The comparison shows that the application for a small study area leads to results that are of the same magnitude as a LiDAR approach. The lower technical potential can be partly attributed to superstructure recognition, slope estimation, and incorrect roof segmentation. However, the overall estimation difference of more than 118% requires further analyses. Furthermore, the comparison needs to be expanded to different areas with varying roof architecture to prove scalability.

Research Opportunities
Although solar potential analyses have been conducted increasingly in recent years, the use of deep learning and aerial images instead of 3D data has been sparsely researched. The challenge is evaluating building-specific potential with high accuracy. Furthermore, the economic PV potential is highly relevant for purchase decisions and policymaking but remains rarely considered in scientific studies. Therefore, many challenges remain in this and adjacent fields of research. We gathered them into three groups: (1) deep learning for roof information, (2) improving economic PV potential estimation, and (3) using economic PV potential for energy system analysis. Further research opportunities can be found in [92,93] on the related topics of PV mapping using aerial images and modeling PV power generation on a city scale, respectively.

Deep Learning for Extraction of Roof Information
Challenges for improving semantic segmentation of roof segments and roof superstructures using deep learning are increasing accuracy and generalization of the CNNs. Accordingly, future work should focus on extending the dataset, determining and increasing dataset quality, and improving the training approach.
Larger datasets covering multiple entire cities exist for the remote sensing tasks of building footprint segmentation [74,75] and PV module mapping [76]. However, with regard to roof segmentation, we are only aware of the DeepRoof dataset consisting of 2274 buildings in 444 images. To the best of our knowledge, there is no dataset for roof superstructures, so we labeled our own dataset consisting of 407 images. Besides increasing the number of images, labeling activities should also pay attention to the variety of labeled roofs. For example, roof features change with the degree of urbanization as visualized in Figure 9 for an urban, suburban, and rural area in the greater Munich area. Furthermore, there is a regional variation in roof architectures that needs to be incorporated in the dataset to enable applying the network internationally.
Although solar potential analyses have been conducted increasingly in recent years, the use of deep learning and aerial images instead of 3D data has been sparsely researched. The challenge is evaluating building-specific potential with high accuracy. Furthermore, the economic PV potential is highly relevant for purchase decisions and policymaking but remains rarely considered in scientific studies. Therefore, many challenges remain in this and adjacent fields of research. We gathered them into three groups: (1) deep learning for roof information, (2) improving economic PV potential estimation, and (3) using economic PV potential for energy system analysis. Further research opportunities can be found in [92] and [93] on the related topics of PV mapping using aerial images and modeling PV power generation on a city scale, respectively.

Deep Learning for Extraction of Roof Information
Challenges for improving semantic segmentation of roof segments and roof superstructures using deep learning are increasing accuracy and generalization of the CNNs. Accordingly, future work should focus on extending the dataset, determining and increasing dataset quality, and improving the training approach.
Larger datasets covering multiple entire cities exist for the remote sensing tasks of building footprint segmentation [74,75] and PV module mapping [76]. However, with regard to roof segmentation, we are only aware of the DeepRoof dataset consisting of 2274 buildings in 444 images. To the best of our knowledge, there is no dataset for roof superstructures, so we labeled our own dataset consisting of 407 images. Besides increasing the number of images, labeling activities should also pay attention to the variety of labeled roofs. For example, roof features change with the degree of urbanization as visualized in Figure 9 for an urban, suburban, and rural area in the greater Munich area. Furthermore, there is a regional variation in roof architectures that needs to be incorporated in the dataset to enable applying the network internationally. The annotation of labels for semantic segmentation is a labor-intense task. A higher degree of automation could accelerate the process. Rausch et al. [94] enriched 3D city models with detected PV modules. Conversely, 3D city models could also be used to generate 2D roof segment annotations for aerial images, providing high spatial alignment. This would allow transferring semantic segmentation of roof segments to areas with no available 3D models. Another option could be machine-assisted labeling, using a CNN trained on a smaller dataset to pre-label images as proposed, for example, by Bastani et al. [95]. However, the benefits of adjusting pre-label annotations as opposed to labeling from scratch need to be explored.
In addition to label quantity, dataset quality is a relevant challenge. A study by van Coilie et al. [96] shows that different labeler interprets the same image differently. Bradbury et al. [76] used two labelers per image to denote PV modules and reported that only The annotation of labels for semantic segmentation is a labor-intense task. A higher degree of automation could accelerate the process. Rausch et al. [94] enriched 3D city models with detected PV modules. Conversely, 3D city models could also be used to generate 2D roof segment annotations for aerial images, providing high spatial alignment. This would allow transferring semantic segmentation of roof segments to areas with no available 3D models. Another option could be machine-assisted labeling, using a CNN trained on a smaller dataset to pre-label images as proposed, for example, by Bastani et al. [95]. However, the benefits of adjusting pre-label annotations as opposed to labeling from scratch need to be explored.
In addition to label quantity, dataset quality is a relevant challenge. A study by van Coilie et al. [96] shows that different labeler interprets the same image differently. Bradbury et al. [76] used two labelers per image to denote PV modules and reported that only 70% of the labels come from both labelers, meaning that 30% of the labels were missed by one labeler. Furthermore, the Jaccard Index of two annotations was lower than 0.86 for half of the labels. This indicates a high likelihood for erroneous ground truth data used for training, validation, and testing the CNNs, even if labels are created carefully. The effect of these errors in semantic segmentation tasks is sparsely studied. Besides the ground truth quality, the image quality can also vary. Google Maps provides different resolutions depending on the region. Furthermore, the images are not orthorectified. The effect of resolution and distortion on the roof segmentation task could be quantified to understand the required image quality.
There is also the potential to improve the training of CNNs. In this publication, we applied the approach from Lee et al. [59] and used 16 classes, labeling a roof segment according to their azimuth group. A higher number of classes increases challenges for the CNN to decide on one class. Therefore, a different approach could be explored, detecting only one roof segment class and postprocessing the labels to calculate an azimuth value based on the segment's outline and orientation. Additionally, gables and gutters could be detected to support the azimuth calculation.
Superstructure pixels make up only 1.4% of pixels in our dataset. Therefore, a cascaded network approach could be explored to reduce the dominance of background pixels in the images. Using the available building footprint datasets, an additional network can be trained to output the roof area. For this task, high IoUs of more than 0.9 are achieved [53]. The footprint output can then be fed into the roof segmentation or superstructure segmentation network as an additional layer to promote an area of interest.
Finally, the networks and data for semantic segmentation of roof segments and roof superstructures can be transferred to other application areas. For example, existing 3D city models usually do not include superstructures. The networks can be used to enrich 3D models with superstructures similar to the approach presented by Rausch et al. [94], enabling research opportunities for the built environment field or architecture.

Improving Economic PV Potential Estimation Based on Aerial Images
Moreover, the economic PV potential estimation poses research opportunities. Improving the accuracy of aerial image-based PV potential is challenging. The drawback of 2D data is the absence of correct slope values and the inability to conduct shadow analysis. Based on 3D data for one region, characteristic roof types and respective inclinations could be derived statistically. Potentially, machine learning approaches could be utilized to transfer slope estimation to new regions similar to existing extrapolation approaches [5,42]. This paper implemented the superstructure segmentation for one class. However, when multiple classes are annotated, characteristic structure dimensions for chimneys, dormers, and air conditioning ducts can be assumed, allowing an approximation of 3D roof representations for shadow analyses. Furthermore, the type of building and its utilization plays a key role in the site's energy consumption and economical PV potential. A challenge remains determining building types from OSM or other map providers due to sparse annotation. More than 80% of buildings in the German state of Bavaria are labeled with the type "yes". Figure 10 shows the remaining most frequent building type labels excluding "yes" from a dataset of more than 5 million labels in Bavaria. builds on previous, validated APIs for physical and technical potential. However, the real economic potential can only be validated with customer data, which is not publicly available. The results of the PV potential analysis should be compared to existing studies based on other approaches. Since few studies regard the economic aspects, a comparison can focus on the technical potential. We presented preliminary results which assigned an IRR to each roof segment. An optimization could be added to the framework to determine the most economical solution per building. Combining the PV systems with energy storages can increase the economic benefit [29][30][31] and should be considered in the optimization.

Using Economic PV Potential for Energy System Analysis
The analysis of the economic PV potential for cities, regions, or nations offers research opportunities on a system scale. Assuming economic potential related penetration rates, roll-out scenarios can be investigated with a high geospatial resolution. The impact on the electricity grid and the requirements for energy storage systems can be analyzed with local reference. With increasing shares of renewable energy generation, sector coupling gains relevance due to the necessary balancing of more volatile supply and demand [99]. The introduction of electric vehicles resembles a major transition in the transport sector. Using the economic PV potential, synergies between PV and electric vehicles can be stud- The building type could be classified using aerial images and CNN [97] or a combination of GIS data and aerial images [98]. The building dimensions also play a role in estimating the energy consumption, but height information is not publicly available. Using existing 3D data, machine learning methods could be applied to correlate geo-features with the building height, too. Furthermore, estimating building energy consumption poses a challenging task, even if sufficient geospatial data are given. The usage of German standard load profiles neglects site-specific variations and cannot be transferred on a European or global level. Especially for larger buildings with higher energy consumption, such as industrial buildings, the standard load profiles become less valid. Another challenge is validating the results of the economic PV potential. The presented framework builds on previous, validated APIs for physical and technical potential. However, the real economic potential can only be validated with customer data, which is not publicly available. The results of the PV potential analysis should be compared to existing studies based on other approaches. Since few studies regard the economic aspects, a comparison can focus on the technical potential. We presented preliminary results which assigned an IRR to each roof segment. An optimization could be added to the framework to determine the most economical solution per building. Combining the PV systems with energy storages can increase the economic benefit [29][30][31] and should be considered in the optimization.

Using Economic PV Potential for Energy System Analysis
The analysis of the economic PV potential for cities, regions, or nations offers research opportunities on a system scale. Assuming economic potential related penetration rates, roll-out scenarios can be investigated with a high geospatial resolution. The impact on the electricity grid and the requirements for energy storage systems can be analyzed with local reference. With increasing shares of renewable energy generation, sector coupling gains relevance due to the necessary balancing of more volatile supply and demand [99]. The introduction of electric vehicles resembles a major transition in the transport sector. Using the economic PV potential, synergies between PV and electric vehicles can be studied. Current research challenges include evaluating these synergies on the city level with a high spatial and temporal resolution [93]. Furthermore, policy measures play an important for the PV adaption rate. Their influence could be evaluated to support policy decision-making similar to [25]. This is especially interesting for new markets and regions with low PV coverage. Finally, increasing interest in the usage of aerial and satellite images evokes the necessity for ethical considerations and the discussion of privacy aspects.

Conclusions
Roof-mounted PV systems play an important role in the global transition to renewable energy generation. PV potential analysis is an important tool to support decision-making and to incentivize new installations. Research in this area has evolved from calculating the physical potential, geographic potential, and technical potential. The analysis of economic potential implies the most practical relevance and has been adapted by few studies. Existing publications with a high spatial resolution and a high level of detail are usually based on 3D data. Recent deep learning advances allow exploring an approach based on aerial images which are built on publicly available data. This paper presents a methodology for economic PV potential analysis using aerial images and deep learning. Two CNNs are trained for semantic segmentation of roof segments and superstructures. The CNNs achieve Intersection over Union values of 0.84 and 0.64, respectively. We calculated the internal rate of return of each roof segment for 71 buildings in a small study area. A comparison of this paper's methodology with a 3D-based analysis discusses its benefits and disadvantages. The approach is potentially scalable to a global level but poses manifold challenges along the way. Therefore, the last section of this paper collected and discussed research opportunities in the fields of deep learning, improved economic PV potential analysis, and energy system analysis. Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest. Table A1. Equations for technical and economic potential calculation.

Calculation of the Technical Potential
Physical Potential E phy = I hor,glob = I hor,dir + I hor,di f f