Spatial Distribution of Wind Turbines, Photovoltaic Field Systems, Bioenergy, and River Hydro Power Plants in Germany

The expansion of renewable energy technologies, accompanied by an increasingly decentralized supply structure, raises many research questions regarding the structure, dimension, and impacts of the electricity supply network. In this context, information on renewable energy plants, particularly their spatial distribution and key parameters—e.g., installed capacity, total size, and required space—are more and more important for public decision makers and different scientific domains, such as energy system analysis and impact assessment. The dataset described in this paper covers the spatial distribution, installed capacity, and commissioning year of wind turbines, photovoltaic field systems, and bio- and river hydro power plants in Germany. Collected from different online sources and authorities, the data have been thoroughly cross-checked, cleaned, and merged to generate validated and complete datasets. The paper concludes with notes on the practical use of the dataset in an environmental impact monitoring framework and other potential research or policy settings.

warming below 2 or even 1.5 degrees centigrade in 2050 [6]. However, the increasing number and spatial distribution of renewable energy plants also lead to surging conflicts with the objectives of nature conservation and acceptance by the local, i.e., directly impaired, population [7][8][9][10][11]. Changes in land use, reduction of cultivable land, fragmentation of the countryside, plant degradation, visual impact, interference with fauna and flora, microclimate change, glare, noise, and impact associated with the construction phase are among the impacts of renewable energy plants-in particular of photovoltaic field systems and wind turbines-that have been identified [12,13]. In order to counteract these conflicts, a comprehensive understanding of the effects of renewable energy systems on humans and the environment is essential. This knowledge can be generated both by case studies and comprehensive monitoring [14,15]. A prerequisite for such analyses is concrete knowledge of the location and characteristics of renewable energy plants. In addition, such information is also important for the further technological planning of the conversion of the energy system.
The dataset described here comprises wind turbines, photovoltaic field systems, bioenergy plants, and river hydro power plants installed in Germany predominantly between 1990 and 2015. For the sake of completeness, available data on plants with a commissioning year prior to the 1990 to 2015 study period are also included. The data have been generated as part of the ongoing research project "EE-Monitoring-Monitoring of nature protection implications of the expansion of renewable energy in the power sector and the development of instruments for the mitigation of impacts on nature and the scenic value of the landscape". This project is supported by the Federal Agency for Nature Conservation and is processed by a consortium consisting of several Departments of two research institutes (Helmholtz Centre for Environmental Research UFZ and the Deutsche Biomasseforschungszentrum DBFZ) and three subcontractors (Bosch und Partner GmbH, Leipziger Institut für Energie and Ingenieurbüro Floecksmühle). The main objectives of this project are: • Multitemporal detection of the spatial distribution of renewable electricity generation and electricity transport infrastructure (location of renewable energy power plants (on-shore wind turbines, photovoltaic field systems, bioenergy, hydro power) and transmission networks) and the identification of environmental and sociotechnological conflicts related to them; • Derivation of appropriate indicators for monitoring the impact of past and future expansion of renewable energies; • Derivation of recommendations for policy and planning.
The remainder of this paper is divided as follows. In Section 2, the dataset is described; in Section 3, the methodology for the data generation is explained; and Section 4 provides hints and examples for the potential reuse of the data in research, monitoring and energy planning processes.

Data Description
The generated dataset of the renewable power plants in Germany is provided in vector format, readable by all common geographical information systems. Their spatial extent covers the area of the Federal Republic of Germany. The projected coordinate system is the European Terrestrial Reference System (ETRS89) Universal Transverse Mercator (UTM) zone 32N.
Statements on the quality and completeness of the dataset are made in the Methods and Results section of the paper.
The dataset contains the following spatially explicit objects: In addition to their position in space, the listed objects include information (attributes) on commissioning year, installed capacity, dimension of the plants (wind power), and type of the plant (gaseous, solid or liquid fuels) for bioenergy plants.
The attributes of the dataset are included in the metadata description [16].

Methods and Results
Here, the data processing is described; data sources and data availability are explained, followed by an explanation of data generation, the resulting dataset, and the validation process. This is done for all technologies under consideration.

Data Source and Accessibility
The locations of wind turbines are provided by the competent authorities of the federal states. They are freely accessible, but user declarations have to be signed. These data contain information on the exact location (latitude/longitude and/or street/parcel of land), as well as the plant specification (nominal capacity, hub height, commissioning time, etc.). To some extent, manufacturer-specific attributes are listed (model name, system status). The quality of the datasets provided by the federal states was quite different regarding the data structure, completeness, and accuracy. For the city states, no information by competent authorities was available. We refrained from attempts to fill these gaps with other sources to avoid inconsistencies in the data. The used data sources are listed in the Appendix A of this paper.

Data Generation Process
In a first step, the compiled individual datasets were checked for incorrect entries, such as the declaration "30 December 1899" as the commissioning date. Implausible entries that could not be manually verified were deleted. Furthermore, incorrect information on the total height of the wind turbines was corrected. The total height of the wind turbines results from the hub height plus half of the rotor diameter. Depending on the federal state, the spatial locations of the wind turbines are available in different coordinate reference systems (UTM zone 32, UTM zone 33, Gauss-Krüger zone 4, etc.). With the help of the online tool "Coordinate Transformation 4.4", which is provided by the Federal Agency for Cartography and Geodesy (BKG) at http://sgs.geodatenzentrum.de/ coordtrans/, the data were converted into a uniform ETRS 89 LAEA coordinate system. This simplifies implementation into common geospatial viewers, such as Google Earth, and makes data comparable with, e.g., CORINE LandCoverData.
In a next step, the individual corrected and transformed datasets were merged to one dataset. Finally, the gaps in the dataset regarding rated power, hub height, rotor diameter, and commissioning year were filled using random forests and k-nearest neighbor approaches. The full methodology is explained in depth in Becker and Thrän [17]. Using a typical wind turbine data set for Germany, the authors trained and tested the performance of five approaches in 84 set-ups consisting of various combinations of predictor and response variables. To test results, the root-mean-square error was computed for 4-fold-and 3-fold-cross-validation test samples, respectively. Computing the differences in R 2 between the 4-fold and the 3-fold run yielded an absolute deviation of 0.001 for the random forests and k-nearest neighbor model, which is an indication of robustness and generalizability of the approaches.
The edited/completed data points are marked in the attribute table. Figure 1 shows an overview of the completeness of database attributes by federal state. Evidently, the database has gaps. The quality of the wind power data as of June 2017 differs significantly depending on the state. For example, there is no information on the rotor diameter and the hub height for Mecklenburg-Vorpommern; in Rhineland Palatinate, the type of plant is not documented. In Bavaria and Thuringia, the commissioning year is documented, but not the exact commissioning date. After preprocessing, the database contains a total of 24,475 entries. The total capacity of these plants is approx. 39,296 MW. An overview of the spatial locations is given in Figure 2.  After preprocessing, the database contains a total of 24,475 entries. The total capacity of these plants is approx. 39,296 MW. An overview of the spatial locations is given in Figure 2. After preprocessing, the database contains a total of 24,475 entries. The total capacity of these plants is approx. 39,296 MW. An overview of the spatial locations is given in Figure 2.

Data Source and Accessibility
The general data basis for site identification of photovoltaic (pv) field systems consists of the following datasets: (a) Energymap register; (b) pv-relevant openstreetmap data; (c) Installation register of the Bundesnetzagentur (BNetzA).
(a) These data contain information about existing plants, financed by the renewable energy sources act support scheme in Germany and is provided by the Deutsche Gesellschaft für Sonnenenergie e.V. (DGS) via its website (www.energymap.info). This dataset comprises plant-specific information, such as the commissioning date, energy source, system key, and nominal power. The addresses and/or the GPS coordinates of the ground-mounted systems are associated with a certain fuzziness and serve only as a rough guide.
(b) The freely available open street map (OSM) geodata contain datasets for pv field systems, which can be retrieved using the online tool (http://overpass-turbo.eu/) using the following tags: generator: source = solar and generator: method = photovoltaic. These georeferenced data relate to both pv field systems and pv roof systems. Occasionally, additional data such as rated power or the name of the solar park are stored. A fundamental distinction in roof systems and field systems does not exist.
(c) These data form the basis for installations which became operational as of 1 August 2014. The installation register of the Bundesnetzagentur (BNetzA) registers all plants (except for pv roof systems) for the generation of electricity from renewable energies. The installation register contains system-specific information, such as the commissioning date, energy source, system key or nominal capacity.

Data Generation Process
Step 1: Processing the OSM data For the separation of the pv field systems from the pv roof systems in the OSM data, the basic assumption was made that pv systems located in urban settlement areas and smaller in size than one hectare are roof systems. The latter assumption is based on two aspects. First, digitization inaccuracies or coarser resolutions (see CLC dataset) can lead to overlaps of pv field systems of the OSM dataset and settlement areas of the generated settlement mask. Second, even within settlement areas, larger brownfields or similar areas are used for pv field systems. To identify the pv roof systems, the following geo datasets containing information on settlement areas were used: • "Corine Landcover (CLC)" dataset of the European Environmental Agency (EEA) of 2012; • OSM record "buildings"; • OSM settlement areas with the tag: landuse = residential.
Points and polygons that lie within these recorded settlement areas were removed from the database and polygons with an area >1 ha were considered and reported as a pv field system. To unify the data structure, pv field systems that exist as points were identified and converted into polygons by manual digitization.
Step 2: Linking Energymap Data and OSM Data To link the exact spatial information (coordinates and areas) of the OSM database with the additional attribute information of the Energymap database, the fuzzy GPS coordinates of the latter were transformed to spatial point information. The indication of postal codes in the Energymap enables postal code-specific location information of the side information. Based on this, postal code areas were identified that overlay with exactly one OSM-pv field system and with exactly one entry in the Energymap register (1:1). For the remaining pv field systems, a manual assignment is unavoidable, since no systematic assignment rules can be set up. The following reasons could be identified by random sampling: • A solar park consisting of several polygons can be assigned to one entry (m:1); • Several register entries (Energymap) are assigned to a solar park polygon (1:n); • Several register entries (Energymap) can be assigned to several solar park polygons (m:n) It should be noted that both the number of polygons and the number of register entries to be allocated vary widely. Furthermore, the distance of a polygon to the nearest register point was generated to test another room attribute for assignment suitability.
Step 3: Manual assignment The total size of the OSM database amounts to approx. 3220 pv field systems in Germany. For the reasons mentioned above, only a relatively small proportion of these OSM data could be systematically assigned to the Energymap registry entries. Linking the data was only possible in 85% of cases via manual assignment. The GPS coordinates of the Energymap dataset, which are available with an accuracy of 3 km, served as a rough orientation. To obtain the coordinates for newer pv field systems and to fill in missing coordinates, the installation register of the Bundesnetzagentur was evaluated. With the help of the geo-viewer Google-Earth, Bing-Maps and the Viewer of the Bundesamt für Kartographie und Geodäsie (BKG), the coordinates of the pv field systems were determined, and at the same time, the areas of the solar parks were digitized.

Resulting Dataset and Validation
The database for pv field systems includes 7076 systems, of which: • 6752 with geo coordinates (including duplicates); • 5653 with polygons (including duplicates); • 7050 with information on the rated output; • 7055 with a commissioning date.
The total capacity of these plants is approx. 9907 MW. This corresponds to approx. 95% of the installed capacity of the nationwide pv field systems at the end of 2015.
The information in the point datasets was overlapped with the available polygons. In this step, the multiple points within a polygon were summarized with regard to their information. Via a plausibility check, several polygons were combined to one polygon, which obviously belong together, now representing one pv field system. However, the compiled data yielded an unrealistic output for some pv farms, e.g., when the installed capacity was calculated in megawatt per hectare for each pv field system.
A problem which could not be completely excluded due to the data abundance and data quality is the mixing of open space plants and roof area plants in terms of the localization and/or a wrong declaration of the plant type. This can lead to the errors in derived values (MW/ha).
To overcome this problem, the dataset was revised by manually comparing each polygon representing a pv field system with those depicted in two basemaps, i.e., OpenStreetMaps (OSM) and Google Maps (more precisely Google Satellite). These were downloaded via the OpenLayers Plugin of the geographical information system QGIS. If there was a discrepancy between the polygons of the dataset and the basemaps, the vertices of this polygon were adjusted using the QGIS "Node Tool".
Initially, the polygons were matched to OSM. However, in cases where the respective pv field system was not included in the OSM data, it was corrected in accordance with the Google Satellite image. It should be taken into account that some polygons could not be validated because they were not represented in either of the available basemaps, even though a small portion of these (in total 17) was probably inaccurate, as they just consisted of a small circle. The pv field systems which could neither be verified nor falsified were kept unchanged in the dataset. Further, 58 polygons visible in the basemaps but not included in the created dataset were added subsequently, accounting for a combined surface area of approximately 108 hectares. However, detecting unaccounted for pv field systems was not the primary objective; thus, there is no guarantee for completeness.
Altogether, during these revision processes, about 26% of the polygons were modified. The total surface area was greater after the correction: Including the new polygons, approx. 375 hectares were added.
Each altered polygon was labelled as "edited" in the attribute table of the ultimately created files. In addition, the indication "implausible area performance" was given if the area output was greater than 1 MW/ha.
The final deduplicated and validated polygon dataset contained 4139 polygons, of which: • 874 without attribute information; • 3265 with attribute information; and an installed capacity of around 9351 MW (see Figure 3). In the deposited data, only the final dataset with complete attribute information is included. Altogether, during these revision processes, about 26% of the polygons were modified. The total surface area was greater after the correction: Including the new polygons, approx. 375 hectares were added.
Each altered polygon was labelled as "edited" in the attribute table of the ultimately created files. In addition, the indication "implausible area performance" was given if the area output was greater than 1 MW/ha.
The final deduplicated and validated polygon dataset contained 4139 polygons, of which: • 874 without attribute information; • 3265 with attribute information; and an installed capacity of around 9351 MW (see Figure 3). In the deposited data, only the final dataset with complete attribute information is included.

Data Source and Accessibility
The basic data for bioenergy power plants in Germany were provided by the German national regulatory authority (BNetzA). They contain master and transaction data of all bioenergy power plants producing electricity according to the Renewable Energy Sources Act, the Erneuerbare-Energien-Gesetz (EEG). Master data contain information on the location, provided as the address of the net connection point location (federal state, postal code, locality, street), installed electrical capacity, and year of commissioning. The data themselves contain no information on the bioenergy carrier but specific remuneration keys for every bioenergy power plant depending on the bonuses for every single power plant. As additional data sources, the DBFZ database from yearly surveys (Scheftelowitz et al. [18]) and the power plant register of the BNetzA (BNetzA 2017) were used.

Data Source and Accessibility
The basic data for bioenergy power plants in Germany were provided by the German national regulatory authority (BNetzA). They contain master and transaction data of all bioenergy power plants producing electricity according to the Renewable Energy Sources Act, the Erneuerbare-Energien-Gesetz (EEG). Master data contain information on the location, provided as the address of the net connection point location (federal state, postal code, locality, street), installed electrical capacity, and year of commissioning. The data themselves contain no information on the bioenergy carrier but specific remuneration keys for every bioenergy power plant depending on the bonuses for every single power plant. As additional data sources, the DBFZ database from yearly surveys (Scheftelowitz et al. [18]) and the power plant register of the BNetzA (BNetzA 2017) were used.

Data Generation Process
The fuel types were assigned according to the methodology in Scheftelowitz et al. [19] depending on remuneration keys and other characteristics, such as power plant size and year of commissioning. Four different biomass fuel types were assigned, namely biogas, biomethane, solid biomass, and liquid biomass. After assigning the biomass fuel types, the bioenergy power plants were georeferenced to receive x/y coordinates in the WGS 84 system. Georeferencing was done by batch processing with an application of the Federal Bureau for Cartography and Geodesy (BKG-Geokoder 2017). After batch processing, the geocoder also provides a quality measure for the results. Therefore, the geocoder computes the distances between the sought and found attribute values by applying various distance measures, such as Damerau-Levenshtein distance. The individual distances of the attributes are summed via a weighting algorithm to form the quality measure. If it is <96%, the results should be checked.

Resulting Dataset and Validation
In total, we processed 14,236 plants with the geocoder (see Figure 4). Out of these, 117 locations, which correspond to <1% of the plants, had a quality measure for the postal code or locality below 96%. Due to their marginal number, these plant sites were not checked again during the project. Moreover, it has to be mentioned that the addresses of the biogas plants do not match with the actual location of the plants in every case. In cases of such deviation, the address probably belongs to the office or residential address of the plant owner. It is unknown how many plants sites are affected, but mostly the biogas plants are located close to the given addresses. The total capacity of these plants is approx. 6833 MW.

Data Generation Process
The fuel types were assigned according to the methodology in Scheftelowitz et al. [19] depending on remuneration keys and other characteristics, such as power plant size and year of commissioning. Four different biomass fuel types were assigned, namely biogas, biomethane, solid biomass, and liquid biomass. After assigning the biomass fuel types, the bioenergy power plants were georeferenced to receive x/y coordinates in the WGS 84 system. Georeferencing was done by batch processing with an application of the Federal Bureau for Cartography and Geodesy (BKG-Geokoder 2017). After batch processing, the geocoder also provides a quality measure for the results. Therefore, the geocoder computes the distances between the sought and found attribute values by applying various distance measures, such as Damerau-Levenshtein distance. The individual distances of the attributes are summed via a weighting algorithm to form the quality measure. If it is <96%, the results should be checked.

Resulting Dataset and Validation
In total, we processed 14,236 plants with the geocoder (see Figure 4). Out of these, 117 locations, which correspond to <1% of the plants, had a quality measure for the postal code or locality below 96%. Due to their marginal number, these plant sites were not checked again during the project. Moreover, it has to be mentioned that the addresses of the biogas plants do not match with the actual location of the plants in every case. In cases of such deviation, the address probably belongs to the office or residential address of the plant owner. It is unknown how many plants sites are affected, but mostly the biogas plants are located close to the given addresses. The total capacity of these plants is approx. 6833 MW.

Data Source and Accessibility
The general data basis for the site identification of hydropower plants consists of the following datasets: (a) These data contain information about existing EEG plants in Germany and are made available via the Federal Network Agency (BNetzA). This dataset consists of plant-specific information, such as location coordinates (UTM coordinates with zone value), address data, plant-specific, approval data, and details of the grid connection, voltage level, and remote controllability information, such as installed power, etc.
(b) The Federal Network Agency (BNetzA) records the master and movement data of all power generation plants that feed into the German electricity grid via the EEG via an annual report from the transmission system operators. These plants contain a reference to the energy source hydropower. This dataset contains address data of the system, system-specific information, such as rated output, voltage level, controllability, date of commissioning, control zone, and movement data of the system (compensation category, annual work, compensation, distribution system operator).
(c) In an internal database, data from hydropower plants >1 MW were compiled from various projects of nationwide importance. The data include information on the power plant location as well as on plant-specific information, such as power, operator, etc.
(d) See (b), as data are structurally the same but cover different reporting periods.
(e) In all federal states except Baden-Württemberg, Berlin, Brandenburg, and Bremen, data are available in a central database (transverse construction cadaster). In North Rhine-Westphalia (NRW), Rhineland-Palatinate, and Thuringia, the data are georeferenced, but only in NRW public. For the other federal states, there are no details.

Data Generation Process
The process of creating the spatial database comprises four steps, described below: Step 1 Preparation of BNetzA data For the identification of the hydropower plants, the master data of the EEG annual statements 2015, which contain address data, EEG asset codes, and performance data, were used. When locating the geodata, the location data of the facilities were located. However, since the location-specific data are address data, several problems arise during the allocation: (i) Address data correspond to postal addresses, so the data points do not always correspond 100% to a location along a river. (ii) Some addresses are incomplete or incorrect, leading to no or wrong allocation. (iii) Information such as "Kreisstr". or "At the waterfall". Two different programs for georeferencing were used for allocating the plants, but there were addresses that were not found by both systems. If address information was missing, the location of the center was chosen as the location.
Step 2 Processing data from hydropower plants >1 MW For installations >1 MW, the locations were partly taken from available documents. Missing data were revised and updated and relocated. A small part of the facilities is funded by the EEG.
Step 3 Match georeferenced data to GIS and assign manually (c) In the geographical information system, the data were compared with the situation on the water. Here, different distance tolerances were used. In individual cases, locations were also searched by map and aerial photographs and located by hand. However, this procedure only worked for large systems that were visible in the aerial image. Smaller installations on small bodies of water, on the other hand, were almost invisible in the aerial photograph due to occlusion.
Step 4 Linking of BNetzA data and the in-house database WKA >1 MW (d) To determine which equipment is present both in the office database and in the BNetzA data, the exact spatial information (coordinates) of the BNetzA data was linked to the additional information of the internal office database, comparing the location and performance data. If EEG system keys were present in the data, the assignment was made using this identifier.

Resulting Dataset and Validation
The dataset comprises 7153 individual plants (see Figure 5). The total capacity of these plants is approx. 4148 MW. For all plants, installed capacity, coordinates, and commissioning year are available. by map and aerial photographs and located by hand. However, this procedure only worked for large systems that were visible in the aerial image. Smaller installations on small bodies of water, on the other hand, were almost invisible in the aerial photograph due to occlusion.
Step 4 Linking of BNetzA data and the in-house database WKA >1 MW (d) To determine which equipment is present both in the office database and in the BNetzA data, the exact spatial information (coordinates) of the BNetzA data was linked to the additional information of the internal office database, comparing the location and performance data. If EEG system keys were present in the data, the assignment was made using this identifier.

Resulting Dataset and Validation
The dataset comprises 7153 individual plants (see Figure 5). The total capacity of these plants is approx. 4148 MW. For all plants, installed capacity, coordinates, and commissioning year are available.

User Notes
The datasets described in this paper are used for the monitoring of the spatiotemporal development of renewable energy in Germany for the period between 1990 and 2015 within the abovementioned research project. In particular, the data are used for location, distance, and density analyses of the renewable energy plants with regard to, e.g., land use, protection areas, and the impairment of open space. First results are presented on a webgis, probably available online by the end of the first quarter of 2019 (https://www.ufz.de/ee-monitor-app/).
The dataset will also be useful for other research, monitoring and energy planning purposes that build upon highly resolved spatial data on renewable energy plant distribution. For example, GISbased techniques have been used to estimate changes in land use due to pv field systems at the

User Notes
The datasets described in this paper are used for the monitoring of the spatiotemporal development of renewable energy in Germany for the period between 1990 and 2015 within the abovementioned research project. In particular, the data are used for location, distance, and density analyses of the renewable energy plants with regard to, e.g., land use, protection areas, and the impairment of open space. First results are presented on a webgis, probably available online by the end of the first quarter of 2019 (https://www.ufz.de/ee-monitor-app/). The dataset will also be useful for other research, monitoring and energy planning purposes that build upon highly resolved spatial data on renewable energy plant distribution. For example, GIS-based techniques have been used to estimate changes in land use due to pv field systems at the municipality level [20]. The availability of the comprehensive dataset for Germany can be utilized in similar investigations of the conflicts between renewable power plants and objectives of nature conservation and landscaping, such as the disturbance of scenic value of the landscape or fragmentation analysis [21,22].
Furthermore, the developed dataset will be of use for researchers and policy makers who are concerned with the optimized spatial planning of the allocation and expansion of renewable power supply in accordance with the energy policy target triangle comprising security of supply, cost effectiveness, and environmental soundness. Although a number of studies have addressed the spatial resolution of the corresponding energy system models on the level of countries or even world regions [23][24][25][26], more detailed models are relatively scarce [27][28][29] because the required comprehensive and validated datasets are usually not publicly available. The presented data were created at a high resolution and can facilitate the modeling of an efficient spatial connection between renewable energy supply and power demand by providing reliable information on the spatial patterns of already existing renewable energy plants in Germany.

Rhineland Palatinate
Ministerium für Wirtschaft, Klimaschutz, Energie und Landesplanung The data were provided by the Department of Energy and Transport Infrastructure, Geoinformation on request.

Landesamt für Umwelt und Arbeitsschutz
The data was provided by Division 3 "Nature and Environmental Protection" Department 3.5 "Circular Economy" on request.

Saxony
Landesamt für Umwelt, Landwirtschaft und Geologie The data was provided by the unit for immission protection and noise on request.

Saxony-Anhalt
Ministerium für Landesentwicklung und Verkehr The data was provided by the Regional Development, Regional Observation and Spatial Planning Board on request.

Schleswig-Holstein
Landesamt für Landwirtschaft, Umwelt und ländliche Räume The data was provided by the Department of Specialized Information System and Reporting on request.

Landesverwaltungsamt
The data was provided by the Department of Planning and Spatial Observation on request.

Berlin, Hamburg, Bremen
For the city states, no information is available. The data situation was not examined there in depth. Table A1. Cont.