Analyzing Europe’s Biggest Offshore Wind Farms: A Data Set with 40 Years of Hourly Wind Speeds and Electricity Production

: We provide an open, available, and ready-to-use data set covering 40 years of hourly wind speeds and synthetic hourly production signals for the 29 biggest offshore wind farms in Europe. It enables researchers and industry experts to include realistic offshore time series into their analyses. In particular, we provide data from 1980 to 2019 for wind farms already in operation and those that will be in operation by 2024. We document in detail how the data set was generated from publicly available sources and provide manually collected details on the wind farms, such as the turbine power curves. Correspondingly, the users can easily keep the data set up to date and add further wind farm locations as needed. We give a descriptive analysis of the data and its correlation structure and ﬁnd a relatively high volatility and intermittency for single locations, with balancing effects across wind farms.


Introduction
The EU has the largest wind energy exploitable maritime space in the world. Consequently, to reduce net greenhouse gas emissions, the EU plans to expand the current offshore wind capacity of 12 GW to 60 GW by 2030 and 300 GW by 2050 [1,2]. Similarly, the United Kingdom aims at reach 40 GW offshore wind capacity by 2030 in their Ten-Point Plan [3].
The main potential for such offshore wind farms in the EU and UK is located in the North Sea, due to relatively steadily blowing winds and shallow sea depths allowing for ground-based installations (see, e.g., [1,4,5] for recent studies). Accordingly, the vast majority of the 29 largest European offshore wind farms displayed in Figure 1 are located in the North Sea (further details in Section 2). In this context, data from wind farms are of particular interest for quantitative analyses related to the growing offshore capacities at these and nearby locations. Such data could feed to simulations, forecasts, models, or studies on the overall energy system. However, to the best of our knowledge, no such data are freely available to a broad public.
In this paper, we provide an hourly data set covering the last 40 years for the 29 biggest wind farms in Europe. We include wind speeds at hub height based on meteorological reanalysis data and further weather parameters as well as specifications of currently installed and planned wind turbines with parametric forms of their power curves. Furthermore, we provide synthetic energy time series of produced power over the considered horizon, i.e., power that would have been produced in the past at the respective locations with the current capacity and technology installed. These synthetic time series do not consider interactions of the installed turbines on their respective output, such as wake effects, or constraints in the connected power grids. The data set mainly focuses on the meteorological Studies in which data sets similar to ours could contribute can be found in the area of economic analysis (see, e.g., [6][7][8][9][10]) or grid integration ( [11][12][13][14][15][16][17][18][19]) as well as in climate change analysis [20][21][22]. Among these papers, ref. [19] implicitly use a data set similar to ours to analyze future developments of an offshore (and energy) transmission grid in 2050. They consider 16 wind farms in the North Sea region and use a single reference wind turbine to calculate future feed-in data from meteorological reanalysis data, taking wake effects into account. A recent publication providing a short overview over calculation feed-in data from reanalysis data is [23] (see also the references therein). Further contributions methodologically related to ours include [24], who uses among other data high-resolution geo-spatial wind speed data to analyze renewable energy potentials in the European Union. In addition, the work of [25] synthetically calculates Swedish wind power production based on a single reference turbine for three years. With respect to modeling spatial and temporal dependency structures of wind power production, ref. [26] analyzed northern European countries and waters. For onshore locations, recent studies related to ours include, e.g., [27][28][29][30][31].
The data set provided in our paper can be of particular use and interest to researchers and industry experts as well as policymakers. It is prepared in a structured way so that a broad readership can analyze the data without much computational skills and effort. To better understand the data and potential insights derived from it, we give a first analysis. We include descriptive statistics and aggregate production figures over various time horizons for each wind farm as well as for total electricity production. This includes average production numbers, full load hours, and numbers for site-specific volatility and intermittency. By calculating these variables, we obtain an overview of all wind farms and the possibility to compare production characteristics at different locations. Since offshore wind energy is intended to play an essential role in the future European power system, we further analyze the dependencies of wind speed and electricity production between considered locations. Such analysis is interesting and very relevant with respect to reducing the wind power variability by aggregating productions from diverse geographical locations. While highly correlated locations lead to high volatility and intermittency in the overall supply, low correlations balance the overall output. Interestingly, the correlations of offshore wind farms over distance behave similarly to those found in onshore studies [32][33][34] but show higher correlations in neighboring locations.
The remainder of the paper is structured as follows. The data published with this paper, including weather data, information about wind farms, and derived wind power generation, are presented in Section 2. We explain the data generation step by step so that readers can keep the data set updated and extend it according to their needs. Section 3 includes the results of the analysis of the 29 wind farms. It is divided into a descriptive analysis for all of them as well as combined and a dependence analysis. We conclude in Section 4.

Data
We provide 40 years of hourly wind and production data for Europe's 29 largest offshore wind farms in terms of installed capacity. The data include wind farms that are still under construction but will begin commercial operation in the next three years (by 2024). The locations of these farms are shown in Figure 1 and listed with technical details in Table 1. Our data set consists of hourly wind speeds and synthetic hourly power generation signals for each site. Wind speeds were determined by matching the wind farms locations to the nearest grid point in the ERA5 data set [35] and transforming the wind speeds at 100 m to the hub heights of the turbines. Afterwards, the wind speeds were converted to production signals using the power transfer function of each turbine, which we also provide in this paper. All steps are explained in detail below.
Details about the considered wind farms in Table 1 include their approximate location, hub height (Hub (m)), turbine types, resulting capacity in MW, and the start of commercial operations. The location of each wind farm is rounded to the next quarter longitude and latitude; thus, the positions are projected to a grid corresponding to the resolution of the weather data described below. Lastly, we assign a letter to each wind farm for improved visualization. Baseline data of each wind farm were manually collected from publicly available information given by the operator of each wind farm or other public information (see Table A1 in Appendix A). Note that although we are talking about 29 wind farms, the number is not so clear to define, and one could, turning to Table 1, also talk about 28 or 32 considered parks. For example, we count Horns Rev Phase 1-3, a wind project built from three turbines types, as one wind farm, since the three parts map onto the same weather coordinates. Opposed to that, we count, e.g., the two projects Hollandse Kust Zuid/Noord as two wind farms. So, decisive for us are the resulting locations we are able to distinguish in the network of weather data.
For the weather data, we extract ERA5 data for every wind farm location from the Copernicus Climate Change Service (C3S) Climate Data Store (CDS) [35]. There, the weather data are provided on a grid of quarter degrees in longitude and latitude, and thus, we already matched each wind farm to the nearest grid point in Table 1. For each location, we extract the lateral wind speed components u and v in (m/s) at 100 m above ground. We neglect the lateral wind component and compute the absolute wind speed speed 100 from these two orthogonal components, by (1) and wind direction φ by where we have φ = 0 • for a northerly wind and the angle increases clockwise. Here, atan2 is the 2-argument arctangent function.
In the subsequent step, we follow [32,36] and assume a logarithmic velocity profile to scale the wind speeds speed hub to different hub heights h hub (in m) of the turbines based on [37]: Here, z 0 corresponds to the surface roughness depending on the actual ocean state (characteristic height of waves, depth, etc.), which is also provided in the ERA5 data set [35]. For wind farms with unknown hub height (where we did not manage to find reliable information, nor were able to calculate it, see Table 1), we set the hub height to 100 m. Figure 2 displays the wind roses of wind in 100 m for wind farms Gwynt y Mor and Kriegers Flak. Note that Gwynt y Mor represents the most western wind farm in our data, and Kriegers Flak represents the most eastern.  Despite the notable difference in their position, the resulting wind roses indicate quite similar main wind directions for both wind farms and very few northern as well as north-eastern winds. However, we observe a higher proportion of low wind speed hours at the Gwynt y Mor wind farm, which is located in a bay of the Irish Sea near the shore (wind farm C in Figure 1). This difference will also be visible in the descriptive statistics of generated power in Section 3, where, e.g., we observe more downtimes of Gynt y Mor due to low winds (below 4 m/s, which is typically the cut-in speed) compared to the Kriegers Flak wind farm.
As a final step in data preparation, we convert the wind speed data into synthetic power output using the turbines' power curves. However, in the datasheets of most turbine types, the power curve is only given for individual points, i.e., in the form of a table with discrete wind speeds and corresponding nominal output power. We follow [38] and fit a combination of third-order polynomials to the nominal power at each wind speed to get a functional relationship. A piece-wise definition of the function is given by where for each turbine type, speed min is the cut-in speed, i.e., the minimum wind speed required for any power, and speed rated is the minimum wind speed for the rated power. speed max is defined as cut-out speed, i.e., the speed at which the turbine is stopped or braked, and set to speed max = 25 m/s for all turbines. In addition to these technical parameters of the turbines, speed split is the turning point within our functional representation, where we change to the second polynomial. As proposed by [38], we fitted a third-order polynomial to find this point, where the concavity of the power curve changes sign. The resulting power curves for the wind turbine Siemens SWT-3.6-107 and Siemens Gamesa SG 8.0-167 DD, installed in wind farms Gwynt y Mor and Kriegers Flak, are shown in Figure 3.  Fitted polynomials and plots for the other turbine types are given in Appendix C. Note that data on nominal power at different wind speeds were not available for four turbines (Vestas V164-8.25, V164-9.0, V164-10.0, and Siemens Gamesa SG 11.0-200 DD). In these cases, we used a scaled version of the most similar Vestas V164-9.5 power curve instead. To be more specific about the scaling, consider, e.g., an unknown turbine with a nominal capacity of 8 MW and an unknown power curve. Then, the unknown power curve is approximated by adopting the shape of the 9.5 MW Vestas V164-9.5 turbine, and each value is scaled by 8 9.5 . Having a time series of power output of every single turbine at each wind farm, we sum up all turbines belonging to the same wind farm to model the farms' overall resulting power output. Without a doubt, this aggregation is a simplification of the actual effects of how individual wind turbines combine to form a wind farm. However, it suffices to provide insights into overall variations, intermittencies, their time constants, as well as distributional characteristics of power production at certain locations and dependency patterns between locations. For studies where the absolute level of generated power of particular wind farms is needed as accurately as possible, we recommend taking the interactions of the wind turbines such as wake effects into account. An implicit way of doing that would be by calibrating the synthetic power data calculated here with the help of measured power data over a short period of time. Depending on the amount of measured data available, one could use different calibrations for different wind conditions such as wind direction or light wind and strong wind scenarios. Alternatively, one could try to consider wake effects within the aggregation step using a theoretical model that incorporates relevant parameters about the wind farm's outline. Various approaches to model these effects have been proposed, and detailed overviews are given, e.g., in [39][40][41].
The resulting total produced electricity of the Gwynt y Mor and Kriegers Flak wind farms in 2019 is displayed in Figure 4a,b for illustration. The figure shows that the power outputs at these locations are highly volatile and vary between no output at all and the maximum, i.e., the rated power. For better visualization, Figure 5 gives a more detailed view of Gwynt y Mor for January 2019. Here, flat tableaus where the wind speed falls below the cut-in speed or exceeds the speed of rated power are visible. The high volatility of the series might not be surprising for readers familiar with offshore wind power, but it clearly shows that the idea of the wind blowing continuously on the sea is not accurate. However, turning back to Figure 4, we can observe that the upper/lower bound of the power output is often reached at different points in time and, consequently, we expect that aggregating the power from multiple sites will have a flattening effect on the overall production. 3RZHULQ>0:@   We end this section with an overview of the exact format in which we provide the data set before we give a brief (descriptive) analysis in the next section. The data set may be downloaded as a zip archive under the provided DOI. It consists of 31 CSV files, one for each wind farm (total 29), one file summarizing the wind speed, and one for the resulting power outputs. In the first 29 files, we report detailed data for each wind farm, including wind components u and v, the forecast surface roughness (fsr), calculated wind speed, wind direction, scaled wind speed at hub height, and estimated power for each turbine type in the columns. Similar to the last two files, reporting wind speed at hub height and total power for each wind farm, each row represents one point in time. Starting from 1 January, 1980, 00:00 am UTC in the first row, the data set ranges up to 31 December, 2019, 11:00 pm in the last of 350,640 rows.

Descriptive Statistics
In this section, we give results of the descriptive analysis of the data provided. The analysis is intended to give first insights into the data and to enable other scientists to work with the data more quickly. We report quantiles, mean, standard deviations, number of rated power hours (R.-Power), full-load hours (Full-L.), number of cut-out hours (Offs), and number of hours below cut-in (Null) of the year 2019 in Table 2. The same tables for earlier years (1980,1990,2000,2010) are given in Appendix B.1. In total, the time series of 2019 consists of 8760 data points, i.e., the hours of a non-leap year. Across all wind farms, we observe a relatively small number of cut-out hours where the wind speed was too high for the wind turbines and, thus, turned off. In the other extreme, 506 hours on average per wind farm had too light wind conditions to generate electricity. This corresponds to a share of 5.8% of all hours in 2019. However, on average, each wind farm generates its rated power in 17.6% of all hours in 2019. In total across all wind farms, this corresponds to 4704 full-load hours. To get a more detailed picture, Figure 6 shows histograms of the hourly output power at Gwynt y Mor (C) and Kriegers Flak (R). The U-shape visualizes the full-load hours on the right tail and the overpowered/underpowered times and low wind hours in the left tail. E.g., we observe more hours in the left tail at the Gwynt y Mor wind farm due to the higher proportion of lower winds observed in Figure 2.

Dependence Patterns
This section provides an overview of the correlation structure of synthetic power series from different locations. Such dependence analysis is crucial, e.g., for modeling and analyzing uncertainty in the sum of overall produced power and potential balancing effects of geographically more widely distributed installations. For onshore turbines, such effects are investigated, e.g., in [26,32,34,42]. In general, we observe that dependence and joint distributional information across units or time are increasingly considered by researchers contributing to the energy literature, such as, e.g., [43][44][45][46] in the forecasting context.
We study the correlation structure of the data to evaluate the overall potential to reduce the offshore wind power variability by considering wind farms in diverse geographical locations and with diverse technical specifications in Europe. Therefore, we first analyze the correlation between wind speeds at different locations before turning to the correlations of the resulting power output. Looking at both wind speed and power output enables us to disentangle the effect of the turbines' power curves from the effect of different wind speeds at geographical locations. The correlation matrix of wind speeds is shown in Figure 8. For clearer representation, the matrix elements are colored from dark blue (correlation is equal to −1) to dark red (correlation is equal to 1).
At first glance, the correlation of wind speed ranges from 0.11 up to values of 1.0, i.e., perfect correlation. It is notable that all correlations are positive, indicating a similar behavior of wind speeds over all locations, which is driven by synoptic-scale weather effects. Hence, no perfect balancing effects in energy production due to negative correlations in wind speeds can be expected. Moreover, we observe that the wind speed at les dYeu et de Noirmoutir (S) seems to be least dependent on the wind speeds of the other wind farms. This, by far the most southwestern location, is the only one in the Atlantic Ocean. Indeed, a comparison with the distance matrix of the wind farms reveals that the correlation is mainly dependent on the distances between the wind farms (the distance matrix of wind farms is shown in Appendix A7). This is also illustrated in Figure 9, which plots pairwise correlations of two sites against their distance. A fitted exponential model ρ ∝ exp(−distance/D), results in a decay parameter D = 668.66 km and an intercept of ρ = 1.02 for zero distance. Most interesting, this decay parameter is of similar size as in studies for onshore locations. For example, refs. [32][33][34] found exponential decay parameters of 305, 455, and 723 km, respectively. Here, refs. [32] used data from Texas, while ref. [33] used European data and ref. [34] focused on Germany. Note that these numbers should be compared with care, since the considered frequencies vary from 15 min data over hourly data to data from one 10 min average every 3 h in those studies. However, a characteristic difference we see compared to the onshore studies is that the intercept of the fitted relationship is nearly 1. This has been around 0.9 for the onshore studies. This result seems to be intuitive, since specific surface conditions or obstacles might reduce the correlation between neighbored onshore locations, while this does not hold for offshore locations, at least as long we abstract from the interaction between neighbored turbines. In the second step of the dependence analysis, we analyze the correlation of derived power signals from each wind farm (see Section 2 for the conversion of wind signals to power signals). The corresponding correlation matrix is shown in Figure 10. Here, a similar but slightly weaker correlation is observed. We explain this by the flattening effect of cut-in speeds and nominal wind speeds in the time series.  Figure 10. Correlation matrix of generated power of considered wind farms. Shown are pairwise correlation coefficients, each calculated from the two vectors of hourly production data from the respective sites. Again, for clearer representation, the matrix elements are colored from dark blue (correlation is equal to −1) to dark red (correlation is equal to 1).

$SSUR[LPDWH'LVWDQFHLQ>NP@ &RUUHODWLRQRI:LQGVSHHG
To sum up, the lower correlation values of power signals suggest that the geographic diversification and technical details of wind farms lead to a reduced variance in the total energy production, i.e., balancing effects across wind farms and a more stable energy production compared to the power output of a single wind farm. It now seems reasonable to look at the CDF of the total energy generation from all wind farms. This graphical representation of the data enables us to analyze the nature of the total energy generation in Europe, e.g., observing lower bounds of energy production or quantiles. The CDF of the total hourly electricity generation in 2019 is shown in Figure 11. The figure displays the percentage of total rated power versus the percentage of all hours in 2019. We observe that the 1% quantile (5%/10%) of total power generation is 1772.27 MW (3165.34 MW/4415.74 MW), which corresponds to 7.7% (13.7%/19.2%) of the rated power. This means that the considered wind farms generated more than 1.7 GW in 99% of the time and produced more than 13.8% of the total rated power in 95% of the time. Such a high availability of offshore wind energy production is particularly important at night, where other renewable energy sources such as solar power produce less electricity. 3HUFHQWDJHRI+RXUV 4XDQWLOH Figure 11. Cumulative distribution function of total power generation in percent and corresponding 5% quantile in 2019. Note that the CDF starts with a zero slope at the beginning, illustrating that there are no hours without power production.
In general, the observed results from 2019 seem to indicate no exception over the observed years in our study. In Figure 12, we show the 1%, 5%, and 10% quantile of total power generation from 1980 to 2019. It is interesting to see that these quantiles remain relatively stationary across years with no discernible trend structure or multi-year seasonal variation. Moreover, we find that there are no hours without power generation and only very few hours when the entire system collectively generates the rated power over the years (the annual rated power hours vary around 20 throughout the data). These very few observations of extreme situations in the system prove the overall balancing effect despite observed positive correlations.

3RZHULQ>0:@ 4XDQWLOH7RWDO3RZHU
4XDQWLOH7RWDO3RZHU 4XDQWLOH7RWDO3RZHU Figure 12. The 1%, 5%, and 10% quantiles of total power production from 1980 to 2019. Every line illustrates a quantile over the years derived from the yearly CDF of total power generation analogously to the CDF shown in Figure 11 for 2019.

Conclusions
In this paper, we compile an openly available data set covering 40 years of hourly wind speeds and synthetic historic production signals for the 29 biggest offshore wind farms in Europe. More precisely, the data set contains data from 1980 to 2019 of already operating wind farms as well as such under construction up to 2024. We provide detailed information about the currently installed or planned wind turbine types and give analytical expressions for the power curves. Furthermore, we present a first descriptive analysis of the data and the joint electricity generation of these current offshore wind farms in Europe. We find relative high volatility and intermittency at single locations with balancing potential when interconnecting spatially more distant locations, since dependency patterns between locations prove to weaken with growing distance. We explain in detail how the wind speed data set and production signals were generated based on the ERA5 data set [35], so that researchers making use of our data may easily integrate more current data as soon as available. The same holds for the planned extension of the historical period of the ERA5 data set back to 1950, which would then also open up the possibility of doing very long-term studies as well. Acknowledgments: The work was partly supported by the German Federal Ministry of Economic Affairs and Climate Action through the research project ProKoMo within the Systems Analysis Research Network of the 6th energy research program. Furthermore, we thank Tim Michael Seeberger for programming assistance and acknowledge support by the KIT-Publication Fund of the Karlsruhe Institute of Technology. We also thank three anonymous referees for their insightful comments, which have helped us to improve the paper.

Conflicts of Interest:
The authors declare no conflict of interest. Table A1. Information sources for wind farms.

Appendix B. Descriptive Statistics for Various Time Horizons
Appendix B.1. Tables

Appendix D. Distance Matrix of Wind Farms
We calculated the distance based on their positions assuming that the Earth is a sphere. Thus, we model the Earth with a radius of r = 6370 km and calculated the distance d(A, B) of two points A and B by For clearer representation, the matrix elements are colored from dark red (same position) to dark blue.