Validation of Improved Sampling Concepts for Offshore Wind Turbine Fatigue Design

Fatigue damage is a design-driving phenomenon for substructures of offshore wind turbines. However, fatigue design based on numerical simulations is quite uncertain. One main reason for this uncertainty is scattering offshore conditions combined with a limited number of simulations (samples). According to current standards, environmental conditions are sampled using a deterministic grid of the most important environmental conditions (e.g., wind speed and direction, significant wave height, and wave period). Recently, there has been some effort to reduce the inherent uncertainty of damage calculations due to limited data by applying other sampling concepts. Still, the investigation of this uncertainty and of methods to reduce it is a subject of ongoing research. In this work, two improved sampling concepts—previously proposed by the authors and reducing the uncertainty due to limited sampling—are validated. The use of strain measurement data enables a realistic estimate of the inherent uncertainty due to limited samples, as numerical effects, etc., are excluded. Furthermore, an extensive data set of three years of data of two turbines of the Belgian wind farm Northwind is available. It is demonstrated that two previously developed sampling methods are generally valid. For a broad range of model types (i.e., input dimensions as well as degrees of non-linearity), they outperform standard sampling concepts such as deterministic grid sampling or Monte Carlo sampling. Hence, they can reduce the uncertainty while keeping the sampling effort constant, or vice versa.


Introduction
The share of offshore wind energy in overall energy production is growing rapidly. Even though the first subsidy-free offshore wind auction bids have been made recently, costs are still relatively high compared to other renewable energies such as onshore wind energy [1]. On the one hand, the waiver of subsidies is provoked by the expected increase in electricity prices. On the other hand, it is forecast that the levelized cost of energy (LCOE) will decrease [2]. Hence, to enable subsidy-free offshore wind energy, optimizations of the whole structure are needed. Here, the improvement of the substructure-in most cases monopile substructures/foundations-is an important possibility to reduce costs.
For substructures made of steel, in most cases, the fatigue lifetime is decisive for the design. To calculate fatigue damages of substructures in the design phase, numerous time-domain simulations are needed. Commonly, these simulations are conducted with state-of-the-art aero-hydro-servo-elastic codes such as FAST [3] or HAWC2 [4]. Current standards [5] define that these simulations should mirror changing environmental conditions (EC) at the precise site of a wind turbine. Commonly, this is achieved by sampling EC using a deterministic grid. In academia, mostly, one-dimensional grids (only wind speed) are applied, leading to a "quasi-deterministic" fatigue damage calculation, which is roughly done as follows [6][7][8]: The range of wind speeds in power production (e.g., 3 m s −1 to 25 m s −1 ) is split up into steps (so-called bins) of 2 m s −1 or less. All other EC are set to constant values within these wind speed bins. Turbulent wind and irregular waves are considered in a stochastic process in each bin by conducting simulations of an overall length of one hour (normally divided into six 10 min simulations) per bin, as it is proposed by standards [5]. In industry and some academic studies [9][10][11], the same approach but with finer binning and bins for more EC (e.g., wind direction, wave height and period) is used to increase the accuracy and to reduce the uncertainty at the cost of higher computing times [9].
For the simplified one-dimensional academic approach, it was shown by Müller and Cheng [12] that it cannot reproduce the scattering observed in offshore fatigue measurements. Moreover, it leads to highly uncertain approximations of the lifetime damage due to a very limited number of samples [6]. In the present contribution, we only investigate the uncertainty introduced by so-called finite sampling (i.e., limited cases/samples) and the selection of these samples during the design phase. Other sources of uncertainty, for example the error in Miner's rule or the uncertainty of stress concentration factors, are not the topic of this investigation. The multi-dimensional grid-based approach-used in industry-leads to more accurate approximations of the long-term fatigue damage. However, it was repeatedly shown that it is numerically inefficient [10,11]. Therefore, either meta models [9,13,14], replacing time-domain simulations and reducing the computing time of each model evaluation significantly, are applied or alternative sampling concepts, improving the computational efficiency by reducing the number of model evaluations while conserving the same level of uncertainty, are needed. In this work, the latter approach is investigated.
Two different types of improved sampling concepts can be differentiated. First, there are those approaches that still consider a deterministic grid, but only use the most important grid points. Stieng and Muskulus [15,16] determine the most important grid points in a computationally expansive preliminary study using a full grid. Velarde and Bachynski [17] estimate the relevant grid points by applying a simple sea state-damage correlation. While Stewart [9] assumes in his "probability sorting method" that "important" is equivalent to "probable".
The second type of improved sampling concepts are probabilistic approaches taking scattering EC into account, which makes simulations more realistic and deals with the deviations to measurement results [12]. The most frequently applied probabilistic sampling concept is Monte Carlo sampling (MCS). In the context of fatigue design of offshore wind turbine substructures, MCS has recently been investigated by various authors [10,11,16,18,19]. On the one hand, probabilistic approaches can reproduce measurement results more precisely [20] and MCS is more suitable for high-dimensional input spaces [10,11]. On the other hand, fatigue damage approximations using probabilistic approaches and limited samples are quite uncertain [18,19] and MCS becomes inefficient (i.e., converges slowly) for highly non-linear model functions [10]. Therefore, to limit computing times while uncertainties are kept on an adequate level, improved sampling techniques are needed. For example, Hübler et al. [18] and Stieng and Muskulus [16] concentrate their sampling on the input subspace leading to high damages or Müller and Cheng [21] apply quasi-random sampling based on Sobol' sequences.
One shortcoming of all previously mentioned studies is the fact that they are all based on pure simulation results. Although the use of simulated data for design purposes is not surprising, since measurement data is not available at this stage, still, the general validity of these numerical findings is not warranted. Are these sampling methods valid independent of the considered system (i.e., turbine, substructure, site, etc.), used simulation code, dimension of the input space, etc.? This is the reason the objective of this study is to validate two sampling concepts that were previously developed by Hübler et al. [18]. In this context, "validation" does not mean that their benefit is proven mathematically, but their performance is assessed using a broad range of wind turbine specific applications. For this purpose, first, a generic test function is applied, and second, real offshore strain measurements are used. The measurements are treated like simulation results for fatigue design. This may seem slightly odd in the first place, since in the design phase, measurements are not available. Nevertheless, this has two major advantages. First, although measurement data is not perfect due to measurement errors, purely numerical effects and simulation/model errors are excluded by using "realistic simulation data" (i.e., measurement data). Second, using measurement data, an extensive number of samples (144 samples per day and several years of measurements) for different turbines is available. This enables an assessment of these concepts not only for small sample sizes-as it is common in academia-but also for large data sets-as done in industry approaches, so that the convergence of the approaches can be analyzed. Convergence studies through simulations would hardly be feasible in an academic context. After all, this enables an assessment of the general validity of numerical approaches for different systems, sample sizes, and independent of simulation code specifics.
After a general introduction to damage calculation and extrapolation (Section 2) and a brief presentation of different sampling concepts (Section 3), a test example is analyzed (Section 4). This test example mimics the real fatigue behavior and helps to get a first insight in the performance of the various sampling concepts for changing input space dimensions and degrees of non-linearity. Subsequently, real offshore measurement data of the Belgian wind farm Northwind is taken as "realistic simulation data" (Section 5). Damage calculations using this data are conducted and the uncertainty of this calculation is analyzed. For this purpose, bootstrapping in combination with different numbers of samples and various sampling concepts is used for the damage extrapolation. Therefore, it is possible to assess the general validity of reduction concepts, proposed in literature.

Damage Calculation
The fatigue damage calculation procedure for offshore wind turbine substructures is a two-stage process. First, the short-term damage of a single short (normally 10 min) measurement or simulation is determined. Second, the long-term or lifetime damage is calculated by extrapolating short-term damages of several samples (i.e., measurements or simulations) to the entire lifetime. Here, the choice of the samples, and therefore the sampling concept, is essential. Improved sampling concepts are the focus of this work and are discussed in the following sections. In this section, the general procedure of calculating and extrapolating fatigue damages-being applied in this work and being mostly independent of the sampling concept-is presented. Some shortcomings and alternatives are briefly discussed. While the short-term fatigue damage calculation procedure is relatively standardized, there is no consensus on how to extrapolate short-term values to long-term lifetime damages.

Short-Term Damage
An overview of the standard short-term damage calculation procedure is given in Figure 1.   Figure 1. Flowchart presenting the short-term damage calculation procedure based on strain measurements or stress simulation (FA: fore-aft, SS: side-to-side). Explicitly stated methods such as Hooke's law are common examples but could also be replaced by more accurate methods.
If the damage calculation is based on strain measurements, as in this work (c.f. Section 5.1), the measured axial strain signals ( z ) are used to calculate tensile stresses (σ z ) according to Hooke's law: Here, E is the Young's modulus of steel, and for measurement positions above mean sea level, it is assumed that longitudinal and radial stresses are negligible, as hydrostatic loads do not act. In case of simulation data, normally, stresses are directly available, so that this first step is not necessary. With three sensors around the circumference of the monopile (substructure), the tensile stresses of all sensors can be used to calculate the bending moments in two perpendicular directions (M north and M west ) and the normal force (F N ) using the following equation: with θ i s being the angle of the i s th sensor to the northern direction and A and S being the cross-section area and section modulus, respectively. For a cylindrical monopile, the section modulus is defined as follows: where r out and r in are the outer and inner radius of the monopile, respectively. Subsequently, the stresses parallel (fore-aft (FA)) and perpendicular (side-to-side (SS)) to the wind direction can be computed by rotating the moments (M north ) and (M west ) by an angle φ to the wind direction.
Knowing FA and SS stresses, a rainflow cycle counting of stress ranges (∆σ z,i ) is performed for both directions. Here, ∆σ z,i is the stress range of the ith band (also called block or bin) in the factored stress spectrum (cf. Annex A of Eurocode 3 [22]). The number of required stress bands (n σ ) was determined in a preliminary convergence study. Fulfilling the requirements by current standards [23,24], n σ = 500 bands-logarithmically spaced between 10 kPa and 1 GPa-are used. A conservative approach of using the higher cycle count of the FA and SS direction, i.e., this is equivalent to a single dominant wind direction, is applied. More sophisticated approaches, taking the wind direction into account, can reduce conservatism, but should not influence the performance of the sampling concepts, and therefore, are not investigated in this work.
Subsequently, an extrapolation of the loads to other positions (i.e., heights) can be performed for all stress bands. Extrapolations are needed, if measurements at critical positions are not possible or have not been done. In this case, for example, the so-called "virtual sensing concept" of Maes et al. [25] that is based on a modal approach can be applied. In this work, we do not apply any load extrapolation, but evaluate the fatigue damage at the precise measurement position. This position might not be the design-driving location (e.g., the most critical weld). However, for the present analysis, this is not essential, since we do not analyze the design itself.
For nominal stresses at the position of interest (here: the measurement position), an overall safety factor (SF) consisting of several sub-factors is applied to get a representative value for the concentrated stresses at the specific detail. First, a stress concentration factor for the present detail is used (here: SCF = 1.0 according to a recommended practice of Det Norske Veritas [23]). Second, a correction for high wall thicknesses-the so-called size effect (SE) correction-is applied [23]. Third, a material safety factor (here: MSF = 1.25 due to limited accessibility, and therefore, no inspections [26]) is used. In this work, these three factors are taken from the original industry design of the monopiles. Additionally, when measuring with welded-instead of glued-fiber Bragg gratings (FBG), a correction of the reduced sensitivity of welded FBG (FSF = 0.9) [27] is needed. In addition, last, an additional safety factor (ASF) covering unexpected behavior, etc., and being easily adjustable is introduced. All these factors can be regarded as uncertain themselves. However, since-as stated in the introduction-only the uncertainty due to limited sampling is topic of the investigation, this is not relevant here. The corrected stress ranges-for the stress concentration at the present detail-can be calculated using: The last step to calculate the damage of each sample (10 min measurement) is the application of a linear damage accumulation according to the Palmgren-Miner rule, and the application of S-N curves according to the standards [23] and the state of the art [6,9,12]. Here, DNV S-N curve D in air is applied. The fatigue damage for each measurement time series (D j ) according to the Palmgren-Miner rule can be calculated as follows: where i and j are indices for the stress band and the time series, respectively, and n ij is the number of cycles associated with the stress range ∆σ cor,ij . The endurance (N i ; number of maximum cycles) for the same stress range is obtained from the corresponding S-N curve.

Long-Term Damage
Having calculated the short-term damages, one of the most uncertain and most unreliable aspects of the lifetime damage calculation, which has not been sufficiently investigated so far, follows next: the extrapolation to the long-term. Here, only one method of extrapolating damages D j to a lifetime damage D LT in combination with different sampling concepts presented in the next section is applied. Alternative extrapolations are mentioned, but for details, it is referred to, for example, Hübler et al. [28], who investigate the effect of alternative approaches on extrapolated lifetimes for service life extensions. To extrapolate fatigue damages, all damages (D j ) are sorted into several (M) bins of EC. Depending on the sampling concept, the dimension of the binning (d g ) can differ. For example, for pure MCS, no bins are applied (d g = 0), while for a deterministic grid approach, one (wind speed; d g = 1) or more (wind direction, significant wave height, and wave period; d g = 4) dimensions of the binning are possible. For each of these bins, the mean value of all J(m) corresponding damage values is calculated. Alternatively, the 90th percentile can be used for a more conservative estimate (c.f. Hübler et al. [28]). Each bin has a certain occurrence probability (Pr(m)) that is either given in design documents (as it is for this work) or must be determined by using environmental measurement data (e.g., SCADA wind data of several years). The mean damage of each bin is now weighted with the corresponding occurrence probability. Finally, to get the lifetime damage (D LT ), the weighted mean damages of all M i d bins must be summed up and multiplied by a time factor. From this, it follows: Finally, the overall lifetime L in years is the inverse of the lifetime damage multiplied by the design lifetime of 20 years:

Damage Uncertainty
It has been shown how fatigue damages can be calculated (c.f. Section 2.1) and extrapolated (c.f. Section 2.2) using a specific number of samples (i.e., damage values D j ). As offshore conditions are scattering, some uncertainty of the extrapolated design lifetimes that depends on the number of used samples is introduced. In literature, the number of simulations required to achieve acceptable uncertainty levels was investigated in various computational studies [6,18,19]. However, first, this number depends significantly on the sampling concept [10,18]. In addition, second, all these studies are limited to simulation data. Therefore, an isolated investigation of the performance of the sampling concepts regarding fatigue damage uncertainties without including purely numerical effects or model errors has not been done before. That is why in this work, various sampling concepts (see Section 3) are assessed and validated using measurement data. For this purpose, the convergence of the sampling concepts for increasing numbers of samples is analyzed. The methodology to determine the resulting uncertainty of the different sampling concepts using measurement data is the following: 1. Measure strain values at real offshore wind turbine substructures. 2. Calculate hot spot stresses for a relevant position using Equations (1) to (4). 3. Calculate damages of all 10 min measurements using Equation (5). 4. Sample short-term damages using the chosen sampling concept (Section 3). 5. Extrapolate short-term damages to a lifetime value using Equations (6) and (7). 6. Repeat steps 4 and 5 N BT = 10,000 times using bootstrapping (i.e., sampling from all available data (here: about 120,000 usable samples) with replacement).

Deterministic Grid (DG)
The standard sampling approach proposed by the standards is a uniform, deterministic rectangular grid of EC (variables). For all d variables, the design space is separated into M bins (e.g., the wind speed range is split up into bins of 2 m s −1 or less). For all M d combinations of variables (i.e., each bin), at least one sample is needed. In each bin, all EC are kept constant at their mean value making the approach quasi-deterministic. This means that for example in the wind speed bin 6.5-8.5 m s −1 , the wind speed is always 7.5 m s −1 . DG becomes very inefficient for high input dimensions (d), as M d samples are required.

Monte Carlo sampling (MCS)
MCS is a standard approach for probabilistic simulations that generates all samples (J Σ ) by applying the (dependent/joint) statistical distributions of all variables. For linear systems, this has the advantage of a constant convergence rate of J −0.5 Σ independent of the input dimension. However, for highly non-linear systems, MCS becomes inefficient, since rarely occurring events determine the converge behavior [10].

Equally Distributed Monte Carlo Sampling (EMCS)
EMCS is a probabilistic version of DG. Just as for DG, a grid of EC is set up. However, the grid dimension (d g ) is chosen to be smaller than the input dimension d. Normally, it is set to one. In contrast to DG, in each bin, EC are not kept constant, but MCS is applied. Hence, in the wind speed bin 6.5-8.5 m s −1 , wind speed values between 6.5 m s −1 and 8.5 m s −1 are possible and other EC are sampled from their (dependent) distributions. For d g = 0, EMCS becomes MCS, and for d g = d, it is DG. For more details regarding EMCS, it is referred to Hübler et al. [18].

Damage Distribution-Based Monte Carlo Sampling (DMCS)
DMCS by Hübler et al. [18] is based on EMCS, but its idea is to focus samples on load cases leading to high damages by applying the damage distribution (i.e., weighted lifetime damage versus wind speed) to the sampling. This is comparable to importance sampling. For example, if 15% of the damage is produced by a bin, 15% of the samples should be drawn from this bin. In theory, DMCS improves the accuracy significantly, as more data is available where it is influential. However, since the damage distribution is normally not known in advance, in a first step, an approximated damage distribution (prior function) must be determined by sampling, for example, N approx = 20 cases in M = 14 bins (e.g., 20 × 14 = 280 EMCS cases). As an approximation based on only 20 samples per bin is not precise enough, Bayesian statistics is applied to update the initially approximated damage distribution after each new sample. It becomes apparent that DMCS is more suitable for larger sample sizes being typical for industry applications, since the estimation of the damage distribution improves with every additional sample. The DMCS procedure is the following: 1. Sample N approx × M cases (e.g., 280 EMCS cases). 2. Calculate the prior function (i.e., initial damage distribution), being the weighted mean damage of N approx cases in each bin. 3. The next sample (j + 1) is generated according to the damage distribution (i.e., prior function).
This means that it is sampled from the bin (m j+1 ) where the quotient of the number of samples and the number of samples required by the prior is minimal: m j+1 = arg min . 4. Calculate the damage of the sample (D j+1 ) and update the damage distribution. 5. Continue with steps 3 and 4 until the desired number of overall samples (e.g., J Σ = 1000) is generated.

Reduced Bin Monte Carlo Sampling (RBMCS)
To be independent of an approximated damage distribution, another concept is RBMCS of Hübler et al. [18]. It is also based on EMCS and reduces the number of bins by merging bins with similar physical and generalized damage behavior. For example, for small monopiles with a wind-dominated behavior and highest loads around rated wind speed, low wind speed bins can be merged, and for high wind speed bins, the bin sizes are slightly increased: <4.5 m s −1 , 4.5-6.5 m s −1 , 6.5-8. RBMCS reduces the number of bins and leads to more cases in each bin. As in each bin random sampling (MCS) is conducted, RBMCS converges to the "correct" value for a sufficient number of samples. Since there are more cases in bins with similar behavior, the uncertainty in each bin can be reduced. Nonetheless, it can be a challenge to determine the optimal combination of merged bins.

Test Example
To gain a better insight in the performance of the various sampling techniques, a test function is analyzed in a first step, before measurement data is used for the validation (c.f. Section 5).

Theory
The test function is based on the test function of Graf et al. [10]. It mimics real fatigue behavior by representing a damage distribution and is capable of modelling different input space dimensions as well as degrees of non-linearity. The test function is defined as: where x is the d-dimensional input vector and m mat mimics the real fatigue behavior by introducing a "material" (Wöhler) exponent. All inputs are sampled from independent, truncated Weibull distributions (scale parameter a = 3, shape parameter b = 1.12, x max = 24) and are weighted with the factor 1 2 i d −1 . The decreasing importance of the inputs while the dimension increases also mimics real fatigue behavior, as the influence of the first random inputs (e.g., wind speed) is more pronounced than of others (e.g., wind direction). For all five sampling concepts (c.f. Section 3), this function is evaluated for dimensions up to six and exponents up to 20. Increasing dimensions represent a growing number of influential EC and higher exponents lead to a more pronounced non-linear model behavior. The grid of DG is equidistant between zero and x max . For EMCS, d g = 1 and M = 15 are used. For DMCS, d g = 1, N approx = 5, and M = 15 are applied. The merging of the bins for RBMCS (d g = 1) depends on m mat , since the overall behavior changes with an increasing exponent. M = 15 bins are merged to eight bins. For small m mat , higher bins are merged, and vice versa.

Results
The performance of the five sampling concepts for different dimensions and degrees of non-linearity is displayed in Figure 2.  Graf et al. [10] have already shown that for linear systems and low dimensions (Figure 2a), DG performs tremendously well, since it converges with J −1 Σ . MCS requires the highest number of samples to converge. However, for higher dimensions, DG becomes very inefficient (Figure 2b), as its convergence rate reduces to J −1/d Σ , while MCS still converges with J −1/2 Σ . The challenge of fatigue damage extrapolations is that the dimension of the input space is high, and in addition that single events (samples) can determine the whole fatigue behavior. Hence, the model is high-dimensional and non-linear (Figure 2c,d). The problem of DG for high dimensions has already been discussed and is also visible for non-linear models, although it becomes less relevant. Increasing d are uncritical for MCS, but high m mat reduce the convergence rate, so that MCS is not efficiently applicable to highly non-linear models (Figure 2d). Alternatives to DG and MCS are the probabilistic bin-based approaches. EMCS and RBMCS combine advantages of DG and MCS. Therefore, they perform similar to MCS for high dimensions, where DG is not applicable. If DG is performing better than MCS (e.g., low dimensions) or the damage distribution (here: f (x)) does not resemble the sampling distributions (here: Weibull), as it is the case for high m mat , EMCS and RBMCS outperform MCS. DMCS has the advantage of concentrating samples in "important" bins. Its convergence is comparable to the other bin-based approaches for linear models, as in this case, its importance sampling does not differ significantly from uniform sampling in EMCS. For non-linear models, DMCS outperforms all other approaches.
To summarize, as the performance of DG and MCS depends on the dimension of the input space and the degree of non-linearity of the model, both cannot generally be applied efficiently. EMCS leads to similar results as MCS, while-for a direct comparison-the similarity of the sampling and the damage distribution is relevant. In most cases, RBMCS is quite similar to EMCS. However, when equipped with well-founded expert knowledge to merge bins appropriately, RBMCS can be quite beneficial as demonstrated in Hübler et al. [18]. Finally, DMCS always converges relatively fast and can be regarded as the most appropriate sampling concept for this test function.

Validation
In the previous section, sampling concepts were tested using a test function. That yields a better insight in their performance. However, for a profound validation, measurement data is needed.

Measurement Set-Up
In this work, measurement data of a large measurement campaign in the Belgian Northwind offshore wind farm is used. Data of this measurement campaign was used in several previous investigations. For detailed information regarding raw data and data quality, it is referred, for example, to Weijtjens et al. [29].
Northwind is located about 37 km off the Belgium coast (see Figure 3a), and has moderate water depths of 16 m to 29 m. The wind farm consists of 72 Vestas V112-3 MW turbines. For all turbines, monopile foundations with diameters of 5.2 m are used. Since October 2014, strain measurements of two instrumented turbines are available. In this work, we use three years of data from 1st November 2014 to 31st October 2017. The two turbines, instrumented by OWI-lab, are marked in Figure 3b. The positions of the turbines on both sides of the wind farm enable an analysis of slightly different wind conditions, as free inflow conditions are given for different wind directions. Moreover, both turbines are located at different water depths. This leads to slightly different designs of the two monopiles, and therefore, to varying eigenfrequencies (see Table 1). Both turbines are instrumented, inter alia, with seven FBG as strain gauges spread over two different levels (see Figure 3c,d). The strain gauges are positioned at the interface between tower and transition piece (TP) and the interface between TP and monopile. Here, the lower measurement layer between TP and monopile is used. For this layer, FBG are welded to the wall making a correction due to reduced sensitivities necessary [27]. The chosen configuration of the strain gauges (spread around the circumference) and a temperature compensation enable a determination of bending moments at these interface levels. In addition to the strain data, metocean data of various sources is available. In the first instance, wind data (e.g., wind speeds or turbulence intensities) can directly be derived using SCADA data of the turbines. Wave conditions are measured at several locations around the wind farm. High-frequency wave data (sampling frequency of 1 Hz) is available from a wave radar in Belwind. The position of the wave radar is marked in Figure 3a. However, as the wave radar was not measuring during the whole measurement period (it was removed in June 2016), additional information of the offshore high voltage substation (OHVS) on "Bligh Bank" (also marked in Figure 3a) has to be used, if no data of the wave radar is available. SCADA data is also used to exclude time periods with a curtailed turbine and down-times.

Resulting Uncertainty
Short-term damages of all measured strain signals are calculated using the procedure in Section 2.1. If 10 min damages of wind speed bins are analyzed separately, the scattering of these values shows qualitatively the amount of uncertainty in short-term damages. Figure 4a displays for a wind speed of 16.5 m s −1 < v s < 18.5 m s −1 and 3 years of data (about 5000 samples for this wind speed bin) the occurrence frequency of fatigue damages (D j ). All damages are normalized with the mean value of all damages of these wind speeds. Damages scatter significantly and reach values of more than 2.5 times the mean value with a probability of 5%. High outliers, reaching values of more than 100 times the mean value, are not shown, but can significantly influence the overall fatigue damage behavior and demonstrate a non-linear model behavior comparable to the analyzed test function.
The uncertainty in lifetime damages can be assessed by calculating the probability density function (PDF) of the fatigue lifetime. It can be determined by using the long-term extrapolation in Section 2.2 and the bootstrapping in Section 2.3. Exemplary PDF for different numbers of samples and EMCS are presented in Figure 4b. Due to reasons of confidentiality, lifetimes are normalized with the mean lifetime of using the whole 3-year data (µ 3years ).  It is demonstrated that using only a few samples (commonly done in academia [6,12]) leads to high uncertainties. At the expense of higher computing times, the uncertainty can be reduced by increasing the number of samples, as it is done in industry. This trade-off is the reason improved sampling concepts are investigated in this work.
In most cases, the full lifetime PDF (see Figure 4b) is not the focus, as the main interest is to guarantee safe designs. Hence, the lowest lifetime approximations (e.g., the 1st percentile) are more relevant. Therefore, the 1% error (∆L 1 ) is defined as the deviation of the lifetime at the 1st percentile (marked in Figure 4b) to the "real" lifetime (estimated using the whole three years of strain data): For EMCS and six samples per bin, the 1% error is about 40% (∆L 1,J(m)=6 = 1−0.61 1 = 0.39; see Figure 4b). Hence, a higher number of samples and/or another sampling concept are recommended. The number of samples per bin is not the best comparative value, as it does not incorporate the number of bins. Therefore, in the remainder of this work, we always relate to the overall number of samples (J Σ ).
To gain a deeper understanding of the uncertainty due to finite sampling for the present measurement data, the reasons for it are determined in the next section.

Reasons for High Uncertainty
In general, relatively rare but highly damaging events are the reason for high uncertainties due to finite sampling (i.e., slow convergence with increasing number of samples). These events are only covered, if a high number of samples is used. On the one hand, highly damaging single events (with low occurrence probabilities, for example, a 100-year storm) do not contribute a lot to the overall damage [21]. On the other hand, a relatively small amount of load situations leads to a high proportion of the overall damage. Hence, it is important to determine reasons for highly damaging but not too unlikely situations for the Northwind monopiles. Figure 5 gives an insight in scattering and the EC that are responsible for the damage. High wind speeds lead to high mean damages. However, these values do not scatter significantly, and furthermore, are not responsible for the highest damage values. The highest "outliers" occur for high turbulence intensities and wind speeds around rated wind speed (cf. Figure 5b). For these wind speeds, the maximum thrust loads occur, as blades have not been pitched out yet. Furthermore, for high turbulence intensities, fluctuations in the wind speed are more pronounced and cannot always be covered by relatively slow pitch controller actions. However, normal operation is not responsible for maximum damages. Rotor stops, shown in Figure 6, increase damages dramatically, as at least one very large cycle is introduced by the rotor stop event. These rare stopping events at rated wind speed drive the uncertainty for the present measurement data. Other conditions increasing the uncertainty are, for example, highly scattering (significant) wave heights or turbulence intensities (cf. Figure 5a).  Hence, current results make clear that there can be various reasons for sampling induced uncertainty. In Hübler et al. [18], the main reason is wave resonance. In contrast, for the present measurement data, controller actions are decisive. If the reason for the high uncertainty is known, these rare cases can be excluded from the standard load case (DLC 1.2 in IEC 61400-3 [5]) and be treated separately. For normal shut downs, etc., this is usually done. They are not included in DLC 1.2, but have their own (deterministic) load cases (DLC 4.1, etc.). However, it is not possible to exclude all rare, highly damaging cases. Therefore, sampling concepts such as RBMCS and DMCS that are designed to reduce the uncertainty compared to standard approaches are needed to keep the sampling effort small. In the next section, the previously developed probabilistic bin-based approaches by Hübler et al. [18] (RBMCS and DMCS) are validated and compared to DG, MCS, and EMCS.

Convergence of Improved Sampling Concepts
For an assessment of the different concepts, the lifetime (L) is calculated according to Section 2. For this calculation, different numbers of overall samples (J Σ ) are generated using the various sampling concepts and the available 3-year data (about 120,000 usable samples). The bootstrap procedure is repeated 10,000 times to estimate the statistical variation in L. Last, the lifetime distribution is normalized by the "real" lifetime (using 3 years of data): To evaluate the performance of RBMCS and DMCS, the same procedure is conducted for samples that are generated using DG, MCS, and EMCS. For EMCS, RBMCS, and DMCS, d g = 1 and M = 14 is used. For RBMCS, the applied merging of the bins to M = 9 is explained in Section 3.5. For DMCS, N approx = 20 is applied. DG uses a grid dimension of d g = 5 (i.e., wind speed and direction, (significant) wave height and period, and turbulence intensity). However, for high-dimensional grids, using measurement data, data is not available for all grid points. Very unlikely EC combinations-not occurring during the three years of measurements and making up more than 50% of all grid points for M ≥ 3-are not taken into account. Hence, the applied (sparse) DG approach for measurement data is only partly comparable to the (full) standard DG for simulation data.
To illustrate the general performance of the two improved concepts, Figure 7 shows the lifetime PDF that are generated using RBMCS and DMCS compared to EMCS. As in Hübler et al. [18], EMCS serve as reference here. The uncertainty is significantly reduced, and more reliable lifetime approximations are achieved, while the number of samples remains constant. Especially for DMCS, it becomes apparent that the lowest lifetimes are much less uncertain. As samples are concentrated on bins with high damages, low lifetime outliers are effectively removed. For an objective assessment of the concepts, the convergence of three evaluation criteria-L (or rather its deviation from µ 3years ; ∆µ), ∆L 1 , and the coefficient of variation (CV)-for an increasing number of overall cases for all concepts and both turbines is illustrated in Figures 8 and 9. The coefficient of variation is defined as the ratio of the standard deviation (σ) to the mean value (µ): It is introduced, as for a reduced (biased) mean value, the 1st percentile is closer to the "real" mean by definition. Here, a biased mean value can have two reasons. First, if the sample size is relatively low, it might not have been converged yet. Second, the binning procedure can influence the mean value, if measurement data is used in combination with long-term design EC distributions for the bin probabilities (cf. Section 2.2). In this case, bin probabilities do not correspond completely to the occurrence probability of the samples in each bin. Therefore, errors at high percentiles can have less informative value. In addition to the illustration of the convergence in Figures 8 and 9, some quantitative results are given in Table 2.     Table 2. Errors and uncertainties in lifetime using different concepts and 1000 as well as 10,000 overall samples; measurement data of H05. Changes (of CV and ∆L 1 ) refer to the reference approach (EMCS). First, it becomes apparent that the increase of the number of cases is a possible but not very effective way to reduce uncertainties. Then, for both turbines, uncertainties and possible reductions of the two improved concepts are quite similar. Although the sparse version of DG theoretically improves the performance of DG, as "impossible" EC combinations are not taken into account, DG requires a much higher number of samples to achieve adequate uncertainties. Furthermore, it converges to a different mean value due to the varied binning procedure. This biased mean value only occurs for measurement data, where the occurrence probability of each bin does not completely match the long-term EC distributions. As postulated before [10,11], MCS outperforms DG. The good performance of MCS is supported by similar sampling and damage distributions (see Figure 10a). For such similar distributions, it has already been shown with the help of the test function (cf. Figure 2c) that MCS can outperform EMCS and RBMCS. Hence, EMCS still converges faster than DG, but does not perform as good as MCS. RBMCS is an improvement of EMCS and performs better than MCS for the present measurement data. Improvements of ∆L 1 of about 10% and 30% compared to MCS and EMCS, respectively, are achieved. DMCS shows the best convergence behavior. It reduces ∆L 1 by around 20% and 40% compared to MCS and EMCS, respectively. This may not sound that much. However, having in mind that for plain MCS, a reduction of ∆L 1 of 25% means doubling the sampling effort, these are considerable reductions. Furthermore, the slower convergence of the mean value (bias of the mean value for small sample sizes described by Hübler et al. [18]) is not that relevant. It is mainly apparent for samples sizes just above N approx × M = 280, where only a few samples are generated according to the damage distribution. The main disadvantage of DMCS is that it requires some samples to generate an initial prior. For sample sizes smaller than J Σ = N approx × M, DMCS is equal to EMCS.

Comparison of the Performance for Simulation and Measurement Data
Finally, the performance of MCS, EMCS, RBMCS, and DMCS is evaluated for previously published simulation results [18] and the present measurement data (treated as "realistic simulation data") and for the different designs (OC3 monopile and different Northwind designs). DG is not taken into account, as first, no simulation data is available. Second, DG is hardly applicable to measurement data. In addition, third, DG features a poorer performance compared to all other concepts. The variety of applications can demonstrate the general performance of the concepts. Table 3 gives an overview of the reduction concepts based on the previous computational analysis [18]. Two different sample sizes (academia and industry) are shown. Table 3. Errors and uncertainties in lifetime using different concepts and 1000 as well as 10,000 overall samples; simulation data [18]. Changes (of CV and ∆L 1 ) refer to the reference approach (EMCS). In contrast to measurement data, for simulation data, MCS does not perform better than EMCS. This clarifies the relevance of sampling distributions in comparison with the damage distribution, and therefore, the missing general applicability of MCS. For the simulation data, the distributions are shown in Figure 10b, where D(m) is more similar to Pr EMCS (m).

EMCS DMCS RBMCS MCS EMCS DMCS RBMCS MCS
For RBMCS, the results are quite consistent again. Reductions of the 1% error of about 20% compared to EMCS and MCS are achieved without introducing any bias.
For DMCS, the reductions of ∆L 1 are above 50% independent of the sample size. This approximately matches the measurement results for high numbers of samples (e.g., J Σ > 500). The "poorer" performance for measurement data, if a small number of samples is chosen (e.g., J Σ = 300), is a result of the previously mentioned disadvantage of DMCS (i.e., it needs some samples for the prior creation). For simulation data and DMCS, there is a pronounced bias of the mean value especially for low numbers of samples. To understand the bias for simulation data, its reason is briefly explained. As DMCS concentrates its sampling on bins with high damages, it cannot represent damages correctly, if there are high outliers in bins with low median damages. In this case, too low damages are estimated by the prior, and therefore, only a few samples are generated for these bins. This leads to an insufficient coverage of high outliers (see Hübler et al. [18] for more information). For the simulation data of the OC3 monopile, there are such outliers-due to wave resonance-in bins of low median damages (e.g., low wind speeds). For the measurement data of the wind-dominated Northwind monopiles, this is not the case. Hence, the reason for the bias of DMCS is not the simulation data itself, but the wave resonance. In any case, the bias diminishes for a sufficient number of samples.
To conclude: The performance of RBMCS is robust by decreasing the 1% error by about 20-30% compared to EMCS independent of other conditions. DMCS can achieve significantly higher error reductions of 40-50% compared to EMCS and 20-50% in comparison with MCS. Hence, it achieves the best results for the test function, simulation data, as well as measurement data. However, for very small sample numbers, the performance is limited. For wave resonance-dominated structures, biased results can occur, if the number of cases is chosen too small to guarantee a convergence of the mean value. Both problems are not relevant for large sample sizes that are common in industry.

Benefits and Limitations
As the fatigue design of substructures of offshore wind turbines is a time-consuming process that must be accurate at the same time, the present study focuses on the validation of previously developed sampling methods that reduce the uncertainty due to finite sampling. For this validation, no actual simulation data is used but, first, a test function, and second, real offshore strain measurements that are considered to be "realistic simulations". The test function helps to demonstrate the general performance for different types of problems. It is shown that the concepts perform well for a broad range of input dimensions as well as degrees of non-linearity. Measurement data has, on the one hand, the advantage that purely numerical effects (e.g., errors of the aero-elastic model) are excluded. Furthermore, a very extensive number of samples for two different structures is available, which would not have been feasible through simulations in an academic context. Hence, the current work can assess the proposed methods for different systems (turbines, substructures, sites, etc.) and sample sizes (convergence study) and independent of any simulation code specifications. On the other hand, measurement errors occur and must be treated carefully. For example, spikes in strain signals due to measurement errors must be filtered or removed manually.
The present outcomes underpin the recently repeatedly formulated presumption [6,12,18] that standard recommendations concerning lifetime extrapolations (in academia) lead to unacceptable uncertainties due to finite sampling (cf. J(m) = 6 in Figure 4b) that are in the same order of magnitude as other important types of uncertainty that are not the topic of this investigation (e.g., the error of Miner's rule [31]).
Regarding sampling methods that reduce the uncertainty in the damage approximation without increasing the number of samples (i.e., computing time), two concepts are validated. Previously, they performed well for simulation data and one set-up. Here, they are tested for measurement data, different turbine designs, and a test function and are compared to standard approaches such as MCS or deterministic grid sampling. It is shown that these two concepts are generally valid independent of the structure (e.g., OC3 or Northwind monopile) and reason of the uncertainty (e.g., wave resonance or rotor stops). They lead to 1% error reductions of about 30% and 40% compared to EMCS for RBMCS and DMCS, respectively. A reduction of the 1% error of 25% is tantamount to halving the required samples while keeping the 1% error constant. DMCS enables higher uncertainty reductions compared to RBMCS, but it can be slightly biased for smaller sample sizes. An important factor for industry is that the benefit of DMCS compared to other sampling concepts grows with increasing sampling effort.

Conclusions
The main objective of this study is the validation of sampling concepts that reduce the uncertainty due to finite sampling. This validation is performed by assessing the concept performance for a broad range of applications and data types. For this purpose, real offshore measurement data is used and treated as "realistic simulation data". This enables an assessment of previously developed methods while simulation errors, etc. are excluded and by using a large data set for the validation.
It is shown that uncertainties are high, depend significantly on the design of (monopile) substructure and turbine, and are considerably influenced by controller actions. Therefore, quantitative conclusions concerning the uncertainty are difficult. However, commonly applied approaches in academia should be reconsidered, as deterministic approaches cannot reproduce the real uncertainty due to rare, highly damaging situations. A possibility to reduce the sampling effort is to exclude rare, highly damaging events from probabilistic analyses and to add them as additional deterministic damages. For normal shut-down events, etc. this is commonly done. For wave resonance, this is unusual and more challenging.
To overcome the problem of uncertain damage extrapolations, in industry, the number of simulated load situations is much higher than in academia at the expense of high computational costs. Alternatives to the plain increase of sampling are valuable. Such an alternative are advanced sampling concepts. Without adding additional computation effort, it is possible to increase the reliability of the damage extrapolation (error reductions of up to 50% or reducing the sampling effort to approximately a fifth) by applying advanced sampling concepts. Here, the relatively general validity of such concepts was proven. It is recommended to use DMCS-especially for larger sample sizes in industry. RBMCS can be an alternative for smaller sample sizes, for example in academia. In future, such concepts could replace the inefficient grid-based approach that is recommended by the standards.
After all, based on present findings and previous simulation-based results, it can be concluded that lifetimes of designs-relying on very limited simulation data-can be relatively uncertain. Therefore, the consideration of measurement-based fatigue lifetime calculations is valuable. Certainly, it is hardly possible to use measurements during the design phase before having built a turbine. However, for lifetime extension, current standards [32] already recommend to use measurement data, if possible. Hübler et al. [28] show that measurement-based lifetime approximations are not completely certain as well. Still, if available, strain measurements for lifetime approximations are definitely a valuable addition to design simulations and should be further investigated.

Conflicts of Interest:
The authors declare no conflict of interest.

Nomenclature
The following nomenclature is used in this manuscript: