We begin by analysing wind vane stalling events at 10 m throughout the entire period, determining the frequency and associated meteorological conditions during and preceding the stalling event. Next, we determine which supplementary observations (features) are the most informative for the machine learning to identify stalling events. The selected features are then used by the supervised and semi-supervised methods to classify stalling events and their performance is evaluated.
4.1. Meteorological Conditions During Wind Vane Stalling
To better understand wind vane stalling at the Cabauw site and establish training and testing datasets, we begin by analysing the stalling at each height along the masts and across various temporal scales. Here, we identified wind vane stalling (detailed in
Section 3.2) using outlier detection. Between 2001 and 2022, a total of 1456 wind vane stalling events across all heights were reported (per height: 315, 206, 282, 213, 224, and 216 cases for heights from 10 to 200 m, respectively). The percentage of time that the wind vane at each height stalled throughout the entire 22 years is shown in
Figure 3a. In general, we observe that, at each height, the wind vane stalled approximately 3–4% of the time (cumulative ∼11–15 days annually). The relatively high frequency observed at 40 m is likely related to the base of the tower influencing the turbulence at that height. Nonetheless, as expected, stalling occurs most frequently at the 10 m level (4.54%) because the wind speed is lowest near the surface. Due to the higher frequency of events, this height is chosen for the remainder of our analysis because more stalling instances facilitate more effective machine learning from the data.
Figure 3b shows the relative occurrence of wind vane stalling at 10 m for each year. Distinct yearly variations are observable throughout the entire period. For instance, for more than half of the years, stalling occurred less than 5% of the time; meanwhile, stalling occurred up to 20% of the time during other years. Although stalling appears more frequent after 2014, no clear trend emerges from the yearly distribution. The years 2011, 2012, and 2022, which exhibit significantly more stalling (
p-values < 0.05 and z-scores (distance from the mean in standard deviations) ≥ 2), were reported as ‘extremely sunny’ [
29,
30]. In fact, 2022 set the record as the sunniest year in the Netherlands since measurements began in 1901 [
31]. The prevalence of sunny conditions suggests that these years experienced a higher frequency of fair weather, which is often associated with lower wind speeds. Fair weather typically results from high-pressure systems, in which cooler air subsides, inhibiting convection at the surface. As a result, stable atmospheric conditions develop, limiting or preventing horizontal air movement and leading to low wind speeds. Climate change may exacerbate such extreme conditions, causing more frequent low wind scenarios, which is highly relevant to various industries. However, we exclude these outlier years (2011, 2012, and 2022) from further analysis in order to focus on assessing more typical stalling behaviour.
A more discernible pattern emerges when examining seasonal variations.
Figure 3c shows the average relative wind vane stalling occurrence at 10 m for each month. Two distinct periods with increased wind vane stalling emerge: spring months to early summer (March–July), and even more during the late summer to early autumn (August–October). We observe that wind vane stalling is more likely during these periods, peaking significantly during September (
p-values < 0.05 and z-scores ≫ 2) where stalling occurs ± 20% of the month on average (cumulative ∼6 days). The increased frequency of stalling during these months may be attributed to the large-scale dynamics. For the Netherlands, both March–May and August–October are seasons with a higher tendency for fair weather conditions as high-pressure areas dominate across Southern-to-Central Europe [
32], favouring fair weather, and thus lower wind speeds to occur.
Diurnal variations in air flow are also expected to influence wind vane stalling. To this end,
Figure 3d shows the frequency of wind vane stalling at 10 m on average for each hour. We observe a parabolic-shaped diurnal pattern, such that wind vane stalling occurs most frequently (approximately 90% of the time) during nocturnal conditions (18–6 UTC). The example case from
Figure 2 supports the suggested relationship between low wind speeds, (nocturnal) atmospheric stability, and wind vane stalling at the 10 m level. To further investigate this relationship, we also consider additional in situ observations that can offer insight into the meteorological conditions surrounding the events.
The meteorological conditions preceding and during stalling events offer additional insight into the underlying causes.
Figure 4 displays the average vertical profiles of the measured wind speed (
U), wind direction (
), potential temperature (
), lapse rate (
), dew point depression (
), and Fog Stability Index (FSI). The black line indicates the average vertical profile when the stalling event occurs at 10 m, and the red line represents the average profile six hours prior to the stalling event. The average wind speed profile (
Figure 4a) during wind vane stalling at 10 m displays the characteristic logarithmic shape with height, where the lowest values are observed at the 10 m level, as anticipated. Notably, the average wind speed at each height leading up to the stalling event at 10 m is also low (U
m s
−1), and decreases further during the event, diminishing by roughly 2 m s
−1 within the lowest 20 m. Most importantly, at 10 m, the wind speed is typically less than 1 m s
−1 during the stalling event, suggesting that the primary cause of wind vane stalling can be attributed to low wind speed and that freezing/ice accumulation or other environmental factors are likely negligible. Furthermore, in
Figure 4b, we observe a high standard deviation in wind direction, suggesting that no distinct wind direction is prominent, prior to or during stalling events. This result emphasises that the reported wind direction values are less meaningful during these events.
As alluded in
Section 2, the prevalence of low wind speed depends on the atmospheric boundary layer conditions. Regarding the atmospheric stability,
and
(
Figure 4c,d) exhibit relatively cooler average temperatures and strong gradients near the surface when stalling occurs at 10 m. The average potential temperature at 10 m is too high for freezing to occur, further supporting that most stalling events are not caused by this mechanical restriction. Furthermore, we determined that the majority of the wind vane stalling instances at 10 m (94.43%) are associated with an average air temperature,
> 0°C.
Below 80 m, we observe stable conditions where the lapse rate is 0.10 K m−1. Moreover, relatively colder air (roughly 5 °C) is able to reside within this layer, pushing the warmer air aloft, creating stable (stratified) conditions. Additionally, we determined that the mean surface pressure during wind vane stalling is 1022 hPa. These findings corroborate that stable stratification of the lower atmosphere is a common condition for wind vane stalling at 10 m. By assessing the profiles six hours prior to the stalling event, we observe a clear development of this stability as the lapse rate increases by 0.05 K m−1.
These developing stable conditions allow for radiation fog formation, which can therefore serve as an indicator for wind vane stalling. This type of fog occurs in colder or nocturnal conditions with (strong) radiative cooling, provided that there is sufficient moisture available [
18]. The potential of fog formation is highlighted by the dew point depression and FSI (
Figure 4e,f). We observe that the average difference between air temperature (
) and dew point temperature (
) during wind vane stalling is 2.5 °C in the lowest 20 m (where
is generally between 0 and 10 °C). Therefore, air saturation is probable, as is resultant fog formation in the lower atmosphere. The FSI metric further supports the likelihood for fog formation as it decreases leading up to the wind vane stalling event, with negative or near-zero values indicating a very strong probability of fog formation. The probability for fog formation is corroborated by the fact that the visibility measured at 1.5 m is almost four times lower during wind vane stalling than six hours prior, decreasing from 25 km to only 6 km.
In the context of wind vane stalling, the observed relationship between low wind speeds, atmospheric stability, and fog formation is also consistent with previous research. In 2018, Izett et al. conducted a study of 254 fog events at the Cabauw site between 2012 and 2016 [
17], categorising them using the methodology outlined in Menut et al., 2014 [
33]. The authors classified the majority of fog events as radiation fog (63.4%), occurring under stable, stratified conditions in the lower atmosphere. Moreover, these fog events primarily occurred between 18 and 8 UTC, during the months of January, February, and March, as well as August, September, and October. Although wind vane stalling was less frequent during winter months, this pattern reflects the relatively high frequency of wind vane stalling that we observed in March, August, September, and October, also occurring between 18 and 8 UTC. Therefore, fog formation may also serve as an indicative variable for machine-learning methods towards identifying wind vane stalling events.
4.2. Feature Evaluation
We use an MI-score (see
Section 3.3) to evaluate which features (variables) are most effective in training the machine-learning methods to predict wind vane stalling at 10 m. We evaluated 12 different meteorological variables measured at heights ranging from 1.5 to 200 m (see
Table 1). Of the 60 evaluated features,
Figure 5 shows the ten highest-ranked features based on their MI-score alongside the mean, median, standard deviation, and inter-quartile range of the MI-score of all evaluated features. As expected, the variation in wind direction (standard deviation:
) and wind speed (
U) at 10 m are the most relevant determining features regarding the behaviour of the wind vane sensor at 10 m. However, the wind direction and speed at higher levels was not as indicative. In fact, we observe that the remaining most influential features are predominantly related to stability.
This established relevance of the stability parameters is consistent with the findings of the previous section, given that wind vane stalling was observed predominantly during (nocturnal) stable atmospheric conditions in the lower 20 m of the atmosphere. Furthermore, the MI-score suggests that, in addition to the 10 m level, stability features at the 20, 40, and 80 m levels are also related and can explain part of the variability in wind vane stalling at the 10 m level. Moreover, the relatively high MI-score of these features at different heights emphasises that the influence of stability is not limited to what is observable at the 10 m height of the wind vane and larger-scale processes are at play. These results corroborate observations from our statistical analysis in
Section 4.1. However, beyond the case study of wind vane stalling, our findings demonstrate that machine-learning methods can be applied to atmospheric phenomenon to identify relevant parameters which can then be cross-referenced for data quality assurance. Additionally, this knowledge can enhance outlier detection such that events identified outside of these conditions can be flagged as suspicious.
Though these ten variables are the highest-ranked, the MI-score itself also provides insight into the meteorological conditions surrounding wind vane stalling events. For instance, specific humidity, air pressure, and air temperature reported low to near-zero MI-scores, especially at the highest levels of 140 and 200 m. Also, visibility (meteorological optical range), which depends on fog formation, ranked 15th. These values suggest that wind vane stalling at 10 m is relatively independent of the temperature and humidity higher in the boundary layer. Additionally, when evaluating which features emerge as the most relevant for particular meteorological seasons (spring, summer, autumn, and winter), no significant differences were observed. As these ten features consistently emerge as dominant across all seasons, no significant differences in behaviour of the machine-learning methods are anticipated under varying seasonal meteorological conditions. However, different variables are expected to be indicative for other meteorological phenomenon, or even wind vane stalling at other heights, and these methods can provide useful insight.
Our findings confirm that anticipating wind vane stalling events goes beyond simply assessing the current status of the wind at 10 m. The variables with a high MI-score enhance our understanding of the meteorological conditions surrounding these events and can be applied with multi-class and one-class machine-learning methods. To this end, we select the features with an MI-score above the 25th percentile (Q1): and at 10 m, U at 10 and 20 m, FSI at 20 and 40 m, and potential temperature lapse rate () at 10 m.
4.3. Supervised Multi-Class Evaluation
We first assess the performance of five supervised multi-class machine-learning methods using the seven relevant features selected in the previous section. As outlined in
Section 3.3, each method is trained to classify stalling and non-stalling events using these features. During the training phase, the available data are incrementally balanced by retaining n-points of non-stalling data up to one hour (6 points), both before and after each stalling event, to obtain a balanced dataset with which we can evaluate the performance of the machine-learning methods. Furthermore, the division of the data into training and test datasets is varied. During training, the optimal configuration is determined through grid search, using the (incrementally balanced) data for each partitioning of the training and test datasets.
To illustrate the performance of the machine-learning methods during a wind vane stalling event,
Figure 6 displays a time series of wind speed and direction alongside the output from the machine-learning methods on 9 July, 2015. The figure also showcases our approach to incrementally balance the dataset where different time windows influence the results. This example utilises a typical 80/20 division of training and test data (80% training data, 20% test data) across all machine-learning methods with the best configured parameter combinations.
Figure 6 demonstrates that most of the machine-learning methods explored in this study are capable of successfully identifying a stalling event, with varying performance depending on how the dataset is balanced. More specifically, in this example, we observe that each of the five selected machine-learning methods correctly identify the wind stalling event, and GNB, RF, and LR exhibit the least classification errors. Including data points close to the wind vane stalling event can enhance the performance of most methods. However, KNN and SVM demonstrate greater difficulty in accurately identifying the stalling instance, either reporting its occurrence too often or failing to detect it altogether. It is important to note that the performance observed during this specific wind vane stalling event is not necessarily representative of all events. Therefore, we analyse the overall performance over 22 years using a set of metrics, which are used to evaluate the performance of the machine-learning methods with the best configured parameter combinations (summarised in
Figure 7).
Figure 7a,b show the performance metrics (Accuracy, Precision, Recall, and F1-score) for each of the five multi-class machine-learning methods, regarding their ability to classify points as non-stalling and stalling, respectively. These metrics describe the ability of the machine-learning methods to correctly identify whether a data point represents a wind vane stalling event or a non-stalling event. As mentioned, the performance of the machine-learning methods depends on how the training dataset is balanced (bars). With this balancing approach, as the number of data points retained before and after a stalling event increase (
n), the number of stalling cases remains constant while reported cases of non-stalling linearly increase. The shaded area (grey) represents the range of the values based on how the dataset was divided into training and test sets.
Overall, we observe that both the non-stalling and stalling performance metrics are significantly affected by incrementally balancing the dataset. In particular, for each of the machine-learning methods, the non-stalling performance metrics (
Figure 7a) significantly improve, approaching a value of 1, when more non-stalling points are included for training. In contrast, as more non-stalling events are retained, the performance metrics for identifying stalling events (
Figure 7b) worsen in most cases, some even approaching <0.5. This behaviour is expected when the dataset becomes more biased towards non-stalling events. Due to this clear impact on each method’s performance, we first consider a suitable balance before analysing these metrics in more detail.
Effective supervised multi-class methods require data balance, and therefore it is recommended to achieve a balance between stalling and non-stalling events. Training with a balanced dataset ensures that the machine-learning methods perform optimally in distinguishing between both of these classes. In this case, achieving this balance depends on optimally defining the time window before and after the stalling event, such that the methods can learn the distinction. Comparing the performance of the different methods in
Figure 7, we observe that the performance metrics for classifying both non-stalling and stalling events are balanced when retaining roughly two to three data points before and after the event (40 to 60 min window). We observe that when
n = 2–3, nearly all performance metrics for the five machine-learning methods exhibit a sharp increase or decrease for identifying non-stalling and stalling events, respectively. Therefore, with this data balance, many of the supervised multi-class methods demonstrate good performance in effectively distinguishing the data.
Although we observed that each supervised multi-class method generally benefited from a similar data balance, these different machine-learning methods yield varying performance results. Overall, we observe that the GNB method demonstrates the lowest performance, exhibiting a clear bias toward the majority class (non-stalling data points). The accuracy of identifying non-stalling cases increases, approaching 1, if trained with an imbalance. However, the accuracy of identifying stalling cases plummets from approximately 80% to 50%. The precision and F1-score show similar trends. In contrast to GNB, the KNN and RF methods demonstrate relatively stable performance as the dataset becomes more saturated with non-stalling data points. Even with the largest window (n = 6, total 120 min), most performance metrics are >0.75 for classifying both stalling and non-stalling cases. When the datasets are balanced (n = 2), KNN and RF identify stalling events with an accuracy of 0.76 and 0.82, respectively, demonstrating that both methods show potential for identifying wind vane stalling events. Furthermore, the F1-score of KNN and RF are 0.77 and 0.82, respectively. The relatively higher F1-score for RF indicates that this method is better at correctly identifying wind vane stalling events, while making fewer mistakes by missing or incorrectly identifying them. The LR and SVM methods perform similarly to KNN and RF for classifying non-stalling events; however, the F1-score indicates drastically poorer performance for classifying stalling events. Furthermore, both methods also exhibit a high sensitivity to the volume of data used for training and testing, as indicated by the shaded grey area.
Our results establish that while KNN and RF yield promising performance metrics, data imbalance poses a significant challenge affecting all five supervised multi-class machine-learning methods. While we demonstrate which supervised machine-learning methods have the most potential, this methodology may be insufficient for comprehensively addressing wind vane stalling because these events can be subtle and may manifest outside the predefined boundaries. Moreover, directly implementing these methods in a practical, real-time application along the Cabauw tower requires further development. We anticipate that extending these findings to wind vane stalling at other heights or meteorological phenomenon will depend on different meteorological factors which should be investigated with the MI-score and may yield different performance. Nonetheless, the development of a dedicated filtering system, where the KNN and RF methods function together, can improve the reliability and efficiency needed to meet our data quality standards.
We demonstrate that these machine-learning methods, when appropriately balanced, can significantly enhance the data quality assurance. By flagging suspicious events or corroborating outlier detection, these methods demonstrate considerable potential in addressing wind vane stalling and other similar atmospheric phenomena, ultimately enhancing data quality assurance. Still, we anticipate that the quality control for various meteorological data issues will also be constrained by data imbalance. Consequently, as an alternative approach to this obstacle, we analyse the performance of the one-class machine-learning method in the next section.
4.4. Semi-Supervised One-Class Evaluation
In contrast to the supervised learning methods explored in the previous section, data imbalance does not impact the semi-supervised One-Class Support Vector Machine (OCSVM) method. Moreover, this method learns exclusively from instances classified as stalling, which are considered inliers. The OCSVM method only has two parameters:
and
(refer to
Table 2). The parameter
determines the influence of outliers (non-stalling data points), whereas
describes the relative influence (dependence) of each data point [
21,
25]. Due to the single class, we evaluate the performance of OCSVM considering the relative mean error (normalised by the number of cases) for classifying stalling data points. We explore this method’s aptitude by varying the (
,
)-parameter combinations, as shown in
Figure 8.
Figure 8 shows that the number of points that are incorrectly classified as stalling is clearly impacted by the model parameters. We observe that as
increases the mean relative error increases. This trend can be attributed to the fact that larger values of
allow a larger influence of non-stalling data points, and therefore the method predicts outliers better than inliers. Consequently, for this dataset,
should be kept low (<0.5). The influence of
is also dependent on
. In general, we observe that
-values ≤ 1 yield a smaller mean classification error, but the relative influence of
becomes higher. As
approaches 1, each
value yields a relative classification error
. Increasing the value of
increases the relative dependency of the individual data points. In other words, more data points contribute to the overall performance of the model. Therefore, when combined with a higher
, the error is largest, ultimately leading to an overall decrease in this machine-learning method’s performance.
These results suggest that the semi-supervised one-class machine-learning method, tuned with < 0.2 and ≤ 1, could effectively identify stalling data points, with up to 90% accuracy. Furthermore, our findings highlight the importance of the model parameters: the effectiveness of a machine-learning method is not only based on the key features. Considering the classification accuracy, this semi-supervised method shows no significant improvement in performance compared to the two best-performing supervised multi-class methods (KNN and RF). Both supervised and semi-supervised methods can effectively identify the stalling data points. However, unlike supervised methods, this approach circumvents the issue of data imbalance, enabling the development of a more straightforward and practical application. Nonetheless, the one-class semi-supervised method is not without limitations. Due to the fact that this machine-learning method is exclusively designed and validated using the majority class (stalling), developing an effective and practical application still requires further exploration.
Ultimately, both supervised and semi-supervised methods can be developed and integrated at various stages of the data filtering process, improving quality assurance, helping to resolve wind vane stalling events and other meteorological data issues. Our results demonstrate that these methods can provide valuable insight for initial checks, generating warnings based on a range of incoming observations, facilitating manual filtering and improving overall confidence in the data quality.