A Review of Wind Clustering Methods Based on the Wind Speed and Trend in Malaysia

: Wind mapping has played a signiﬁcant role in the selection of wind harvesting areas and engineering objectives. This research aims to ﬁnd the best clustering method to cluster the wind speed of Malaysia. The wind speed trend of Malaysia is affected by two major monsoons: the southwest and the northeast monsoon. The research found multiple, worldwide studies using various methods to accomplish the clustering of wind speed in multiple wind conditions. The methods used are the k-means method, Ward’s method, hierarchical clustering, trend-based time series data clustering, and Anderberg hierarchical clustering. The clustering methods commonly used by the researchers are the k-means method and Ward’s method. The k-means method has been a popular choice in the clustering of wind speed. Each research study has its objectives and variables to deal with. Consequently, the variables play a signiﬁcant role in deciding which method is to be used in the studies. The k-means method shortened the clustering time. However, the calculation’s relative error was higher than that of Ward’s method. Therefore, in terms of accuracy, Ward’s method was chosen because of its acceptance of multiple variables, its accuracy, and its acceptable calculation time. The method used in the research plays an important role in the result obtained. There are various aspects that the researcher needs to focus on to decide the best method to be used in predicting the result.


Introduction
Wind clustering plays an important role in determining the various aspects of the research objective, such as energy, engineering, and public safety.Therefore, the usage of the relevant clustering method is basically determined by the objective of the study and the parameters involved in the study.The sensitivity of the data also plays an important role in determining the method of clustering.It is important for the researcher to have an expectation of what the result should be and will be so that the method can be used efficiently.
This paper focuses on a comparison of the clustering methods used by researchers in terms of wind speed clustering.The areas considered in this paper are in Malaysia, Qatar, France, Iran, Turkey, the United States, India, South Africa, Switzerland, and Columbia.
The winds in Malaysia are influenced by two monsoon seasons: the southwest monsoon from late May to September and the northeast monsoon of Peninsular Malaysia from November to March.The heavy rain to the east of Peninsular Malaysia and of western Sarawak is caused by the northeast monsoon, whereas the southwest brings drought to the nation [1].
Figure 1 shows the northeast monsoon storms; the east of Peninsular Malaysia and Mersing are the windiest areas of Peninsular Malaysia.Therefore, according to various wind energy potential research studies in Malaysia, Mersing is often the best location for wind farming [2].wind energy potential research studies in Malaysia, Mersing is often the best location for wind farming [2].
Figure 1.The direction of northeast and southwest monsoon in Peninsular Malaysia [1].
The monitoring of the wind speed trend is crucial for the prediction of future events or in seeing the continuity of the wind supply in certain areas.For example, the mapping conducted and adopted by the Malaysian Standard Code of Practice on Wind Loading for Building Structure uses mean wind speed.The standard is used by engineers in Malaysia, especially mechanical engineers and those involved with civil structures, to predict the wind speed in such areas as telecommunication antenna deployment.The mean wind speed usage may increase in a given year, as previously reported by Young in 2011.Figure 2 shows the recommendation by the Malaysian Standard on Basic Wind Speed with regard to a mean wind speed of 33.5 m/s [3].The monitoring of the wind speed trend is crucial for the prediction of future events or in seeing the continuity of the wind supply in certain areas.For example, the mapping conducted and adopted by the Malaysian Standard Code of Practice on Wind Loading for Building Structure uses mean wind speed.The standard is used by engineers in Malaysia, especially mechanical engineers and those involved with civil structures, to predict the wind speed in such areas as telecommunication antenna deployment.The mean wind speed usage may increase in a given year, as previously reported by Young in 2011.Figure 2 shows the recommendation by the Malaysian Standard on Basic Wind Speed with regard to a mean wind speed of 33.5 m/s [3].
Energies 2023, 14, x FOR PEER REVIEW 2 of 27 wind energy potential research studies in Malaysia, Mersing is often the best location for wind farming [2].
Figure 1.The direction of northeast and southwest monsoon in Peninsular Malaysia [1].
The monitoring of the wind speed trend is crucial for the prediction of future events or in seeing the continuity of the wind supply in certain areas.For example, the mapping conducted and adopted by the Malaysian Standard Code of Practice on Wind Loading for Building Structure uses mean wind speed.The standard is used by engineers in Malaysia, especially mechanical engineers and those involved with civil structures, to predict the wind speed in such areas as telecommunication antenna deployment.The mean wind speed usage may increase in a given year, as previously reported by Young in 2011.Figure 2 shows the recommendation by the Malaysian Standard on Basic Wind Speed with regard to a mean wind speed of 33.5 m/s [3].The global trends show that wind speed is increasing.Based on research in 2011, the global wind speed is increasing, indicating that extreme events are growing faster than the mean condition.The wind speed of most of the world's oceans has increased by at The global trends show that wind speed is increasing.Based on research in 2011, the global wind speed is increasing, indicating that extreme events are growing faster than the mean condition.The wind speed of most of the world's oceans has increased by at least 0.25% to 0.5% per year.The strongest increasing trend was found in the southern hemisphere, and the northern hemisphere, especially the central North Pacific, shows a negative trend in wind speed.The wind speed increase in the central North Pacific was less than 0.25%, and some areas show a negative trend.As shown in Figure 3 below, the area surrounding Malaysia is also experiencing an increasing trend, especially in the southwest of Malaysia, where the southern Indian Ocean is located [4].A study conducted in 2019 confirmed the above research by Young in 2011.As shown in Figure 4a, the research found that the global mean annual wind speed had increased for the previous ten years and that the pattern was increasing yearly.The Asian mean annual wind speed is also showing an increasing pattern.However, the wind speed in the Asian region began to increase earlier than the global speed.As shown in Figure 4b, the increase in wind speed in the Asian region started as early as 2002, whereas the global mean annual wind speed has been increasing since 2010.The research uses the diagnostic statistic for regression, which includes the goodness of fit, R 2 , and the Pearson correlation coefficient, P. A Pearson correlation coefficient of less than 0.001 was considered be satisfactory in this study.However, the research found oscillation patterns that decreased the global wind speed; therefore, according to this research, wind energy production may decrease in the future [5].A study conducted in 2019 confirmed the above research by Young in 2011.As shown in Figure 4a, the research found that the global mean annual wind speed had increased for the previous ten years and that the pattern was increasing yearly.The Asian mean annual wind speed is also showing an increasing pattern.However, the wind speed in the Asian region began to increase earlier than the global speed.As shown in Figure 4b, the increase in wind speed in the Asian region started as early as 2002, whereas the global mean annual wind speed has been increasing since 2010.The research uses the diagnostic statistic for regression, which includes the goodness of fit, R 2 , and the Pearson correlation coefficient, P. A Pearson correlation coefficient of less than 0.001 was considered be satisfactory in this study.However, the research found oscillation patterns that decreased the global wind speed; therefore, according to this research, wind energy production may decrease in the future [5].
The global trends show that wind speed is increasing.Based on research in 2011, the global wind speed is increasing, indicating that extreme events are growing faster than the mean condition.The wind speed of most of the world's oceans has increased by at least 0.25% to 0.5% per year.The strongest increasing trend was found in the southern hemisphere, and the northern hemisphere, especially the central North Pacific, shows a negative trend in wind speed.The wind speed increase in the central North Pacific was less than 0.25%, and some areas show a negative trend.As shown in Figure 3 below, the area surrounding Malaysia is also experiencing an increasing trend, especially in the southwest of Malaysia, where the southern Indian Ocean is located [4].A study conducted in 2019 confirmed the above research by Young in 2011.As shown in Figure 4a, the research found that the global mean annual wind speed had increased for the previous ten years and that the pattern was increasing yearly.The Asian mean annual wind speed is also showing an increasing pattern.However, the wind speed in the Asian region began to increase earlier than the global speed.As shown in Figure 4b, the increase in wind speed in the Asian region started as early as 2002, whereas the global mean annual wind speed has been increasing since 2010.The research uses the diagnostic statistic for regression, which includes the goodness of fit, R 2 , and the Pearson correlation coefficient, P. A Pearson correlation coefficient of less than 0.001 was considered be satisfactory in this study.However, the research found oscillation patterns that decreased the global wind speed; therefore, according to this research, wind energy production may decrease in the future [5].In 2003, a study was conducted on the coastline of Peninsular Malaysia.The study focuses on analyzing the annual vector mean wind speed and direction according to two seasons, i.e., the northeast and southwest monsoons.The wind direction was northeast during the northeast monsoon season and southwest during the southwest monsoon season [6].
Research conducted in 2015 by Kok et al. found that the dynamics of the wind stress system had an important influence on the physical characteristics of the sea.The study used a wind stress curl to examine the mechanism responsible for the formation of the thermal front during both Malaysian monsoon seasons [7].
The positive and negative values of the wind stress curl cause cyclonic and anti-cyclonic motion in the northern hemisphere.This action causes divergence of the convergence in the surface layer of seawater.Therefore, the cooler or warmer water from the deep rises and replaces the diverging or converging water.This results in the upwelling and downwelling of the seawater.This upwelling is caused by the wind, which makes the water close to shore cooler [8].
Therefore, with regard to all of the wind characteristics mentioned, there is a need for wind trend monitoring and clustering, especially in Malaysia.Factors such as global warming have increased the temperature of the sea, causing the fluctuation in the global wind speed [9].When wind standards commenced in 2002 in Malaysia, the need for wind clustering was foreseen; wind clustering can increase the accuracy of wind mapping and wind forecasting in Malaysia.
It is important for engineers and wind experts to able to see the wind trend and clustering according to objectives such as those considering area or demand.Each objective can show different results, which also depend on the method of clustering used.This paper aims to investigate the best method to cluster the wind trend.The specific objective is to determine the best method to cluster the wind trend in relation to the Peninsular Malaysia and Borneo regions.

Wind Speed Trend Observation
Wind speed observation has been conducted by the researchers based on various objectives.The research which uses the method of observation of wind speed trends is that of the wind energy researchers.Wind energy research requires wind trend observations to ensure the continuity of the wind supply that powers the wind harvesting equipment.
Research to evaluate the wind energy potential in Peninsular Malaysia was conducted from 2007 to 2009 by Masseran   In 2003, a study was conducted on the coastline of Peninsular Malaysia.The study focuses on analyzing the annual vector mean wind speed and direction according to two seasons, i.e., the northeast and southwest monsoons.The wind direction was northeast during the northeast monsoon season and southwest during the southwest monsoon season [6].
Research conducted in 2015 by Kok et al. found that the dynamics of the wind stress system had an important influence on the physical characteristics of the sea.The study used a wind stress curl to examine the mechanism responsible for the formation of the thermal front during both Malaysian monsoon seasons [7].
The positive and negative values of the wind stress curl cause cyclonic and anticyclonic motion in the northern hemisphere.This action causes divergence of the convergence in the surface layer of seawater.Therefore, the cooler or warmer water from the deep rises and replaces the diverging or converging water.This results in the upwelling and downwelling of the seawater.This upwelling is caused by the wind, which makes the water close to shore cooler [8].
Therefore, with regard to all of the wind characteristics mentioned, there is a need for wind trend monitoring and clustering, especially in Malaysia.Factors such as global warming have increased the temperature of the sea, causing the fluctuation in the global wind speed [9].When wind standards commenced in 2002 in Malaysia, the need for wind clustering was foreseen; wind clustering can increase the accuracy of wind mapping and wind forecasting in Malaysia.
It is important for engineers and wind experts to able to see the wind trend and clustering according to objectives such as those considering area or demand.Each objective can show different results, which also depend on the method of clustering used.This paper aims to investigate the best method to cluster the wind trend.The specific objective is to determine the best method to cluster the wind trend in relation to the Peninsular Malaysia and Borneo regions.

Wind Speed Trend Observation
Wind speed observation has been conducted by the researchers based on various objectives.The research which uses the method of observation of wind speed trends is that of the wind energy researchers.Wind energy research requires wind trend observations to ensure the continuity of the wind supply that powers the wind harvesting equipment.
Research to evaluate the wind energy potential in Peninsular Malaysia was conducted from 2007 to 2009 by Masseran et al. at 10 wind stations.The research, which focuses on the wind speed persistence in Peninsular Malaysia, is based on hourly data.The research found that for Peninsular Malaysia, the hourly wind speed for the wind station exhibits stationarity state.The smallest hourly wind speed observed at the Chuping station and Mersing station showed its suitability for the generation of energy due to its hourly wind trend.Therefore, the research shows the importance of wind trend observation in wind energy research methodology.Figure 5 below shows the wind speed trend for one week at Ipoh wind station, Perak [10].
Energies 2023, 14, x FOR PEER REVIEW 5 of 27 focuses on the wind speed persistence in Peninsular Malaysia, is based on hourly data.The research found that for Peninsular Malaysia, the hourly wind speed for the wind station exhibits stationarity state.The smallest hourly wind speed observed at the Chuping station and Mersing station showed its suitability for the generation of energy due to its hourly wind trend.Therefore, the research shows the importance of wind trend observation in wind energy research methodology.Figure 5 below shows the wind speed trend for one week at Ipoh wind station, Perak [10].The wind trend also uses research conducted in Qatar by Aboobacker in 2021.The research uses monthly mean data to simulate the trend of the wind speed and to further estimate the wind power produced in the area [11].The research focuses on the wind around the Arabian gulf coast and focuses on the Qatar peninsula.The research found that the highest wind speed was located in offshore Ruwais.Offshore Ruwais was found to be the windiest location and to have the highest mean wind speed.Table 1 below shows the wind speed statistics at the research locations from 1979 to 2018 [11].The wind trend also uses research conducted in Qatar by Aboobacker in 2021.The research uses monthly mean data to simulate the trend of the wind speed and to further estimate the wind power produced in the area [11].The research focuses on the wind around the Arabian gulf coast and focuses on the Qatar peninsula.The research found that the highest wind speed was located in offshore Ruwais.Offshore Ruwais was found to be the windiest location and to have the highest mean wind speed.Table 1 below shows the wind speed statistics at the research locations from 1979 to 2018 [11].The trend observations were also conducted by research which showed that there were similarities in the onshore and offshore wind trends, as shown in Figure 6.However, the two windiest stations were Ras Laffan and Ruwais.The research using mean wind speed trend observations was similar to the current research in terms of the finding of the strongest wind recorded in the area.Therefore, the method of observation by trend was applicable in finding the windiest area or the area with the strongest wind.The trend observations were also conducted by research which showed that the were similarities in the onshore and offshore wind trends, as shown in Figure 6.Howev the two windiest stations were Ras Laffan and Ruwais.The research using mean wi speed trend observations was similar to the current research in terms of the finding of t strongest wind recorded in the area.Therefore, the method of observation by trend w applicable in finding the windiest area or the area with the strongest wind.In 2012, Tiang and Ishak studied the wind speed at the measurement site of Bay Lepas, Pulau Pinang, from January to December 2008.The study used wind trend obs vation to assess the potential wind energy in Pulau Pinang.By observing the trend of t wind speed, the researchers were able to find the windiest period in Pulau Pinang.Bas on the findings, the maximum wind speed in Pulau Pinang was achieved in Septemb and the slowest was recorded in November.Using the trend observation, the research were able to determine the months in Pulau Pinang that were the windiest and had t In 2012, Tiang and Ishak studied the wind speed at the measurement site of Bayan Lepas, Pulau Pinang, from January to December 2008.The study used wind trend observation to assess the potential wind energy in Pulau Pinang.By observing the trend of the wind speed, the researchers were able to find the windiest period in Pulau Pinang.Based on the findings, the maximum wind speed in Pulau Pinang was achieved in September, and the slowest was recorded in November.Using the trend observation, the researchers were able to determine the months in Pulau Pinang that were the windiest and had the highest wind speed; these were May, July, and September.The causes of the higher wind speed period were the southwest monsoon season and the geographical location of Pulau Pinang.Figure 7 below shows the monthly mean wind speed trend in 2008 from July to October [12].highest wind speed; these were May, July, and September.The causes of the higher wind speed period were the southwest monsoon season and the geographical location of Pulau Pinang.Figure 7 below shows the monthly mean wind speed trend in 2008 from July to October [12].However, in terms of the engineering purposes, the wind trend observation focused more on the sudden spike in wind speed and the highest wind speed recorded in the research area.The research conducted by Shanmugasundaram et al. in 1998 was based on the tropical cyclone wind condition which occurred in June and December 1996.The research came out with a wind trend observation of the cyclone which indicated the highest mean and maximum wind speed recorded during the event.The wind speed trend observation helped the researchers to locate the maximum wind speed during the event and to calculate the damping ration increase for the 52 m steel lattice tower.Figure 8 below shows the mean and maximum wind speeds during the cyclone of the year 1996 [13].The research which was based on the air pollution also used the wind trend analysis to simulate the severity of the pollution affecting the area.The direction and speed of the wind plays an important role in the air pollution effect.In 2016, Sokolov et al. conducted a cluster analysis of the atmospheric dynamics and pollution transport in the coastal area of industrialized Dunkerque in northern France.The research aimed to determine the However, in terms of the engineering purposes, the wind trend observation focused more on the sudden spike in wind speed and the highest wind speed recorded in the research area.The research conducted by Shanmugasundaram et al. in 1998 was based on the tropical cyclone wind condition which occurred in June and December 1996.The research came out with a wind trend observation of the cyclone which indicated the highest mean and maximum wind speed recorded during the event.The wind speed trend observation helped the researchers to locate the maximum wind speed during the event and to calculate the damping ration increase for the 52 m steel lattice tower.Figure 8 below shows the mean and maximum wind speeds during the cyclone of the year 1996 [13].
highest wind speed; these were May, July, and September.The causes of the higher wind speed period were the southwest monsoon season and the geographical location of Pulau Pinang.Figure 7 below shows the monthly mean wind speed trend in 2008 from July to October [12].The research which was based on the air pollution also used the wind trend analysis to simulate the severity of the pollution affecting the area.The direction and speed of the wind plays an important role in the air pollution effect.In 2016, Sokolov et al. conducted a cluster analysis of the atmospheric dynamics and pollution transport in the coastal area of industrialized Dunkerque in northern France.The research aimed to determine the trajectories in the context of pollution transport.The trajectories were based on the largest and most dispersed areas of low wind speeds, which make the pollution worse.The data of this research were based on the meteorological data of the wind speed and its direction and pollution measurements.The wind trend observation was visualized based on the wind rose.The wind rose modeling was successful in showing the trend in terms of the direction and the wind speed at Maregraph station.Figure 9 below shows the wind rose modeling for Maregraph station from 1st May to 1st October 2006 [14].
trajectories in the context of pollution transport.The trajectories were based on the largest and most dispersed areas of low wind speeds, which make the pollution worse.The data of this research were based on the meteorological data of the wind speed and its direction and pollution measurements.The wind trend observation was visualized based on the wind rose.The wind rose modeling was successful in showing the trend in terms of the direction and the wind speed at Maregraph station.Figure 9 below shows the wind rose modeling for Maregraph station from 1st May to 1st October 2006 [14].Therefore, the wind trend observation requirement is based on the objective of the research.The wind trend observation can assist with multiple factors and can contribute to the objective of the research.However, wind trend observation for a longer period may require grouping or clustering to ease the analysis and to localize the wind trend according to the area.

Linkage-Ward Clustering Method
The probabilistic wind speed clustering was used in the study cases at Khaaf, Iran, in 2018 [15].Azizi et al., reported using the Linkage-Ward clustering method to cluster the wind speed in the area.The research reported that the usage of the Ward clustering method was higher in accuracy compared to the k-means method.The Ward method, however, was more complex than the k-means method.For two years, the study used the measured wind speed time of 60 min in the wind stations around Binalood, Iran.The wind stations vary in height, soil, and distance to residential areas.The focus of the study was to select the proper site to install the wind turbine in Binalood.Therefore, the study focuses on the windiest area, which can be correlated with the current study.Although the study also used the Linkage-Ward clustering method instead of k-means, the Linkage-Ward clustering method required even more computational effort to solve.
The research found that the Linkage-Ward clustering method was the most common and accurate for use in the study.The method calculated the dissimilarity between clusters based on the centroid of the cluster, as shown in ( 1) where dik, djk, dij are the pairwise distances between the clusters i and k, j and k, and i and j. i,j,k are the indexes of the clusters.ni, nj, nk are the numbers of members within clusters i, j, and k, respectively.Therefore, the wind trend observation requirement is based on the objective of the research.The wind trend observation can assist with multiple factors and can contribute to the objective of the research.However, wind trend observation for a longer period may require grouping or clustering to ease the analysis and to localize the wind trend according to the area.

Clustering Wind Speed 2.2.1. Linkage-Ward Clustering Method
The probabilistic wind speed clustering was used in the study cases at Khaaf, Iran, in 2018 [15].Azizi et al., reported using the Linkage-Ward clustering method to cluster the wind speed in the area.The research reported that the usage of the Ward clustering method was higher in accuracy compared to the k-means method.The Ward method, however, was more complex than the k-means method.For two years, the study used the measured wind speed time of 60 min in the wind stations around Binalood, Iran.The wind stations vary in height, soil, and distance to residential areas.The focus of the study was to select the proper site to install the wind turbine in Binalood.Therefore, the study focuses on the windiest area, which can be correlated with the current study.Although the study also used the Linkage-Ward clustering method instead of k-means, the Linkage-Ward clustering method required even more computational effort to solve.
The research found that the Linkage-Ward clustering method was the most common and accurate for use in the study.The method calculated the dissimilarity between clusters based on the centroid of the cluster, as shown in (1) where d ik , d jk , d ij are the pairwise distances between the clusters i and k, j and k, and i and j. i, j, k are the indexes of the clusters.n i , n j , n k are the numbers of members within clusters i, j, and k, respectively.
where a and b are defined as (2), and c = 0 in the Linkage-Ward clustering method.a and b are the parameters, which depend on the cluster size to determine the clustering algorithm, with a distance between clusters of d ij .
The clusters which have the lowest increase in distance between the cluster centroids (1) are combined.The Ward method uses the objective function in the sum of the squares from the points to the centroids of the clusters.Figure 10 below shows the step-by-step algorithm of Linkage-Ward clustering.
where a and b are defined as (2), and c = 0 in the Linkage-Ward clustering method.a and b are the parameters, which depend on the cluster size to determine the clustering algorithm, with a distance between clusters of dij.
The clusters which have the lowest increase in distance between the cluster centroids (1) are combined.The Ward method uses the objective function in the sum of the squares from the points to the centroids of the clusters.Figure 10 below shows the step-by-step algorithm of Linkage-Ward clustering.The calculation above will result in the lowest increase in the cost function of (1) and in the combination.The method uses the objective function in the sum of the squares from the points to the centroids of the clusters.Figure 11  The researchers found the centroid of the cluster where the study was able to find the mean of the wind speed earlier in the research.This is a reverse method to find the centroid of the cluster and may affect the result.Figure 12 below shows the cluster centers of the measured wind speed.The calculation above will result in the lowest increase in the cost function of (1) and in the combination.The method uses the objective function in the sum of the squares from the points to the centroids of the clusters.Figure 11   The researchers found the centroid of the cluster where the study was able to find the mean of the wind speed earlier in the research.This is a reverse method to find the centroid of the cluster and may affect the result.Figure 12 below shows the cluster centers of the measured wind speed.The number of clusters was chosen by calculating the error of the cluster's centroid and its member.As expected, a small number of clusters brings out the dissimilar object The number of clusters was chosen by calculating the error of the cluster's centroid and its member.As expected, a small number of clusters brings out the dissimilar object group.The optimal number of clusters is important to ensure the effectiveness and the accuracy of the data.The Euclidian error between each cluster is calculated as in (3).
where N Cluster is the number of clusters, n j is the number of members within cluster j, respectively, x is each observation in the dataset, and c i is the centroid of cluster i.
The error calculation found that the minimum error obtained for this research was four clusters, as the calculation showed that only an 8% relative error was found.Therefore, the research used four clusters as the basis of the clustering for the dataset.Figure 13  group.The optimal number of clusters is important to ensure the effectiveness and the accuracy of the data.The Euclidian error between each cluster is calculated as in (3).
where NCluster is the number of clusters, nj is the number of members within cluster j, respectively, x is each observation in the dataset, and ci is the centroid of cluster i.
The error calculation found that the minimum error obtained for this research was four clusters, as the calculation showed that only an 8% relative error was found.Therefore, the research used four clusters as the basis of the clustering for the dataset.Figure 13 below shows the error calculation result in determining the number of clusters.Azizi et al. found that from the four clusters created, cluster 2 had the higher probability compared to the other clusters, at 38%.The higher probability occurrence suggests that cluster 2 is more suitable for wind farming.Figure 14 below shows the probability of occurrence of each cluster.Azizi et al. found that from the four clusters created, cluster 2 had the higher probability compared to the other clusters, at 38%.The higher probability occurrence suggests that cluster 2 is more suitable for wind farming.Figure 14 below shows the probability of occurrence of each cluster.Azizi et al. found that from the four clusters created, cluster 2 had the higher probability compared to the other clusters, at 38%.The higher probability occurrence suggests that cluster 2 is more suitable for wind farming.Figure 14 below shows the probability of occurrence of each cluster.

k-Means Approach for Wind Clustering
The annual wind speed patterns can be grouped when the study area is the same.Yesilbudak et al. conducted a clustering analysis of multidimensional wind speeds for 75 provinces in Turkey.The method used in the clustering was the k-means approach.In this research, the silhouette coefficient was used to determine the effectiveness of the distance measure.The analysis found that the prominent cities in terms of average wind speed

k-Means Approach for Wind Clustering
The annual wind speed patterns can be grouped when the study area is the same.Yesilbudak et al. conducted a clustering analysis of multidimensional wind speeds for 75 provinces in Turkey.The method used in the clustering was the k-means approach.In this research, the silhouette coefficient was used to determine the effectiveness of the distance measure.The analysis found that the prominent cities in terms of average wind speed were Canakkale and Mardin, located in cluster 4, where the mean cluster of silhouettes was 0.5224.On the other hand, cluster 1 contained Duzee, Amasya, and Siirt, which were determined to be poorly matched areas with the silhouette coefficients of 0.7294, 0.7198, and 0.7111.Figure 15 below shows the silhouette coefficients for k = 5 and the square Euclidean distance measure result [16].
Energies 2023, 14, x FOR PEER REVIEW 12 of 27 were Canakkale and Mardin, located in cluster 4, where the mean cluster of silhouettes was 0.5224.On the other hand, cluster 1 contained Duzee, Amasya, and Siirt, which were determined to be poorly matched areas with the silhouette coefficients of 0.7294, 0.7198, and 0.7111.Figure 15 below shows the silhouette coefficients for k = 5 and the square Euclidean distance measure result [16].In this research, the study mentioned k-means as one of the portioning methods in the literature.The k-means algorithm assumes that D is the dataset that contains n observations and k is the number of clusters.The k-means calculated the dissimilarity between each pair observation differently according to the distance measures.Four types of distance measures were used: squared Euclidean, city-block, cosine, and Pearson.Figure 16 below shows the k-means algorithm used in the study.
To determine the best distance measure, the silhouette coefficient varying between −1 and +1 was used for measuring the observation assigned to the clusters.The accuracy was defined by the silhouette coefficient closer to 1, which indicated that the observation belonged to its cluster.The silhouette was defined as in (4) below.In this research, the study mentioned k-means as one of the portioning methods in the literature.The k-means algorithm assumes that D is the dataset that contains n observations and k is the number of clusters.The k-means calculated the dissimilarity between each pair observation differently according to the distance measures.Four types of distance measures were used: squared Euclidean, city-block, cosine, and Pearson.Figure 16  As shown in Figure 17, the study plots the annual wind speed data using plots.The plots shown in Figure 17 show the wind pattern of the 75 areas arou The analysis by the k-means algorithm with the silhouette coefficient gives a str tering solution.The research found that using the square Euclidean distanc gives a more accurate clustering result compared to the other three distance methods.Therefore, the clustering result was obtained using the square Euc tance measure, as shown in Table 2 below.To determine the best distance measure, the silhouette coefficient varying between −1 and +1 was used for measuring the observation assigned to the clusters.The accuracy was defined by the silhouette coefficient closer to 1, which indicated that the observation belonged to its cluster.The silhouette was defined as in (4) below.

𝑠(𝑦
where a(y i ) is the average dissimilarity of y i and the element of (∈) S k to all other y j ∈ S k , and b(y i ) is the minimum average of dissimilarity of y i ∈ S k to all other y j ∈ S l .As shown in Figure 17, the study plots the annual wind speed data using star glyph plots.The plots shown in Figure 17 show the wind pattern of the 75 areas around Turkey.The analysis by the k-means algorithm with the silhouette coefficient gives a stronger clustering solution.The research found that using the square Euclidean distance measure gives a more accurate clustering result compared to the other three distance measuring methods.Therefore, the clustering result was obtained using the square Euclidean distance measure, as shown in Table 2 below.Table 2.The province categorized into each cluster by the k-means approach [16].

Cluster Name Cluster Observations
Cluster Time series clustering has been widely used in predicting wind speed.For example, Kusiak et al. conducted wind speed clustering to predict the power output generation based on the wind speed.The researchers' study was based on the long-and short-term prediction of power using the k-nearest neighbor (k-NN) algorithm [17].
In this research, multiple parameters were considered during clustering calculation.The parameters also made the clustering much more detailed and precise.Therefore, a clustering method that can cater for bigger variables has to be used for the clustering exercise to be successful.Table 3 below shows the list of parameters used in the research by Kusiak et al.Table 2.The province categorized into each cluster by the k-means approach [16].

Cluster Name Cluster Observations
Cluster Time series clustering has been widely used in predicting wind speed.For example, Kusiak et al. conducted wind speed clustering to predict the power output generation based on the wind speed.The researchers' study was based on the long-and short-term prediction of power using the k-nearest neighbor (k-NN) algorithm [17].
In this research, multiple parameters were considered during clustering calculation.The parameters also made the clustering much more detailed and precise.Therefore, a clustering method that can cater for bigger variables has to be used for the clustering exercise to be successful.Table 3 below shows the list of parameters used in the research by Kusiak et al.
However, the current wind speed data were unavailable during the study.Therefore, the prediction of the power generated from the wind speed was not validated [17].
In 2012, Andrew Clifton demonstrated the usage of k-means clustering to identify the relationship between the wind at turbine height and climate oscillation.The study used fourteen years of data from an 80 m tower at the National Wind Technology Center (NWTC) in Colorado.During the study, the k-means method of clustering identified four dominant wind flows in the area.The study first identifies the frequency of the wind direction.However, for the frequency study, the data are limited to the wind speed of 3.5 m/s and grouped into 5 • and 1 m/s bins.The contours show the relative frequency in each bin on a linear scale.Figure 18 below shows the wind frequency visualized in contours.However, the current wind speed data were unavailable during the study.Therefore, the prediction of the power generated from the wind speed was not validated [17].
In 2012, Andrew Clifton demonstrated the usage of k-means clustering to identify the relationship between the wind at turbine height and climate oscillation.The study used fourteen years of data from an 80 m tower at the National Wind Technology Center (NWTC) in Colorado.During the study, the k-means method of clustering identified four dominant wind flows in the area.The study first identifies the frequency of the wind direction.However, for the frequency study, the data are limited to the wind speed of 3.5 m/s and grouped into 5° and 1 m/s bins.The contours show the relative frequency in each bin on a linear scale.Figure 18 below shows the wind frequency visualized in contours.The researcher applied the k-means clustering approach to zonal and meridional wind speeds.The k-means clustering splits N data points into k clusters and assumes that the data belong to the nearest mean value.The researcher repeated the clustering 100 times using a random initial centroid and generated an optimum set of centroids.The research used the function form of the "Statistics Toolbox" in the software MATLAB R2010b to generate the k-means analysis.Thereby, four dominant flows were found: the weak northerly (N), weak southerly (S), weak westerly (W(L)), and strong westerly flows.The clustering of the flows is shown in Figure 19 below [18].

Figure 18.
Frequency of wind at 80 m binned by wind speed and direction [18].
The researcher applied the k-means clustering approach to zonal and meridional wind speeds.The k-means clustering splits N data points into k clusters and assumes that the data belong to the nearest mean value.The researcher repeated the clustering 100 times using a random initial centroid and generated an optimum set of centroids.The research used the function form of the "Statistics Toolbox" in the software MATLAB R2010b to generate the k-means analysis.Thereby, four dominant flows were found: the weak northerly (N), weak southerly (S), weak westerly (W(L)), and strong westerly flows.The clustering of the flows is shown in Figure 19 below [18].The optimum number of clusters was obtained by Andrew Clifton's research using the Bayesian information criterion (BIC) method.The BIC method increased the number of k to a point where k would not give a meaningful quality to the result.The method performs well in two-dimensional datasets, especially when using a machine learning ap- The optimum number of clusters was obtained by Andrew Clifton's research using the Bayesian information criterion (BIC) method.The BIC method increased the number of k to a point where k would not give a meaningful quality to the result.The method performs well in two-dimensional datasets, especially when using a machine learning application such as MATLAB [19].Figure 20 below shows the variation of the normalized BIC value with the number of clusters, and the result shows that optimum number of k is 4. The optimum number of clusters was obtained by Andrew Clifton's research using the Bayesian information criterion (BIC) method.The BIC method increased the number of k to a point where k would not give a meaningful quality to the result.The method performs well in two-dimensional datasets, especially when using a machine learning application such as MATLAB [19].Figure 20 below shows the variation of the normalized BIC value with the number of clusters, and the result shows that optimum number of k is 4.

Non-Parametric Approach Hierarchical Clustering
Guldal et al. used hierarchical clustering algorithms to cluster the wind speed and blow number, a parameter which causes evaporation in Lake Egirdir, Turkey.The research used a non-parametric approach of the hierarchical clustering algorithm where the monthly evaporation losses and the mean wind speeds with the blow number were clustered.The clustering method was determined by the mutual neighbor distance (MND) algorithm.Figure 21a shows the pattern labelled A, B, C, D, E, F, and G, which falls into three clusters.The clustering can be further refined using a single-link algorithm, as shown in Figure 21b [20].

Non-Parametric Approach Hierarchical Clustering
Guldal et al. used hierarchical clustering algorithms to cluster the wind speed and blow number, a parameter which causes evaporation in Lake Egirdir, Turkey.The research used a non-parametric approach of the hierarchical clustering algorithm where the monthly evaporation losses and the mean wind speeds with the blow number were clustered.The clustering method was determined by the mutual neighbor distance (MND) algorithm.Figure 21a shows the pattern labelled A, B, C, D, E, F, and G, which falls into three clusters.The clustering can be further refined using a single-link algorithm, as shown in Figure 21b [20].Figure 21 shows the hierarchical clustering algorithm in a two-dimensional dataset.Figure 21a shows that there are seven observations, labelled as A, B, C, D, E, F, and G, in three clusters.Therefore, in Figure 21b, the dendrogram shows the grouping of seven patterns and the similarity levels of the observations.Figure 21b shows that the clustering can be broken into multiple levels.For example, level 1 comprises three clusters, (A, B and C), (D and E), and (F and G) [20].
The mutual neighbor distance (MND) used by this study is described in Figure 22 and by MND Equation ( 5  Figure 21 shows the hierarchical clustering algorithm in a two-dimensional dataset.Figure 21a shows that there are seven observations, labelled as A, B, C, D, E, F, and G, in three clusters.Therefore, in Figure 21b, the dendrogram shows the grouping of seven patterns and the similarity levels of the observations.Figure 21b shows that the clustering can be broken into multiple levels.For example, level 1 comprises three clusters, (A, B and C), (D and E), and (F and G) [20].
The mutual neighbor distance (MND) used by this study is described in Figure 22 and by MND Equation (5) below; where NN (x i , x j ) is the neighbor number of x j with respect to x i .Figure 22  three clusters.Therefore, in Figure 21b, the dendrogram shows the grouping of seven patterns and the similarity levels of the observations.Figure 21b shows that the clustering can be broken into multiple levels.For example, level 1 comprises three clusters, (A, B and C), (D and E), and (F and G) [20].
The mutual neighbor distance (MND) used by this study is described in Figure 22 and by MND Equation ( 5  The result from the above method shows both the similarity (S) levels (l) of S6 (l6) and S8 (l8) and the strong relation of the evaporation rate, R 2 (R 2 = 0.29 for wind speed change and evaporation rate), (R 2 = 0.85 for wind blow number and evaporation rate), for June, July, August, and September.The strongest relationship is the clustering at l = 6 (S6), as shown in Figure 23a; the detail of the similarity level S6 (l6) clusters analysis is shown in Figure 23b, where the coefficient of the evaporation rate is 0.96.Therefore, the clustering should determine different operation levels to make efficient operating decisions and The result from the above method shows both the similarity (S) levels (l) of S6 (l6) and S8 (l8) and the strong relation of the evaporation rate, R 2 (R 2 = 0.29 for wind speed change and evaporation rate), (R 2 = 0.85 for wind blow number and evaporation rate), for June, July, August, and September.The strongest relationship is the clustering at l = 6 (S6), as shown in Figure 23a; the detail of the similarity level S6 (l6) clusters analysis is shown in Figure 23b, where the coefficient of the evaporation rate is 0.96.Therefore, the clustering should determine different operation levels to make efficient operating decisions and accurate predictions.Furthermore, this prediction should produce scientific meaning by representing the actual object in the best way [20].
Energies 2023, 14, x FOR PEER REVIEW 18 of 27 accurate predictions.Furthermore, this prediction should produce scientific meaning by representing the actual object in the best way [20].However, the research of Guldal et al. does not discuss the relative error or comparison between methods since the research only uses the non-parametric approach.

Trend-Based Time Series Data Clustering Using Statistical Model
The wind prediction method has been studied and revised with multiple hybrid methods to simplify and increase the accuracy of the algorithm.Kushwah et al. studied wind forecasting by using a time series.Wind components such as seasonal trends can be monitored in the time series application.In this research, the clustering method was based on the seasonal trend.As shown in Figure 24 below, the proposed model for wind speed prediction uses the trend as the major component during the study [21].The study used standard deviation for data analysis.The result from the standard deviation analysis was then converted into a time series for clustering purposes.The wind prediction was evaluated in four models: the autoregressive integrated moving average (ARIMA), the generalized autoregressive score (GAS), a hybrid model of C-ARIMA, and a hybrid model of C-GAS.The finding was that both hybrid models performed better compared to the original model of ARIMA and GAS in terms of forecasting wind trends.
25, in the left, middle, and right panels, shows the wind speed prediction using the GAS model for the first, second, and third clusters.The result also shows that the mean absolute error (MAE) and root mean square error (RMSE) for the hybrid models are lower than the original, as shown in Tables 4 and 5 below.The bolded numbers in the tables are the lowest error values obtained during the analysis [21].The study used standard deviation for data analysis.The result from the standard deviation analysis was then converted into a time series for clustering purposes.The wind prediction was evaluated in four models: the autoregressive integrated moving average (ARIMA), the generalized autoregressive score (GAS), a hybrid model of C-ARIMA, and a hybrid model of C-GAS.The finding was that both hybrid models performed better compared to the original model of ARIMA and GAS in terms of forecasting wind trends.Figure 25, in the left, middle, and right panels, shows the wind speed prediction using the GAS model for the first, second, and third clusters.The study used standard deviation for data analysis.The result from the standard deviation analysis was then converted into a time series for clustering purposes.The wind prediction was evaluated in four models: the autoregressive integrated moving average (ARIMA), the generalized autoregressive score (GAS), a hybrid model of C-ARIMA, and a hybrid model of C-GAS.The finding was that both hybrid models performed better compared to the original model of ARIMA and GAS in terms of forecasting wind trends.Figure 25, in the left, middle, and right panels, shows the wind speed prediction using the GAS model for the first, second, and third clusters.The result also shows that the mean absolute error (MAE) and root mean square error (RMSE) for the hybrid models are lower than the original, as shown in Tables 4 and 5 below.The bolded numbers in the tables are the lowest error values obtained during the analysis [21].The result also shows that the mean absolute error (MAE) and root mean square error (RMSE) for the hybrid models are lower than the original, as shown in Tables 4 and 5 below.The bolded numbers in the tables are the lowest error values obtained during the analysis [21].The study above, however, did not reveal the result of the wind clustering and only reviewed the precision of both hybrid methods of wind forecasting.
In 2019, based on the Komsberg, South African area, research on the mean daily wind speed was conducted by Vuuren and Vermeulen.The study focuses on clustering the mean daily wind speed and comparing it with the customers' demands.The research then further analyzed the tariff to optimize the siting areas for wind energy facilities.The study used multiple clustering algorithms to cluster wind resource datasets.The algorithms used were k-means, partitioning around medoids, the clustering large application algorithm, agglomerative clustering, the divisive analysis algorithm, and fuzzy c-means clustering.The research also used the Euclidean distance and Pearson correlation of the distance measurement.The research used the standard deviation method to obtain the mean high wind speed.Figure 26 shows the daily mean, median, and variance characteristics of the wind speed profiles for the REDZs for the 2013 period, using the standard deviation method [22].The research used a dendrogram to show the cluster assignment obtained by using the hierarchical agglomerative algorithm.Figure 28 below shows the clustering tree-like structure used to represent the four clusters assigned to the data based on the clustering method.Therefore, the mean wind speed can be visualized by the tree diagram and is easy to understand.The research used a dendrogram to show the cluster assignment obtained by using the hierarchical agglomerative algorithm.Figure 28 below shows the clustering tree-like structure used to represent the four clusters assigned to the data based on the clustering method.Therefore, the mean wind speed can be visualized by the tree diagram and is easy to understand.The research used a dendrogram to show the cluster assignment obtained by using the hierarchical agglomerative algorithm.Figure 28 below shows the clustering tree-like structure used to represent the four clusters assigned to the data based on the clustering method.Therefore, the mean wind speed can be visualized by the tree diagram and is easy to understand.Based on the clustering analysis, Table 6 below shows the validation result of the research.The result shows that the PAM and CLARA algorithm gave the best validation result.It was found that the CLARA algorithm reduced the algorithmic computing time of the large datasets without deceasing their accuracy.The CLARA algorithm also gave the highest silhouette coefficient.Therefore, it was concluded that CLARA algorithm was the most suitable method to use in this research.Based on the clustering analysis, Table 6 below shows the validation result of the research.The result shows that the PAM and CLARA algorithm gave the best validation result.It was found that the CLARA algorithm reduced the algorithmic computing time of the large datasets without deceasing their accuracy.The CLARA algorithm also gave the highest silhouette coefficient.Therefore, it was concluded that CLARA algorithm was the most suitable method to use in this research.In 1996, Kaufmann et al. used the hierarchical clustering method in research in which the wind speed was an absolute value with vector differences at the station.The research took place for a duration of one year in the city of Basel.The period reflected the diurnal and seasonal airflow variation in the complex terrain.The study analyzes the normalized hourly mean of the wind fields.The distances measured for the study were defined as the mean absolute values of the vector differences at all the stations involved [23].The study is comparable to the study of Gassmann et al., in which they used Ward's clustering method with distances of Euclidean measurement [24].
However, the method was found to be unsuitable for use in the study.Therefore, the study used the complete linkage method (Anderberg), which tended to build a group of similar size but focused on the ranking of the distances.The study found that 15 clusters could be produced based on the analysis using complete linkage clustering.A clear diurnal variation of wind patterns was observed, and it fit with the physical mechanism of the mountain valley wind and the characteristics of the sample of the cluster for normalized wind vectors obtained during the study, as in Figure 29 below.The research, however, did not discuss the error analysis of the method used [23].The study found that 15 clusters could be produced based on the analysis using complete linkage clustering.A clear diurnal variation of wind patterns was observed, and it fit with the physical mechanism of the mountain valley wind and the characteristics of the sample of the cluster for normalized wind vectors obtained during the study, as in Figure 29 below.The research, however, did not discuss the error analysis of the method used [23].The research found five different wind patterns by using a two-step clustering analysis in the city of Cartagena.The analysis clustered the wind direction into five clusters.For example, the first cluster found that 6.5% of the cases of wind direction were north-northwest and north.The second cluster had wind of a south-southwest and south direction,  The research found five different wind patterns by using a two-step clustering analysis in the city of Cartagena.The analysis clustered the wind direction into five clusters.For example, the first cluster found that 6.5% of the cases of wind direction were north-northwest and north.The second cluster had wind of a south-southwest and south direction, which comprised 24.7% of the data.The method used in this research was a two-step clustering analysis procedure that used the hierarchical (average linkage) and non-hierarchical (kmeans) methods [25].
There are other clustering algorithms, such as the density-based spatial clustering of application with noise (DBSCAN) and the autoregressive integrated moving average (ARIMA).Dokuz et al. used both the DBSCAN and the ARIMA algorithms in their research on wind speed forecasting.The study found that using both methods provided a better performance than using a single method.In addition, the hybrid method proved that the root mean square error (RMSE) decreased up to 20% [26].

Recommendation and Conclusions
As mentioned in the above topics, there are many methods of clustering used to cluster wind speed.The non-parametric hierarchical clustering using the mutual neighbor distance algorithm shows a complex method of clustering and an acceptable result.The method showed an efficient operating decision and made accurate predictions during research [20].
The trend-based time series clustering shows that the method produces excellent accuracy.Even though the research focuses on forecasting the wind speed, the study shows that the wind speed can be clustered according to its trend.This was shown in the research of Kushwah et al. for the yearly trend.Therefore, the trend can be predicted as it follows a seasonal pattern, and the application of this research is good for research with a localized wind speed trend prediction.The clustering using the trend-based method was successfully shown in the research of Vuuren et al., where the researchers successfully clustered the mean daily wind speeds for the high demand season using the clustering large application algorithm (CLARA).
However, there are two main methods that the wind clustering researcher usually uses: the k-means and the Ward methods.Both methods are based on the k-value to determine the partition size of the cluster.The cluster size is important to the researcher when determining the number of desired clusters according to the research objective.
For the k-means method, the algorithm gives no guidance for the numbers of k.However, Ward's method gives some partition sizes of k, which should be within the partition size of k + 1.Therefore, Ward's method does not produce a sum of squares as small as that of the k-means method.Between the k-means method and Ward's method, Ward's method gives more accurate results compared to the k-means method.The trade-off for this accuracy aspect is that due to its complexity, Ward's method takes more time to be calculated and shows less error, as shown in the Tables 8 and 9   Therefore, with regard to the essence of the accuracy of wind clustering, Ward's method shows higher precision compared to the other clustering methods.The method is Energies 2023, 16, 3388 23 of 24 also easily applied to numerous parameters, such as speed, direction, frequency, and others, to suit the researcher's target objectives.This paper focuses on the best method of wind clustering according to wind speed.Therefore, it was found that to cluster wind speed at a particular location and a period of time, the clustering should be able to segregate a timelapse, such as with wind speed trend clustering.Table 10 below shows the comparison of each method discussed in this research.It concluded that in terms of accuracy, readability in machine learning software, and larger datasets, the most suitable method to cluster the wind trend nationally is the Linkage-Ward clustering method.The selection of the Linkage-Ward clustering method is due to the impact of the result and its accuracy.Although the calculations using Ward's method are more complex than those of the other methods, due to impact of the result the complexity can be ignored.The result of the research aims to create a guideline for researchers, engineers, and wind experts to improve the knowledge and design, especially regarding wind speed trends.The impact of the finding will be on the civil design, wind harvesting, and weather safety sectors.

Figure 1 .
Figure 1.The direction of northeast and southwest monsoon in Peninsular Malaysia [1].
et al. at 10 wind stations.The research, which

Figure 6 .
Figure 6.Monthly mean wind speeds at (a) onshore and (b) offshore locations of Qatar at a heig of 90 m from 1979 to 2018 [11].

Figure 6 .
Figure 6.Monthly mean wind speeds at (a) onshore and (b) offshore locations of Qatar at a height of 90 m from 1979 to 2018 [11].

Figure 7 .
Figure 7. Monthly mean hourly wind speed in 2008 from July to October [12].

Figure 7 .
Figure 7. Monthly mean hourly wind speed in 2008 from July to October [12].

Figure 7 .
Figure 7. Monthly mean hourly wind speed in 2008 from July to October [12].However, in terms of the engineering purposes, the wind trend observation focused more on the sudden spike in wind speed and the highest wind speed recorded in the research area.The research conducted by Shanmugasundaram et al. in 1998 was based on the tropical cyclone wind condition which occurred in June and December 1996.The research came out with a wind trend observation of the cyclone which indicated the highest mean and maximum wind speed recorded during the event.The wind speed trend observation helped the researchers to locate the maximum wind speed during the event and to calculate the damping ration increase for the 52 m steel lattice tower.Figure8below shows the mean and maximum wind speeds during the cyclone of the year 1996[13].

Figure 8 .
Figure 8. Mean and maximum wind speeds during cyclone [13].The research which was based on the air pollution also used the wind trend analysis to simulate the severity of the pollution affecting the area.The direction and speed of the wind plays an important role in the air pollution effect.In 2016, Sokolov et al. conducted a cluster analysis of the atmospheric dynamics and pollution transport in the coastal area of industrialized Dunkerque in northern France.The research aimed to determine the
below shows the average wind speed value sample at a 40 m height with 10 min intervals in the study area.The color lines indicate 50 days chosen randomly by the researchers [15].
below shows the error calculation result in determining the number of clusters.Energies 2023, 14, x FOR PEER REVIEW 11 of 27

Figure 13 .
Figure 13.The error for different numbers of clusters [15].

Figure 13 .
Figure 13.The error for different numbers of clusters [15].

Figure 13 .
Figure 13.The error for different numbers of clusters [15].

Figure 17 .
Figure 17.The star glyph plots created for visualizing multidimensional wind speed data [16].

Figure 17 .
Figure 17.The star glyph plots created for visualizing multidimensional wind speed data [16].

Figure 18 .
Figure 18.Frequency of wind at 80 m binned by wind speed and direction[18].

Figure 20 .
Figure 20.Variation of normalized BIC value with number of clusters when M2 meridional and zonal winds are grouped into k clusters at each height [18].

Figure 20 .
Figure 20.Variation of normalized BIC value with number of clusters when M2 meridional and zonal winds are grouped into k clusters at each height [18].
Figure21shows the hierarchical clustering algorithm in a two-dimensional dataset.Figure21ashows that there are seven observations, labelled as A, B, C, D, E, F, and G, in three clusters.Therefore, in Figure21b, the dendrogram shows the grouping of seven patterns and the similarity levels of the observations.Figure21bshows that the clustering can be broken into multiple levels.For example, level 1 comprises three clusters, (A, B and C), (D and E), and (F and G)[20].The mutual neighbor distance (MND) used by this study is described in Figure 22 and by MND Equation (5) below;  (  ,   ) =  (  ,   ) +  (  ,   ) (5) where NN (xi, xj) is the neighbor number of xj with respect to xi. Figure 22 shows the example of MND.The neighbor nearest to A is B, and B is the nearest neighbor of A. Therefore, NN(A,B) = NN (B,A) = 1.The MND between A and B become 2 according to equation (5) above.The NN (B,C) = 1, and the NN (C,B) = 2. Therefore, the MND (B,C) = 3.
shows the example of MND.The neighbor nearest to A is B, and B is the nearest neighbor of A. Therefore, NN(A,B) = NN (B,A) = 1.The MND between A and B become 2 according to Equation (5) above.The NN (B,C) = 1, and the NN (C,B) = 2. Therefore, the MND (B,C) = 3.
) below;  (  ,   ) =  (  ,   ) +  (  ,   ) (5) where NN (xi, xj) is the neighbor number of xj with respect to xi. Figure 22 shows the example of MND.The neighbor nearest to A is B, and B is the nearest neighbor of A. Therefore, NN(A,B) = NN (B,A) = 1.The MND between A and B become 2 according to equation (5) above.The NN (B,C) = 1, and the NN (C,B) = 2. Therefore, the MND (B,C) = 3.

Figure 22 .
Figure 22.A and B are more similar than A and C.

Figure 22 .
Figure 22.A and B are more similar than A and C.

Figure 23 .
Figure 23.The dendrogram depends on the hierarchical single linkage for the second application (a) and detail of similarity level S6 (l6) cluster analysis (b) [20].However, the research of Guldal et al. does not discuss the relative error or comparison between methods since the research only uses the non-parametric approach.2.2.4.Trend-Based Time Series Data Clustering Using Statistical Model The wind prediction method has been studied and revised with multiple hybrid methods to simplify and increase the accuracy of the algorithm.Kushwah et al. studied

Figure 23 .
Figure 23.The dendrogram depends on the hierarchical single linkage for the second application (a) and detail of similarity level S6 (l6) cluster analysis (b) [20].

Figure 25 .
Figure 25.The wind speed prediction using the GAS model on dataset #1 [21].

Figure 25 .
Figure 25.The wind speed prediction using the GAS model on dataset #1 [21].

Figure 25 .
Figure 25.The wind speed prediction using the GAS model on dataset #1 [21].

Energies 2023 ,Figure 26 .
Figure 26.Boxplot showing the daily mean, median, and variance characteristics of the wind speed using the standard deviation method [22].The research used three types of clustering methods.The clustering methods were the k-means algorithm, the partitioning around medoids algorithm (PAM), and the clustering large application algorithm (CLARA).The k-means clustering algorithm result showed a non-overlapping cluster for the Komsberg wind speed profile.Figure 27 below shows a 2D representation of the variables through principal component analysis.

Figure 26 .Figure 26 .
Figure 26.Boxplot showing the daily mean, median, and variance characteristics of the wind speed using the standard deviation method [22].The research used three types of clustering methods.The clustering methods were the k-means algorithm, the partitioning around medoids algorithm (PAM), and the clustering large application algorithm (CLARA).The k-means clustering algorithm result showed a non-overlapping cluster for the Komsberg wind speed profile.Figure 27 below shows a 2D representation of the variables through principal component analysis.

Figure 28 .
Figure 28.Dendrogram representation of the tree-like structure obtained with the hierarchical agglomerative algorithm [22].

Figure 28 .
Figure 28.Dendrogram representation of the tree-like structure obtained with the hierarchical agglomerative algorithm [22].

Figure 29 .
Figure 29.Cluster averages of normalized wind vectors at all measurement sites for (a) cluster 5 and (b) cluster 14. "C" labels the station on the TV tower at St. Chrischona [23].

2. 2 . 6 .
Other Methods of Data Clustering Angosto et al. conducted a wind clustering analysis to predict atmospheric pollution.

Figure 29 .
Figure 29.Cluster averages of normalized wind vectors at all measurement sites for (a) cluster 5 and (b) cluster 14. "C" labels the station on the TV tower at St. Chrischona [23].

2. 2 . 6 .
Other Methods of Data Clustering Angosto et al. conducted a wind clustering analysis to predict atmospheric pollution.
below, produced by the Azizi et al. in 2019.

Table 4 .
MAE and RMSE values using the ARIMA and clustered ARIMA models [21].

Table 5 .
MAE and RMSE values using the GAS and clustered GAS models[21].: mean absolute error; RMSE: root mean square error; GAS: generalized autoregressive score.Bold numeric value of MAE and RMSE indicates that the prediction model corresponding to the column has the least prediction error and performed better on the Dataset representing that row. MAE

Table 6 .
Validation result for the various clustering algorithms.

Table 7
[23]w summarizes the result obtained in the study ofKaufmann et al., where 15 clusters were found based on the criteria given[23].

Table 8 .
Time of clustering with different methods [15].

Table 9 .
Relative error between cluster members and their centers in different methods[15].

Table 10 .
Comparison table on clustering method.