Suitability Assessment of Weather Networks for Wind Data Measurements in the Athabasca Oil Sands Area

The Athabasca Oil Sands Area (AOSA) in Alberta, Canada, is considered to have a high density of weather stations. Therefore, our objective was to determine an optimal network for the wind data measurement that could sufficiently represent the wind variability in the area. We used available historical data records of the weather stations in the three networks in AOSA, i.e., oil sands monitoring (OSM) water quantity program (WQP) and Wood Buffalo Environmental Association (WBEA) edge sites (ES) and meteorological towers (MT) of the air program. Both graphical and quantitative methods were implemented to find the correlations and similarities in the measurements between weather stations in each network. The graphical method (wind rose diagram) was found as a functional tool to understand the patterns of wind directions, but it was not appropriate to quantify and compare between wind speed data of weather stations. Therefore, we applied the quantitative method of the Pearson correlation coefficient (r) and absolute average error (AAE) in finding a relationship between the wind data of station pairs and the percentage of similarity (PS) method in quantifying the closeness/similarity. In the correlation analyses, we found weak to strong correlations in the wind data of OSM WQP (r = 0.04–0.69) and WBEA ES (r = 0.32–0.77), and a strong correlation (r = 0.33–0.86) in most of the station pairs of the WBEA MT network. In the case of AAE, we did not find any acceptable value within the standard operating procedure (SOP) threshold when logically combining the values of the u and v components together. In the similarity analysis, minor similarities were identified between the stations in the three networks. Hence, we presumed that all weather stations would be required to measure wind data in the AOSA.


Introduction
Wind is an important atmospheric element when we think about the current weather condition and predicting the future. It carries temperature and moisture from one place to another, and therefore, weather conditions vary with the shift of wind speed and direction. Both wind speed and wind direction are critical for monitoring and predicting weather patterns and climate from the global to local scale. The wind blows due to uneven heating of the Earth's surface by the Sun (solar radiation). In this process, the Sun heats the Earth's surface and warms up the surface air. The warm air becomes less dense and creates a low-pressure zone that tends to rise upwards. Subsequently, denser cold air from the surrounding high-pressure zone blows toward the low-pressure zone due to the pressure gradient, and that causes surface wind [1]. The surface wind recorded at the weather station is directly related to the characteristics of the landscape of the site, i.e., latitude, the roughness of the terrain, surrounding vegetation, and any elevated surface structures [2,3]. 10 and 20 km/h, respectively. Here, the wind data was 5 • with 10 km/h and 5 • with 20 km/h for station A and station B, respectively, and they are not similar. Therefore, it would be appropriate to use both components (speed and direction) together as a single entity in a wind data analysis to find similarities. We also observed in the literature that several approaches used both wind components for comparing and performing similarity or closeness between two wind datasets derived from two weather stations [28,29]. These analyses could be grouped into two major categories, such as graphical representation and quantitative measures. The wind rose diagram, a graphical representation, would show a visual to represent the frequency distribution of wind speed and direction over a certain period for any weather station [30]. In contrast, quantitative measures related to the methods included an analysis of association and analysis of coincidence using u (zonal) and v (meridional) for two-dimensional (2D) surface wind vectors of the speed expressed in polar coordinates of the x-direction (east-west) and y-direction (north-south), respectively [28,29,31]. The examples of association-related measures were the Pearson correlation coefficient (r), coefficient of determination (R 2 ), Spearman's correlation coefficient (R S ), Nash-Sutcliffe efficiency (E), and Cosine similarity (Cosθ) [29,32], and the coincidence-related measures were the absolute average error (AAE), relative difference (RD), mean squared error (MSE), root mean square error (RMSE), and bias (B) [1,33,34].
In this study, we considered the approaches that used both u and v components of the surface wind data together as vectors for estimating the similarity or closeness between two datasets of the station pairs in AOSA. The representation of graphical measures, i.e., the wind rose diagram, provided a very good visual in the literature for comparing the datasets between stations; however, it was not possible to derive any quantitative estimates from it. Therefore, we also opted to apply the measure of analysis of association (r, R 2 , R S , E, and Cosθ) and coincidence (AAE, RD, MSE, RMSE, and B). In the association-related measures, r was widely used in the literature for its capacity of determining the strength and direction of the relationship [32]. AAE was also broadly used among the coincidence-related measures due to providing a more natural measure of the average error and being relatively simple to calculate [35]. These two methods (i.e., r and AAE from the two groups of measure) were found sufficient to measure the quantitative similarity between two datasets to predict one another [36] but did not estimate the number of similar values in each station pair in the datasets. Additionally, we did not find any approach in the literature that considered integrating the error of measuring instrument to quantify the similarity in the wind data. Therefore, we set our overall goal for this study to perform a similarity analysis of historical wind data records of the weather stations in AOSA and identify the minimum number of weather stations required for wind measurements by integrating the instrumental error. The specific objectives to fulfill the overall goal were as follows: i.
Evaluation of graphical and quantitative measures on wind data among weather stations to identify the best representative ones for a similarity analysis; ii.
Calculation of the percentage of similarity in the wind data records using the best measures and integrating the instrumental errors to find the correlations among the weather stations; and iii.
Determination of optimal weather networks for wind data measurements in the study area based on the estimated percentage of the similarity analysis.

Study Area
Our study area was the Athabasca oil sands area, which is in the lower Athabasca River Basin of Northern Alberta, Canada ( Figure 1). The Athabasca River drains through the area from southwest towards the north. The landscape varies from upland Boreal forests to poorly drained wetlands within the low land regions [37]. The area is in a subarctic climatic regime with an average annual air temperature from 0.7 to 1 • C and having four seasons of long cold winter, short wet summer, and a short spring and fall. The spring and fall seasons receive a considerable amount of precipitation in terms of snowfall, with an Climate 2022, 10, 10 4 of 18 annual total of 376-456 mm. Here, the driest months are from November to April, and the wettest is July. The yearly average wind speed is 9.6 km/h according to the 1961 to 1990 climate normal, where the highest is during spring (10.7 km/h) and the lowest in winter (8.8 km/h) [38]. According to the climate normals from 1981 to 2010, our study area receives an average annual solar radiation of 108-128 W/m 2 [39] and records an annual average of the atmospheric pressure 96.9-97.2 kPa, relative humidity 40.1-87.5%, and snow depth up to 30 cm [40].
Climate 2022, 10, x. https://doi.org/10.3390/xxxxx www.mdpi.com/journal/climate the area from southwest towards the north. The landscape varies from upland Boreal forests to poorly drained wetlands within the low land regions [37]. The area is in a subarctic climatic regime with an average annual air temperature from 0.7 to 1 °C and having four seasons of long cold winter, short wet summer, and a short spring and fall. The spring and fall seasons receive a considerable amount of precipitation in terms of snowfall, with an annual total of 376-456 mm. Here, the driest months are from November to April, and the wettest is July. The yearly average wind speed is 9.6 km/h according to the 1961 to 1990 climate normal, where the highest is during spring (10.7 km/h) and the lowest in winter (8.8 km/h) [38]. According to the climate normals from 1981 to 2010, our study area receives an average annual solar radiation of 108-128 W/m 2 [39] and records an annual average of the atmospheric pressure 96.9-97.2 kPa, relative humidity 40.1-87.5%, and snow depth up to 30 cm [40]. For monitoring purposes, three networks of weather stations (i.e., OSM WQP, WBEA ES, and WBEA MT) measure the wind speed and direction in the study area, including other meteorological parameters, and span between 109° W and 114° W longitudes and 56° N and 58° N latitudes ( Figure 1). Here, the altitudes of the weather stations vary 294-559 m, 299-520 m, and 256-626 m mean sea level for the OSM WQP, WBEA ES, and WBEA MT networks, respectively.

Data Availability
We collected the available wind speed and wind direction data for 17 stations of the three weather networks from the OSM WQP of Alberta Environment and Parks (AEP) and WBEA for this study. Data measurements of the height, frequency (an averaging window), and period of records of each station are shown in Table 1. For monitoring purposes, three networks of weather stations (i.e., OSM WQP, WBEA ES, and WBEA MT) measure the wind speed and direction in the study area, including other meteorological parameters, and span between 109 • W and 114 • W longitudes and 56 • N and 58 • N latitudes ( Figure 1). Here, the altitudes of the weather stations vary 294-559 m, 299-520 m, and 256-626 m mean sea level for the OSM WQP, WBEA ES, and WBEA MT networks, respectively.

Data Availability
We collected the available wind speed and wind direction data for 17 stations of the three weather networks from the OSM WQP of Alberta Environment and Parks (AEP) and WBEA for this study. Data measurements of the height, frequency (an averaging window), and period of records of each station are shown in Table 1.

Graphical Measure
We prepared wind rose diagrams as a graphical visual measure that presents the frequency distribution of both the wind speed and direction data for a period of interest. Here, we synthesized wind rose plots of station pairs in each weather network for each year and compared them side by side to visualize the dynamics in the wind patterns.

Quantitative Measures
We resolved the wind speed data into scalar quantities (i.e., u and v components were expressed in polar coordinates of the x-and y-directions, respectively) to find the correlation between the measurements of two stations in each station pair in a weather network, as shown in Figure 2. In wind direction measurements, direction refers to the angle from where the wind comes. Here, positive and negative values of the u component considered the wind coming from the west and east, respectively. In the case of the v component, positive and negative values were the south and north. Therefore, first, we transformed the wind direction measurements into the mathematical convention for resolving the scalar wind quantities. Here, we computed the math direction as 270 minus the measured wind direction and added 360 for the negative values. Finally, we computed the relationship between the u and v components for each station data and then compared the stations in each station pair by the correlation measures of the analysis for both association and coincidence (see Section 3.2.1. Association-Related Measures and Section 3.2.2. Coincidence-Related Measures).
x FOR PEER REVIEW 6 of 18 Figure 2. Derivation of the x-direction (east-west) and y-direction (north-south) components of the wind data.

Association-Related Measures
We performed a set of association-related measures, such as R 2 , r, , Cosθ, and E, on the entire dataset according to the following equations (Equations (1)- (5)).

1
(1) where D1 and D2 are the observational data recorded at Station A and Station B, respectively, the number of observations is n, the residual sum of squares is RSS, and the total sum of squares is TSS.

Coincidence-Related Measures
We embraced several coincidence-related measures, including MSE, AAE, RMSE, B, and RD, and the equations are showing as follows (Equations (6)-(10)). All the symbols used in these equations refer to the meaning of symbols we showed in Section 3.2.1. Association-related measures.

Association-Related Measures
We performed a set of association-related measures, such as R 2 , r, R s , Cosθ, and E, on the entire dataset according to the following equations (Equations (1)- (5)).
where D1 and D2 are the observational data recorded at Station A and Station B, respectively, the number of observations is n, the residual sum of squares is RSS, and the total sum of squares is TSS.

Coincidence-Related Measures
We embraced several coincidence-related measures, including MSE, AAE, RMSE, B, and RD, and the equations are showing as follows (Equations (6)-(10)). All the symbols used in these equations refer to the meaning of symbols we showed in Section 3.2.1. Association-related measures.

Determination of the Best Representative Measures
We identified a representative measure from each group of the association and coincidence measures to minimize the ambiguity of using several measures. Here, we executed linear regressions among measures in each group to identify the representative measure. In this process, we also identified outliers where the points were significantly away from the best fit regression line [41].

Similarity Analysis
We conducted a similarity analysis (i.e., percentage of similarity, PS) on the station pairs for the wind data using the acceptable value of instrumental error recommended in the standard operating procedure (SOP) [42,43] using the following equation (Equation (11)): where N1 refers to the total data count, and N2 is the number of data counts that satisfy the arguments of the absolute difference between D1 and D2, which are ±0.5 m/s (1.8 km/h) for the operational wind speed and ±5 • for the wind direction, as suggested in the SOP. From the visual comparison of the wind rose diagrams (Figures 3-6), we observed significantly different patterns of wind magnitude (speed) and direction among stations. Such variability are likely due to the location of the stations at different altitudes [44] with variations in site characteristics, including the surface roughness and the size, shape, and height of the surrounding vegetations and structures [2,3]. It was straightforward for us to visually identify the dominant wind direction from wind rose diagrams but not for the wind speed. Moreover, we could not quantify the similarity among the wind data measured in the stations from the wind rose diagrams (Figures 3-6).

Wind Rose Diagram
Climate 2022, 10, x. https://doi.org/10.3390/xxxxx www.mdpi.com/journal/climate wind speed. Moreover, we could not quantify the similarity among the wind data measured in the stations from the wind rose diagrams (Figures 3-6).    wind speed. Moreover, we could not quantify the similarity among the wind data measured in the stations from the wind rose diagrams (Figures 3-6).

Measures of Association
Our scatter plot of the association related measures (i.e., R 2 , RS, Cosθ, and E) against r for the u and v components of the entire wind dataset (see Figure 7a,b) indicated that r could be a representative measure because of its strong association (i.e., R 2 > 0.71). Research studies indicated that R 2 ≥ 0.50 was significant and acceptable [45,46], and R 2 ≥ 0.70 showed a strong association between the two variables [47,48]. Note that we used only positive values of E in this analysis, because values less than zero (i.e., negative values) indicated an unacceptable model performance [49,50]. Therefore, we decided to use r as a representative for the measures of association in finding further similarities between two datasets of station pairs.

Measures of Association
Our scatter plot of the association related measures (i.e., R 2 , R S , Cosθ, and E) against r for the u and v components of the entire wind dataset (see Figure 7a,b) indicated that r could be a representative measure because of its strong association (i.e., R 2 > 0.71). Research studies indicated that R 2 ≥ 0.50 was significant and acceptable [45,46], and R 2 ≥ 0.70 showed a strong association between the two variables [47,48]. Note that we used only positive values of E in this analysis, because values less than zero (i.e., negative values) indicated an unacceptable model performance [49,50]. Therefore, we decided to use r as a representative for the measures of association in finding further similarities between two datasets of station pairs.

Measures of Coincidence
Our scatter plots of coincidence related measures (i.e., AAE against MSE, RMSE, B, and RD) for the u and v components of the wind datasets (see Figure 8a-d) showed that AAE was strongly correlated with RMSE and MSE (R 2 > 0.99). However, we found very low to insignificant relationships of AAE with RD and B. It was because RD is the percentage error (i.e., ratio of absolute difference), while AAE provided only the actual differences [45,46], and the positive and negative differences cancelled each other to provide low values for B [51]. Hence, we considered to use AAE as a representative for the coincidence-related measures in further accomplishing the analysis of similarity between two datasets of station pairs.

Measures of Coincidence
Our scatter plots of coincidence related measures (i.e., AAE against MSE, RMSE, B, and RD) for the u and v components of the wind datasets (see Figure 8a-d) showed that AAE was strongly correlated with RMSE and MSE (R 2 > 0.99). However, we found very low to insignificant relationships of AAE with RD and B. It was because RD is the percentage error (i.e., ratio of absolute difference), while AAE provided only the actual differences [45,46], and the positive and negative differences cancelled each other to provide low values for B [51]. Hence, we considered to use AAE as a representative for the coincidence-related measures in further accomplishing the analysis of similarity between two datasets of station pairs.

Relationship and Similarity Analysis
We identified that two correlation measures, such as r and AAE (see Sections 4.2 and 4.3), could represent the best in determining a relationship between two datasets of the wind components for each station pair in the three networks (supported by the results from another study [36]). Here, we considered a strong relationship with r ≥ 0.70 [52,53] and AAE values of ≤1.79 km/h (= 1.8*Cos5°) and ≤0.16 km/h (= 1.8*Sin5°) for the u and v components, respectively. These acceptable AAE values were computed from the acceptable absolute difference values recommended in the SOP, i.e., 1.8 km/h for the wind speed and 5° for the wind direction [42,43]. In addition, we considered at least a 75% value of PS in the similarity analysis to find the closeness of the data values between two stations in a station pair.

Correlation Analysis
OSM WQP network: We observed a very weak to moderate correlation (i.e., r from 0.04 to 0.69) for both the u and v components of all the station pairs, except the station pair C2 vs. C4 for the v component with a strong correlation (r = 0.77) (see Table 2). In addition, we found that AAE values were not acceptable for both the u (2. 26-4.14) and v (2.84-5.38) components considering the required acceptable values of ≤1.79 and ≤0.16, respectively. Such a weak-to-moderate correlation and not acceptable AAE values were likely due to factors associated with elevation differences, surface frictions, and the surrounding vegetation of the stations [2,3,44].

Relationship and Similarity Analysis
We identified that two correlation measures, such as r and AAE (see Sections 4.2 and 4.3), could represent the best in determining a relationship between two datasets of the wind components for each station pair in the three networks (supported by the results from another study [36]). Here, we considered a strong relationship with r ≥ 0.70 [52,53] and AAE values of ≤1.79 km/h (= 1.8*Cos5 • ) and ≤0.16 km/h (= 1.8*Sin5 • ) for the u and v components, respectively. These acceptable AAE values were computed from the acceptable absolute difference values recommended in the SOP, i.e., 1.8 km/h for the wind speed and 5 • for the wind direction [42,43]. In addition, we considered at least a 75% value of PS in the similarity analysis to find the closeness of the data values between two stations in a station pair.

Correlation Analysis
OSM WQP network: We observed a very weak to moderate correlation (i.e., r from 0.04 to 0.69) for both the u and v components of all the station pairs, except the station pair C2 vs. C4 for the v component with a strong correlation (r = 0.77) (see Table 2). In addition, we found that AAE values were not acceptable for both the u (2. 26-4.14) and v (2.84-5.38) components considering the required acceptable values of ≤1.79 and ≤0.16, respectively. Such a weak-to-moderate correlation and not acceptable AAE values were likely due to factors associated with elevation differences, surface frictions, and the surrounding vegetation of the stations [2,3,44]. WBEA ES network: For both the u and v components, r was satisfied in general (from 0.32 to 0.77) but not the AAE (from 2.05 to 4.33 and 1.57 to 4.26 for the u and v components, respectively). Here, we found two station pairs (i.e., JE306 vs. JE312 and JE312 vs. JE316), where at least one component showed a strong correlation (i.e., r ≥ 0.70), and the other was very close to it (see Table 2). Nevertheless, the reason of not having any acceptable AAE values among the wind datasets would be due to the site-associated factors that cause variable wind magnitude (speed) and direction at any place [54]. WBEA MT network: We found the range of weak to strong correlation values (i.e., r) from 0.35 to 0.82 and 0.33 to 0.86 for the u and v components, respectively (see Table 3). In general, moderate to strong correlations were observed at 16-m and above heights, which would probably be due to the measurements above the vegetation canopy with less interference of the surface roughness [55]. Here, AAE ranged from 0.14 to 0.36, 0.37 to 0.98, 0.63 to 1.43, and 0.93 to 1.83 at 2, 16, 21, and 29-m heights, respectively, for the u component and 0. 16 Table 4). Note that we considered the SOP values for both the u and v components together in this analysis. Considering such low PS values, we determined that none of the station pairs showed any similarity. Such dissimilarity in wind datasets would likely be due to altitude variations among the weather stations [44]. Landscape or hill forms and its steepness and orientation toward wind would also potentially affect the wind speed and direction [56]. In addition, characteristics of the surrounding vegetation and topographic obstructions in the weather stations would be other factors [2,3]. Moreover, wind direction is difficult to compare even at the same place, because it is highly affected by a lack of synchronization between measurements that allows turbulent motion to make directions quite different [57]. Therefore, it would require wind measurements from all weather stations in the three networks to represent the observed variability in the study area. Overall, we found variable relationships and similarities in the station pairs of each network, such as: some correlations (i.e., r) in all the networks, no acceptable AAE for the OSM WQP and WBEA ES networks, some acceptable AAE values for the u or v components individually for the WBEA MT network, and no acceptable PS value for any network. Therefore, we did not find any station pair that was acceptable with the logical combination of the "r value" AND "AAE value of the u component" AND "AAE value of the v component" AND "PS value". We noticed that some stations are spatially very close to each other in the study area, but they belong to other networks. Since we did not find any acceptable station pair in each weather network, we further analyzed for r and AAE (both u and v components) and PS for the closest station pair across the network with the hope of receiving an acceptable correlation and similarity, as an example (see Table 5). For such an analysis, we required the data of station pairs across the network that were measured at the same height, because we should not compare wind data measured at different heights. The closest two station pairs in the cross network were JP104 vs. R2 and JP316 vs. JE316, where all the stations measured data at a 2-m height. The analysis showed a moderate correlation (r = 0.23 and 0.60) with AAE values 1.59 and 4.54 (for the u component) and 2.23 and 3.90 (for the v component) and PS values 6.64 and 4.09% (see Table 5). While reasonable correlations existed in the station pairs, the combination logic of the "r value" AND "AAE value of the u component" AND "AAE value of the v component" AND "PS value" did not show any acceptable station pair suitable from the cross network. It indicated that wind data is more variable at the lower height (e.g., 2 m), where the wind direction might not vary much but the wind speed is greatly impacted by the surrounding landcover, vegetation, topography, and other obstructions and therefore varies much.

Conclusions
Weather stations in AOSA were found significantly closer to each other than recommended by the WMO. To understand the redundancy of the stations, we demonstrated that a wind rose diagram would be appropriate for visual comparisons of the wind datasets among weather stations to understand various patterns of the wind magnitude (speed) and direction in the networks. Comparing the dominant wind directions from wind rose diagrams for a station pair was straightforward, but it was difficult for the wind speeds. Therefore, it was not possible to quantify the similarity among the datasets of station pairs from the wind rose diagrams. However, a correlation analysis (i.e., r and AAE) and similarity analysis (i.e., PS) made it possible to quantify the relationship and similarity of the wind data considering the integrated u and v vector components. In the correlation analysis, we observed insignificant correlation in the OSM WQP and WBEA ES networks and a strong correlation in most of the station pairs of the WBEA MT network at 16 m and above heights. However, none of the three networks showed any similarities between the wind data of the station pairs in the similarity analysis. As an example, we performed a similarity analysis between stations across networks for the two closest station pairs but received minor similarities. Note that we did not perform a similarity analysis on the wind data measured across different heights of any station in the WBEA MT network, because the wind speed varies with the level of measurement from the ground. We concluded that all weather stations in the three networks would be required to measure the variability of wind in the study area. Nevertheless, we demonstrated that a similarity analysis would be a decision tool to rationalize/optimize weather stations in a network for wind data measurements. This method of finding similarities would be applicable in optimizing a weather network to minimize the associated costs without sacrificing the scientific credibility of a monitoring program. However, we recommend evaluating these methods thoroughly before applying them to other weather networks in Canada and elsewhere during any decision-making process. Funding: This research was funded by the Oil Sands Monitoring (OSM) Program of Alberta Environment and Parks (AEP). It was independent of any position of the OSM Program. The fund was awarded to Q.K.H. having an agreement number of 19GRAEM25. OSM had no role in the study design, data collection and analysis, decision to publish, and preparation of the manuscript.

Data Availability Statement:
The data used in this study is freely accessible and downloadable from their respective websites. The links are as follows: <http://www.ramp-alberta.org/data/map/ default.aspx?c=Climate> (accessed on 1 December 2021) and <https://wbea.org/network-and-data/ monitoring-stations/> (accessed on 1 December 2021).