Accuracy Assessment of WRF Model in the Context of Air Quality Modeling in Complex Terrain

: Output data from the Weather Research and Forecasting (WRF) model are frequently used in air quality modeling for scientiﬁc, practical and regulatory purposes. Therefore, it is crucial to determine whether the accuracy of WRF predictions is suitable for application in air quality models on a local scale (<50 km) and in complex terrain. The presented research is unique because, to assess the accuracy of the WRF model, data from experimental data sets for the assessment of air quality models were used, which contained information about the actual conditions of selected meteorological parameters along the vertical proﬁle of the atmosphere. The aim of the study was to conduct an evaluation of the WRF model using data derived from three ﬁeld experiments designated to conduct air quality model evaluation studies for models such as AERMOD, ADMS or CALPUFF. Accuracy evaluation was carried out in relation to the grid resolution, station location (on-site and weather airport) and vertical proﬁle of the atmosphere. Obtained results of the evaluation for temperature, wind speed and direction were analyzed with regard to the possibilities of application in air quality modeling systems. It was stated that the use of a grid with a resolution of 1 km generally resulted in statistically signiﬁcantly lower values of errors for wind speed compared to a 4 km resolution. The outcomes of simulations for temperature and wind speed were sensitive with regard to the location. In on-site locations (complex terrain) signiﬁcantly higher values of prediction errors (MB, MGE, RMSE) were obtained compared to the standard weather station locations (airport). In addition, wind speed predictions in on-site locations were generally biased (overestimated). Along the vertical proﬁle of the atmosphere, up to the altitude of 100 m a.g.l., statistically signiﬁcantly different outcomes of accuracy evaluation were achieved for wind speed and direction. Considering the above, caution should be exercised when using data from meteorological simulations in air quality modeling.


Introduction
Accurate air quality predictions on a local scale (<50 km) and in complex terrain constitute one of the scientific challenges due to, among others, the necessity of replicating local meteorological conditions while maintaining adequate computational efficiency. The fundamental problem associated with air quality modeling for impact assessments is the availability of representative meteorological data for the considered area, as noted by Tartakovsky et al. [1], Abril et al. [2], Karthick and Devadoss [3] or Ottosen et al. [4]. In many cases, the application of data from existing monitoring systems of meteorological parameters is not adequate. Spatial representativeness of the measurement data is limited. The use of meteorological data from sites located several to dozens of kilometers from the source may lead to unrepresentative results of dispersion modeling, especially in complex terrain [5,6]. Found in the literature, one of the solutions to this problem is to obtain data from an advanced meteorological model, e.g., WRF (Weather Research and Forecasting model) or MM5 (Fifth-Generation Penn State/NCAR Mescoscale Model). The importance of meteorological modeling in regional air quality assessment was emphasized by, among others, Sistla et al. [7], Gilliam et al. [8], Pielke and Uliasz [9], and McNider and Pour-Biazar [10].
Two of the most recognizable mesoscale meteorological models used for providing input to the air pollution models are the WRF [11] and MM5 [12]. However, the MM5 model is no longer supported and developed since 2005 and the results of research included in [1] provide evidence that for the purpose of air quality modeling the application of the simulation outputs from WRF is more effective. WRF was designed for general atmospheric research purposes as well as for operational forecasting. Its applications include, among many others, short-term meteorological forecasting [13,14], climatologic research [15,16], prediction of extreme events such as hurricanes [17,18], wind power forecasting [19,20] and air quality assessment and forecasting [2,5,[21][22][23][24][25][26][27][28][29][30]. Apart from regional applications, the results from the WRF model are implemented in air quality modeling systems on a local scale (<50 km) for air quality impact assessment, e.g., [3,5,[20][21][22][23][24] in the areas of urban agglomerations in order to assess air quality, determine the location of air quality monitoring stations and to quantify the contribution of each emission source in air quality deterioration [31][32][33][34][35][36][37]. Many of the aforementioned studies focused on conducting an air quality model assessment, but meteorological model assessment was occasionally neglected. In effect, conclusions from the studies are limited as they do not provide information about the accuracy of the input meteorological data.
According to the United States Environmental Protection Agency (U.S. EPA) Revisions to the Guideline on Air Quality Models [38], wind direction measurement error of 5-10 • results in uncertainty ranging from 20% to 70% of the concentration determination (in a given time and space) by Gaussian plume models. It means that relatively small modification of the meteorological parameter leads to exponential increase in the error of the surface air pollutant concentration. This effect is always crucial, since the application of meteorological data from weather prediction modeling systems in the process of air quality modeling may yield unrepresentative outcomes and lead to inadequate conclusions. It can also compensate for the errors in air quality predictions, which in turn prevent the actual assessment of both the structural and parametric uncertainty of the model. This issue is particularly significant on a local and micro scale, where reliable representation of meteorological conditions is essential and there are no on-site measurements available. Publicly accessible ISD [39] and IGRA [40] meteorological parameter measurements are generally conducted at the airports located on the cities' outskirts [4]. This problem was also addressed in [41], where default wind speed correction factor (f_roof) in the OSPM model was responsible for the underestimation of PM 10 and PM 2.5 particulate matter concentration predictions. The use of meteorological data from weather prediction models in air quality modeling (including radioactive pollutants) is therefore unavoidable on a local scale, which is confirmed by numerous studies that used data from meteorological models [21,23,35,[42][43][44][45][46][47]. On the other hand, many studies [2,5,6,48] demonstrated that the use of data from meteorological models results in decreased accuracy in predicted air pollutant concentration levels.
The above is reflected in the U.S. EPA Revisions to the Guideline on Air Quality Models [38]. The use of meteorological data from weather prediction models is allowed, but only when representative measurement data are not available or conducting meteorological measurements is not feasible. The application of these data must be preceded by an accuracy assessment. Noting the research on the evaluation of dispersion models on a local scale, including complex terrain, it should be emphasized that regardless of the study area, the best air quality predictions were obtained when a simultaneous measurement campaign of the meteorological parameters was conducted in the vertical profile of the atmosphere [5,6,49]. This implies that the extrapolation methods along the vertical profile of the atmosphere included in the air quality models for selected meteorological parameters do not yield reliable results. Therefore, it is particularly significant to pay attention to the accuracy of the representation of the vertical profile of meteorological parameters by weather prediction models, since the acquisition of such data for impact assessments of emission sources on air quality is expensive and probably not economically justified in many situations.
The aim of the study was to evaluate the accuracy of the WRF model in the scope of basic meteorological parameters used in the air quality modeling process (temperature, wind speed and direction) at different levels above the ground level on a local scale and complex terrain. The paper evaluated whether an increase in the density of the computational grid from 4 km to 1 km resulted in more accurate outcomes of the meteorological parameter predictions as, according to [50,51], an increase in the grid resolution does not always yield better wind field predictions. The accuracy assessment results of selected meteorological parameters were compared with regard to different locations of the sites (synoptic, on-site). It allowed for the examination of the stability of the weather prediction model subjected to varying local terrain conditions. The research experiment was designed in a way that allowed for the application of conducted simulations for other purposes, such as evaluation of air quality modeling systems using meteorological data from WRF simulations or parameterization of air quality models with regard to input meteorological data. Therefore, it was decided to apply three experimental dispersion model assessment datasets made available by the U.S. EPA [52]. These experiments are standardly used to evaluate the accuracy of the development versions of AERMOD and ADMS models. The choice of multiple experiments was also motivated by the conclusions in [53], where it was shown that it is not always feasible to obtain homogenous conclusions for the same air quality model with an analogous configuration for different study areas that have common features relevant to air quality modeling.

Study Area and Data Completeness
Study areas consisted of three experimental datasets included in the Model Evaluation Databases (MED), i.e., Martin Creek (MC), Lovett (LV) and Tracy Power Plant (TR). These databases were originally made available by the U.S. EPA [52] for the evaluation and development of air pollutant dispersion models used for regulatory purposes. Results of studies focused on the evaluation of dispersion models such as CALPUFF and AERMOD with the use of the aforementioned datasets were presented, among others, in [5,[53][54][55][56][57]. These areas of study were selected because they are characterized by complex terrain. As part of the experiments, measurements of selected meteorological parameters were conducted in the vertical profile of the atmosphere. Graphical representation of the study areas is shown in Figure 1, and the characteristics of the meteorological sites are presented in Table 1. stable atmospheric conditions dominated. Wind speed and direction measurements were conducted to the altitude of 150 m a.g.l. from an instrumented tower. Wind measurements were extended above 150 m using Doppler acoustic sounder. Temperature measurements were extended with a tethersonde [58,61]. In addition to the on-site meteorological data (site-specific location meteorological station for experiments) there were also weather stations within computational domains, from which measurements are available through the ISD database [39]. Measurement meteorological data from the MED (ONS for each experiment) and ISD (LVIA, WCA, SIA, RTIA) databases were characterized by a time resolution of 1 h.  As part of each experiment, measurements of temperature, wind direction and speed at various altitudes were carried out ( Figure 2). The Martin Creek experiment was conducted from 1 May 1992 to 19 May 1993 on the Pennsylvania/New Jersey border. Measurements of temperature, wind speed, wind direction at the altitude of 10 m a.g.l. were recorded from an instrumented tower, and the hourly measurements of wind speed at multiple heights were conducted using sodar [54,57,58]. The Lovett experiment was conducted from December 1987 to December 1988 in a rural area of New York State. Measurements of meteorological parameters were conducted at the heights of 50, 100 and 150 m a.g.l. from the meteorological tower [58,59]. The Tracy Power Plant experiment was carried out in Nevada State, approximately 27 km from Reno. It was part of the EPA Complex Terrain Model Development (CTMD) program [60]. Measurement period covered 128 h in August 1984, consisting of 14 measurement sub-periods during which stable atmospheric conditions dominated. Wind speed and direction measurements were conducted to the altitude of 150 m a.g.l. from an instrumented tower. Wind measurements were extended above 150 m using Doppler acoustic sounder. Temperature measurements were extended with a tethersonde [58,61]. In addition to the on-site meteorological data (site-specific location meteorological station for experiments) there were also weather stations within computational domains, from which measurements are available through the ISD database [39]. Measurement meteorological data from the MED (ONS for each experiment) and ISD (LVIA, WCA, SIA, RTIA) databases were characterized by a time resolution of 1 h.
Results of the analysis verifying completeness of meteorological data derived from standard meteorological stations (ISD) and experiment-specific measurements in the vicinity of the emission source (ONS) are presented in Table 2 and Figure 2, respectively. Data from the LVIA (MC) and RTIA (TR) stations were characterized by a high completeness at the level of minimum 90%. In the case of the Lovett experiment, completeness of data from weather stations was visibly lower, especially for wind direction records manifested by the availability of data at the level of 59% and 83% for the SIA and WCA stations, respectively. For the on-site measurements, conducted as part of field experiments, the completeness of data was significantly higher. In the case of measurements carried out at the altitude of 10 m a.g.l. data availability ranged from 98% to 100%. However, the completeness of data decreased with the increase in altitude to 40-50% at the highest measurement levels, depending on the experiment and meteorological parameter. It should be noted that, accordingly to Figure 2, in the case of the MC experiment there were no temperature measurements in the vertical profile of the atmosphere available. In the TR experiment wind speed was not measured at the heights of 15 m a.g.l. and 50 m a.g.l. and the wind direction was not available for the level of 75 m a.g.l.
Analyzed experiments are not homogenous. First of all, data from multiple levels along the vertical profile of the atmosphere are available for the TR experiment, but the measurement collection period was exceptionally short with 128 h of total data availability. For other experiments the measurement periods were at least one year long, but for the LV the data was collected only from the levels of 10, 50 and 150 m a.g.l. All experiments concern areas characterized by complex terrain, but they are strongly differentiated in terms of the measurement campaign period or the number of data collected. Therefore, caution should be exercised when comparing the results of accuracy assessment of the WRF model between experiments.

Model Setup and Data Processing
The WRF v3.8 model runs were initialized using the North American Regional Reanalysis (NARR) dataset (ds608.0) developed by the National Centers for Environmental Prediction (NCEP). The NARR model assimilates (3DVAR) observational data such as temperatures, winds and moisture from radiosondes or pressure data from surface observations as well as other measurements, including precipitation [62]. The output is available with a 3-h timestamp at 29 Eta pressure levels for the period of 1979-2023 and continuing. The land use information for the WRF runs was derived from the National Land Cover Data 1992 (NLCD 1992) with a native resolution of 30 m, which was assumed representative as all the experiments were conducted during the period of 1984-1993. Recent studies show that the impact of land use on the WRF model performance is not negligible but subtle [63,64]. However, urban areas tend to block incoming solar radiation and wind profiles which may impact the temperature, wind speed and direction [65] and their adequate representation is crucial for the model's performance in highly urbanized regions [66]. Therefore, the land use data from 1992 applied in the study should not be the main determinant of the differences between the obtained simulation results and observations.
Depending on the experiment, three (MC, LV) or four (TR) computational domains were set. The coarsest domain in the TR experiment had a resolution of 36 km and in both MC and LV 24 km. In all cases the innermost, one-way nested domains had the resolutions of 4 km and 1 km, as presented in Figure 1. Nesting was carried out using 3-6 grid ratio, since [67] found no significant differences between ratios in the range of 3-7 in relation to the model performance. Simulations were carried out at 32-43 pressure levels depending on the experiment. In order to better represent the boundary layer in regions with complex terrain, a higher number of levels of pressure was used compared to the studies presented in [68]. Details concerning domain setup in WRF model for different experiments are summed up in Table 3. Simulations were carried out for time periods corresponding to the specific experiments described in Section 2.1 with a 12-h model spin-up to develop smaller-scale convective features as noted by [69]. Results of the analysis verifying completeness of meteorological data derived from standard meteorological stations (ISD) and experiment-specific measurements in the vicinity of the emission source (ONS) are presented in Table 2 and Figure 2, respectively. Data from the LVIA (MC) and RTIA (TR) stations were characterized by a high completeness at the level of minimum 90%. In the case of the Lovett experiment, completeness of data from weather stations was visibly lower, especially for wind direction records manifested by the availability of data at the level of 59% and 83% for the SIA and WCA stations, respectively. For the on-site measurements, conducted as part of field experiments, the completeness of data was significantly higher. In the case of The WRF physics configuration for the experiments was set based on the literature findings and summarized in Table 4. Boundary layer physics were resolved using the YSU scheme [70], which was found to perform relatively better in terms of reproducing vertical wind speed profile and the 10-m wind speed compared to other schemes [71]. The microphysics, on the other hand, was resolved using the WSM6 scheme [72]. For the cumulus parameterization the Kain-Fritsch (new Eta) scheme [73] was selected, with no cumulus physics in the inner domains. The Noah LSM [74] selection for the land surface model was based on its ability to predict near-surface water mixing ratios [75], however it tends to miscalculate the boundary layer evolution in the morning hours [76]. For the longwave and shortwave radiation, the Rapid Radiative Transfer Model [77] and Dudhia [78] schemes were selected, respectively. The MM5 Dudhia scheme was found to perform slightly better in the terms of wind direction prediction than the Goddard and GFDL schemes [75]. Extraction of the WRF model simulation results was carried out using MMIF [79], which is designed to process meteorological data from WRF and MM5 for use in air quality models such as AERMOD, SCICHEM and CALPUFF. Application of this tool was necessary for the acquisition of meteorological data at defined altitudes a.g.l. WRF model simulates values of meteorological parameters on given pressure levels; therefore, for the comparison of outcomes in the vertical profile of the atmosphere it was essential to determine simulation values at given heights relative to the ground level. According to Jiménez and Dudhia [68,80], careful selection of grid points to assess model accuracy is important. In this case, the closest grid points were selected because the variation in terrain elevation within about 1 km of the station location was small. Terrain mapping for grid resolution ranging from 100 to 4000 m is presented in [53].

Methodology for Evaluating the Accuracy of WRF Model
The evaluation of the WRF model simulation results was conducted for temperature, wind speed and direction. Error (RMSE), Index Of Agreement (IOA). Equations for calculation of the model performance measures are presented below [81][82][83].
Model performance measures such as MB, MGE, RMSE return values in units of the evaluated parameter. A value of 0 indicates an ideal model. In the case of MB, caution should be exercised, as the value of 0 might mean that the negative and positive bias of predictions towards observations have balanced out. In contrast, for the NMB and NMGE a value of 0 always indicates an ideal model. Normalized values of systematic errors were applied in comparative analysis of selected parameters involving assigned groups where mean observed values differed, e.g., the change in the mean wind speed value with the altitude. In this case, Index of Agreement (IOA) indicates refined index of model performance, which was introduced in [83]. IOA is a dimensionless parameter with value in the range from −1 to 1, where 1 indicates an ideal model. The value of 0 means that the sum of model errors equals the sum of observations deviations in relation to their mean value. The value of 0.5 denotes that the sum of model errors constitutes half of the observed values deviations in relation to the mean value [82].
Emery et al. [81] proposed sets of benchmarks for the abovementioned statistical measures of the model performance for various atmospheric parameters from meteorological models used in air quality modeling. Benchmarks were proposed for wind speed Above values were determined based on multiple different PSU/NCAR simulations with MM5 mesoscale model but have been applied for WRF model as well [84][85][86]. In accordance with the interpretation included in [86] the above reference values are to be considered as guidelines rather than final tests of acceptance or rejection of the model. Outcomes of the meteorological evaluation may depend on time and location, which is de facto the subject of this study. Moreover, determination of the model evaluation criteria based on non-normalized parameters, only in the units of evaluated parameters, creates certain discomfort in terms of scientific correctness. It is difficult to assume that the accuracy of the model at the level of 1 m/s is equally relevant where the mean wind speed values and the sample variances at diverse locations are statistically significantly different. Application of homogenous criteria is not always justified, but they should be taken into account and the analysis results should be interpreted cautiously. The abovementioned guidelines were developed for the MM5 model; however they are used as a reference point for assessing the performance of meteorological models [76,87]. What is more, in accordance with U.S. EPA [88] these benchmarks should not be applied as indicators of model simulation acceptance. The results of statistical evaluation might point to the occurrence of specific, frequently local effects that require further examination. In this study, the evaluation of meteorological predictions was divided into 3 stages. In the first stage, outcomes from computational domains of different horizontal resolutions were compared. Deciding in the first place which resolution of the computational grid produces significantly lower deviations of meteorological parameters in relation to the observations assisted in narrowing down the range of presented outcomes in the next stages.
Second stage focused on the analysis of the results obtained for meteorological measurements carried out at the ground level. In this case, measurement data from field experiments (ONS), as well as synoptic stations (ISD) were available. The ONS sites, compared to the synoptic stations (ISD), were generally located in river valleys surrounded by hills. Therefore, the ONS sites were located in the areas defined as complex terrain according to the definition in [38]. In those locations, terrain effects influence greatly the characteristics of wind field, as well as transport and diffusion of air pollutants. Based on the observations concerning data completeness (see Section 2.1) in this stage the number of observations was limited in order to maintain an equal amount of observations between compared stations. In the case of Lovett experiment, two ISD stations (WCA and SIA) were located within the computational domain; therefore, for each station data comparison was conducted separately, with two distinct datasets characterized by different numbers of observations.
Third stage involved the evaluation of the WRF model along the vertical profile of the atmosphere and was limited to the on-site stations. Analysis focused on the comparison of accuracy of the meteorological predictions with increasing altitude. This stage is of significant importance due to the possibility of implementing the data in air quality modeling. In typical air quality models applied for environmental impact assessment it is generally necessary to determine wind speed at the height of flue gases discharge point, increased by the plume rise. Values of meteorological parameters in the atmospheric vertical profile are most commonly estimated based on meteorological data at ground level, measured at standard ISD stations and atmospheric soundings, which are characterized by the measurement frequency of 2-6 times per day.
For the purpose of examining whether there were differences between analyzed cases at individual stages, the model comparison measure (MCM) was applied, defined as: In order to verify whether the differences were statistically significant, confidence intervals (CI) at the confidence level of 95% were determined. CI intervals were estimated using non-parametric method of bootstrap available through the boot package [89,90] for the R programming language [91] in the RStudio integrated environment [92]. The methodology proposed above for comparison of the model performance outcomes was derived from the air quality model evaluation procedures described in [93,94]. The Supplementary Materials section of this paper contains a supplement to the accuracy assessment of the WRF model as well as a graphical presentation of the results. Graphical methods and selected algorithms were taken from the presentation of research results [81,95].

Grid Resolution
The results of the WRF model accuracy evaluation concerning two set computational grid resolutions with regard to the assessed meteorological parameters, experiments and sites are shown in Table 5 Figure 3, together with the confidence interval at the confidence level of 95%, indicate that apart from the TR (RTIA) experiment the obtained differences between variants are statistically significant (the confidence interval does not coincide with zero) for wind speed. This suggests that the grid resolution has a statistically significant impact on the outcomes of simulation predictions for wind speed in specific situations.
regardless of the experiment and site location for the grid resolutions of 1 km (MGE: 1.2-1.6; RMSE: 1.6-2.0) and 4 km (MGE: 1.0-1.7; RMSE: 1.2-2.0). This indicates that the obtained modeling outcomes are characterized by a comparable scatter in relation to the mean observed wind speed. Simulated wind speed predictions are visibly biased. In the case of the site-specific (ONS) measurements, overestimation of simulated wind speed in relation to the observations at the level of 0.9-1.4 m/s was found for the WRF1 variant. An analogous relationship was noticed for the WRF4 variant as well, but the range of values is wider (from 0.3 to 1.5 m/s). In other cases, an underestimation of wind speed at the level of −1.1 to −0.1 m/s was observed (weather airport stations). A lower scatter of error values between experiments for the resolution of 1 km indicates that the use of higher resolutions of orography representation results in more stable (with a narrower range of error values) wind speed predictions. The achieved values of MB and IOA are not conclusive in terms of whether the increase in grid resolution results in the improvement of wind speed predictions. On the other hand, values of MCM shown in Figure 3, together with the confidence interval at the confidence level of 95%, indicate that apart from the TR (RTIA) experiment the obtained differences between variants are statistically significant (the confidence interval does not coincide with zero) for wind speed. This suggests that the grid resolution has a statistically significant impact on the outcomes of simulation predictions for wind speed in specific situations.  In the case of the SIA and LVIA stations significantly lower values of mean gross error (MGE) were obtained for the computational grid resolution of 1 km compared to 4 km. However, for the WCA station the opposite observations were made. This is interesting due to the fact that in the LV experiment for two different locations (WCA and SIA) distinct results were obtained. The WCA station, located by the Westchester County Airport, is situated less than 10 km from the shoreline, which may cause discrepancies in the obtained evaluation results for different locations. Coastal regions are strongly affected by atmospheric forcing [96] that may not be accurately represented by numerical models if no additional, more accurate input data (e.g., sea surface temperature) are included in calculations [97,98] or the applied resolution is not sufficient to resolve terrain complexity and small-scale weather phenomena [84,99].
Increase in the computational grid resolution lowered the prediction bias, but simultaneously caused an increase in the absolute values of deviations (MGE). As a consequence, mean long-term predictions reflected well the actual conditions in the case of the resolution of 1 km, but this was not necessarily the case for short-term, e.g., 1 h, predictions. This finding is crucial in the context of air pollutant dispersion modeling for regulatory purposes. Achieving similar mean values for a simulation period (e.g., one year) indicates that a sample of results was obtained that will be statistically similar to reality, but will not always correctly represent momentary values. This is also a certain problem for air quality models used for regulatory purposes, which are mainly used to predict 1-h maximum and annual averaged values. When assessing the accuracy of predictions of air pollutant concentration levels, pairing of samples in time and space is generally neglected due to the imperfect nature of the applied methods. Awareness of these facts leads to a simple conclusion. In the evaluation of meteorological predictions carried out for air quality modeling, most likely it will be more critical to pay attention to the similarity of the prediction distributions compared to the observations. Thus, it is more important to obtain a sample of predictions, whose statistical characteristics will reflect the actual conditions. On the other hand, the disadvantage of such approach is the need to run simulations for a period of at least several years, in order to obtain a sample of data characterized by a full range of possible variations of meteorological conditions. The results of the wind speed prediction assessment for the vertical profile of the atmosphere at each location are shown in Figure 4. In the case of the MC and LV experiments, significantly lower values of MGE for wind speed and 1 km resolution (WRF1) were obtained. In contrast, results for the TR experiment indicate that the obtained accuracy assessment results were not statistically significantly different. A wide confidence interval for the TR experiment is a consequence of the sample size (Supplementary Materials) and the accuracy of the WRF model is dependent on the season of the year [84]. Additional analyses, included in Supplementary Materials, indicate that when using the same data sample (the same time range of the year), significantly wider confidence intervals were obtained for the MC and LV experiments compared to those shown in Figures 3-5. Moreover, statistical model evaluation measures were generally higher compared to the results obtained for the entire period of the MC and LV experiments. Hence, each of the considered experiments (MC, LV and TR) should be treated independently. racy assessment results were not statistically significantly different. A wide confidence interval for the TR experiment is a consequence of the sample size (Supplementary Materials) and the accuracy of the WRF model is dependent on the season of the year [84]. Additional analyses, included in Supplementary Materials, indicate that when using the same data sample (the same time range of the year), significantly wider confidence intervals were obtained for the MC and LV experiments compared to those shown in Figures  3-5. Moreover, statistical model evaluation measures were generally higher compared to the results obtained for the entire period of the MC and LV experiments. Hence, each of the considered experiments (MC, LV and TR) should be treated independently. The above observations are important in the contest of the aim of the study, since increasing grid resolution generally results in a statistically significant increase in the accuracy of wind speed predictions at the on-site locations (in the case of datasets collected for at least one year). This in turn suggests that the use of meteorological predictions with a resolution of 1 km for the purpose of air quality modeling will provide wind speeds characterized by lower deviations from the actual values compared to coarser resolutions.
The presented results of the statistical parameters (MB, MGE, RMSE) of the wind direction assessment do not provide evidence that the increase in the computational grid resolution yields more reliable predictions ( Table 5). The values of MGE and RMSE are similar at the resolutions of 1 km and 4 km. Values of MCM shown in Figure 3 demonstrate that only in the case of LV-SIA was a statistically significant difference between variants obtained. In other cases, the confidence intervals coincide with zero, indicating that the increase in the resolution of the computational grid did not result in an increase in the wind prediction accuracy. Values of MCM in the vertical profile ( Figure 4) overwhelmingly indicate that there are no significant differences between the results obtained using 1 km and 4 km resolution. In the case of the MC experiment, different results were obtained for the comparative evaluation of different computational grids depending on the altitude above ground level. At the heights of 10 and 120 m negative values of MCM (−2.61 to −0.47) and for the heights of 300, 330 and 360 m positive values (0.02-0.97) were achieved. At the remaining heights above ground level the values of the MCM confidence interval coincide with zero. In conclusion, the obtained outcomes of the comparative evaluation of the WRF model's accuracy indicate that the use of 1 km grid resolution does not result in more reliable wind direction predictions. On the other hand, it is impossible to conclude that the resolution of 4 km yielded statistically significantly lower error values. However, in this case, the quality of predictions depends on the experiment or locations. A similar conclusion was presented by Giovannini et al. [50]. They pointed out that the increase in the computational grid from a resolution of 2 km to 1.2 km resulted in higher accuracy of prediction only in selected locations (approximately 50% of the analyzed sites). The studies carried out by Giovannini et al. [50] refer to only one geographical area (Europe, Alps). They did not include the analysis of the accuracy of the WRF model in relation to the vertical profile, but measurements of meteorological parameters were made at 16 stations.
The obtained results of the WRF4 accuracy evaluation in the context of temperature (Table 5) indicate that, regardless of the grid resolution, the MGE values were greater than 2 for almost all cases. Similarly, for MB in six out of eight cases the values exceeded the range of |MB| < 0.5. Therefore, the WRF evaluation results do not meet the benchmarks proposed by [81]. However, in the case of the MC and LV experiments, the index of agreement reached IOA ≥ 0.8 (see Supplementary Materials), which is in agreement with the criteria established in [81]. It should be emphasized that the presented results of the temperature prediction accuracy do not differ from the outcomes in other studies concerning computational grid resolution [1,63,86,100]. Comparative analysis of the temperature modeling evaluation unambiguously indicates that the use of a finer grid resolution (1 km) yields higher accuracy. Both in Figures 3 and 4, the MCM parameter with confidence intervals at the confidence level of 95% has positive values for the MC and LV experiments. In contrast, the results for the TR experiment are inconclusive. Statistically significant and lower errors for temperature predictions were obtained only for the altitudes above 250 m a.g.l. In other cases, generally no statistically significant differences were found ( Figure 4). The presented conclusions are in agreement with the results of studies [63,100] where a similar increase in temperature prediction accuracy was identified with the use of finer grid resolutions. However, [100] pointed out that the resolution of 3 km is satisfying, since lesser computational costs allowed more simulations to be carried out in the same amount of time. On the other hand, it should be emphasized that in the aforementioned studies the confidence intervals of the model evaluation measures were not determined, and thereby no statistically significant differences between evaluated configurations were estimated. Therefore, a direct comparison of the results between studies is impossible. In addition, the study above did not include an accuracy assessment of the WRF model along the vertical profile of the atmosphere.
Taking into account the obtained results, we recommend using a resolution of 1 km, because it allows the achieving of statistically significant and lower MGE values compared to a grid with a resolution of 4 km. Therefore, the remaining Sections 3.2 and 3.3 are limited to the results only for the resolution of 1 km. The outcomes presented above and in the discussion indicate that the use of 1 km resolution compared to 4 km usually increases the accuracy of wind speed and temperature predictions. The results for wind direction were ambiguous but no contradictions to the above conclusion were found. Future research on grid resolution should focus on parameterizing the model by geographic region and number of pressure levels to see if boundary layer issues in complex terrain are correctly resolved. Better representation of topography will not increase the accuracy of numerical model predictions due to the inability of numerical models to represent near-surface structures, e.g., temperature inversions [101]. This problem is very well illustrated by the animation of vertical temperature profiles presented in the Supplementary Materials (Chapter S4).

Location of the Station
A comparison of the results was conducted for two types of the stations, i.e., on-site (ONS) and synoptic (ISD) situated in different locations. The analysis was aimed at verifying whether the accuracy of WRF model predictions is similar in non-typical locations compared to standard meteorological measurements at sites located near airports. Unlike many other studies [84,85,102], the presented evaluation includes campaigns datasets from the MC, LV and TR experiments. Measurements were carried out at various levels relative to the ground surface. In addition, these data were not assimilated to the simulations at any stage. In effect, the evaluation of meteorological predictions was possible in areas where standard measurement data are not available. This situation corresponds to the realities of air quality modeling of pollutant dispersion in the atmosphere, where in situ meteorological data are not always available. In the case of prediction accuracy comparison for two different locations it was necessary to unify the frequency of observations between data from stations located near airports (ISD) and data obtained as part of experiments (ONS). MB, MGE and RMSE results with confidence intervals at the 95% confidence level, calculated for temperature, wind speed and direction for each experiment, are shown in Figure 5. The MGE values obtained for wind speed are closer to zero for ISD stations compared to ONS sites. The 95% confidence intervals did not overlap for the MC and LV experiments. This proves that the achieved WRF accuracy results at different locations within The MGE values obtained for wind speed are closer to zero for ISD stations compared to ONS sites. The 95% confidence intervals did not overlap for the MC and LV experiments. This proves that the achieved WRF accuracy results at different locations within the same computational domain are statistically significantly different. The MB value explicitly demonstrates that the WRF model was biased with respect to the station location. For the ONS sites the results were overestimated, which is well illustrated by the scatter plots in Supplementary Materials (Chapter S2.2.1). Similar conclusions regarding the WRF model bias were presented in [101,103]. In the case of ISD stations, wind speeds were underestimated in the range from −1.12 m/s to −0.12 m/s. The RMSE values indicated that the scatter of the results for the MC (1.83 m/s-1.86 m/s) and TR (1.48 m/s-1.62 m/s) experiments was similar for different sites (95% confidence intervals overlapped) and for both comparable cases in the LV experiment it was different (ranging from 1.5 m/s to 2.06 m/s). Wind speed deviations were high depending on the site location. It is manifested by the MB values which, in some ONS cases, were at the level of mean observed values. According to Hanna and Yang [104] the uncertainty of wind speed stems majorly from random turbulent processes as well as errors in both land use and topography representation. Taking into account the outcomes for ONS and ISD stations it is suspected that in this study the main component of the prediction uncertainty was the location of the ONS stations in complex terrain. It is suspected that the problem of overestimation of wind speed in complex terrain can be corrected by using the parameterization of the WRF model described in Jiménez and Dudhia [68]. In the future, parameterization of the WRF model should be considered in order to minimize wind speed errors in complex terrain (on-site station locations).
The presented results raise doubts in the context of the use of wind speed prediction for air quality modeling. The use of the overestimated predictions in the air quality model will result in lower air pollutant concentrations compared to the observed values and, what is more, the maximum concentrations will be located at a greater distance from the emission source. In the case of the air quality model, additional attention should be drawn to the plume rise. It is dependent on the wind speed and temperature difference between the plume and the environment, as well as the plume ejection velocity. For wind speed that dependence is inversely proportional [105]. In effect, the wind speed overprediction in the WRF model will result in a lower plume rise in the considered experiments.
All of the considered model performance measures for wind direction in almost every case were statistically significantly different ( Figure 5). This parameter was the most sensitive to the representation of actual conditions in different locations. Therefore, for this parameter, the values of wind direction deviations are presented using polar plots in Figure 6. Graphical representation indicates unambiguously that in the case of the ONS sites deviations exceeding ±90 • are more frequent. This is reflected by the statistical model performance measures. The MGE values for ISD locations were at the level of 40 • but for the ONS sites they were significantly higher and exceeded the value of 60 • (LV, MC). Similar observations were noted for RMSE (ISD-RMSE < 60 • , ONS-RMSE > 60 • ). The above remarks are also confirmed by the comparative analysis of wind roses included in Supplementary Materials. For example, in the MC experiment, due to the location of the ONS station in a relatively narrow valley of the Delaware River (with a width not greater than 15 km in this region), two dominant, opposite wind directions are noticeable-SW and NE. These directions were partially reproduced by the model as manifested by MGE and RMSE values of 42 • and 61 • , respectively. However, due to the lack of natural obstacles in the vicinity of the airport, the wind rose for the ISD (LVIA) station does not show the presence of dominant directions of air inflow, only a typical predominance of winds blowing from the west. This situation was represented more accurately by the model simulations and resulted in MGE and RMSE values for wind direction at the level of 36 • and 51 • , respectively. Similar conclusions were shown in [106], where it was observed that for a location with complex terrain the RMSE values of wind direction were twice as high compared to areas located on flat terrain. Liu et al. [106] unlike this study, used the assimilation of observational data, and yet they did not obtain an effective improvement in the results of wind direction forecasts in complex terrain. As in Jiménez and Dudhia [80], it was observed that wind direction errors are inversely related to wind speed. This is presented in the graphs of MGE and RMSE values depending on wind speed, included in Supplementary Materials (Chapters S2.2 and S3.2.2). This effect is a consequence of topography, which strongly modifies circulation at the synoptic scale. This results in a large spatial variability of the airflow in complex terrain. The reason for these errors is the limitations of the WRF model's representation of topography, as synoptic circulation is generally well represented by mesoscale models [80]. In contrast to these studies, they used a horizontal grid resolution of 2 km, and the accuracy assessment referred only to wind direction measurements at a height of 10 m a.g.l. Therefore, it can be concluded that reducing the horizontal grid resolution to 1 km has not solved this problem in complex terrain. However, in contrast to previous studies, the presented results of the WRF model accuracy assessment (Supplementary Materials-Chapter S3.2.2), provide evidence that the problem is not only related to the representation of wind direction at a height of 10 m a.g.l. It affected all vertical layers for which the accuracy of the WRF model was evaluated. It is suspected to be the consequence of greater variability in wind direction at low wind speeds, which is more difficult to represent in meteorological simulations.
In the case of the TR experiment both MB and RMSE proved the opposite observations. Lower error values were noted for the ONS station. On the other hand, the RTIA site is situated in Reno which is located on the border of the Mount Rose wilderness. Moreover, the RTIA measurement site is located relatively close to the computational domain border in conducted simulations (about 4.5 km away). Complex topography and location of the station might be the reasons for higher deviations of wind direction, especially considering that the remaining ISD stations (SIA, WCA, LVIA) are located in areas characterized by a less varied topography in terms of the absolute altitude. Extending the size of the domain might address this issue and enable dynamics and physics to fully Liu et al. [106] unlike this study, used the assimilation of observational data, and yet they did not obtain an effective improvement in the results of wind direction forecasts in complex terrain. As in Jiménez and Dudhia [80], it was observed that wind direction errors are inversely related to wind speed. This is presented in the graphs of MGE and RMSE values depending on wind speed, included in Supplementary Materials (Chapters S2.2 and S3.2.2). This effect is a consequence of topography, which strongly modifies circulation at the synoptic scale. This results in a large spatial variability of the airflow in complex terrain. The reason for these errors is the limitations of the WRF model's representation of topography, as synoptic circulation is generally well represented by mesoscale models [80]. In contrast to these studies, they used a horizontal grid resolution of 2 km, and the accuracy assessment referred only to wind direction measurements at a height of 10 m a.g.l. Therefore, it can be concluded that reducing the horizontal grid resolution to 1 km has not solved this problem in complex terrain. However, in contrast to previous studies, the presented results of the WRF model accuracy assessment (Supplementary Materials-Chapter S3.2.2), provide evidence that the problem is not only related to the representation of wind direction at a height of 10 m a.g.l. It affected all vertical layers for which the accuracy of the WRF model was evaluated. It is suspected to be the consequence of greater variability in wind direction at low wind speeds, which is more difficult to represent in meteorological simulations.
In the case of the TR experiment both MB and RMSE proved the opposite observations. Lower error values were noted for the ONS station. On the other hand, the RTIA site is situated in Reno which is located on the border of the Mount Rose wilderness. Moreover, the RTIA measurement site is located relatively close to the computational domain border in conducted simulations (about 4.5 km away). Complex topography and location of the station might be the reasons for higher deviations of wind direction, especially considering that the remaining ISD stations (SIA, WCA, LVIA) are located in areas characterized by a less varied topography in terms of the absolute altitude. Extending the size of the domain might address this issue and enable dynamics and physics to fully evolve inside and correct possibly improper large-scale features inherited from the lateral boundary conditions [107,108].
Application of the WRF meteorological model data in air pollutant dispersion modeling will result in specific consequences. Atmospheric wind direction is equivalent to the direction of the plume from the emission source. In effect the air pollutant concentration levels obtained from the air dispersion model can be observed in completely different areas. Measurement of the error of wind direction at the level of 5 • -10 • results in calculated concentration uncertainty ranging from 20% to 70% [38]. The wind roses included in Supplementary Materials show that the contributions of the main wind directions were represented properly in experiments TR and LV. Theoretically, it can be presumed that a sufficient criterion for air quality modeling may be the appropriate representation of the wind directions. This results from the methodology for the air quality model evaluation for the purposes of impact assessment of the source on air quality. Suitable representation of the upper distributions of the 26 highest 1-h concentrations using the RHC (Robust Highest Concentration) parameter is the most important [109,110]. On the other hand, the overlapping of errors originating from meteorological simulations and determination of air pollutant concentrations can create a "domino effect". This might lead to unrepresentative modeling results, significantly deviating from the actual values. In addition, an opposite effect may be achieved, which leads to more accurate air quality predictions, as a result of unintended compensation of errors. Both of these scenarios are plausible, hence practical application of these techniques must allow for certain risks. Determination of the origins of discrepancies requires further research comparing the accuracy of air quality predictions from the analyzed experiments with the application of various input meteorological data. However, the most direct proof of the possibility of error compensation is the results of research prepared for the MC experiment by US. EPA [6]. Comparison of the wind roses obtained in this study and those presented in US. EPA [6] implies that the model failed to adequately represent dominant wind directions. The wind speed was biased as well, as evidenced by the presented MB = 0.59 m/s. However, this value is still lower than the one obtained in this study (MB = 1.11 m/s). It is suspected that the reason for the meteorological prediction discrepancy is associated with the WRF domains set up. Grid cells did not overlap which could have resulted in different outcomes due to the terrain elevation. Despite the fact that the accuracy of the meteorological simulations differed from the actual conditions, the AERMOD evaluation results presented in US. EPA [6] did not differ statistically significantly compared to simulations using measurement data from the weather tower. Moreover, a number of studies have successfully applied meteorological simulation data to model pollutant concentration levels in the air [21,23,35,[42][43][44][45][46][47]. Accuracy of the used meteorological simulations differed as well from the actual conditions. Therefore, it is suspected that the main reason for similar AERMOD evaluation results for different meteorological input data is a consequence of disparate methodologies for model evaluation and unintended compensation of the output data.
Evaluation results for temperature demonstrated that the MB values were closer to zero for on-site stations compared to weather stations ( Figure 5), in contrast to the wind speed and direction evaluation. Achieved MB values together with 95% confidence intervals did not overlap. Therefore, the WRF evaluation outcomes with regard to the location are statistically significantly different. Similar conclusions can be drawn from the MGE analysis except for the LV-WCA case. On the other hand, RMSE values for different experiments and locations within a given experiment are not conclusive. In the case of the LV experiment, opposite outcomes were obtained depending on the locations (LV-SIA vs. LV-WCA). It is worth highlighting that, in the case of the TR experiment, the MB values are positive, whereas for the other experiments negative values were obtained. Data from the TR experiment were collected only in the summer period (August 1984) which implies that the WRF model is sensitive to seasonality. This can be observed on the "Statistical metrics" graphs presented in Supplementary Materials, where, depending on the temperature range, the MB values were different. In general, for temperatures lower than 15 • C, the MB values were lower than zero (underestimation of observations). For higher temperatures MB values were greater than zero (overestimation of observations). The presented outcomes are atypical in the case of the WRF model, which generally struggles with representation of the highest temperatures, especially for long-term simulations [16]. On the other hand, the TR experiment was conducted in 1984. For this study, the NLCD 1992 land use data was used, which most likely was characterized by a higher share of urban tissue and built-up area compared to the actual conditions. The presence of urban and built-up areas affects simulated sensible and latent heat fluxes causing an increase in air temperature which may lead to overestimation by the meteorological model [65].

Vertical Profile of the Atmosphere
In the case of the wind speed evaluation, it was decided to apply normalized evaluation measures (NMB, NMGE) since they are resistant to data with different ranges of values. For instance, wind speed values at the highest measurement points were up to 3-4 times greater than the values at 10 m a.g.l. Comparison using basic statistics, i.e., MB, MGE, RMSE could distort the interpretation and lead to incorrect observations. Error values presented in Figure 7 indicate that the meteorological simulation accuracy depends on the altitude above ground level as well. As the height above ground level increased for the MC, LV and TR experiments, the NMGE values decreased. In the case of the MC experiment, they decreased from 0.78 to 0.39 (10 m-420 m a.g.l.) and for the LV the difference was more apparent from 1.27 to 0.74 with a change in altitude from 10 m to 100 m a.g.l. For the TR experiment, NMGE decreased from 0.77 to 0.48 (10 m-400 m a.g.l.). It should be emphasized that the greatest differences between the model evaluation outcomes (statistically significantly different) for particular altitudes above terrain occur up to 120 m about ground level. For wind direction (Figure 8) there is a similar dependency; however, it applies to the lowest layer. At higher altitudes differences between given layers are generally lower and often statistically insignificant, as proven by not overlapping 95% confidence intervals (Figures 7 and 8). The presented outcomes unambiguously indicate that the representation of local conditions, where complex terrain issue is of great importance, constitutes a major challenge. Moreover, it is strongly location-dependent, since for the MC and LV experiments statistically significantly different values of NMBE were achieved (respectively, 0.62 and 1.62 at the surface layer). This presents the problem of applying the meteorological simulation data in air quality models, because, depending on the emission source (or group of sources) location, the quality of input data in the form of meteorological predictions may be statistically significantly different. Since a more accurate representation of topographic characteristics does not result in a clear improvement in performance [50,101], future studies should also consider increasing the resolution of vertical layers.
The obtained meteorological predictions of wind direction were biased regardless of the experiment (Figure 8). The positive values of MB indicate that the meteorological predictions were mostly right-sided compared to the observations. The values of absolute errors (MGE) were 2-3 times greater than the MB values. This implies that there are both left-and right-sided deviations of wind direction, on average considerably higher than indicated by the MB value. Included in Supplementary Materials the "wind direction error" and "wind direction error histogram" graphs point out that the deviations in the range from −45 • to 60 • were dominant for the MC and LV experiments. In the case of the TR experiment the histograms were bimodal, with two dominant ranges of deviations present, i.e., from −180 • to −90 • and from −20 • to 90 • . It should be noted that starting from the 100 m a.g.l. the accuracy of the model was similar in the case of the MC experiment and for the TR experiment there was a noticeable improvement in the prediction accuracy with altitude. An important finding is that regardless of altitude, meteorological simulations of wind direction have an error that is inversely proportional to wind speed in complex terrain (Supplementary Materials-Chapter S3.2.2). As wind speed increases, the MGE and RMSE values approach zero, regardless of the height above ground level. This is a known problem of the WRF model [80], but its existence has so far been proven for the near-surface layer (heights of about 10 m a.g.l.). is a known problem of the WRF model [80], but its existence has so far been proven for the near-surface layer (heights of about 10 m a.g.l.).  The aspect of wind speed and direction prediction accuracy along the vertical profile is important for the purpose of air quality modeling. Emission sources diverge in terms of, among others, the height at which gases and particulate matter are introduced into the atmospheric air. In the case of point sources, the height of emitters reaches from several to 200 m, and the height of most power industry fossil fuel combustion emitters often exceeds 300 m, and in extreme cases even 400 m (Ekibastuz GRES-2 Power Station). In air  is a known problem of the WRF model [80], but its existence has so far been proven for the near-surface layer (heights of about 10 m a.g.l.).  The aspect of wind speed and direction prediction accuracy along the vertical profile is important for the purpose of air quality modeling. Emission sources diverge in terms of, among others, the height at which gases and particulate matter are introduced into the atmospheric air. In the case of point sources, the height of emitters reaches from several to 200 m, and the height of most power industry fossil fuel combustion emitters often exceeds 300 m, and in extreme cases even 400 m (Ekibastuz GRES-2 Power Station). In air The aspect of wind speed and direction prediction accuracy along the vertical profile is important for the purpose of air quality modeling. Emission sources diverge in terms of, among others, the height at which gases and particulate matter are introduced into the atmospheric air. In the case of point sources, the height of emitters reaches from several to 200 m, and the height of most power industry fossil fuel combustion emitters often exceeds 300 m, and in extreme cases even 400 m (Ekibastuz GRES-2 Power Station). In air pollutant dispersion models, the determination of wind speed and direction on the effective height of the emitter is crucial. This value affects the process of transport and diffusion of air pollutants. In the absence of measurement data, wind speed is estimated using different meteorological preprocessors, such as AERMET or CALMET. The methodology based on the Monin-Obukhov (M-O) similarity theory for wind profile determination depending on the atmospheric stratification is frequently applied. It should be emphasized that the involved method constitutes only an approximation. The greatest discrepancies occur for stable stratification conditions due to the formation of intermittent layers and for complex terrain areas [111,112]. Therefore, in such situations, using data from meteorological models which provide additional information about vertical wind speed and direction profiles, appears reasonable. Considering the presented results of the WRF prediction evaluation, in terms of wind speed and direction along the vertical profile, a certain issue emerges. Application of meteorological predictions in air quality modeling for a group of emitters characterized by various heights (e.g., 40 m, 100 m, 200 m) may result in entirely different errors of pollutant concentration levels due to the incorrect representation of meteorological conditions along the vertical profile of the atmosphere. In addition, the location of the emission source impact area may also be oriented in other directions.
The obtained results of the WRF evaluation in the case of NMB unambiguously indicate that regardless of the experiment, the model was biased. For MC and LV experiments the overestimation of wind speed was found (MC: NMB = 0.19-0.62; LV: NMB = 0.54-1.13), which decreased with the increase in altitude. In many cases the differences between individual vertical layers were statistically significantly different. In general, the greatest difference of NMB, NMGE and IOA values between altitudes were obtained for the heights from 10 to 120 m. This implies that the WRF model is not able to represent the actual meteorological conditions in locations where topography characteristics vary within one cell of computational grid. This is proven by the comparison of NMB, NMGE, IOA (Figure 7), MB, MGE, RMSE ( Figure 8) and wind roses included in Supplementary Materials (Chapter S3.1.2) obtained for the LV experiment. WRF predictions for wind direction in the LV experiment indicate that the winds blowing from the west were dominant with the share of 22%. The values recorded by the on-site station show that the actual contribution of western winds was only 6%. Moreover, the WRF model failed to appropriately represent winds blowing from the northern and south-eastern directions, i.e., winds along the Hudson River valley. Based on the analysis of actual topography compared to the applied digital terrain model with 1 km accuracy it can be observed that there was a clear flattening of Buckberg Mountain. The peak of this hill was located approximately 600-700 m from the meteorological tower and both of these points were mapped within one cell of the computational grid. In consequence the representation of terrain blocking effects was impossible, which resulted in reduction in the western winds. It is not completely understandable why the WRF model failed to accurately represent the contribution of northern and south-eastern winds. The conducted discussion of this issue leads to a conclusion. In the case of meteorological prediction application (when measurement data are unavailable) in the air quality modeling process, first, the expert analysis of wind roses should be carried out in the locations of considered emission sources with regard to the topography. In many cases it would be possible at the expert level to determine the potential of using meteorological predictions.
In Figure 9 the values of NMB, NMGE and IOA calculated for temperature values with regard to the altitude above ground level are presented. NMB and NMGE were, for most cases, statistically significantly different depending on the altitude above ground. These differences are particularly clear for the values at the ground surface. IOA values for the MC and LV experiments were at the levels from 0.86 to 0.87 and met the criteria recommended by [81]. In the case of the TR experiment the values of IOA were lower and, depending on the altitude above ground level, fell within the range of 0.44 to 0.62. On the other hand, the NMGE values were statistically significantly lower compared to the values obtained for the MC and LV experiments. In order to better understand these issues, the animations showing vertical profiles of temperature for the TR experiment are presented in Supplementary Materials. In the analyzed period, an increase in temperature with altitude (temperature inversions) prevailed-most often at heights up to 200 m a.g.l. Modeling results generally misrepresented the typical vertical temperature profile. In most situations, the temperature decreased with the increase in altitude. The values of the temperature gradient are commonly used for atmospheric stability determination, which affects both the horizontal and vertical diffusion coefficients applied in the air quality models. obtained for the MC and LV experiments. In order to better understand these issues, the animations showing vertical profiles of temperature for the TR experiment are presented in Supplementary Materials. In the analyzed period, an increase in temperature with altitude (temperature inversions) prevailed-most often at heights up to 200 m a.g.l. Modeling results generally misrepresented the typical vertical temperature profile. In most situations, the temperature decreased with the increase in altitude. The values of the temperature gradient are commonly used for atmospheric stability determination, which affects both the horizontal and vertical diffusion coefficients applied in the air quality models.

Conclusions
The study was carried out concerning the evaluation of the WRF meteorological simulations (temperature, wind speed and direction), with regard to the computational grid resolution, location in complex terrain and the vertical profile of the atmosphere. The conducted research allowed for the formulation of several conclusions in the context of the application of meteorological predictions in air quality models.
Increasing the grid resolution to 1 km did not always result in statistically significantly lower error values compared to a resolution of 4 km. In most cases wind roses at 4 km did not represent the dominant wind directions, the same as for the on-site station in the LV experiment at 1 km. Hence, an increase in the resolution up to 1 km will not always assure adequate accuracy of the input data for air pollutant dispersion models. Therefore, this aspect should be treated individually in relation to the orography of the area. On the other hand, a further increase in the resolution is not recommended due to the suggestions presented in [113]. It is noteworthy that the optimization of the WRF simulation setup [114,115] and the application of model coupling [116][117][118] proved to be effective to further improve the accuracy of modeling outcomes.
A dependency between locations of the meteorological measurements and the outcomes of meteorological simulation accuracy was observed for wind speed and temperature. In the case of on-site stations (complex terrain) greater error values were obtained compared to the standard ISD location (airport). Similar issues were reported in [50,84]. Therefore, depending on the location of the emission source within the computational domain, the accuracy of air pollutant dispersion simulations may be different. This issue

Conclusions
The study was carried out concerning the evaluation of the WRF meteorological simulations (temperature, wind speed and direction), with regard to the computational grid resolution, location in complex terrain and the vertical profile of the atmosphere. The conducted research allowed for the formulation of several conclusions in the context of the application of meteorological predictions in air quality models.
Increasing the grid resolution to 1 km did not always result in statistically significantly lower error values compared to a resolution of 4 km. In most cases wind roses at 4 km did not represent the dominant wind directions, the same as for the on-site station in the LV experiment at 1 km. Hence, an increase in the resolution up to 1 km will not always assure adequate accuracy of the input data for air pollutant dispersion models. Therefore, this aspect should be treated individually in relation to the orography of the area. On the other hand, a further increase in the resolution is not recommended due to the suggestions presented in [113]. It is noteworthy that the optimization of the WRF simulation setup [114,115] and the application of model coupling [116][117][118] proved to be effective to further improve the accuracy of modeling outcomes.
A dependency between locations of the meteorological measurements and the outcomes of meteorological simulation accuracy was observed for wind speed and temperature. In the case of on-site stations (complex terrain) greater error values were obtained compared to the standard ISD location (airport). Similar issues were reported in [50,84]. Therefore, depending on the location of the emission source within the computational domain, the accuracy of air pollutant dispersion simulations may be different. This issue is crucial in the case of applications where emission sources are located in complex terrain and nearby representative measurement data are unavailable.
The evaluation of the WRF meteorological simulations in the vertical profile of the atmosphere with regard to wind speed and direction in the on-site locations implies that the accuracy changes are statistically significantly at the confidence level of 95%. This refers to the vertical profile up to the altitude of 100 m a.g.l. Generally, the accuracy of predictions increases with altitude, which is most likely the consequence of decreasing effect of terrain orography. Above the height of 100 m a.g.l. the prediction accuracy stabilizes at a given level or increases further with the altitude, depending on the experiment. The increase in accuracy with altitude has certain consequences for the diffusion of air pollutants. Depending on the type of emission sources, with a particular attention to the height at which gases and particulate matter are introduced into the atmospheric air, the uncertainty of these simulations will depend on the prediction accuracy at different altitudes.
Clear positive bias of the meteorological predictions in areas characterized by complex terrain in relation to the observed wind speed values, in the case of application in air quality models for regulatory purposes, may result in obtaining significantly lower values of concentration levels of the considered air pollutants compared to the actual values. In effect, this may lead to severe consequences during the implementation of recovery programs for air quality, e.g., qualifying a given source or group of sources as negligible based on false evidence. It is possible that the use of the WRF model parameterization described by Jiménez and Dudhia [68,80] will allow the elimination of these effects.
Considering the accuracy of meteorological predictions carried out using the WRF model, their use in air quality modeling is recommend only in certain circumstances. Firstly, when the measurement data are not available and their acquisition is impossible or economically unjustified. The above is mostly in line with the recommendations [38]. Moreover, for each case study, an evaluation of the meteorological prediction accuracy should be conducted with regard to the orography features in the vicinity of emission sources, in particular when the study area is characterized by complex terrain. It should also be emphasized that in the case of lack of measurement on-site data, the evaluation of the WRF model using information from the ISD database might not be sufficient. In addition, it is advised that in areas planned for investments that could potentially significantly affect air quality, meteorological conditions should be measured before issuing an environmental decision.
It has been demonstrated that there is a risk of an unintended compensation of the uncertainty of air quality modeling outcomes [6] or lowering the accuracy of predictions by introduction of inaccurate meteorological data [2,5,48]. It has been stated that this might result from differences in methodologies for air quality and meteorological model evaluation.

Funding:
The work was carried out as part of research associated with the Ministry of Education and Science (Poland) subsidy to maintain scientific potential (Contract no. 16.16.150.545).

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The additional data presented in this study are available in Supplementary Materials. Other data sharing is not applicable to this article. It is possible to provide modeling results on request.