Skill of Mesoscale Models in Forecasting Springtime Macrophysical Cloud Properties at the Savannah River Site in the Southeastern US

Predicting boundary layer clouds is important for the accurate modeling of pollutant dispersion. Higher resolution mesoscale models would be expected to produce better forecasts of cloud properties that affect dispersion. Using ceilometer observations, we assess the skill of two operational mesoscale models (RAMS and WRF) to forecast cloud base altitude and cloud fraction at the Savannah River Site in the southeastern US during the springtime. Verifications were performed at small spatial and temporal scales necessary for dispersion modeling. Both models were unreliable with a 50% (RAMS) and a 46% (WRF) rate of predicting clouds observed by the ceilometer which led to low cloud fraction predictions. Results indicated that WRF better predicted daytime cloud bases from convection that occurred frequently later in the period and RAMS better predicted nighttime cloud bases. Using root mean squared error (RMSE) to score the forecast periods also highlighted this diurnal dichotomy, with WRF scores better during the day and RAMS scores better at night. Analysis of forecast errors revealed divergent model cloud base biases—WRF low and RAMS high. A hybrid solution which weighs more heavily the RAMS nighttime forecasts and WRF daytime forecasts will likely provide the best prediction of cloud properties for dispersion.


Introduction
Scavenging and absorption of atmospheric particles and gases by boundary layer clouds affect the dispersion of pollutants, The processes of washout, chemical transformations, and sedimentation upon droplet evaporation of constituents all impact the rate of atmospheric removal or deposition. To predict the interactions between atmospheric aerosols and cloud droplets, models need to accurately predict clouds within the boundary layer. Large spatial or temporal variability in cloud field predictions will lead to incorrect localization of clouds that fail to correctly interact with the plume limiting dispersion predictions on small scales.
The variable scale nature of clouds and cloud processes make them difficult to model. Small-scale interactions within clouds require state of the art microphysical and cumulus parameterizations [1] to estimate cloud properties for most mesoscale to global-scale meteorological simulations. Correctly parameterizing cloud processes is complicated by feedbacks by which clouds modify their own environment. For example, the formation of clouds modifies the local environment by moderating Earth's energy balance and releasing latent heat to the atmosphere [2][3][4].
In dispersion modeling, clouds and precipitation drive wet deposition. Transformations of atmospheric aerosols and chemical constituents occur in clouds which, upon evaporation, leave altered aerosols [5] and modify airborne concentrations. Additionally, clouds moderate photochemistry Observational data sets that have previously been used to perform cloud forecast assessments include satellite data (cloud fraction/mask and brightness temperatures) [8,13], cloud radar and lidar networks [9,11,18], observation networks [14], pyranometer networks [7], and radiosondes [12]. Techniques for verifying cloud forecasts using satellite data include global analysis to compare with 3 h global models [8] and grid point or area comparisons [7,13,19]. Verification techniques using observation networks include area averaging [11,14] and probabilistic comparisons [18]. Radiosonde relative humidity profiles were also used to verify model cloud base height [12].
In this study, we provide an assessment of model capabilities in forecasting cloud macrophysical properties that affect atmospheric constituent dispersion at fine spatial and temporal scales. We use 15-min averaged ceilometer data at SRS to compare with 15-min WRF and RAMS forecast output of cloud properties. Comparisons between single point temporal measurements and instantaneous spatial measurements are highly dependent on resolution and advection [14]. For high-quality dispersion modeling, model resolution becomes very important. Horizontal grid spacings used in this assessment are 1.3 km for WRF and 1 km for RAMS. We use a spatial averaging method based on cloud field advection by the model wind speed to compare model forecasts every 15 min to 15- Observational data sets that have previously been used to perform cloud forecast assessments include satellite data (cloud fraction/mask and brightness temperatures) [8,13], cloud radar and lidar networks [9,11,18], observation networks [14], pyranometer networks [7], and radiosondes [12]. Techniques for verifying cloud forecasts using satellite data include global analysis to compare with 3 h global models [8] and grid point or area comparisons [7,13,19]. Verification techniques using observation networks include area averaging [11,14] and probabilistic comparisons [18]. Radiosonde relative humidity profiles were also used to verify model cloud base height [12].
In this study, we provide an assessment of model capabilities in forecasting cloud macrophysical properties that affect atmospheric constituent dispersion at fine spatial and temporal scales. We use 15-min averaged ceilometer data at SRS to compare with 15-min WRF and RAMS forecast output of cloud properties. Comparisons between single point temporal measurements and instantaneous spatial measurements are highly dependent on resolution and advection [14]. For high-quality dispersion modeling, model resolution becomes very important. Horizontal grid spacings used in this assessment are 1.3 km for WRF and 1 km for RAMS. We use a spatial averaging method based on cloud field advection by the model wind speed to compare model forecasts every 15 min to 15-min averaged ceilometer cloud fraction and cloud base height. Synoptic conditions during the springtime in the southeastern US can be quite variable. Cloud formation can be driven by forcing from local diabatic heating, large-scale low-pressure systems and frontal passages, and sea breeze penetration among others. Clouds range from low level stratus and fog to deep convective thunderstorms. Therefore, the time period from April to June provides a robust evaluation of these two models.

Methods
To facilitate comparisons in this study, data are taken from both RAMS and WRF runs that were initiated at 18:00 UTC using the 18:00 UTC forecast of the 12 km North American Mesoscale Forecast System (NAM), available through the National Centers for Environmental Prediction [20], The first six hours of simulations were used to spin up the model and assessments for this study began with the forecast period at 00:00 UTC. RAMS simulations were for 42 h and WRF simulations were for 36 h, producing 36 h RAMS forecasts and 30 h WRF forecasts. Both models generate gridded forecasts every 15 min. These model configurations have been tuned over the years to provide the best regional meteorological forecasts for key variables which affect atmospheric dispersion processes: temperature, humidity, wind speed, and wind direction.

RAMS
The RAMS model (v6.2.06) [21] was configured with two domains; a 600 × 500 km outer domain with 5 km horizontal grid spacing and 102 × 92 km nested domain ( Figure 1) with 1 km grid spacing. Both domains were centered on the Savannah River Site and have 37 vertical grid boxes with the lowest grid spacing 25 m deep and the highest 1000 m. Intermediate levels are stretched by a factor of 1.2 until reaching 1000 m. Five model soil levels are used at 2.5, 15, 30, 50 and 70 cm below the surface. Soil moisture initialization uses a constant saturation fraction of 0.29. Soil temperature is initialized by the offset in temperature between three measured soil temperature levels (1, 6 and 12 inches) and the two-meter temperature measurement from the relative humidity probe at the SRS Climatology Site. Temperature offsets for the lowest two soil levels are estimated by calculating the slope of the 6-and 12-inch measurements and dividing by 2, for the 50 cm level, or 4, for the 70 cm level. This creates a more realistic soil temperature profile for model initialization. Additional information for topography, soil, and vegetation are read in from standard RAMS data files with a 30 arc second resolution.
Following the guidelines of the model developers [22], no cumulus parameterizations were used for either grid. Traditional convective parameterizations have been shown to overpredict precipitation and initiate convection too early in the southeastern United States [23], The operational RAMS model utilizes the Harrington two-stream radiation parameterization [24] updated every 1200 s, and the Mellor and Yamada [25] diffusion parameterization scheme, The minimum value for the horizontal diffusion coefficient was set to 0.9 for the outer grid and 0.7 is used for the inner nested grid, The operational RAMS model utilizes the two-moment bin emulating bulk microphysical parameterization [26,27] with aerosol regeneration and wet and dry deposition activated, The ice nucleation scheme uses the Demott 2010 composite aerosol ice nuclei formula [28]. Cloud condensation nuclei are initialized with a concentration of 500 mg −1 and diameter of 100 nm and 120 nm when regenerated with typical solubility fractions.
During the time period 31 March to 30 June (2020), only one of the RAMS operational runs failed (missing data for 22 April), The failure was due to aerosols exceeding the maximum values close to steep terrain near grid outer grid boundary.

WRF
The operational WRF (v3.81) [29] model was configured with two domains; 400 × 400 km outer domain with 4 km horizontal grid spacing and nested domain about 89 × 89 km ( Figure 1) with a grid spacing of 1.33 km. Both domains surrounded the Savannah River Site and have 35 vertical levels, The model uses four soil levels and soil initialization is provided by the operational North American Mesoscale (NAM) [30] forecast model grids. Information for soil type and vegetation [31] is provided by the NLCD2011 database. Topography for the large domain is 30 arc second and 1 arc second for the inner domain. Assimilation of temperature, moisture and wind information from the SRS meteorological towers is used operationally, with observations available at a 15-min frequency for the first 2 h of model integration.
The Thompson microphysical parameterization [32], which includes double moment parameters for rain and ice but not for cloud, was used for the WRF operational model. This operational model used the improved Grell 3D cumulus parameterization [33] that can be used on higher resolution domains, The radiation parameterization uses an improved Rapid Radiative Transfer Model (RRTMG) scheme [34] and the boundary layer parameterization uses the Mellor-Yamada-Janjic scheme [35].
Cloud data collection from WRF runs began on 2 April, so data are missing for 1 April. WRF cloud data are also missing for 19 April and 28 June because the 18 UTC runs failed on those days.

Cloud Base Altitude and Cloud Fractions
At SRS, a Vaisala CL31 ceilometer [36] measured cloud base altitude (CBA), when present, every 15 s. These were used to calculate a 15-min averaged CBA and a 15-min cloud fraction (CF). CF was calculated based on the fraction of time that a cloud base was detected during that period, The averaging was performed to center the 15-min periods on the quarter hours to provide the nearest representation to the 15-min model forecast times. For additional analysis, daily averaged CBA and CF were also calculated. On two separate days the ceilometer data were unavailable for short periods of 3.25 (18 April) and 2.5 h (27 June).
CBA and CF for the models were determined by examining the cloud properties in the grid cells in the proximity of the ceilometer, The specific grid cells were selected using model horizontal wind components (U and V) at the location of the ceilometer, The U and V components were taken from the model level nearest 2 km above the surface (median climatological daily CBA at SRS). Calculations from U and V (m/s) determined the distance (km) traveled in 15-min of a possible cloud field in the west-east (U) and south-north (V) directions. These values were used as an estimate to cloud advection and therefore select the number of grid boxes within a rectangle centered on the ceilometer to be used for CBA and CF calculations, The rectangle sizing and centering were performed to better match the ceilometer averaging mentioned previously. To increase statistics in cases with low wind speeds, an extra grid box is added in each direction, The lowest level of the vertical grid with a cloud water mixing ratio of at least 0.01 g/kg [37] or an ice water mixing ratio of at least 0.003 g/kg [38,39] is determined for each horizontal grid box within the rectangle. These limits are imposed as a better comparison for the clouds measured by the ceilometer. Model CBA used for comparisons is the average CBA of all the grid cells within the rectangle and model CF is the fraction of grid cells that have a valid CBA within that same rectangle. While this method differs from other commonly used methods for determining these values, it allows for the comparison of the instantaneous model spatial forecast data with the time averaged ceilometer cloud properties focused on a specific location rather than over a region covered by satellite or multiple cloud-measurement sites where the focus is on identifying whether the model produced the observed behavior within a reasonable vicinity of the measurement. For this application, we are more concerned with whether the modeled clouds occurred at the same location and time as the measurement as would be necessary for dispersion calculations. Only the nested domains were used for this analysis.

Cloud Forecast Scoring
Root mean squared error (RMSE) was used to score models on their ability to closely forecast CBA and CF. For CF, this was a simple process of taking the square root of the mean of the squared difference between the model value (predicted) and the ceilometer value (observed) for all 15-min values in a daily forecast. CBA scores were more complicated. Clouds near the surface within the boundary layer are most likely to affect dispersion. Thus, we set a maximum error limit of 2500 m because any cloud base error beyond that would likely be irrelevant to the dispersion. Additionally, we use the same maximum error for cases when models failed to develop cloud when observed by the ceilometer or created clouds when absent in ceilometer observations. All CBA scores were then normalized by a value of 2500; this allows us to quickly identify all cases when there was a discrepancy between the ceilometer and the models regarding whether clouds had formed because these cases will all be scored as a 1.0. After normalizing, the RMSE for CBA follows the process for CF. Because we use RMSE, lower scores indicate better forecasts (less error). Only the 15-min forecast times for the 30 h forecasts were used in scoring to provide a fair scoring comparison between RAMS and WRF. Besides scoring each 30 h forecast, each forecast was also divided into five 6 h periods and scored.

Fifteen-Minute Monthly Averaged Forecasts
Forecasts and observations are averaged monthly for each 15-min forecast time and corresponding observation time. Monthly ceilometer CF averages showed the least cloud in April (0.28) and increasing in May (0.30) and June (0.34) which is opposite the climatological trend where monthly CF decreases from April to June, The fewer clouds (smaller sample size) in April could be what leads to the apparent variability in ceilometer CBA in Figure 2a that the models have difficulty in forecasting. Additionally, clouds in April are more likely to be driven by synoptic scale events compared to May and June where local convection is more prominent. One such example was a widespread tornado outbreak in South Carolina from a frontal passage on 13 April 2020. Thus, deviations in both RAMS and WRF CBA compared to the ceilometer CBA ( Figure 2a) could also be due to issues with timing of these synoptic scale events by the models.
Atmosphere 2020, 11, x FOR PEER REVIEW 5 of 16 because any cloud base error beyond that would likely be irrelevant to the dispersion. Additionally, we use the same maximum error for cases when models failed to develop cloud when observed by the ceilometer or created clouds when absent in ceilometer observations. All CBA scores were then normalized by a value of 2500; this allows us to quickly identify all cases when there was a discrepancy between the ceilometer and the models regarding whether clouds had formed because these cases will all be scored as a 1.0. After normalizing, the RMSE for CBA follows the process for CF. Because we use RMSE, lower scores indicate better forecasts (less error). Only the 15-min forecast times for the 30 h forecasts were used in scoring to provide a fair scoring comparison between RAMS and WRF. Besides scoring each 30 h forecast, each forecast was also divided into five 6 h periods and scored.

Fifteen-Minute Monthly Averaged Forecasts
Forecasts and observations are averaged monthly for each 15-min forecast time and corresponding observation time. Monthly ceilometer CF averages showed the least cloud in April (0.28) and increasing in May (0.30) and June (0.34) which is opposite the climatological trend where monthly CF decreases from April to June. The fewer clouds (smaller sample size) in April could be what leads to the apparent variability in ceilometer CBA in Figure 2a that the models have difficulty in forecasting. Additionally, clouds in April are more likely to be driven by synoptic scale events compared to May and June where local convection is more prominent. One such example was a widespread tornado outbreak in South Carolina from a frontal passage on 13 April 2020. Thus, deviations in both RAMS and WRF CBA compared to the ceilometer CBA ( Figure 2a) could also be due to issues with timing of these synoptic scale events by the models.  Differences in the accuracy of cloud predictions from RAMS and WRF are more apparent in May ( Figure 2b) and June (Figure 2c). In May, RAMS CBA forecasts compare well with the overnight ceilometer CBA but demonstrate a tendency to increase CBA too early in the day (Figure 2b). However, RAMS does well in predicting the maximum CBA, as its predictions compare much more favorably in the late afternoon and evening hours. In contrast, WRF CBA is forecasted much lower than observations overnight but compares well with ceilometer CBA in the daytime by seemingly correctly developing daytime convection but likely benefiting from an early morning CBA well below the measured ceilometer CBA. RAMS increases daytime CBA with convection at a similar rate but begins with at CBA much too high. In the evening, WRF lowers CBA much too soon compared to the observations or RAMS. It is interesting to note that both WRF and RAMS predict a similar timing in when CBA begins to rise, suggesting that the onset of convection is being treated correctly within the models, but struggle with predicting the behavior of CBA during the transition periods during sunset and sunrise. Similar patterns continue for RAMS and WRF in June (Figure 2c).
CF comparisons in April (Figure 3a), show RAMS-forecasted CF to compare well with the ceilometer CF except during early morning time periods before sunrise when the ceilometer CF increases. WRF forecasts CF well below the ceilometer CF and does not properly develop the early morning high CF. These high CF measured by the ceilometer may be due to fog or low cloud formation that the models do not develop correctly. In May (Figure 3b), both models start with CF too low but do develop the increase in CF in the early morning hours. Later in the day, WRF CF is lower than the ceilometer CF but RAMS CF compares well. A similar pattern exists in June (Figure 3c) without the development of the high morning CF, and RAMS CF continues to compare much better later in the forecast with ceilometer CF than WRF CF. RAMS better forecasts daytime CF during this time period than WRF (Figure 3d).
Atmosphere 2020, 11, x FOR PEER REVIEW 6 of 16 However, RAMS does well in predicting the maximum CBA, as its predictions compare much more favorably in the late afternoon and evening hours. In contrast, WRF CBA is forecasted much lower than observations overnight but compares well with ceilometer CBA in the daytime by seemingly correctly developing daytime convection but likely benefiting from an early morning CBA well below the measured ceilometer CBA. RAMS increases daytime CBA with convection at a similar rate but begins with at CBA much too high. In the evening, WRF lowers CBA much too soon compared to the observations or RAMS. It is interesting to note that both WRF and RAMS predict a similar timing in when CBA begins to rise, suggesting that the onset of convection is being treated correctly within the models, but struggle with predicting the behavior of CBA during the transition periods during sunset and sunrise. Similar patterns continue for RAMS and WRF in June (Figure 2c). CF comparisons in April (Figure 3a), show RAMS-forecasted CF to compare well with the ceilometer CF except during early morning time periods before sunrise when the ceilometer CF increases. WRF forecasts CF well below the ceilometer CF and does not properly develop the early morning high CF. These high CF measured by the ceilometer may be due to fog or low cloud formation that the models do not develop correctly. In May (Figure 3b), both models start with CF too low but do develop the increase in CF in the early morning hours. Later in the day, WRF CF is lower than the ceilometer CF but RAMS CF compares well. A similar pattern exists in June ( Figure  3c) without the development of the high morning CF, and RAMS CF continues to compare much better later in the forecast with ceilometer CF than WRF CF. RAMS better forecasts daytime CF during this time period than WRF (Figure 3d).

Daily Averaged Forecasts
Averaged CBA and CF were calculated for the ceilometer, RAMS and WRF for each 30 h forecast period. WRF CBA was generally lower than the ceilometer CBA ( Figure 4). This became more consistent from mid-May (Figure 4b) into June where WRF missed many of the time periods of midlevel clouds. This time period (May-June) is more consistent with daytime convection and higher cloud bases than was apparent in Figure 2. RAMS CBA fluctuated from above to below the ceilometer measurements, but in mid to late May, RAMS more closely forecasted the mid-level clouds detected

Daily Averaged Forecasts
Averaged CBA and CF were calculated for the ceilometer, RAMS and WRF for each 30 h forecast period. WRF CBA was generally lower than the ceilometer CBA ( Figure 4). This became more consistent from mid-May (Figure 4b) into June where WRF missed many of the time periods of mid-level clouds. This time period (May-June) is more consistent with daytime convection and higher cloud bases than was apparent in Figure 2. RAMS CBA fluctuated from above to below the ceilometer measurements, but in mid to late May, RAMS more closely forecasted the mid-level clouds detected by the ceilometer for these averaged forecasts. In June, however, many days the RAMS-forecasted CBA were consistently higher than the measured CBA. This is highlighted in Table 1, where a greater number of RAMS forecasted CBA were higher than the ceilometer while WRF was reversed, and supports the trends in Figure 2 which showed RAMS lifting CBA too quickly in the morning while WRF generally dropped the CBA too early in the evening. However, for these averaged forecasts, RAMS CBA was more often closer to the ceilometer CBA than WRF (62% of the time, Table 1).
Atmosphere 2020, 11, x FOR PEER REVIEW 7 of 16 by the ceilometer for these averaged forecasts. In June, however, many days the RAMS-forecasted CBA were consistently higher than the measured CBA. This is highlighted in Table 1, where a greater number of RAMS forecasted CBA were higher than the ceilometer while WRF was reversed, and supports the trends in Figure 2 which showed RAMS lifting CBA too quickly in the morning while WRF generally dropped the CBA too early in the evening. However, for these averaged forecasts, RAMS CBA was more often closer to the ceilometer CBA than WRF (62% of the time, Table 1).  RAMS and WRF forecasts of CF exhibit periods of increased and decreased forecast skill ( Figure  5). The WRF model showed a tendency to exhibit errors leading to the underprediction of CF while the RAMS model showed cases with both overprediction and underprediction of CF (Table 1 and Figure 5). Overall, the RAMS model seems to more accurately predict trends of CF, particularly in April and early May. As convective conditions begin to dominate the weather patterns in late May and June, the RAMS model shows increasing variability and error in its predictions, a trend mirrored in the WRF predictions. For CF, it is interesting to note that most periods when WRF accurately predicted CF values, the RAMS model likewise predicted CF well ( Figure 5). The reverse cannot be  RAMS and WRF forecasts of CF exhibit periods of increased and decreased forecast skill ( Figure 5), The WRF model showed a tendency to exhibit errors leading to the underprediction of CF while the RAMS model showed cases with both overprediction and underprediction of CF (Table 1 and Figure 5).
Overall, the RAMS model seems to more accurately predict trends of CF, particularly in April and early May. As convective conditions begin to dominate the weather patterns in late May and June, the RAMS model shows increasing variability and error in its predictions, a trend mirrored in the WRF predictions. For CF, it is interesting to note that most periods when WRF accurately predicted CF values, the RAMS model likewise predicted CF well ( Figure 5), The reverse cannot be said, as there are numerous examples where RAMS predicted CF much better than WRF. In 54% of cases, RAMS predicted an averaged CF nearer the observed CF than WRF (Table 1). Periods when RAMS and WRF agreed on the CF occurred primarily for cases where both forecasted clear sky for the entire period. Comparing the trends of CF prediction with those of CBA prediction (Figure 4), we can identify that there are apparent relationships between the skill of CF forecasts with the measured CBA. Primarily, this seems to occur when the ceilometer CBA indicates extended periods of low clouds, generally fog or low stratus cases where the average CBA was about 2 km or less. This relates to the results illustrated in Figure 3 for the early morning hours, typically periods with lower CBA, which had more poorly forecasted CF values. It seems that both models struggle to correctly resolve low stratus clouds or fog in these cases, which is consistent with previous findings [7,14]. Table 2 provides a comparison of the number of 15 min periods for all 30 h forecasts for periods with CF or clear sky for the ceilometer and both models. RAMS failed to predict correct cloud conditions 31.5% of the time. False clear sky predictions made up 71% of failed predictions and false cloud predictions made up 29%. WRF error in prediction conditions was higher at 32.8%, also with more false clear sky predictions than false cloud predictions (76% to 24%). The ceilometer averaged slightly lower CBA when RAMS falsely predicted clear sky (650 m) than when WRF did (731 m). However, when the models falsely predicted CF, WRF averaged CBA was lower (88 m) than RAMS (350 m). The low clouds from WRF false prediction appears to be associated with the low nighttime CBA in Figure 2. Table 2. Frequency distribution describing the number of cases when clouds were present or absent in both the model and the observations (ceil. for ceilometer). Comparing the trends of CF prediction with those of CBA prediction (Figure 4), we can identify that there are apparent relationships between the skill of CF forecasts with the measured CBA. Primarily, this seems to occur when the ceilometer CBA indicates extended periods of low clouds, generally fog or low stratus cases where the average CBA was about 2 km or less. This relates to the results illustrated in Figure 3 for the early morning hours, typically periods with lower CBA, which had more poorly forecasted CF values. It seems that both models struggle to correctly resolve low stratus clouds or fog in these cases, which is consistent with previous findings [7,14]. Table 2 provides a comparison of the number of 15 min periods for all 30 h forecasts for periods with CF or clear sky for the ceilometer and both models. RAMS failed to predict correct cloud conditions 31.5% of the time. False clear sky predictions made up 71% of failed predictions and false cloud predictions made up 29%. WRF error in prediction conditions was higher at 32.8%, also with more false clear sky predictions than false cloud predictions (76% to 24%), The ceilometer averaged slightly lower CBA when RAMS falsely predicted clear sky (650 m) than when WRF did (731 m). However, when the models falsely predicted CF, WRF averaged CBA was lower (88 m) than RAMS (350 m), The low clouds from WRF false prediction appears to be associated with the low nighttime CBA in Figure 2.  Figure 6 depicts histograms of CBA scores in 8 bins. Scores for the full forecast (Figure 6a) indicate neither RAMS nor WRF provide good comparisons for the entire forecast period. A larger number of good scores close to zero appear when the scores are broken down into 6 h periods. These are likely time periods where that 6 h period has clear skies with no clouds. RAMS and WRF both appear to do poorly otherwise in these periods as identified by the full forecast scores. Besides the lowest scores, the 12-18 h forecast period (8 a.m. to 2 p.m. local time) seems to be more distributed to lower scores for both models, The 6-12 h forecast period indicates that RAMS is performing better during these early morning hours. This is supported by the mean CBA score (Table 3) for this period that is better (lower) than that of WRF. WRF is observably better in the afternoon (18-24 h) forecast period which supports Figure 2. Mean scores with large standard deviations (Table 3) indicate limited differences between the two models for all forecast periods. However, because both models miss much of the clouds observed by the ceilometer (50% for RAMS and 56% for WRF, Table 2), these scores are heavily influenced by the error factor of 2500 for model forecast failures.

Forecast Scoring
Atmosphere 2020, 11, x FOR PEER REVIEW 9 of 16 Figure 6 depicts histograms of CBA scores in 8 bins. Scores for the full forecast (Figure 6a) indicate neither RAMS nor WRF provide good comparisons for the entire forecast period. A larger number of good scores close to zero appear when the scores are broken down into 6 h periods. These are likely time periods where that 6 h period has clear skies with no clouds. RAMS and WRF both appear to do poorly otherwise in these periods as identified by the full forecast scores. Besides the lowest scores, the 12-18 h forecast period (8am to 2pm local time) seems to be more distributed to lower scores for both models. The 6-12 h forecast period indicates that RAMS is performing better during these early morning hours. This is supported by the mean CBA score (Table 3) for this period that is better (lower) than that of WRF. WRF is observably better in the afternoon (18-24 h) forecast period which supports Figure 2. Mean scores with large standard deviations (Table 3) indicate limited differences between the two models for all forecast periods. However, because both models miss much of the clouds observed by the ceilometer (50% for RAMS and 56% for WRF, Table 2), these scores are heavily influenced by the error factor of 2500 for model forecast failures.     To diagnose how well the models accurately predict CBA when they do generate clouds, we examined only those times when model CBA and ceilometer CBA coexist ( Table 2). These scores were normalized as described in Section 2.4 to be from 0 to 1 (where a valued of 1 indicated a mismatch between the presence of measured and modeled clouds) and comparable to the full CBA scores. This is accomplished by dividing by 7000 m, which is the maximum error rounded up to the nearest 1000 m. Slightly lower mean scores and lower variability over nighttime periods (Table 3, 0-6, 6-12, [24][25][26][27][28][29][30] indicate that RAMS performs slightly better during these periods, whereas WRF performs better during the daytime hours (12)(13)(14)(15)(16)(17)(18)(18)(19)(20)(21)(22)(23)(24) regardless of standard deviations, similar to Figure 2, while for the entire forecasts there is not much difference. However, Figure 7 indicates that WRF mean scores suffer from occasional high (poor) scores where RAMS does not. For the full forecast scores and the first two periods (0-6 and 6-12), WRF performs similarly to RAMS but exhibits a few poorer scores. WRF appears to predict CBA during the daytime and early evening better than RAMS as demonstrated by having a greater number of the lowest scores (good) in the following three forecast periods (12-18, 18-24, 24-30). These daytime clouds are heavily weighted to the daytime convection that became prevalent in the latter half of our study period (Figure 2).
Atmosphere 2020, 11, x FOR PEER REVIEW 10 of 16 To diagnose how well the models accurately predict CBA when they do generate clouds, we examined only those times when model CBA and ceilometer CBA coexist ( Table 2). These scores were normalized as described in Section 2.4 to be from 0 to 1 (where a valued of 1 indicated a mismatch between the presence of measured and modeled clouds) and comparable to the full CBA scores. This is accomplished by dividing by 7000 m, which is the maximum error rounded up to the nearest 1000 m. Slightly lower mean scores and lower variability over nighttime periods (Table 3, 0-6, 6-12, 24-30) indicate that RAMS performs slightly better during these periods, whereas WRF performs better during the daytime hours (12)(13)(14)(15)(16)(17)(18)(18)(19)(20)(21)(22)(23)(24) regardless of standard deviations, similar to Figure 2, while for the entire forecasts there is not much difference. However, Figure 7 indicates that WRF mean scores suffer from occasional high (poor) scores where RAMS does not. For the full forecast scores and the first two periods (0-6 and 6-12), WRF performs similarly to RAMS but exhibits a few poorer scores. WRF appears to predict CBA during the daytime and early evening better than RAMS as demonstrated by having a greater number of the lowest scores (good) in the following three forecast periods (12-18, 18-24, 24-30). These daytime clouds are heavily weighted to the daytime convection that became prevalent in the latter half of our study period (Figure 2). CF scores for both models seem to be better than the CBA scores overall. Figure 8 shows scores to be more evenly distributed and less heavily weighted to poor scores (higher values). As with CBA, RAMS CF scores better for the 6-12 h period and WRF CF scores better for the 18-24 h period. CF scores for other periods and full forecast indicate little difference (Figure 8), although mean CF scores CF scores for both models seem to be better than the CBA scores overall. Figure 8 shows scores to be more evenly distributed and less heavily weighted to poor scores (higher values). As with CBA, RAMS CF scores better for the 6-12 h period and WRF CF scores better for the 18-24 h period. CF scores for other periods and full forecast indicate little difference (Figure 8), although mean CF scores edge slightly towards WRF (Table 3) regardless of the standard deviations. This contrasts with Figures 3 and 5, where RAMS appears to perform much better at predicting CF.
Atmosphere 2020, 11, x FOR PEER REVIEW 11 of 16 Figure 8. As in Figure 6 but for RAMS and WRF CF forecast scores.
We have limited our analysis to 30 h forecasts to provide a consistent assessment. However, Figure 2 indicates that RAMS seems to continue with better CBA forecasts at night where WRF shows CBA decreasing in May and June. In Figure 3, WRF CF forecasts are also below RAMS for May and June but trend toward RAMS values toward the end of the 30 hr forecast period. Including the full 36 h forecast for RAMS does not improve the mean overall scores for CBA or CF, but the mean score for only coexisting CBA during the 30-36 h forecast period is lower (better) than the 24-30 h forecast period (0.25 to 0.26). From Figure 2, it does not seem that WRF CBA is trending in a way that it would improve on its score from the 24-30 h forecast period. This result further indicates that RAMS performs better during the nighttime hours.  (Figure 9a,c), both WRF and RAMS failed to forecast many of the clouds that were detected by the ceilometer (0-6 h). This is common for many of these forecasts as indicated by Table 2. WRF erroneously develops low clouds (likely fog) during the night on this day as on others, giving it a poor nighttime forecast score when compared to RAMS (6-12 h). RAMS CBA is closer than WRF in the late afternoon (18-30 h), but both models fail to predict the CF and the duration of clouds. The models perform better on 25 May (Figure 9b,d). RAMS develops cloud in the nighttime when WRF does not but does not match the ceilometer CBA (0-6 h). WRF develops low clouds too early (6-12 h) but follows well with the correct CBA while maintain CF too low into the daytime (12-18 h). RAMS misses morning clouds and creates CBA well above those of the ceilometer during the afternoon (12-18 h). Thus, for the 25 May example, WRF performs much better during the daytime (12-24 h). We have limited our analysis to 30 h forecasts to provide a consistent assessment. However, Figure 2 indicates that RAMS seems to continue with better CBA forecasts at night where WRF shows CBA decreasing in May and June. In Figure 3, WRF CF forecasts are also below RAMS for May and June but trend toward RAMS values toward the end of the 30 hr forecast period. Including the full 36 h forecast for RAMS does not improve the mean overall scores for CBA or CF, but the mean score for only coexisting CBA during the 30-36 h forecast period is lower (better) than the 24-30 h forecast period (0.25 to 0.26). From Figure 2, it does not seem that WRF CBA is trending in a way that it would improve on its score from the 24-30 h forecast period. This result further indicates that RAMS performs better during the nighttime hours.  (Figure 9a,c), both WRF and RAMS failed to forecast many of the clouds that were detected by the ceilometer (0-6 h). This is common for many of these forecasts as indicated by Table 2. WRF erroneously develops low clouds (likely fog) during the night on this day as on others, giving it a poor nighttime forecast score when compared to RAMS (6-12 h). RAMS CBA is closer than WRF in the late afternoon (18-30 h), but both models fail to predict the CF and the duration of clouds, The models perform better on 25 May (Figure 9b,d). RAMS develops cloud in the nighttime when WRF does not but does not match the ceilometer CBA (0-6 h). WRF develops low clouds too early (6-12 h) but follows well with the correct CBA while maintain CF too low into the daytime (12-18 h). RAMS misses morning clouds and creates CBA well above those of the ceilometer during the afternoon (12-18 h). Thus, for the 25 May example, WRF performs much better during the daytime (12-24 h). However, WRF cuts the clouds off too soon in the evening (after 24 h) where RAMS maintains clouds, but at a much lower CBA then detected by the ceilometer. These models provide different configurations that affect their ability to forecast clouds. WRF uses a cumulus parameterization which may improve its convection derived clouds where RAMS, following the suggested guidelines, does not. Both models initiate the rise of CBA (convection) too quickly after sunrise, regardless of parameterization [23]. Additionally, the WRF microphysics package uses only single moment for cloud water while RAMS uses a double moment microphysics package which may be why RAMS predicts CF slightly better than WRF [40]. Differences in soil initialization and treatment probably had some effect as well as the differences in domain size and grid spacing; the WRF domain was smaller ( Figure 1) but with slightly larger horizontal grid spacing than RAMS.

Summary of Typical Forecast Errors
We also explored whether there were tendencies in the errors as a function of low, medium, or high CBA and low, medium, and high CF. To assess whether CF played a role in the CBA errors, we separated the forecast cases into high CF (CF > 0.8), medium CF (CF = 0.4 to 0.8) and low CF (CF > 0 to 0.4) based on ceilometer CF ( Figure 10) and analyzed the distribution of CBA for each range. Clear sky (CF = 0) was not considered as a measured ceilometer CBA did not exist for these cases. Figure  10 clearly highlights the failure of both models to forecast clouds consistently as shown by the band of data across ceilometer CBA but where the models show clear sky. Regardless of CF, Figure 10 indicates that RAMS CBA is more often higher than the measured CBA, while WRF is more often lower. As previously shown (Figure 9), WRF has a bias in creating low clouds (fog) for all CF. RAMS frequently creates cloud with CBA just below 4 km. WRF predicts CBA better than RAMS for low and mid CF with correlation coefficients (R) of 0.35 (WRF low CF), 0.30 (WRF mid CF), 0.27 (RAMS low CF) and 0.24 (RAMS mid CF). However, RAMS predicts CBA better than WRF with high CF (0.39 vs. 0.28), which is not clear in Figure 10. These models provide different configurations that affect their ability to forecast clouds. WRF uses a cumulus parameterization which may improve its convection derived clouds where RAMS, following the suggested guidelines, does not. Both models initiate the rise of CBA (convection) too quickly after sunrise, regardless of parameterization [23]. Additionally, the WRF microphysics package uses only single moment for cloud water while RAMS uses a double moment microphysics package which may be why RAMS predicts CF slightly better than WRF [40]. Differences in soil initialization and treatment probably had some effect as well as the differences in domain size and grid spacing; the WRF domain was smaller ( Figure 1) but with slightly larger horizontal grid spacing than RAMS.
We also explored whether there were tendencies in the errors as a function of low, medium, or high CBA and low, medium, and high CF. To assess whether CF played a role in the CBA errors, we separated the forecast cases into high CF (CF > 0.8), medium CF (CF = 0.4 to 0.8) and low CF (CF > 0 to 0.4) based on ceilometer CF ( Figure 10) and analyzed the distribution of CBA for each range. Clear sky (CF = 0) was not considered as a measured ceilometer CBA did not exist for these cases. Figure 10 clearly highlights the failure of both models to forecast clouds consistently as shown by the band of data across ceilometer CBA but where the models show clear sky. Regardless of CF, Figure 10 indicates that RAMS CBA is more often higher than the measured CBA, while WRF is more often lower. As previously shown (Figure 9), WRF has a bias in creating low clouds (fog) for all CF. RAMS frequently creates cloud with CBA just below 4 km. WRF predicts CBA better than RAMS for low and mid CF with correlation coefficients (R) of 0.35 (WRF low CF), 0.30 (WRF mid CF), 0.27 (RAMS low CF) and 0.24 (RAMS mid CF). However, RAMS predicts CBA better than WRF with high CF (0.39 vs. 0.28), which is not clear in Figure 10.  Table 2).
Performing the opposite analysis, assessing CF based on categorization by measured CBA, showed similarities with RAMS correlations, showing it was better than WRF at predicting CF for all divisions; very low CBA (0.31 vs. 0.18, <500 m), low CBA (0.34 vs. 0.33, 500-2500 m), mid CBA (0.09 vs. 0.05, 2500-5000 m) and high CBA (0.12 vs. 0.00, >5000 m). These results indicate that RAMS performs better at predicting CF and CBA at high CF (stratus), whereas WRF performs better at predicting clouds at lower CF (cumulus).

Conclusions
This study assesses the ability of two operational mesoscale models (RAMS and WRF) to forecast CBA and CF at SRS in the southeastern US during the spring of 2020. We utilize monthly averages, forecast averages and a RMSE scoring method to assess model forecasts using observations from a ceilometer. WRF better predicts daytime CBA primarily from convection but it suffers from too low CBA at night and low CF overall, particularly during periods of low clouds or fog. RAMS better predicts nighttime CBA and seems to better maintain convection at night but suffers from high CBA in the daytime due to the incorrect reduction in CBA at morning. RAMS model improvement in the second evening highlights this diurnal dichotomy over possible trends due to a single initialization time. RAMS CF is also too low overall, but Figure 3 shows it to be better than WRF. Additionally, comparisons indicate that RAMS CBA is generally biased high whereas WRF CBA is biased low. WRF better predicts CBA at lower CF but RAMS better predicts CBA at high CF. However, CF forecast scoring was determined to be less conclusive in identifying the strengths or shortcomings of each model relative to the other.
The ability of cloud droplets to interact with atmospheric constituents allows for modification to dispersion. This characteristic has become more important to dispersion modeling, and this assessment highlights this need. These results indicate that a combination of mesoscale models for  Table 2).
Performing the opposite analysis, assessing CF based on categorization by measured CBA, showed similarities with RAMS correlations, showing it was better than WRF at predicting CF for all divisions; very low CBA (0.31 vs. 0.18, <500 m), low CBA (0.34 vs. 0.33, 500-2500 m), mid CBA (0.09 vs. 0.05, 2500-5000 m) and high CBA (0.12 vs. 0.00, >5000 m). These results indicate that RAMS performs better at predicting CF and CBA at high CF (stratus), whereas WRF performs better at predicting clouds at lower CF (cumulus).

Conclusions
This study assesses the ability of two operational mesoscale models (RAMS and WRF) to forecast CBA and CF at SRS in the southeastern US during the spring of 2020. We utilize monthly averages, forecast averages and a RMSE scoring method to assess model forecasts using observations from a ceilometer. WRF better predicts daytime CBA primarily from convection but it suffers from too low CBA at night and low CF overall, particularly during periods of low clouds or fog. RAMS better predicts nighttime CBA and seems to better maintain convection at night but suffers from high CBA in the daytime due to the incorrect reduction in CBA at morning. RAMS model improvement in the second evening highlights this diurnal dichotomy over possible trends due to a single initialization time. RAMS CF is also too low overall, but Figure 3 shows it to be better than WRF. Additionally, comparisons indicate that RAMS CBA is generally biased high whereas WRF CBA is biased low. WRF better predicts CBA at lower CF but RAMS better predicts CBA at high CF. However, CF forecast scoring was determined to be less conclusive in identifying the strengths or shortcomings of each model relative to the other.
The ability of cloud droplets to interact with atmospheric constituents allows for modification to dispersion. This characteristic has become more important to dispersion modeling, and this assessment highlights this need. These results indicate that a combination of mesoscale models for predicting clouds that affect dispersion (WRF for daytime and RAMS for nighttime) may be necessary. However, with 50% (RAMS) and 46% (WRF) rates of predicting clouds (Table 2), neither model provides a reliable cloud forecast. While it is possible that using a larger area or probabilistic methods could improve these prediction rates, dispersion modeling is highly dependent on the location and time of clouds for interactions to occur. Thus, future research needs to focus on similar high spatial and temporal resolution comparisons of clouds. Improved configurations, initialization and data assimilation may improve these cloud forecasts. Improvements and adjustments to SRS operational models, such as implementing a double moment microphysical scheme for WRF and a cumulus parameterization for the outer domain of RAMS, have been identified though this study with the plan to improve onsite dispersion modeling.