Satellite Observation for Evaluating Cloud Properties of the Microphysical Schemes in Weather Research and Forecasting Simulation: A Case Study of the Mei-Yu Front Precipitation System

: Radiative transfer model can be used to convert the geophysical variables (e.g., atmospheric thermodynamic state) to the radiation ﬁeld. In this study, the Community Radiative Transfer Model (CRTM) is used to connect regional Weather Research and Forecasting (WRF) model outputs and satellite observations. A heavy rainfall event caused by the Mei-Yu front on the June 1, 2017, in the vicinity of Taiwan, was chosen as a case study. The simulated cloud performance of WRF with four microphysics schemes (i.e., Goddard (GCE), WRF single-moment 6 class (WSM), WRF double-moment 6 class (WDM), and Morrison (MOR) schemes) was investigated objectively using multichannel observed satellite radiances from a Japanese geostationary satellite Himawari-8. The results over the East Asia domain (9 km) illustrate that all four microphysics schemes overestimate cloudy pixels, in particular, the high cloud of simulation with MOR when comparing with satellite data. Sensitivity tests reveal that the excess condensation of ice at ≥ 14 km with MOR might be associated with the overestimated high cloud cover. However, GCE displayed an improved performance on water vapor channel in clear skies. When focusing on Taiwan using a higher (3 km) model resolution, each scheme displayed a decent performance on cloudy pixels. In the grid-by-grid skill score analysis, the distribution of high clouds was the most accurate among the three cloud types. The results also suggested that all schemes required a longer simulation time to describe the low cloud horizontal extend.


Introduction
Clouds play a crucial role in the Earth system. For instance, cloud coverage can affect the temperature [1,2] and global radiation balance [3,4]. In the global model, clouds are a crucial element of the Earth system because of their interaction with the longwave and shortwave radiation of the Earth and Sun. Furthermore, cloud coverage and cloud properties have substantial implications on forecasting convection initiation [5,6] and severe weather [7] at the regional level. Therefore, evaluating radiation is vital in numerical models.
The observed rain is a typical parameter to evaluate numerical weather prediction (NWP) model performance. However, the precipitation is initiated by the condensation of water vapor, followed by a sophisticated cloud-to-precipitation process. Therefore, obtaining information regarding cloud formation within NWP model simulation is an essential step for the evaluation of the model performance, because the NWP model combines the macrophysical and microphysical processes of the atmosphere and clouds [8][9][10]. In the numerical model, microphysics parameterization schemes are used to describe the nucleation of cloud droplets from water vapor to different hydrometeors explicitly. However, each parametrization method may have slightly different assumptions and provide different coefficients in the microphysics schemes, and thus the selection of the parametrization schemes will affect the results of model simulations [11][12][13][14]. Therefore, the validation and evaluation of different parameterization schemes in the numerical weather model are essential.
Conventionally, the performance of a numerical model is evaluated by comparison with available observations, such as surface observations, sounding data, or other kinematic or thermodynamic fields. However, the in situ observations of the convective systems, such as atmospheric profiles from radiosondes or direct measurements from aircraft penetrating cumulus clouds, have insufficient temporal and spatial distribution, which hinders comprehensive validation of the performance of the models. Advances in meteorological satellite technology have collected consistent and high spatiotemporal resolution remote sensing data using satellites. Therefore, the simulation performance of the model can be evaluated comprehensively by using meteorological satellite data. However, the satellite sensor measures reflectance and radiance of the visible and infrared bands, respectively. Both represent the reflected and emitted radiation from a particular area and azimuth angle, within a specific wavelength range. The radiative quantities cannot be directly compared with the geophysical variables produced by the NWP model because of unmatched natural properties.
To evaluate the performance of the NWP model output with satellite observations, two typical methods are commonly adopted. One of the evaluation methods uses satellite observed reflectance and radiances to retrieve geophysical variables, such as atmospheric moisture, temperature, cloud characteristics, and surface information. Previous studies have used retrieval data to analyze the performance of numerical models of atmospheric water vapor, cloud properties, precipitation, and atmospheric thermodynamic state, among others [15][16][17][18]. However, the retrieval technique may solve an ill-posed problem, which consumes more computational resources, and may cause uncertainty in the retrieved products. The second method uses a radiative transfer model to perform forward calculation to convert the geophysical variables from model outputs to simulated observations by satellite, also known as "model-to-satellite evaluation" [19][20][21][22][23][24][25][26]. This method has the benefit of using fewer computational resources and avoids the uncertainties caused by ill-posed conditions in the retrieval algorithm. As the radiometry observations are sensitive to the cloud properties, profiles of the atmospheric and surface features, "model-to-satellite evaluation" method, is an economic and efficient tool for evaluating the NWP model in studies on cloud formation, microphysical properties, and interactions with precipitation [27][28][29][30].
Studies on the characteristics of the microphysical parameterization schemes have typically compared with ground-based radar or polar-orbiting satellite (e.g., TRMM) observation to obtain the information on rainfall because of the low resolution of geostationary satellites [31][32][33]. Advances in satellites have enabled higher spatial, temporal, and radiometric resolution observational data. Therefore, high grid resolution NWP model simulations can more readily be evaluated against satellite observations. For example, Grasso et al. [34] used simulated satellite images to evaluate the performance of the Regional Atmospheric Modeling System (RAMS model) and identified coding errors in microphysical routines. Jankov et al. [35] conducted a simulation with five different microphysics schemes for the atmospheric river event and evaluated the influence on radiance for different microphysics schemes. Liu et al. [36] used the simulated satellite observations to evaluate the performance of the cloud distribution of each cloud type from Taiwan's Central Weather Bureau global forecast system using two different cumulus schemes with MTSAT-2 and identified characteristics of these cumulus schemes. Yao et al. [37] used FY-2E to evaluate the performance of the cloud in Weather Research and Forecasting (WRF) using a single microphysical scheme, and they illustrated the characteristics of different cloud types over mainland China.
To evaluate regional NWP model simulation for high-impact weather, the observation of ground-based radar reflectivity or rain gauge was commonly adopted. However, the results reflect the later stage or the final results but not the former stage of the precipitation event, which may cause a "correct answer because of wrong reason" situation. A rainfall event must go through all the stages, such as convection initiation and the formation of clouds. However, the model simulation does not necessarily capture the cloud structures. The uncertainty that occurs before the onset of precipitation has been investigated by Chung et al. [38]. Model evaluations of clouds should not neglect this factor.
The main goal of this study was to evaluate the radiance performance of an NWP model in the East Asian region using multichannel satellite images. A typical frontal system, namely Mei-Yu, was selected for our case study. The results were compared with geosynchronous satellites with high spatial and temporal resolution. The simulated Himawari-8 satellite imagery was produced using the simulation output from the WRF model coupled with the Community Radiative Transfer Model (CRTM) forward model. As the simulation of clouds was affected by the choice of the microphysics schemes in WRF, comparing the brightness temperature (BT) using various microphysics schemes may provide additional information (e.g., the cloud distribution, and cloud top height, etc.) regarding the characteristics of these microphysics schemes.
The remainder of this paper is organized as follows: Section 2 describes the precipitation event selected, the NWP model used, and the observed satellite. Section 3 introduces the radiative transfer model CRTM, the cloud classification method, and the verification methods. Section 4 presents the results, including the performance of clouds over East Asia with coarser resolution simulation, the analysis of cloud pattern for the frontal system near Taiwan with higher resolution simulation, and the moisture information on the clear pixels. Finally, Section 5 summarizes the findings of the present stage and presents future works.

Selected Mei-Yu Case
The Mei-Yu front event, which was accompanied by heavy rainfall on June 1-4, 2017, over Taiwan, was selected in this study. This frontal system began to affect Taiwan with southwestern airflow which brought precipitation to the mountainous regions of Central and Southern Taiwan on June 1 (local time). The front then approached the offshore of northern Taiwan in the early hours of June 2. Notably, the position of the frontal system remained in the north of Taiwan, delaying the actual rainfall time by approximately 6 h from the forecast time. A maximum hourly rainfall of >80 mm was observed at several stations in northern Taiwan. The 4-day (June 1-4) accumulated rainfall in the central and southern mountains exceeded 1000 mm in several places and caused floods, agricultural damage, and economic losses. Studies have analyzed this case and obtained results; for instance, Tu et al. [39] determined that a marine boundary layer jet that caused moisture transport was a crucial factor in this heavy rainfall case. Lupo et al. [40] investigated the sensitivity of WRF ensemble forecasts to the tuning configuration and implementation of Stochastic perturbed parameterization tendencies (SPPT) and individual independent SPPT (iSPPT) schemes with a heavy rainfall event.

Evolution of Convection and Precipitation
The structure of front moving toward Taiwan in the Mei-Yu season may change and decelerate because of enhanced ascending motion close to the terrain [41,42]. Furthermore, the southwest monsoon often forms a strong wind area near Taiwan because of the terrain effects and subtropical high [39,43], and the atmospheric conditions during the Mei-Yu season are often advantageous for the development of convection. Therefore, various mesoscale weather phenomena often accompany the transit of the Mei-Yu front in Taiwan. For instance, the frontal system often accompanies a mesoscale convective system (MCS).
Infrared satellite images can reveal the structure of the front and are used to track the MCS [43,44]. The satellite images during this Mei-Yu event revealed that the front is a complex system, composed of several convection systems (e.g., Chimney cloud with convective hot spots). Observation of metrological radars ( Figure 1) indicated that the rain band of the frontal system moved toward northern Taiwan on June 1 at 1800 universal time coordinated (UTC) and then slowed and stagnated for approximately 6 h. The stratiform rainfall area was parallel to the squall line, similar to a parallel-stratiform type convection system [45]. The disaster report from Taiwan's National Science and Technology Center for Disaster Reduction (NCDR) reported the daily accumulated rainfall from June 1 to 4 local solar time (LST) (May 31 at 1600 UTC to June 3 at 1600 UTC). Before June 1, the rainfall was mainly located in the south-central mountainous region because of the southwesterly moist flow. On June 2, the front stagnated in northern Taiwan before noon, causing intense rainfall at Keelung and the north coast area. The rainfall area gradually moved southward on June 3 and then remained in the central area. Finally, the rain began weakening on June 4.

WRF Model Configurations
The nonhydrostatic WRF model (version 3.9) was used in this study. The simulation was initialized on June 1, 2017 at 1200 UTC. Three nested domains were applied over East Asia for the configuration (Figure 2). The model comprised a total of 52 vertical layers from the surface to 10 hPa, and the horizontal grid resolutions were 27, 9, and 3 km, respectively. The European Centre for Medium-Range Weather Forecasts ERA-Interim (0.75 • × 0.75 • ) analysis was used as the initial conditions and boundary conditions. The physical parameterizations used in this study contained the Rapid Radiative Transfer Model for broadband longwave radiation schemes [46], the Dudhia shortwave scheme [47], the Yonsei University (YSU) planetary boundary layer scheme [48], and the Kain-Fritsch cumulus parameterization [49]. The cumulus scheme was only switched on in domains 1 and 2. Microphysics schemes, which are the main focus for evaluation purposes, were performed by selecting Goddard (GCE; Tao et al. [50]), WRF single-moment in 6-class (WSM; Hong et al. [51]), WRF double-moment in 6-class (WDM; Lim et al. [52]), and Morrison (MOR; Morrison et al. [53]) schemes. Each model was integrated for 24 h with a time step of 90, 30, and 10 s in domains 1, 2, and 3, respectively.

BT from Satellite Observation
The Japanese Himawari-8 satellite data was used as a reference to evaluate the performance of the model simulation. Himawari-8 is a new-generation geostationary meteorological satellite that was launched on October 7, 2014 at 0516 UTC and reached its operational orbit in October 2014, at 35,786 km and 140.7 • E with 16 bands that cover the wavelengths from visible to IR. The spatial resolution at nadir was 0.5 or 1 km for visible and near-infrared bands and 2 km for infrared bands [54]. The onboard Advanced Himawari Imager (AHI) enabled the performance of a full disk scan within a 10-min interval. Compared with the previous generation of geostationary meteorological satellites (e.g., Himawari-7, so called MTSAT-2), the onboard AHI had the advantages of enhanced observation capabilities. The number of observation bands increased from 5 to 16 [55], and the spatial and temporal resolutions improved significantly. Furthermore, the increased observation samples and denser data update rate provided more information on the atmosphere.
In this study, four channels were used to evaluate the cloud features from the model simulations. These four channels were the atmospheric window channel (10.4 µm) along with three water vapor channels (6.2, 6.9, and 7.3 µm). These four channels have several applications; for instance, the window channel can represent surface or cloud-top temperature and water vapor channels are associated with the moisture information at different altitudes. (Table 1). Table 1. Characteristics of the four Himawari-8 infrared channels used in the present study.

CRTM
With the atmospheric thermodynamic and dynamic states that are simulated by the NWP model, we can use a radiative transfer model as an observation operator that considers the emission, scattering, and absorption process of radiation in the atmosphere, to obtain a proxy of the radiance observed by satellite. In this study, the information on the cloud and atmospheric conditions produced by WRF were coupled with a forward calculation model, the CRTM, to compute the simulated infrared (IR) BT as the observation using an AHI onboard the Himawari-8.
The CRTM was developed at the Joint Center for Satellite Data Assimilation, which is a multiagency research center to improve the use of satellite data for analyzing and predicting the weather, the ocean, the climate, and the environment in the United States, to simulate radiances for a given sensor from the input of meteorological variables under all atmospheric and surface conditions [56].
The temperature, pressure, water vapor, surface type, altitude given by the WRF are directly inputting to CRTM to calculate the IR BTs and obtain the corresponding radiance from the top of the atmosphere. However, the WRF provides the cloud properties with the mixing ratio of each hydrometeor (e.g., mixing ratio of cloud, ice, rain, graupel, and snow), which cannot be used directly for CRTM. In CRTM, the effective radius and water path can be used to determine the optical properties for each layer.
Martin et al. [57] suggested that a suitable parameterization for effective radius is formulated as: where q x , µ x , and N ox , respectively, represent the mixing ratio, shape factor, and intercept parameter; of each hydrometeor and ρ a is the density of air. The density of each hydrometeor (ρ x ) is given follow the setting of different MP scheme. Since N 0x = N x λ x , the number concentration simulated by the double scheme can be considered in the calculation. For MOR, the effective radius of ice follows the method in the source code of WRF version 3.9.1. For WSM, WDM, and GCE, the effective radius of cloud ice particles is obtained by the relationship between ice crystal mass and ice crystal radius, following the method used by Hong et al. [58]. In this study, the formulation of cloud ice effective radius follows Yao et al. [37]: , 10 3 , 10 6 (3) where c = 5.38 × 10 7 and d = 0.75 are constants and M i is the ice crystal mass. qi, the mixing ratio of ice cloud, is given by WRF. Before using the CRTM, users must convert the meteorological fields at levels to the layers. Layer temperatures (pressure) were calculated as an average of two levels of temperatures (pressure). The layer water path for each hydrometeor was calculated as the layer water content multiplied by the air density multiplied by the thickness of the layer. The variables used in the CRTM are listed in Table 2.

Match of the Horizontal Resolution
As mentioned above, the spatial resolution of Himaari-8 at nadir was 2 km for infrared bands and the horizontal resolution of simulated BTs was the same as WRF model configuration. The difference in the horizontal resolution should be considered. When comparing the simulated and observed BTs in domain 2 (9-km resolution), the mean of observation points in the model grid was used as the reference. On the other hand, for the comparison in domain 3 (3-km resolution), since the horizontal resolution is close to the observation, the nearest neighbor interpolation was introduced to get the observed BTs on the model grid.

The Classification of Cloud
The brightness temperature differences (BTDs) between different channels are commonly used to obtain information on cloud properties and atmospheric conditions. For example, the BTDs between 6.9 and 10.4 µm channels are used to detect the cloud-top height relative to the tropopause [5]. As the temperature decreases vertically and the 6.9-µm channel has strong water vapor absorption, the BTDs values between the 6.9 and 10.4 µm channels are usually negative. The BTDs over clear sky are often lower, because of the sufficient moisture to absorb the energy emitted by high-temperature surfaces. Therefore, the BTDs between 6.9 and 10.4 µm can be used to classify clouds at different altitudes.
In this study, the definitions of each cloud type show in Figure 3 followed those presented by Yao et al. [37]. BTDs between 6.9 and 10.4 µm of less than −40 K were defined as clear pixels, and the others were defined as a cloudy sky. Furthermore, the cloudy sky was further classified depending on the cloud type. The high-level cloud tops were represented by a BTD between −10 and 0 K; the mid-level cloud tops were associated with a BTD between −30 and −10 K; BTDs between −40 and −30 K were defined as low cloud top.

Statistics and Evaluation Method
To quantitatively evaluate the performance of a model for different cloud-top type, categorical verification methods were employed. Cloud type was defined and used to evaluate the performance of each model configuration. This method is also called binary verification. Table 3 illustrates the combination of the different cloud cases of model simulation and observation. For example, four conditions can be defined for "high cloud" cases, hits (a) represent the high cloud occuring in the simulation and observation; false alarms (b) occur when the high cloud is presented by the simulation but not by the observation; misses (c) are the opposite of false alarms (b); correct negatives (d) are the opposite of hits (a). The Probability of Detection (POD), Threat Score (TS), False Alarm Ratio (FAR), Frequency bias (FBIAS), and Equitable Threat Score (ETS) [59] score were computed. These scores were formulated as follows: The scores of these parameters are between 0and 1 except FBIAS. POD, TS, and ETS equal to 1 represent a perfect forecast. FAR is fraction of the predicted "yes" events that actually did not occur. FAR = 0 means the forecast has no false alarm. If the FBIAS is greater than 1, it means the overestimation of the cloud case, and vice versa.
To quantify the error of each cloud type, the mean absolute error (MAE) and the mean bias error (MBE) were introduced in the present study, formatted as follows: where M i and O i are the simulated and observed BTs, respectively.

Results
A comprehensive analysis of both the traditional method and the "model-to-satellite" technique could qualitatively and quantitatively analyze the systematic bias or random error of NWP and radiative transfer models. However, each evaluation method may focus on the specific element of the model. In this section, the weather field at different levels and the "model-to-satellite" radiance of different channels for cloudy or clear pixels are evaluated.

Conventional Diagnostic of Simulated Meteorological Fields
The performance of a numerical model is traditionally verified by comparing meteorological fields (e.g., 500-hPa geopotential heights, surface wind, surface temperature) to available observations. In this study, the simulated 500-hPa geopotential height field and temperature field, the near-surface 2 m temperature and 10 m wind, and the 24-h accumulated rainfall were compared with observation or reanalysis data.

Meteorological Field
The weather chart is presented for different levels of reanalysis data and simulation (Figure 4). We compared the reanalysis field to the four simulations of the 9-km resolution, and the results revealed no major differences in the fields among models run using different microphysics schemes (not shown). The model simulations captured the 500-hPa trough at the western Pacific and the ridge over China (Figure 4a Figure 5 illustrates the observed and simulated 24-h accumulated rainfall. The charts reveal the distribution of the heavy precipitation around northern Taiwan and the mountainous region in this event. Figure 5b illustrates the model simulation of 24-h accumulated rainfall using four microphysical schemes that have similar distribution (not shown). Overall, the results revealed that the simulation using four schemes qualitatively captured the distribution of the 24-h accumulated rainfall. However, the simulation slightly underestimated the extreme rainfall in northern Taiwan. The underlying reason may be that the simulations were slightly delayed in the position of rain bands. Furthermore, the results of each simulation roughly captured the rainfall in the central and southern mountainous regions.

Comparison Between Simulated and Observed BTs
The "model-to-satellite evaluation" was performed directly by comparing the simulated and observed BTs. The verification bands contained a window channel and three water vapor channels, which have different weighting on the vertical of the atmosphere. Furthermore, a cloud classification method was introduced to verify the performance on different cloud types.  (Figure 6i,j) resulted in low BT covering with too much high cloud in the domain. The remaining simulations resulted in similar distributions, but some small-scale clouds were displayed in the model run using GCE, which caused the images to be a little grainy. Although the error that is caused by either observation or forward model was not considered in the present study (Figure 6), only the relative differences among the forward model calculations and observation are the main concern but not the absolute values in the simulated BTs from forward calculation.
On June 2 at 0000 UTC (12-h simulation), all four model simulations exhibited a cold bias at the East China Sea, which might be correlated to either a larger frontal system coverage or higher altitude in clouds against to the observed BT. Furthermore, the movement of the cloud bands may be slightly delayed. Since the atmospheric factors may bias BTs, the result indicates that the model cannot completely represent the atmospheric conditions. The 24-h simulation, valid on June 2 at 1200 UTC, using the MOR scheme displayed similar characteristics in the 12-h simulation. Approximately half of the MOR D02 displayed a cold bias because of excess cloud cover in the simulation. The simulations with GCE (Figure 6d Figure 7 displays the BT histograms of the probability of occurrence using the four microphysical scheme model simulations and the corresponding satellite observations. The BT histogram focuses on the distribution of BTs to enable the suppression of the error caused by the weather system location. The probability of occurrence, valid on June 2, 2017 at 0000 UTC and 1200 UTC is presented in Figure 7a,b, respectively. The observations revealed that the highest probability of occurrence occurred at approximately 292 K. However, the simulations with four microphysical schemes all displayed the highest probability at higher BT. The BT of the window channel was >285 K, which suggests low cloud or clear sky, and thus the maximum probability of occurrence appears at higher BT, which indicates that each scheme underestimated the cloud-top height or that the surface temperature was positively biased. A BT of <235 K suggests a high cloud case. A higher probability of high cloud was observed in all four schemes comparing with the observation, suggesting an overestimation of high cloud, especially in the model run using the MOR scheme comparing among the four model configurations. Furthermore, results indicated that the performance of WSM and WDM displayed similar BT distributions. When the BT was <270 K, the curve of WSM and WDM were almost overlapping. The design of WSM and WDM are similar in the cold rain processes. The results were valid on June 2, 2017 at 1200 UTC (Figure 7b) which indicates that MOR still has too much high cloud cover over domain 2. The observed primary maximum BT was captured by GCE, WDM, and MOR. However, the model run with MOR underestimates the percentage of the occurrence. The simulated peak of BT with GCE and WDM matches better with observation, compared to WSM. The simulation with WSM and WDM was demonstrated to have a similar distribution of the probability of occurrence when the BT was <270 K.

BTs in Water Vapor Channels
The water vapor channels contain moisture information at various altitudes. Three of the water vapor channels used in this study had distinct vertical weighting functions [60]. The peaks of the weighting functions are at the pressure levels of approximately 340, 440, and 620 hPa, with 6.2, 6.9, and 7.4 µm, respectively. As the weighting at the surface is similar, the verification used the BTD to remove the information close to the surface. Therefore, BTDs between 6.2 and 6.9 and between 6.9 and 7.4 represented the high-level and mid-level water vapor, respectively. The verification was processed when clear pixels were simulated and observed simultaneously.
The high-level water vapor results are illustrated in Figure 8. The results suggest that the numbers of clear pixels are similar in GCE, WSM, and WDM. MAE indicates that the models run using WSM and WDM have very similar performances. The MOR results displayed higher MAE after a 10-h simulation, whereas the GCE results displayed a lower MAE after a 6-h model simulation. The MBE for each simulation revealed a cold bias. The simulation using GCE displayed the lowest MAE and MBE most of the time for both the high-level and mid-level (not shown), which may result in a better water budget.   Figure 9 displays the frequency of each cloud type valid June 2, 2017 at 1200 UTC (24-h simulation). The blue bars represent the results of the observation and the red bars represent the simulations with different microphysical schemes. The results revealed that all four models run overestimated the frequency of cloudy pixels. Excess cloud cover estimates were caused by overestimating high-level and mid-level clouds. An excess of high cloud was produced for the simulation with MOR compared with the observations. GCE, WSM, and WDM adjusted the frequency of low clouds after 24-h simulation compared with the 12-h simulation (not shown).    (Figure 10a) demonstrated that each scheme had a satisfactory performance on cloudy pixels, which indicates that the observed cloudy events were accurately captured by the simulation. When further investigating the cloud types, the high cloud displayed a better performance among the three cloud types. However, overestimation occurred in the cloudy pixel frequency and the POD score ignored false alarms. Therefore, the higher value of POD may not reflect an optimal simulation. The FAR results (Figure 10b) indicated that the value of each cloud type was >0.5, which indicates that at least 50% of simulated events are false alarms. This may be because the exact cloud types were not correct. The ETS score results (Figure 10c) also did not display a high-quality performance on different cloud types. Finally, the FBIASE score (Figure 10d) of the low clouds was near 1, which indicates that the amount of low clouds is closer to the observation at 24-h simulation. Furthermore, the bias of the high cloud for each scheme is >1. In particular, the MOR demonstrated a high cloud bias near 4, which indicated that the presented model estimated approximately four times the cloudy pixels recorded in the corresponding observation.

Probability Distributions of Hydrometeor Particles
A considerable amount of high clouds was displayed in the simulation with the MOR scheme when verifying the performance of BT. The different hydrometeors were examined because the distribution of hydrometeors may affect the simulation of BT. The probability distributions of cloud ice mixing ratio (Qi) (Figure 11) indicated that the WSM and WDM displayed similar performance. The GCE indicated that the highest mixing ratio of cloud ice was at approximately 8-13 km. However, the distribution of 10.4 µm BT did not perform with the most significant cold bias. On the other hand, the MOR tended to generate more cloud ice at higher altitudes which may contribute to this bias.

Sensitivity on the Cloud-Top Altitude
In Figure 11, the simulation with GCE had the highest mixing ratio of Qi but did not cover with more cold bias of BT in the model domain compared with other simulations. However, overestimation of the high cloud and a higher Qi at higher altitudes were demonstrated in the simulation with the MOR scheme. Therefore, we assumed that the overestimation of the high cloud with the MOR was affected by the distribution of Qi being too high. To verify this hypothesis, a sensitivity test experiment was designed to examine the influence of the vertical distribution of hydrometeors. To lower the distribution of the hydrometeors by a number of levels, the effective radius and water content of hydrometeors simulated by WRF with MOR was moved directly downwards by removing the bottom layers of the hydrometeors, so that the height on the number concentration (Ni) corresponded to the level of Qi. To maintain the numbers of vertical layers, the uppermost layers were assumed to have no hydrometeors. The reconstructed atmosphere was used as input to the radiative transfer model to compute the BTs. The experiment was designed to compare two parts: Experiment 1 was "Qi only", in which only the height of the cloud ice effective radius and ice water content was controlled; Experiment 2 was "All", in which the heights of all hydrometeors were simultaneously controlled.
The MAE and MBE were used to verify the improvements in the high cloud produced by MOR. The results of the MAE and MBE (Figure 12) indicated that lowering the height of all hydrometeors by 4-5 layers led to the largest improvements. However, lowering the height of Qi only resulted in similar improvements as lowering the heights of all hydrometeors. One might notice that the total amount of the hydrometeor might not conserve in the experiment, the results of two sensitivity tests ( Figure 12) suggested that Qi was the most crucial factor in the overestimation of the "high cloud" for the MOR scheme.

Evaluation of the Cloud Pattern Evolution
The synoptic-scale cloud features and the water vapor channel revealed the performance of the simulation using each scheme. In this subsection, the model simulations at domain 3 (3-km horizontal resolution) which focus on the main cloud band for the precipitation system event near the Taiwan are compared with high-resolution satellite observation to verify the performance of each scheme on the main cloud band.
Roebber [61] illustrated an alternative approach to demonstrate multiple verification measures called the "performance diagram" which uses the success ratio (i.e., the value of 1-FAR) as the horizontal axis and the POD score as the vertical axis to illustrate the performance. Moreover, because of the mathematical correspondence, the slope of the dotted line represents FBIAS and the shaded background represents TS. Figure 13 demonstrates multiple verification measures of simulation accuracy on the performance diagram. Each figure illustrates the simultaneous performance of various cloud classifications and skill scores. The perfect performance will make the dot display in the top-right corner. The verification results suggest that the improved skill could be identified in the cloudy pixels. In particular, a higher TS and a close to 1 FBIAS value with the simulation using the MOR scheme, indicates that this scheme has less deviation and is closer to the observation. However, this part of the verification only focused on the main cloud area and thus the simulation using the MOR scheme displayed the most favorable performance. The verification of clear pixels revealed that GCE, WSM, and WDM overestimated the clear pixels, whereas MOR underestimated it slightly after the 6-h simulation. Furthermore, the verification results of different cloud types revealed that each model simulation displayed a superior performance on high cloud pixels. The performance of the simulation of low clouds was initially poor, but the verification score rises after 18-h simulation. This might be because the convections in the frontal system were weakening after 18 h from initialization of model. Thus, an improved skill in later stage of low cloud simulation was identified in the bottom row of Figure 13.

Conclusions
The main goal of this study was to assess the performance of model simulation objectively using various microphysical schemes. To this end, new-generation multichannel meteorological geostationary satellite observations were used as the reference. The model evaluation using higher spatiotemporal satellite observations could be considered an optimal and comprehensive method. We combined an NWP model (i.e., WRF) and the CRTM to simulate the multichannel BTs from the AHI onboard the Himawari-8 geostationary satellite. Therefore, the entire NWP model domain could be evaluated in an advanced method by comparing the simulated BTs to observations.
A classical stationary front in the Mei-Yu season in Taiwan on June 1-3, 2017, was simulated by WRF using four different microphysical schemes, which includes two single-moment schemes (GCE and WSM) and two double-moment schemes (WDM and MOR). The results of each model simulation using different microphysical schemes revealed that they captured the dynamic and thermodynamic meteorological field (i.e., the 500-hPa geopotential height synoptic-scale trough on the east of Taiwan, and the ridge over China agrees with the reanalysis data). The model also accurately represented the wind shear and strong temperature gradient over the western Pacific, associated with the frontal system. Compared with the observations, the 24-h model simulation of the total accumulated rainfall predicted the local maximum rainfall patterns accurately across the central and southern mountainous regions. Overall, the results of the synoptic conditions and total accumulated rainfall indicated that the WRF simulation with four MP schemes represented this event accurately and similarly qualitatively.
Comparison between simulated and observed BTs provided the following results: 1.
In the 10.4 µm BT, the results revealed a large cold bias in the simulation using MOR, which was caused by modeling the clouds at higher altitudes. The simulations with GCE, WSM, and WDM captured the distribution of the main cloud band more accurately. The probability of occurrence indicated excessive high cloud occurrences in the simulation with each scheme, and especially in the model run using MOR. The performance of WSM and WDM had similar BTs distributions when the BT was <270 K, which may be because WSM and WDM only differ for warm cloud processes.

2.
The grid-by-grid evaluation revealed that observed cloudy events were accurately captured by the simulation using four schemes. Furthermore, high cloud pixels displayed a higher accuracy than mid and low clouds.

3.
The moisture information from clear pixels indicated that the mean bias was positive in the middle layer (approximately 620-420 hPa) and negative in the upper layer (approximately 420-340 hPa). GCE displayed a lower MAE and MBE in most instances, which may result in a better water budget.
Overall, this study demonstrated the potential of WRF coupled with CRTM in producing multichannel satellite imagery as a model performance evaluation tool. The satellite observations provided wide spatial coverage data, which are useful in comprehensively evaluating the NWP models. As only one case was analyzed, some results might be not robust in the present study. The characteristics of these microphysical schemes in different weather systems (e.g., mesoscale typhoon, afternoon thunderstorm, etc.) or the long-term evaluation (e.g., ≥15-day simulation) will be examined in future studies, since the evaluation system has been constructed. Moreover, the effect of other physical parameterization methods in WRF over East Asia can be evaluated from the perspective of radiance. Furthermore, a recent study (e.g., [62]) revealed that the microphysical properties and optical characteristics of the clouds could be estimated from the geostationary orbit. The WRF simulated hydrometers variables may have direct intercomparison against to satellite retrievals, so that the microphysical scheme might be enhanced through the optimal turning from the synergistic analysis between models and satellite data sets.