1. Introduction
Applying new methods to improve the accuracy of regional non-hydrostatic numerical weather prediction (NWP) has always been one of the most heavily investigated topics in the meteorological research society. To improve the model’s accuracy could be understood as reducing the errors of the model. The origin of the errors of a regional mesoscale model can be dissected into three main categories. First, errors can come from the imperfect integration method applied in maintaining the stability of the discretized model [
1]. Secondly, errors can arise from the underlying physics within the model, as the parameterizations are heavily dependent on the scale of the weather phenomena that it aims to predict. Last but not least, errors can also originate from the data inputs used for generating the forecasts. With erroneous input data, even a perfect model will be prone to generating erroneous simulation results. The first two categories are more related to the internal errors that exist in the model itself; however, the last one is more related to the quality of the input data. Thus producing input data that closely resemble the true state of the atmosphere for the model is very important to generate an accurate forecast, and as essential as improving the underlying dynamics, physical schemes and the structure of model.
In order to reduce the error of the input data used for initializing the model, one of the most popular methods is to assimilate observations onto the initial atmospheric field, and this method is known as data assimilation. Kalnay [
2] provides readers with a fundamental understanding of data assimilation, as well as introducing different state-of-the-art data assimilation methods that were applied and are being applied in most of the meteorological agencies. As we are aiming for an efficient and less resource-demanding data assimilation strategy, the three-dimensional variational (3DVAR) data assimilation method is considered to be the best option to be apply to the forecasting model to improve the initial condition. Numerous studies applied 3DVAR on a non-nested regional mesoscale model or a single domain ([
3,
4,
5,
6], etc.). While these studies show that 3DVAR leads to significant improvements in the initial conditions and has proven to have positive effects on the initial 48 h of forecasts, limited efforts were made to investigate the effect of applying 3DVAR to several nests of a regional mesoscale model and how it impacts the post-48 h time period of the forecasts. Additionally, most 3DVAR applications are event-based and short-term forecasting. Fan [
7] applied 3 h cycling 3DVAR to a nested 27 km/9 km/3 km domain, focusing on the prediction of 0–6 h accumulated rainfall of a specific heavy rain event that occurred in Beijing, China. Jianfeng [
8] targeted a specific typhoon case to evaluate the effectiveness of 3DVAR. They improved the initial condition by optimizing the background error statistics (BES) and tuning the scale-length parameter. Many other 3DVAR-related studies also aimed to investigate specific weather phenomena ([
5,
9,
10,
11], etc.). While these studies make a notable contribution to the short-term forecasting of severe weather events, they are not representative enough to demonstrate the performance of 3DVAR for general day-to-day weather forecasting.
In the Environment and Sustainability department of Hong Kong University of Science and Technology (HKUST), one of our goals is to be able to accurately predict the dispersion of pollutants within the Pearl River Delta (PRD) region, in which the atmospheric input heavily relies on the accuracy of the weather forecasting model. In order to create a more accurate atmospheric condition to drive the air pollution model, numerous methods have been researched, tested, and applied to our weather forecasting model, such as considering the frictional force induced by urban effects in the ACM2 PBL scheme [
12] as well as utilizing the multi-layer urban canopy model (WRF-BEP) model [
13], which provides direct information exchange between the planetary boundary layer (PBL) and the buildings. With more sophisticated building height parameters, more accurate weather forecasts could be achieved [
14] within an urban landscape. However, these are improvements which reduce errors that are within the aspect of the model itself. While efforts were also spent on reducing the errors that lie within the input data, through updating the soil map and the soil moisture to improve the realism of the model regarding the latent heat energy that is stored in the ground [
15], the changes regarding the atmospheric field are affected in a passive way.
With the PRD region situated within the sub-tropical region and along the coastline of South-Eastern China, the weather patterns are season-dependent. In the summer, the South-Westerly monsoon dominate. The maritime air stream brings in moist and warm air mass from the ocean towards the region. In the winter, the north-easterly monsoon, together with the southward movement of Siberian continental air-mass, brings dry and cold weather to the region. Due to the drastic difference in weather patterns, focusing on a specific weather phenomena within a particular season does not show a complete analysis of the performance of 3DVAR. Therefore, in this study, we would like to evaluate and understand how 3DVAR behaves in general, without focusing on any specific weather event, and to focus on the post-48-h impacts of applying a 3DVAR data assimilation method using conventional observations on the Weather Research and Forecasting (WRF) model, which has a two-nested domains setup.
The intention of applying variational data assimilation to any NWP model is to generate the most precise and accurate estimate, close to the true state of the atmosphere at a specific analysis time, using the iterative method to find the solution of a prescribed cost-function, such as Equation (
1) [
16]. In this paper, the term WRF-Var will refer to the data assimilation component, which is responsible for the minimization and iterative process that consists of Equation (
1), while the term WRFDA refers to the entire WRF data assimilation system, including various other components.
The essence of the variational method is to find an analysis state
that could minimize
. This could be done by minimizing Equation (
1) iteratively until a suitable
is found, which contributes to obtaining a minimum of
. With the given priori data and error matrices, such as the background state
(including model error
B) and the observations
(including observation errors
E and
F),
could be shown as the posteriori maximum likelihood estimate of the real atmospheric condition [
17,
18]. The entire data assimilation system consists of several components: observation preprocessing (OBSPROC) unit, the background error unit (gen_be utility), the WRF-Var unit, and the update lateral boundary conditions unit (WRF_BC). These processing units are supplied with various observation data and model analysis, which are then processed to generate data files to input into the WRF-Var to perform the assimilation process.
When data assimilation is applied on a regional domain, WRF-Var extends the observation information and relays it beyond its original location using recursive filters as a diffusion operator [
19]. Updating the existing wrfbdy lateral boundary condition files (
) is an essential step and could be done via the WRF-BC utility [
1]. This update process allows
to remain consistent with
to avoid unexpected gravity waves being generated.
The objective of this paper is to understand the sensitivity of the NWP model towards the application of 3DVAR. The paper focuses on two parts. First, the effectiveness of the application of 3DVAR on different domain setups and evaluate its contribution towards different prediction times. Second, the seasonal effects on the response of the model towards the 3DVAR application. We organized the paper in the following fashion.
Section 2 (Model and data assimilation setup) would focus on the setup of the NWP model and the data assimilation protocol.
Section 3 (Data) would show the data we used for both the model and the data assimilation process. Then, in
Section 4 (Methodologies and Results), the methodologies of the studies would be laid out and the results of those studies would be presented. Lastly,
Section 5 (Conclusions) summarizes the study by highlighting the findings of the experiments carried out in this paper.
2. Model and Data Assimilation Setup
One of the most important inputs for variational data assimilation systems is the background error covariance B. It performs several important roles in data assimilation, such as: (i) determining the weight of the background state , so that the analysis can be defined to either match the observations more or less closely; (ii) projection of the spatial and multivariate characteristics of the model error; (iii) maintaing the physical balance between variables when applying increments.
The background error can be understood as the deviation of the model from the true state of the atmosphere. However, we do not possess the information of the truth, thus refraining us from producing the true background error. Therefore, a best estimate of the true atmospheric condition (
) has to be produced to substitute
, such that we can determine the background error via the forecast difference National Meteorological Center (NMC) method. In 3DVAR systems, the background error covariances are estimated offline and have similar characteristics to climatological statistics, which do not evolve with time, it is also not flow-dependent. This allows for a more computationally efficient NMC method to be applied. The NMC method uses forecast differences (e.g., T + 48 minus T + 24 for global model and T + 24 minus T + 12 for regional model) statistics to generate the background error covariances, with the assumption that both forecasts have the same model BIAS and uncorrelated errors [
20].
The WRF model with the Advanced Research WRF (ARW) dynamical core is used for generating the simulations. A parent domain and two nested domains are applied in this research, namely, domain 1 (D1), domain 2 (D2), and domain 3 (D3). The first domain D1 focuses on converting synoptic scale weather phenomena into mesoscale-
scale events that propagate through or develop within China. The second domain D2 collects information from D1 and acts as a buffer region for the conversion between mesoscale-
and mesoscale-
scales weather events that propagate or develop within the southeastern part of China. D3 focuses on the propagation and development of mesoscale-
scale weather within the Guangdong region of China. The specifications of these three domains are listed in
Table 1 and a spatial representation is shown in
Figure 1. The model applied in this study was being run in parallel and the domains are one-way nested with the static terrestrial data downloaded from the WRF download website [
21]. The geographical data include various geographical information which consists of the characteristics of the terrain on which the model is running. The model does not use restart and we consider the first 24 h as the spinning-up time for the model. We selected the two time periods for the model simulation, June 2015 and December 2015. In general, they represent the summer season and the winter season for our study, respectively, and in some experiments, only the winter season was evaluated (
Section 4).
The WRF Data Assimilation (WRFDA) System is being used for the data assimilation process and the gen_be utility is used to generate the background error covariances (BE) via the NMC method. One month of 12 h initiated WRF simulation data, set prior to the study period, are supplied to the gen_be_wrapper to estimate the forecast error covariance CV5 [
27]. Two sets of BE are generated, one for the summer and the other for the winter. Together with the BE, the observations are assimilated onto the initialization timestep with a window size of 10 min to maximize the usable data. We increased the VAR_SCALING (VS) of all the variables from 1.00 (default) to 3.00, following [
28]. The increase in VS is mainly because the error of the model error is generally underestimated, as shown in [
11,
29,
30]. Since observations that lie above five
of the error covariance are automatically rejected, the increase in VS increases the underestimated model error, thus allowing more observations to be assimilated into the model.
VAR_SCALING1 = 3.00, 3.00 (ψ)
VAR_SCALING2 = 3.00, 3.00 (χu)
VAR_SCALING3 = 3.00, 3.00 (T_u)
VAR_SCALING4 = 3.00, 3.00 (rh)
VAR_SCALING5 = 3.00, 3.00 (Psfc_u)
3. Data
The data used for the WPS unit of the WRF-ARW and WRFDA data assimilation system are described as follows.
Meteorological data: WPS requires an external meteorological source to provide atmospheric information to initialize the simulation. In this research, we use the NCEP operational Global Forecast System (GFS) analysis and forecasts, as the initial meteorological fields for the simulation. The data are 0.25 degree resolution, which has time steps at a 3-h intervals from 0 h to 240 h, initiated at 00, 06, 12, and 18 UTC daily, stored as GRIB2 format. The GFS analyses contain various variables, which include air temperature, albedo, cloud frequency, cloud liquid water/ice, sea surface temperature, skin temperature, soil temperature, upper air temperature, vorticity, etc. More information about the dataset can be found in [
31].
Two sets of observation data are applied in this research and they serve a different purpose during the simulation process. The two sets of observation data will be described as follows:
(1) NCEP ADP Global Upper Air and Surface Weather Observations (ds337.0): These datasets are provided by the National Centers for Environmental Prediction (NCEP), and they compose of a set of global operational surface and upper air reports. The observation data are reported at a time interval ranging from 1 to 12 h. All the data are stored in the PREPBUFR format, while ASCII and NetCDF formats are also available [
32]. Each observation in the dataset comes with a quality marker (qm), and a qm less than 3 are kept to ensure that good-quality observations are assimilated in this study. All the observations which passed the QC and are within the domain of our study are assimilated into the initial condition;
(2) The Environmental Central Facility Atmospheric and Environmental Database: These datasets are provided solely for educational and research purposes, by the Environmental Central Facility (ENVF) within the Institute for the Environment (IENV) of the Hong Kong University of Science and Technology (HKUST). They comprise reports of conventional surface observations gathered from GTS, Meteorological Terminal Aviation Routine Weather Report (METAR), and Surface Synoptic Observations (SYNOP), and the data went through several quality control measures. More information regarding these datasets can refer to [
33]. The purpose of these observation data is to evaluate the result of simulations both before and after 3DVAR which are applied to the model, so that we can understand the changes made by the data assimilation system, with the assumption that the observations used by WRF-VAR for the data assimilation and the observations used to evaluate the results are uncorrelated with each other. We selected four variables—2-m surface temperature, 10-m zonal and meridional wind, and 2-m relative humidity—as the variables for the evaluation process. In this research, since we are mostly interested in the behaviour of the model within the Pearl River Delta region, some of the evaluation on the middle domain (domain 2) would be shown for a more complete study, but only the outermost domain (domain 1) and the innermost domain (domain 3) would be evaluated in detail. All results were evaluated by the observation data located within the Guangdong province (
Figure 2), except the evaluation on the 27 km domain for the seasonal sensitivity study (
Figure 3).
5. Conclusions
Both the general sensitivity study and the seasonal sensitivity study broaden the understandings of the effect of the initial conditions and boundary conditions on the model’s performance. It is clear that 3DVAR has an impact on the forecast quality by generating initial conditions that represent the state of the atmosphere more accurately. The studies also provide insights into other possible factors that could affect the performance of 3DVAR. In this section, we summarize the findings of this study.
In the general sensitivity study, the domain sensitivity experiment shows that applying 3DVAR on both domain 1 (27 km) and domain 2 (9 km) is the optimal setup and yields the best results. The parent domain (domain 1) shows the greatest improvement through the application of 3DVAR, and it is observed that the effect of 3DVAR becomes less significant as the nest grows deeper and area of the domain becomes smaller. This also shows that applying 3DVAR on the parent domain is more essential than applying 3DVAR on the nests, as the 3DVAR effect on the parent domain remains dominant and is capable of spreading the information into the nests. The reduction in the spread and the shift towards zero distribution and the mean of and show that the majority of validation observations are benefited. It can be seen that the major improvements occur in the 3DVAR_DOMAIN_D1 case, and the performance is marginally better with 3DVAR applied on domain 2. This showcases the importance and the benefits of carrying out data assimilation on the parent domain. Using the results from the time period sensitivity experiment, we begin to understand the performance of 3DVAR on four time periods, 0–24 h, 24–48 h, 48–72 h, and 72–96 h, with 3DVAR only applied on domain 1 and domain 2. The results follow the conventional understanding that 3DVAR provides significant improvements in the first 24 h of the model run. However, at the same time, the results also show that there are significant improvements in BIAS and RMSE from 48 to 96 h, with 2 m temperature and 2 m relative humidity being the better performers. The seasonal sensitivity study indicates that 3DVAR performs better in the winter than in the summer for our model configuration. When we compare the winter case and the summer case to understand their discrepancy in the seasonal sensitivity study, it is shown that the location of the increments and the wind field plays a dominant role in the propagation of information within the domain. In this study, 3DVAR assimilates most of the observation data inland regardless of the season. In winter, the general wind pattern is from inland to the ocean (offshore wind) and the evaluation zone of the 27 km domain and the 3 km domain are situated downwind of the wind flow. This allows the information of the increments in the 27 km domain to pass down to the evaluation zone (3 km domain), allowing the nest to obtain a more accurate boundary condition from the parent. The wind speed and direction along the boundary of the parent reduce the influx of new information from the boundary, allowing the 3DVAR increments to be sustained for a longer period of time. However, in the summer, the wind pattern is directly opposite to that of the winter. The wind flow is from the ocean towards inland (onshore wind). Since most of the assimilated observations are inland, there are not a significant amount of observations in the ocean; with the evaluation zone situated downstream of the onshore flow, 3DVAR did provide an equal amount of new information to the evaluation region. Aside from this, the wind direction along the edge of the domain promotes the transfer of new information from the parent’s boundary layer, aggravating the unsustainable effect of the 3DVAR. With the combiniation of effects mentioned above, 3DVAR is more efficient in the winter than in the summer.
In this study, we performed several important sensitivity tests, and the results of the experiments show that, in general day-to-day forecasting, the application of 3DVAR on the 27 km and 9 km domain is shown to be the optimal setup, providing observable and significant improvements in the winter season for the pearl river delta region. The 3DVAR performance excels when initial condition is not hugely affected by the boundary condition and the evaluation zone is situated downstream of the area where the majority of new information is assimilated. The study also shows that 3DVAR is not only suitable for short-term forecasting. In the right conditions, 3DVAR has the ability to increase model performance after a 24 h runtime, especially after 72 forecast hours.