Reconstruction of the Radar Reﬂectivity of Convective Storms Based on Deep Learning and Himawari-8 Observations

: Radar reﬂectivity (RR) greater than 35 dBZ usually indicates the presence of severe convective weather, which affects a variety of human activities, including aviation. However, RR data are scarce, especially in regions with poor radar coverage or substantial terrain obstructions. Fortunately, the radiance data of space-based satellites with universal coverage can be converted into a proxy ﬁeld of RR. In this study, a convolutional neural network-based data-driven model is developed to convert the radiance data (infrared bands 07, 09, 13, 16, and 16–13) of Himawari-8 into the radar combined reﬂectivity factor (CREF). A weighted loss function is designed to solve the data imbalance problem due to the sparse convective pixels in the sample. The developed model demonstrates an overall reconstruction capability and performs well in terms of classiﬁcation scores with 35 dBZ as the threshold. A ﬁve-channel input is more efﬁcient in reconstructing the CREF than the commonly used one-channel input. In a case study of a convective event over North China in the summer using the test dataset, U-Net reproduces the location, shape and strength of the convective storm well. The present RR reconstruction technology based on deep learning and Himawari-8 radiance data is shown to be an efﬁcient tool for producing high-resolution RR products, which are especially needed for regions without or with poor radar coverage.


Introduction
Severe convective weather caused by convective storms poses a significant risk to aircraft flight safety [1]. Weather radar can provide decision-making support for airlines by accurately detecting the occurrence, movement, and disappearance of convective storms. Radar reflectivity greater than 35 dBZ tends to indicate the presence of severe convective weather [2]. However, weather radar equipment is lacking in certain areas [3] such as alpine areas and the ocean.
Space-based satellite observations provide key data to obtain radar reflectivity information. For example, some sensors carried by satellites provide infrared brightness temperature (BT) data that can detect the activity of convective clouds. Satellite sensors also feature high spatial and temporal resolutions, allowing them to track convective events in a timely manner. The threshold characteristics of satellite bands (such as single bands), brightness temperature differences (BTDs), and their temporal trends can be used to identify convective systems [4][5][6][7]. Moreover, compared with radar data, space-based satellite observations cover larger areas, enabling the reconstruction of data such as the radar reflectivity, vertical liquid water content (a type of weather radar data), and precipitation to represent severe convective weather.
Some traditional precipitation reconstruction algorithms use the physical properties of cold and warm clouds to establish the relationship among the cloud-top BT and the rainfall probability and intensity [8]. For instance, Arkin and Meisner [9] developed the Geostationary Operational Environmental Satellite (GOES) Precipitation Index algorithm based on the coverage of cold clouds (with a cloud-top BT below 235 K). More recently, Sun et al. [10] established a rain intensity check table to estimate rainfall by matching Himawari-8 multichannel infrared BT data, infrared BTDs, and global precipitation measurement core observatory rainfall data.
Recently, deep learning (DL) has greatly accelerated the development of artificial intelligence [11], and remarkable progress has been achieved in some industries. For example, DL is known for its strong image processing capabilities in the field of computer vision [12][13][14][15][16] and medical diagnosis [17,18]. Compared to traditional algorithms, DL algorithms can map non-linear processes through the filtering capabilities of a nonlinear activation functions. In particular, with its rapid development, DL has gradually shown substantial potential in the earth sciences. DL can not only fully understand the spatial shape of an event (such as a typhoon or a thunderstorm) but also can determine the dynamic evolution of the event by integrating the time series [19].
Some studies have investigated the reconstruction of data based on DL and infrared BT data. Beusch et al. [20] developed and compared satellite rainfall retrievals based on generalized linear models and artificial neural networks, and showed that the latter further improves prediction skills by additionally reducing false alarms. Veillette et al. [21] used a convolutional neural network (CNN) to reconstruct vertical liquid water content data from GEOS-R and a numerical weather prediction, and demonstrated that CNNs can fuse heterogeneous geospatial data to extrapolate data to regions beyond the coverage of the weather radar. Wang et al. [22] developed a novel DL-based infrared precipitation estimation algorithm using a CNN and verified the effectiveness of a CNN combined with multichannel physical inputs in the retrieval of infrared precipitation information; the results showed that a five-channel input is more efficient at estimating precipitation than the commonly used one-channel input. Hilburn et al. [23] aimed to develop techniques through U-Net to assimilate GOES-R observational radiation data and lightning data in precipitation scenarios to improve the short-term convective-scale prediction of highimpact weather disasters; they found unique values of lightning observations that can pinpoint the locations of strong radar echoes. In particular, U-Net is a good architecture for reconstructing radar reflectivity information from satellite data and has considerable application potential in the United States, which is within the GOES-R coverage area [23].
In this study, we extend the U-Net method to another geostationary satellite, namely Himawari-8, for the reconstruction of radar reflectivity and we take North China as the study area. Unlike GOES-R, Himawari-8 does not acquire lightning data. Hence, this study focuses on the reconstruction and evaluation of the combined reflectivity factor (CREF) with values greater than 35 dBZ, which has extensive value for aviation applications within the coverage area of the Himawari-8 satellite. The model constructed herein can be extended to areas with no radar coverage to reconstruct convective storms.
The second section introduces the data and experimental procedures used. The third section describes the design of the model structure and loss function. The fourth section presents the model results and a case study. The fifth section presents the conclusion. Data with a high spatial resolution and high temporal resolution from the Advanced Himawari Imager (AHI) sensor carried by Himawari-8 were extracted as the input variables of the model. The AHI onboard Himawari-8 was equipped with 16 observation bands, including visible bands (central wavelengths ranging from 0.47 µm to 0.64 µm), nearinfrared bands (central wavelengths ranging from 0.86 µm to 2.3 µm), and infrared bands (central wavelengths ranging from 3.9 µm to 13.3 µm). The AHI performed a full-disk scan in 10 min (plus five subareas) and a regional scan in 2.5 min. The resolution of Band 03 (central wavelength of 0.64 µm) was 0.5 km, Band 01 (central wavelength of 0.47 µm), Band 02 (central wavelength of 0.51 µm), and Band 04 (central wavelength of 0.86 µm) was 1 km, and the resolution of Bands 05~16 (center wavelengths of 3.9 µm~13.3 µm) was 2 km.
We downloaded the full-disk data from June to August for the period 2016-2018 by registering on the Japan Aerospace Exploration Agency (JAXA) website (https://www. eorc.jaxa.jp/ptree/) accessed on 1 June 2020. As the visible light and near-infrared bands collect data only during the day, we used only the infrared bands to obtain a general day and night model. Table 1 shows the central wavelengths and physical meaning of the bands selected in this study. We choose Band 07, Band 09, and Band 13 according to Hilburn et al. [23]. The BTD between Band 16 and Band 13 (denoted as Band  can be used to detect the cloud phase state, which indicates liquid or ice particles on the cloud top [22] and can identify the developmental characteristics of both deep and shallow convective systems [24]. We also added Band 16 to represent the cloud-top height. The label used by the model was the MICAPS3.1 mosaic product CREF with a 6-min time interval and a spatial resolution of 0.01 • . We collected samples from North China (Figure 1), across which weather radar stations are widely deployed. The latitudinal and longitudinal ranges of this region are 32 • N-42 • N and 110 • E-120 • E, respectively. The grid data with equal latitudes and longitudes were arranged from north to south and west to east. CREF is a volume sweep product by projecting the maximum base reflectivity factor found in the constant elevation azimuth scanning onto cartesian grid points. CREF can be used to determine the strength structure and variation trend of convective storms. Considering summer convective weather events occur frequently in North China, convective samples from June to August were selected to form the dataset. Remote Sens. 2021, 13, x FOR PEER REVIEW 4 of 12  Figure 2 illustrates the workflow of this study. First, the spatial and temporal resolutions of the two kinds of data were matched. We sampled the CREF data on a 0.02° grid to make its spatial resolution consistent with that of the infrared data. Similarly, to ensure that the CREF data corresponded to the infrared data in time, we employed an hourly temporal resolution. In total, we obtained 6332 samples during the summer of 2016-2018.

Method
Second, approximately 20% of the samples from July and August 2018 were defined as the test set, while the remaining data were defined as the training set.
Third, we applied maximum and minimum normalization, which scales the input and output to between 0 and 1. We selected the maximum and minimum values for each channel from the sample. We assigned a CREF value of less than 0 dBZ to 0 dBZ, and the maximum normalized value was 80 dBZ.
Fourth, we investigated the effects of different inputs on the model performance. Specifically, we set up two experiments: one with a five-channel input (U-Net_full) and another with a single-channel input (U-Net_single). Only Band 13 was used in U-Net_single because it is the most important band for reconstructing radar reflectivity (Hilburn et al., 2020).
Finally, we evaluated the U-Net_full performance over the whole study period (from June to August 2016-2018) to confirm the generalizability of the model. Additionally, we compared the effects of the two models during the test period (from July to August 2018).  Figure 2 illustrates the workflow of this study. First, the spatial and temporal resolutions of the two kinds of data were matched. We sampled the CREF data on a 0.02 • grid to make its spatial resolution consistent with that of the infrared data. Similarly, to ensure that the CREF data corresponded to the infrared data in time, we employed an hourly temporal resolution. In total, we obtained 6332 samples during the summer of 2016-2018.

Choosing the CNN Architecture
U-Net has been shown to demonstrate good performance in image segmentation [25]. Thus, we selected U-Net to reconstruct CREF data to identify and segment the edges Second, approximately 20% of the samples from July and August 2018 were defined as the test set, while the remaining data were defined as the training set.
Third, we applied maximum and minimum normalization, which scales the input and output to between 0 and 1. We selected the maximum and minimum values for each channel from the sample. We assigned a CREF value of less than 0 dBZ to 0 dBZ, and the maximum normalized value was 80 dBZ.
Fourth, we investigated the effects of different inputs on the model performance. Specifically, we set up two experiments: one with a five-channel input (U-Net_full) and another with a single-channel input (U-Net_single). Only Band 13 was used in U-Net_single because it is the most important band for reconstructing radar reflectivity (Hilburn et al., 2020).
Finally, we evaluated the U-Net_full performance over the whole study period (from June to August 2016-2018) to confirm the generalizability of the model. Additionally, we compared the effects of the two models during the test period (from July to August 2018).

Choosing the CNN Architecture
U-Net has been shown to demonstrate good performance in image segmentation [25]. Thus, we selected U-Net to reconstruct CREF data to identify and segment the edges of convective systems. However, we modified the traditional U-Net architecture according to Hilburn et al. [23]. We employed one convolution operation instead of two in each convolution block to decrease the possibility of overfitting. In addition, we closed the skip connections because the high-resolution spatial information they provide is not necessarily helpful.
The U-Net architecture is illustrated in Figure 3. It consists of a contraction path (left side) and an expansion path (right side). In the contraction path, each convolution block is followed by a 2 × 2 pooling layer. Each convolution block encapsulates a batch normalization layer, a 3 × 3 convolution layer, and a rectified linear unit (ReLU) activation function in sequence. U-Net doubles the number of filters going down the encoding branch and halves the number of filters going up the decoding branch. The expansion path consists of an upsampling layer through nearest interpolation and a convolution block. In the last layer of U-Net, a 1 × 1 convolutional layer maps the tensor from sixteen channels to one channel.

Setting the Loss Function
Upon calculating the probability distribution function (PDF) of the CREF in the sample, we found a significant data imbalance in the dataset. We therefore used a weighted loss function to solve the sample imbalance problem. Considering precipitation is a lowprobability event, there are few pixels with convective radar echoes. Thus, we divided the Ioffe et al. [26] confirmed that adding a batch normalization layer to the network structure improves the training effect of the model. Regarding the other training settings, the optimizer was Adam [27]. The learning rate was initially set as 0.001 and then multiplied by 0.1 every 100 epochs. The model is trained for a total of 300 epochs.

Setting the Loss Function
Upon calculating the probability distribution function (PDF) of the CREF in the sample, we found a significant data imbalance in the dataset. We therefore used a weighted loss function to solve the sample imbalance problem. Considering precipitation is a lowprobability event, there are few pixels with convective radar echoes. Thus, we divided the CREF data into five ranges that correspond approximately to different rainfall levels ( Table 2). We set the weight (w) for the mean square error (MSE) loss function (loss) according to the five ranges of the PDF in Table 2, where n is the sample number, y i is the reconstructed CREF value, and y is the true CREF value, as shown in Functions (1) and (2). The weights of the five intervals are 1. 24, 9.20, 15.21, 62.79, and 743.84. This procedure is equivalent to data augmentation and has a better effect on reconstructing CREF values above 35 dBZ.

Model Evaluation
Classification scores were utilized to evaluate the reconstruction ability of the model for CREF above 35 dBZ, which constitute an aviation concern. The classification scores used in this study were the probability of detection (POD), false alarm rate (FAR), Critical Success Index (CSI) [28], Heidke skill score (HSS) [29], and accuracy (ACC). To identify convective storms, these scores were used to describe the consistency between the occurrence and observation of convective storms. Table 3 lists the meaning of each parameter in the classification score formulas. Events and nonevents were distinguished based on whether the output pixel was greater than a given threshold. For Functions (3)-(8), n, y i , and y have the same meanings as in Functions (1) and (2). The regression score root-mean square error (RMSE) reflects the discrepancy between the reconstructed data and the real observations, and thus reflects the overall accuracy of the reconstructed CREF data.

Model Performance of U-Net_Full
The RMSEs in the training set and test set is 9.442 dBZ and 9.357 dBZ, respectively, which are on the same order of magnitude as that (5.29 dBZ) in a previous study [23]. One reason that their value was lower than those found herein is that their radar reflectivity was between 0 and 60 dBZ, while the CREF used in this study is between 0 and 80 dBZ. The other reason is that Himawari-8 does not acquire lightning data, unlike GOES-R. As shown in Figure 4a, the median RMSE for the training set is 9.336 dBZ and that for the test set is 8.953 dBZ. U-Net_full is not overfitted and exhibits a good generalization effect on the test set. With increasing convection coverage, the RMSE of the sample also increases (Figure 4b). The sample coverage of values exceeding 35 dBZ is less than 15% and is relatively concentrated in less than 5%. We can see that by using satellite multiband radiation data, the U-Net-based radar reflectivity reconstruction algorithm can achieve good results.   Figure 5 shows the performance of the two experiments in terms of RMSE, POD, FAR, CSI, HSS, and ACC. For both RMSE and FAR, the higher the value, the worse the effect; in contrast, for POD, CSI, HSS, and ACC, the higher the value, the better the effect. Regarding the overall error, the RMSE of U-Net_full is smaller than that of U-Net_single. The POD35 and FAR35 results indicate that the hit rate and false alarm rate of U-Net_full  Figure 5 shows the performance of the two experiments in terms of RMSE, POD, FAR, CSI, HSS, and ACC. For both RMSE and FAR, the higher the value, the worse the effect; in contrast, for POD, CSI, HSS, and ACC, the higher the value, the better the effect. Regarding the overall error, the RMSE of U-Net_full is smaller than that of U-Net_single. The POD35 and FAR35 results indicate that the hit rate and false alarm rate of U-Net_full are better than those of U-Net_single. Moreover, due to the increase in POD35 and the decrease in FAR35, the CSI35 of U-Net_full is superior to that of U-Net_single and thus the HSS35 of U-Net_full is also superior to that of U-Net_single. The ACC35 values of the two models are both above 0.97, but that of U-Net_full is slightly better than that of U-Net_single. In conclusion, the comprehensive performance of U-Net_full at a threshold of 35 dBZ with respect to the adopted classification scores and regression scores is better than that of U-Net_single.  Figure 5 shows the performance of the two experiments in terms of RMSE, POD, FAR, CSI, HSS, and ACC. For both RMSE and FAR, the higher the value, the worse the effect; in contrast, for POD, CSI, HSS, and ACC, the higher the value, the better the effect. Regarding the overall error, the RMSE of U-Net_full is smaller than that of U-Net_single. The POD35 and FAR35 results indicate that the hit rate and false alarm rate of U-Net_full are better than those of U-Net_single. Moreover, due to the increase in POD35 and the decrease in FAR35, the CSI35 of U-Net_full is superior to that of U-Net_single and thus the HSS35 of U-Net_full is also superior to that of U-Net_single. The ACC35 values of the two models are both above 0.97, but that of U-Net_full is slightly better than that of U-Net_single. In conclusion, the comprehensive performance of U-Net_full at a threshold of 35 dBZ with respect to the adopted classification scores and regression scores is better than that of U-Net_single.  The performance of the experiment is shown in more detail in Figure 6, in which the model performs better at a given point the closer the point is to the upper right corner. In the range of 20 dBZ~50 dBZ, the performance of U-Net_full is obviously better than that of U-Net_single. We also find that above 20 dBZ, the score worsens, which is related to a decrease in the number of high-value pixels within the sample. In summary, U-Net_full demonstrates superior abilities over U-Net_single to identify the occurrence of convection (greater than 35 dBZ) and reconstruct the overall accuracy of the convection. Hence, a U-Net architecture with multichannel input is more helpful for accurately estimating radar reflectivity.

The Performance of Different Inputs in the CREF Reconstruction
demonstrates superior abilities over U-Net_single to identify the occurrence of convec (greater than 35 dBZ) and reconstruct the overall accuracy of the convection. Hence, Net architecture with multichannel input is more helpful for accurately estimating ra reflectivity.

Case Study
To vividly and intuitively illustrate the model effect, we choose the test set as an ample for an in-depth analysis. As U-Net_full achieves the best effect, we use this m for a case study.
During 15-17 July 2018, heavy rain fell over North China. Affected by the pulsa of the external low-level jets of the subtropical high pressure and the cold air in the no rainfall reaching 25-100 mm occurred extensively across North China, with the precip tion exceeding 100 mm in some areas. The rainfall process was characterized by a l duration with obvious showers, strong local rainfall, and large precipitation [30].
At 7:00, as shown in Figure 7a-c, several convective cells are arranged horizon to form a northeast-southwest squall line. The model reconstructs the location, shape strength of the squall line well. Then, from 8:00 to 9:00, the model results show that system gradually strengthens and expands as it moves eastward, consistent with the servation results. This is in accordance with the scoring performance in Table 4. Ban in the left column shows that the BT value at the center of the convection is below app imately 210 K (marked with color).
The model correctly captures the location of the convective system center, altho it tends to overestimate in magnitude. This study is more concerned with the occurr

Case Study
To vividly and intuitively illustrate the model effect, we choose the test set as an example for an in-depth analysis. As U-Net_full achieves the best effect, we use this model for a case study.
During 15-17 July 2018, heavy rain fell over North China. Affected by the pulsation of the external low-level jets of the subtropical high pressure and the cold air in the north, rainfall reaching 25-100 mm occurred extensively across North China, with the precipitation exceeding 100 mm in some areas. The rainfall process was characterized by a long duration with obvious showers, strong local rainfall, and large precipitation [30].
At 7:00, as shown in Figure 7a-c, several convective cells are arranged horizontally to form a northeast-southwest squall line. The model reconstructs the location, shape and strength of the squall line well. Then, from 8:00 to 9:00, the model results show that the system gradually strengthens and expands as it moves eastward, consistent with the observation results. This is in accordance with the scoring performance in Table 4. Band 13 in the left column shows that the BT value at the center of the convection is below approximately 210 K (marked with color).

Conclusions
In this paper, we trained an image-to-image U-Net for North China that can transform observation data from multiple satellite bands of Himawari-8 into CREF data. A weighted loss function is used to weaken the unbalanced distribution of CREF data acquired during convective weather. In comparative experiments, we show that U-Net with multichannel inputs exhibits better performance than U-Net with a single channel. In the case of severe convective weather in North China, U-Net can reconstruct CREF data well to identify the location, shape, and strength of convective systems. As discussed above, The model correctly captures the location of the convective system center, although it tends to overestimate in magnitude. This study is more concerned with the occurrence of values greater than 35 dBZ based on aviation application. Therefore, one reason for the result to be overestimated is that when we designed the loss function, points greater than 35 dBZ in the sample are assigned a large weight value. In addition, this is also related to the large convection coverage of the case we selected. Figure 4b shows that the larger the sample coverage of values exceeding 35 dBZ, the higher the RMSE value. As can be seen from Figure 4a, the PDF corresponding to the RMSE value in this case is relatively small and the RMSEs of most samples are less than 12 dBZ. In conclusion, the model performs well, especially in locating the convective storm center.

Conclusions
In this paper, we trained an image-to-image U-Net for North China that can transform observation data from multiple satellite bands of Himawari-8 into CREF data. A weighted loss function is used to weaken the unbalanced distribution of CREF data acquired during convective weather. In comparative experiments, we show that U-Net with multichannel inputs exhibits better performance than U-Net with a single channel. In the case of severe convective weather in North China, U-Net can reconstruct CREF data well to identify the location, shape, and strength of convective systems. As discussed above, the RMSE is of the same order of magnitude as that in previous studies, our method can be more easily used to monitor convective storms in real-time during flights over East Asia.
Instead of using U-Net to classify the occurrence of severe convection directly, we regress the radar reflectivity. This approach is taken for two reasons. First, the center of the convective system is visible, making it more user-friendly. Second, the reconstruction results can be used for the nowcasting of severe convective storms. This paper is a preliminary attempt to reconstruct CREF data by utilizing DL and Himawari-8 observations in China. The model may be more generalizable when geographical factors are taken into account. To extend the application of this method to the open sea, due to the differences between the underlying surfaces of the ocean and land, we can acquire samples offshore and apply the model to the open sea to ensure aviation safety.
Our plan for subsequent research is to extend the application of the proposed model to areas lacking deployed radar equipment. Considering these areas without radar have no observations to verify, the application of the model in such ocean areas will require the uncertainty to be assessed using offshore areas with radar.