Multiscale Representation of Radar Echo Data Retrieved through Deep Learning from Numerical Model Simulations and Satellite Images

: Radar reﬂectivity data snapshot ﬁne-grained atmospheric variations that cannot be represented well by numerical weather prediction models or satellites, which poses a limit for nowcasts based on model–data fusion techniques. Here, we reveal a multiscale representation (MSR) of the atmosphere by reconstructing the radar echoes from the Weather Research and Forecasting (WRF) model simulations and the Himawari-8 satellite products using U-Net deep networks. Our reconstructions generated the echoes well in terms of patterns, locations, and intensities with a root mean square error (RMSE) of 5.38 dBZ. We ﬁnd stratiﬁed features in this MSR, with small-scale patterns such as echo intensities sensitive to the WRF-simulated dynamic and thermodynamic variables and with larger-scale information about shapes and locations mainly captured from satellite images. Such MSRs with physical interpretations may inspire innovative model–data fusion methods that could overcome the conventional limits of nowcasting.


Introduction
Meteorological forecasts at the convective scale are crucial to mitigate environmental hazards such as storms and floods that cause huge socioeconomic damages, but face fundamental challenges in representing the convective weather regime in numerical weather prediction (NWP) models, which is a less-known "gray zone" compared to the relatively well-resolved synoptic-scale systems [1,2]. Radar is an invaluable instrument to scan the convective atmosphere in nearly real time. The extrapolations of these sequential radar echo data can provide state-of-the-art nowcasts of precipitation patterns and severe weather events within a few hours based on the persistence principle [3][4][5]. However, these radar echo data are snapshots of the complex atmosphere with fine-grained details but not readily related to the dynamics of the atmosphere and the information from remote sensing satellites. This lack of representation of the dynamical and global atmospheric information Remote Sens. 2023, 15, 3466 2 of 15 poses a limit for convective nowcasting. For instance, when combining extrapolation-based methods with the dynamical information from NWP model simulations [6,7], the forecast skill in general decreases rapidly in the first forecast hour and remains low beyond a few forecast hours [8,9], despite advances in convective-permitting modeling [10], radar and satellite data assimilation [9,11,12] and high-resolution observations [13].
The representation gap, particularly when model-data fusion is to be conducted, is difficult to bridge [14,15]. Representing processes related to turbulence, convection, and topography is challenging for numerical models in gray zones [16]. Parameterizing and resolving the atmospheric motions in grid spacing for deep convection (1-10 km) and turbulence (0.1-1 km) requires ponderation and in-depth explorations [17][18][19]. The precipitation and storms observed using radar and satellites are among the most difficult to simulate, which is a consequence of the intertwined consecutive physical processes of NWPs with multiplicative error propagations [20]. Convective-scale NWPs also have fundamental theoretical challenges such as the mathematical characteristics of the underlying partial differential equations as well as the predictability and probability issues related to the nonlinear dynamics of the convective systems with effective dimensions much higher than those of the balanced synoptic systems [1]. It has long been recognized that the convective atmospheric motions are of multiscale interactions [19,[21][22][23]. Currently, the first-principle multiscale formulation is beyond the traditional NWP modeling paradigm. In addition, a representation gap exists between satellite images and radar data. Geostationary satellites observe cloud evolution from a global perspective [24], but they are limited to detecting cloud tops and hardly probe the internal structure of clouds.
Deep learning (DL) has recently emerged as a general data-driven technology to represent the spatiotemporal features that cross multiple scales and are not captured well by geophysical models [25]. DL techniques can explore the rich patterns in radar data with deep networks of neurons and improve the precipitation nowcasting skill [26]. Numerous DL applications for the convective atmosphere have been proposed, ranging from radar-or satellite-based nowcasting [27][28][29][30] to reconstructions of radar data from satellites [31][32][33]. However, there are few applications aiming at bridging the representation gap for the fusion between radar data, satellite images, and NWP simulations. Accordingly, it remains largely unexplored how the deep networks represent the convective atmosphere. In addition, these deep networks are usually considered black boxes with limited physical interpretations. Here, we attempt to retrieve the deep network representations by reconstructing the radar reflectivity data from NWP simulations and satellite observations and then probe the structure of the obtained representations by diagnosing their relations with physical quantities such as NWP variables and satellite images. This attempt aims to reveal the potential of data-driven DL models to bridge the representation gaps between multiscale multi-source data. Hopefully, this potential of multiscale representation with investigations of physical interpretations could make the DL models more transparent and inspire innovative model-data fusion methods that could overcome the conventional limits of nowcasting.

Data and Methods
The study area is the Beijing-Tianjin-Hebei (BTH) region ( , which is vulnerable to floods caused by heavy summer precipitation. The study period is from June to September for the years 2015 and 2016, when precipitation is more frequent over this region.

Radar Echo Data
Radar reflectivity data are collected from six Doppler radars that sufficiently cover the BTH region from the Chinese new generation weather radar network (ChIna New generation doppler weather RADar, CINRAD), with four radars of type CINRAD/SA located in Beijing, Tanggu, Shijiazhuang, Qinhuangdao, and two radars of type CINRAD/CB located in Zhangjiakou and Chengde, respectively ( Figure S1a). The SA and CB Doppler radars are S-band radars and C-band radars, respectively. The collected radar echo data are interpolated from plan position indicators (PPIs) to constant altitude PPIs (CAPPIs) with a vertical linear interpolation method [34,35]. Then, the radar data in the polar coordinate are mapped onto 0.01 • × 0.01 • Cartesian grids using the nearest neighbor interpolation method. After that, the data from different radars around the same time (e.g., within 3 min) are combined, and maximum values are preserved for overlapped grid cells. The resulting radar echo data have a temporal resolution of 6 min. The combined radar data are then smoothed and filtered using a convolution threshold method [36,37]. We calculate the maximum radar data based on CAPPIs at 1500 m, 2000 m, 2500 m, 3000 m, and 3500 m above sea level, a typical range of elevations around the level of free convection where convection develops actively.

Numerical Model Simulations
The Weather Research and Forecasting (WRF) model [38] is used as a convectionpermitting modeling system over the BTH region with three nested domains at horizontal resolutions of 9, 3, and 1 km, respectively ( Figure S1b). By switching off the cumulus parameterization of the inner two nested domains, convections are explicitly resolved in this setting. Detailed configuration is listed in Table 1. We run a 36 h simulation at a temporal resolution of 30 min, beginning at 12:00 UTC each day with the first 12 h of simulations as spin-ups. The remaining 24 h of simulations of the innermost domain are used to provide the meteorological input for deep networks. The initial and boundary conditions for the simulations are provided by the NCAR/NCEP 1 • × 1 • reanalysis data. We compare the simulations with weather station observations (Table S1) and verify that the performance on most selected meteorological factors is close to those in other stateof-the-art WRF studies [39,40]. We select 14 daily simulated variables commonly used in convective nowcasting from three categories (i.e., dynamic variables, thermodynamic variables, and moisture-related variables) to build the dataset for learning, such as the three components of wind velocity (U, V, W), K index (K), water vapor mixing ratio (WVMR), and relative humidity (RH). Five of the fourteen variables are three-dimensional and extracted from the pressure levels of 850 hPa, 700 hPa, and 500 hPa, generally around the elevations of the radar echo data. The other nine variables are two-dimensional. Therefore, the input data from WRF simulations have 24 channels (5 × 3 + 9 = 24). Detailed information about all the selected variables can be found in Table S2. These variables are mapped onto the same grid of radar data using a linear interpolation method.

Geostationary Satellite Images
Five infrared bands (5th, 8th, 13th, 15th, and 16th) of data are collected from the Himawari-8 geostationary satellite products. These satellite images can provide global information on cloud properties such as phases and heights ( Table 2) with high spatiotemporal resolution (2 km and 10 min) [48]. We also extract the deep convective cloud classification (CCC) data from the Himawari-8 cloud type products by assigning 1 to the grids of the deep convective cloud and 0 to the grids of other cloud types. Therefore, we obtain 6-channel input data from the Himawari-8 satellite products. All the satellite products are remapped in a way similar to the mapping of the WRF variables. Note. Himawari-8 satellite images listed above were supplied by the P-Tree System, Japan Aerospace Exploration Agency (JAXA).

Data Preprocessing
We have obtained 30-channel input data from the WRF simulations and the Himawari-8 satellite products, and 1-channel labels from the radar echo data on the common grid of 0.01 • × 0.01 • over the BTH region (i.e., 700 × 700 horizontally). We first match the input data with labels for the same time and form a dataset of 2647 samples. We then use the min-max normalization to scale each channel of data in the dataset to be in the range of [0, 1], where the maximum value for the radar echo data is set at 70 dBZ, so that the effect of outliers can be suppressed. Moreover, we fill in missing or invalid values with 0 for the normalized dataset.

Deep Network Model
We adopt a U-Net for the representation learning of the radar echo data ( Figure 1). The U-Net deep network is a convolutional neural network (CNN) variant originating from biomedical image segmentation [49] and is here repurposed for a regression task as in many previous studies [31,33,50,51]. It preserves the hierarchical convolutional structure of a CNN in its left contracting path, and uses upsampling operations in successive layers to form a right expansive path. Consequently, the network has a 'U' shape, hence its name. The U-Net is an encoder-decoder network architecture that allows the end-to-end learning of multiscale features and outputs with desired dimensions (i.e., 700 × 700 in this study). In general, early layers in the contrasting path learn small-scale features such as textures and edges, whereas deep layers learn large-scale features such as semantic information. The U-Net is equipped with so-called skip connections that perform identity mappings of low-level features from the contrasting path (encoder) to the expansive path (decoder) at corresponding levels. The U-Net combines large-scale information with small-scale information brought by the skip connections for reconstructing the data from the learnt multiscale features. Such a network architecture and reconstruction process are appealing for our study on how radar data are represented by deep networks.
Concretely, the U-Net depicted in Figure 1 has eight blocks (Block-As in gray) in the encoder and eight blocks (Block-Bs in blue) in the decoder, followed by an individual convolutional block (Block-C in orange). Each Block-A consists of a convolutional layer followed by a batch normalization layer [52] and a LeakyReLU activation layer. The 1st, 2nd, 4th, 6th, and 7th Block-As convolve the data with 4 × 4 convolution filters with 2 × 2 strides to reduce resolutions, enabling the subsequent layers to detect patterns in expanded areas. The 3rd Block-A employs 3 × 3 convolution filters with 2 × 2 strides to produce the output of a certain dimension (i.e., 88 × 88 horizontally). The remaining Block-As contain 3 × 3 convolution filters with 1 × 1 strides. All convolutional operations are carried out with a zero padding of size 1. The respective numbers of convolution filters in the eight Block-As are 36, 72, 144, 288, 288, 288, 288, and 288. Each Block-B in the decoder section consists of a bilinear upsampling layer, a 3 × 3 convolutional layer, a batch normalization layer, and a LeakyReLU activation layer, except for the 8th Block-B, which does not have the LeakyReLU activation layer. The respective numbers of convolution filters in eight Block-Bs are 288, 288, 288, 288, 144, 72, 36, and 1. Finally, an individual Block-C is added, which is composed of a 1 × 1 convolutional layer, a 3 × 3 convolutional Remote Sens. 2023, 15, 3466 5 of 15 layer, a 1 × 1 convolutional layer, and a ReLU-6 activation layer, which is considered to facilitate the learning of sparse features [53]. Therefore, the 30-channel input data are mapped into one-channel radar echo data reconstructions.
operations are carried out with a zero padding of size 1. The respective numbers of convolution filters in the eight Block-As are 36, 72, 144, 288, 288, 288, 288, and 288. Each Block-B in the decoder section consists of a bilinear upsampling layer, a 3 × 3 convolutional layer, a batch normalization layer, and a LeakyReLU activation layer, except for the 8th Block-B, which does not have the LeakyReLU activation layer. The respective numbers of convolution filters in eight Block-Bs are 288, 288, 288, 288, 144, 72, 36, and 1. Finally, an individual Block-C is added, which is composed of a 1 × 1 convolutional layer, a 3 × 3 convolutional layer, a 1 × 1 convolutional layer, and a ReLU-6 activation layer, which is considered to facilitate the learning of sparse features [53]. Therefore, the 30-channel input data are mapped into one-channel radar echo data reconstructions. Figure 1. The U-Net model architecture.

Training
Since we do not optimize the hyperparameters of the U-Net, we divide the dataset (2647 samples) in time order only into a training set (the first ~90%, 2387 samples) and a test set (the remaining 260 samples). We employ two types of loss functions for training (that is, the objective function penalizing the discrepancy between the radar echo observations and the reconstructions generated by the U-Net). The first type is the mean square error (MSE), and we denote the resulting network of this type as UNet-MSE. The second type of loss function is the echo-weighted mean square error (EWMSE), with larger weights assigned to grid cells of higher echo intensity [28]. The resulting network of this type is denoted as UNet-EW. The calculations of the MSE and the EWMSE can be found in Text S1. The U-Net models are trained using the stochastic gradient descent (SGD) with the momentum algorithm [54] (momentum = 0.9) with batch size 8. The initial learning rate is set as 1 × 10 −5 . The U-Net model is trained until the loss function shows no reduction on the test set for 100 subsequent epochs. We stop the trainings of the UNet-MSE and the UNet-EW at the 96th and the 92nd epoch, respectively. Since we obtain satisfactory trained models and the reconstruction accuracy is not our ultimate goal, we do not use data augmentation for potential improvement.

Evaluations and Interpretations
The performance of the U-Net reconstructions is evaluated by five indices, namely the root mean square error (RMSE), the mean error (ME), the critical success index (CSI), the probability of detection (POD), and the false alarm rate (FAR). RMSE and ME account for the difference between the reconstructed and observed echo data. CSI and POD measure the precision of reconstructions, whereas FAR measures the degree of overestimation.

Training
Since we do not optimize the hyperparameters of the U-Net, we divide the dataset (2647 samples) in time order only into a training set (the first~90%, 2387 samples) and a test set (the remaining 260 samples). We employ two types of loss functions for training (that is, the objective function penalizing the discrepancy between the radar echo observations and the reconstructions generated by the U-Net). The first type is the mean square error (MSE), and we denote the resulting network of this type as UNet-MSE. The second type of loss function is the echo-weighted mean square error (EWMSE), with larger weights assigned to grid cells of higher echo intensity [28]. The resulting network of this type is denoted as UNet-EW. The calculations of the MSE and the EWMSE can be found in Text S1. The U-Net models are trained using the stochastic gradient descent (SGD) with the momentum algorithm [54] (momentum = 0.9) with batch size 8. The initial learning rate is set as 1 × 10 −5 . The U-Net model is trained until the loss function shows no reduction on the test set for 100 subsequent epochs. We stop the trainings of the UNet-MSE and the UNet-EW at the 96th and the 92nd epoch, respectively. Since we obtain satisfactory trained models and the reconstruction accuracy is not our ultimate goal, we do not use data augmentation for potential improvement.

Evaluations and Interpretations
The performance of the U-Net reconstructions is evaluated by five indices, namely the root mean square error (RMSE), the mean error (ME), the critical success index (CSI), the probability of detection (POD), and the false alarm rate (FAR). RMSE and ME account for the difference between the reconstructed and observed echo data. CSI and POD measure the precision of reconstructions, whereas FAR measures the degree of overestimation. All indices range from 0% to 100%. Ideal CSI and POD values are supposed to approach 100%, whereas ideal FAR values are the opposite. Moreover, structural similarity (SSIM) is calculated in the following analysis, which quantifies the similarity between the visible structures of two images. SSIM is a value between −1 (perfect anti-correlation) and +1 (perfect similarity) and a value of 0 indicates no similarity. The calculations of these indices are detailed in Text S2.
To investigate the physical interpretations of the learnt deep network models, we propose a sensitivity analysis method inspired by Ankenbrand et al. [55]. We intentionally Remote Sens. 2023, 15, 3466 6 of 15 perturb the input data and then diagnose the relationships between the multiscale features and the input physical quantities by checking the consequences of perturbations on the reconstructions. Two types of perturbations are considered. First, we flip the input data from left to right and scale them by multiplying a coefficient. We denote this sensitivity analysis experiment as SA-a. Second, we evaluate the role of each input quantity by nullifying it while keeping other input quantities unchanged. This sensitivity analysis experiment is denoted as SA-b. Figure 2 shows the performance of reflectivity data reconstructions with the deep networks. The UNet-MSE and UNet-EW networks reconstruct the echoes well, with RMSEs of 4.76 and 5.38 dBZ, respectively. Because most radar echoes are of low intensity, training with unweighted MSE loss functions will bias towards echoes of low intensity. The UNet-MSE networks systematically underestimate the echoes, especially those with high intensity. However, echoes of higher intensity are the most valuable radar signals that directly observe the precipitation and convective processes. Adding more weights on intense echoes for training is beneficial for reconstructing richer sparse high intensity patterns. The reconstructions in this case thus either overestimate or underestimate the intense echoes, which explains the slight increases in the RMSE and FAR values of the UNet-EW networks over the UNet-MSE networks. The richer patterns reconstructed by the UNet-EW networks have better CSI and POD scores, and we will further diagnose their network structures.

Echo Reconstructions
is calculated in the following analysis, which quantifies the similarity between the visible structures of two images. SSIM is a value between −1 (perfect anti-correlation) and +1 (perfect similarity) and a value of 0 indicates no similarity. The calculations of these indices are detailed in Text S2.
To investigate the physical interpretations of the learnt deep network models, we propose a sensitivity analysis method inspired by Ankenbrand et al. [55]. We intentionally perturb the input data and then diagnose the relationships between the multiscale features and the input physical quantities by checking the consequences of perturbations on the reconstructions. Two types of perturbations are considered. First, we flip the input data from left to right and scale them by multiplying a coefficient. We denote this sensitivity analysis experiment as SA-a. Second, we evaluate the role of each input quantity by nullifying it while keeping other input quantities unchanged. This sensitivity analysis experiment is denoted as SA-b. Figure 2 shows the performance of reflectivity data reconstructions with the deep networks. The UNet-MSE and UNet-EW networks reconstruct the echoes well, with RMSEs of 4.76 and 5.38 dBZ, respectively. Because most radar echoes are of low intensity, training with unweighted MSE loss functions will bias towards echoes of low intensity. The UNet-MSE networks systematically underestimate the echoes, especially those with high intensity. However, echoes of higher intensity are the most valuable radar signals that directly observe the precipitation and convective processes. Adding more weights on intense echoes for training is beneficial for reconstructing richer sparse high intensity patterns. The reconstructions in this case thus either overestimate or underestimate the intense echoes, which explains the slight increases in the RMSE and FAR values of the UNet-EW networks over the UNet-MSE networks. The richer pa erns reconstructed by the UNet-EW networks have be er CSI and POD scores, and we will further diagnose their network structures.    (Table S1), so they can provide dynamic and thermodynamic information for echo reconstruction. Note that we do not include the WRF-simulated reflectivity data in the input data of the U-Net deep networks. Based on the multi-source information, the reconstructions by deep networks well produce echo patterns, locations, and intensities (Figure 3c,d). Compared with the UNet-MSE network, the UNet-EW network (Figure 3d) exhibits more precise details such as clearer edges and high values. Data-driven deep learning techniques effectively reduce the gap of representation from the NWP simulations and satellite images to the radar echoes and tend to encode the missing physics in their network weights. Other cases of reconstructions show similar performances (Figures S3-S9).

Echo Reconstructions
overall shape and location of echoes (Figure 3e). By contrast, the WRF-simulated reflectivity data (Figure 3b) miss a majority of the observed echo distribution. Indeed, the precipitation and convective processes are among the most difficult to simulate with NWPs. Nevertheless, our simulations are qualified to represent the general atmospheric conditions (Table S1), so they can provide dynamic and thermodynamic information for echo reconstruction. Note that we do not include the WRF-simulated reflectivity data in the input data of the U-Net deep networks. Based on the multi-source information, the reconstructions by deep networks well produce echo pa erns, locations, and intensities ( Figure  3c,d). Compared with the UNet-MSE network, the UNet-EW network (Figure 3d) exhibits more precise details such as clearer edges and high values. Data-driven deep learning techniques effectively reduce the gap of representation from the NWP simulations and satellite images to the radar echoes and tend to encode the missing physics in their network weights. Other cases of reconstructions show similar performances (Figures S3-S9). The WRF-simulated reflectivity data calculated according to [56].

Multiscale Representation
The multiple layers of features encoded in the contracting path of U-Net are visualized in a naïve way [57]. In this way, a multiscale representation (MSR) of the radar data can be revealed and analyzed. Figure 4 exemplifies such an MSR for the reconstruction case in Figure 3. The multiscale features of the radar echo data appear to be automatically

Multiscale Representation
The multiple layers of features encoded in the contracting path of U-Net are visualized in a naïve way [57]. In this way, a multiscale representation (MSR) of the radar data can be revealed and analyzed. Figure 4 exemplifies such an MSR for the reconstruction case in Figure 3. The multiscale features of the radar echo data appear to be automatically stratified in the multi-layer hierarchical structure of the U-Net, and are especially manifested in the deeper part, indicating the greater capacity of deep networks than shallow networks to represent multiscale high-dimensional relationships among multi-source data (e.g., [58,59]). Apart from the first layer with mostly texture-like information, the locations of echo signals (upper left triangles in Figure 4) are recognized in all subsequent layers and largely correspond to the cloud locations from satellite images. The location-aware ability of U-Net is one of the key factors for its success in image segmentation. The small-scale features in the shallow layers (first-third) still demonstrate no clear echo patterns. These small-scale features such as the echo intensities (hexagons in Figure 4) as well as larger-scale features networks to represent multiscale high-dimensional relationships among multi-source data (e.g., [58,59]). Apart from the first layer with mostly texture-like information, the locations of echo signals (upper left triangles in Figure 4) are recognized in all subsequent layers and largely correspond to the cloud locations from satellite images. The locationaware ability of U-Net is one of the key factors for its success in image segmentation. The small-scale features in the shallow layers (first-third) still demonstrate no clear echo patterns. These small-scale features such as the echo intensities (hexagons in Figure 4) as well as larger-scale features like the echo shapes (ellipses in Figure 4) emerge in the middle layers (fourth-sixth). The shapes are more visibly related to the satellite images, whereas the sources of intensities need further investigation (see Section 3.3). The features in the deepest layers (seventh and eighth) are rather global descriptions of echoes and may have some semantic meanings. MSRs for other reconstruction cases manifest similar stratifications (see Figures S10-S16 for more examples).

Physical Interpretations of the MSR
We further investigate the physical interpretations of the MSR through sensitivity analysis. The SA-a experiments assess the overall contributions of the WRF simulations and satellite images to reconstructions by flipping or a enuating these different sets of input data. Figure 5 shows the resulting diverse reconstructions, where the CSI and SSIM of reconstructions of this case and over the test set are calculated. When all input data are flipped, the reconstruction flips accordingly (Figure 5a), but does not yield a perfect flipping of the original reconstruction (Figure 5i). This indicates that factors other than the input data (e.g., the topographic conditions) may also play a role in the spatial distribution

Physical Interpretations of the MSR
We further investigate the physical interpretations of the MSR through sensitivity analysis. The SA-a experiments assess the overall contributions of the WRF simulations and satellite images to reconstructions by flipping or attenuating these different sets of input data. Figure 5 shows the resulting diverse reconstructions, where the CSI and SSIM of reconstructions of this case and over the test set are calculated. When all input data are flipped, the reconstruction flips accordingly (Figure 5a), but does not yield a perfect flipping of the original reconstruction (Figure 5i). This indicates that factors other than the input data (e.g., the topographic conditions) may also play a role in the spatial distribution of the reconstructions. The echo shapes and locations are much more influenced by the satellite images (Figure 5b, SSIM = −0.07) than by the WRF simulations (Figure 5c,g,h with higher SSIM). Moreover, this influence is mainly related to the spatial distribution of the satellite images rather than changes in magnitude (Figure 5e,f). Concerning the echo intensities, the representation is much more sensitive to the WRF simulations (Figure 5g,h, especially in Figure 5h CSI = 0.26 and CSI-Test = 0.17) than to the satellite images (Figure 5e,f). This sensitivity appears to depend more on the value than on the spatial distribution of the simulations (Figure 5c). Other cases of reconstructions have similar interpretations (Figures S17-S23).
satellite images rather than changes in magnitude (Figure 5e,f). Concerning the echo intensities, the representation is much more sensitive to the WRF simulations (Figure 5g,h, especially in Figure 5h CSI = 0.26 and CSI-Test = 0.17) than to the satellite images ( Figure  5e,f). This sensitivity appears to depend more on the value than on the spatial distribution of the simulations (Figure 5c). Other cases of reconstructions have similar interpretations (Figures S17-S23).    Figure 6 shows the reconstructions with CSI for the SA-b experiment (see Figures  S24-S30 for more examples). The MSR here is predominantly sensitive to the satellite images from the 8th and 13th bands of Himawari-8 as well as the WRF-simulated W, K, and RH. These bands of satellite images provide information on middle and upper tropospheric humidity and cloud-top properties, respectively. The three dominant WRF variables describe the vertical motion of the air, the atmospheric instability, and the water vapor content in the atmosphere. They correspond well with the three key ingredients for deep convection initiation and evolution, namely lift, instability, and moisture [60]. Therefore, for the current study, the MSR may encapsulate these complex atmospheric physics in their learnt multiscale features. ospheric humidity and cloud-top properties, respectively. The three dominant WRF variables describe the vertical motion of the air, the atmospheric instability, and the water vapor content in the atmosphere. They correspond well with the three key ingredients for deep convection initiation and evolution, namely lift, instability, and moisture [60]. Therefore, for the current study, the MSR may encapsulate these complex atmospheric physics in their learnt multiscale features.

Summary and Discussions
We have addressed the difficulty of model-data fusion for convective nowcasting with a representation problem. Deep learning techniques were applied to represent the fine-grained radar reflectivity data by reconstructing them from the WRF model simulations and the Himawari-8 satellite products. We learnt a multiscale representation (MSR) of the radar data using the U-Net deep networks. The MSR manifested a stratification, and the richest features were in the middle layers. Sensitivity analyses on the retrieved representation showed that small-scale pa erns like echo intensities were more sensitive to the magnitude of numerical model simulations, whereas larger-scale information about the shapes and locations was mainly from the spatial distribution of satellite images.

Summary and Discussions
We have addressed the difficulty of model-data fusion for convective nowcasting with a representation problem. Deep learning techniques were applied to represent the finegrained radar reflectivity data by reconstructing them from the WRF model simulations and the Himawari-8 satellite products. We learnt a multiscale representation (MSR) of the radar data using the U-Net deep networks. The MSR manifested a stratification, and the richest features were in the middle layers. Sensitivity analyses on the retrieved representation showed that small-scale patterns like echo intensities were more sensitive to the magnitude of numerical model simulations, whereas larger-scale information about the shapes and locations was mainly from the spatial distribution of satellite images.
The retrieved multiscale representation takes advantage of the ability of deep learning techniques (Bengio et al., 2013) to find complex relationships in data, which are otherwise difficult to model or formulate in traditional approaches, especially when the underlying physics is complex and multiscale or even unknown. The deep network representation can organize the learnt features at increasing levels of abstraction from local fine-grained details to global semantic information [61][62][63]. Such multiscale representation could inspire innovative methods that make use of the features in the mapping from numerical model simulations to radar data as well as in convective nowcasting, where machine learning has been demonstrated to be a useful tool [64]. Note that multiscale representations can also be obtained using traditional methods such as wavelets in a more compact manner [63,65]. However, in general, these traditional methods require domain expertise for feature extraction and should be combined with deep learning techniques for automatic end-to-end feature extractions [66,67].
This study has a number of limitations. First, although radar data provide routine verifications for convection-permitting models [10,68], they are not perfect and can suffer from various errors [69,70]. Hence, the retrieved multiscale representation may occasionally fit noise rather than signals. Second, our WRF simulations were configured at a convectionpermitting resolution, but we did not tailor and optimize the WRF configurations and examine in-depth model-radar comparisons (e.g., [68]), which is not the main objective of this representation study. Third, we did not test other deep learning techniques such as CNN variants other than U-Nets or generative deep networks. These deep networks, once successfully trained, are known to have similar performances [71]. Finally, convective processes can vary by regions, so the learnt multiscale representation needs evaluation across regions, such as considering the role of terrain during training or applying the obtained representation to different regions. Hence, there is still room for qualitative and quantitative improvement in the accuracy of the reconstructed echo reflectivity. Despite these limitations, our finding on the functioning and potential of the deep network representation for convective atmosphere should not be affected.
For now, the proposed representation of radar echo data may not meet the requirements of practical convective nowcasting. We consider our attempt as a step towards a deep network framework of multiscale representation with physical interpretations that relate the features in hidden network layers with the convective atmosphere. Our sensitivity analyses for physical interpretations have been experimental. It is possible to apply more formal methods-notably explainable artificial intelligence techniques [61,[72][73][74]-to analyze the cause-and-effect relationships among radar data, satellite images, and convective dynamics. The multiscale features with physical interpretations are seldom investigated in radar data assimilation practices [9,12]. Data assimilation based on multiscale features should merge with artificial intelligence techniques [75,76] and should be inevitably "deep" in some sense, either in models [77,78], in data (this study), or in assimilation algorithms [79]. Such deep assimilating techniques may be essential to overcome the conventional limits of convective nowcasting.
Supplementary Materials: The supporting information can be downloaded at: https://www.mdpi. com/article/10.3390/rs15143466/s1, Figure S1: (a): The selected six radar stations in the Beijing-Tianjin-Hebei region. (b): WRF simulation area with three nested domains; Figure S2: Visualizations of input data of the reconstruction sample at 09:30 UTC on 10 September 2016; Figure S3: A case of reconstruction at 00:30 UTC on 4 September 2016 from the test set; Figure S4: A case of reconstruction at 05:30 UTC on 4 September 2016 from the test set; Figure S5: A case of reconstruction at 08:30 UTC on 4 September 2016 from the test set; Figure S6: A case of reconstruction at 08:00 UTC on 10 September 2016 from the test set; Figure S7: A case of reconstruction at 03:30 UTC on 11 September 2016 from the test set; Figure S8: A case of reconstruction at 06:30 UTC on 13 September 2016 from the test set; Figure S9: A case of reconstruction at 07:30 UTC on 23 September 2016 from the test set; Figure S10: A multiscale representation of the radar echo data at 00:30 UTC on 4 September 2016; Figure S11: A multiscale representation of the radar echo data at 05:30 UTC on 4 September 2016; Figure S12: A multiscale representation of the radar echo data at 08:30 UTC on 4 September 2016; Figure S13: A multiscale representation of the radar echo data at 08:00 UTC on 10 September 2016; Figure S14: A multiscale representation of the radar echo data at 03:30 UTC on 11 September 2016; Figure S15: A multiscale representation of the radar echo data at 06:30 UTC on 13 September 2016; Figure S16: A multiscale representation of the radar echo data at 07:30 UTC on 23 September 2016; Figure S17