Spatial Resolution Enhancement of Satellite Microwave Radiometer Data with Deep Residual Convolutional Neural Network

Satellite microwave radiometer data is affected by many degradation factors during the imaging process, such as the sampling interval, antenna pattern and scan mode, etc., leading to spatial resolution reduction. In this paper, a deep residual convolutional neural network (CNN) is proposed to solve these degradation problems by learning the end-to-end mapping between low-and high-resolution images. Unlike traditional methods that handle each degradation factor separately, our network jointly learns both the sampling interval limitation and the comprehensive degeneration factors, including the antenna pattern, receiver sensitivity and scan mode, during the training process. Moreover, due to the powerful mapping capability of the deep residual CNN, our method achieves better resolution enhancement results both quantitatively and qualitatively than the methods in literature. The microwave radiation imager (MWRI) data from the Fengyun-3C (FY-3C) satellite has been used to demonstrate the validity and the effectiveness of the method.


Introduction
The microwave remote sensing on satellites with the passive radiometers has the advantage over optical and infrared sensing that it can capture images both day and night and even through clouds [1,2].Unique information, including the total water vapor content, the snow depth and the ozone profile, can also be provided in the microwave frequency band [3][4][5][6].Due to the fact of the limited sampling interval and a series of degeneration factors, such as the antenna pattern, the receiver sensitivity and the sensors' scanning mode, the data obtained by satellite microwave radiometers has relatively poor spatial resolution [7][8][9].Thus, the requirement of some small-scale distribution parameters, such as the earth surface soil moisture, is hampered by the low spatial resolution of the data [10].Furthermore, other geophysical parameters like cloud liquid water, total precipitation water and snow depth require the combination of the observed data from different frequency bands and different polarizations [5,6], hence it would be preferable to improve the resolution of data at low frequencies (low spatial resolution) to match up with the data at high frequencies (high spatial resolution).Therefore, the spatial resolution of satellite microwave radiometer data should be improved, especially for the data at low frequencies.
The satellite microwave radiometer data is degenerated by many factors, including the antenna pattern, the receiver sensitivity and the scan mode.Due to the physical limitations on the size of satellite antenna, image data is smoothed by the wide beam-width antenna pattern, leading to relatively poor spatial resolution.Additionally, the image noise is affected by the receiver sensitivity of the radiometer [2].Moreover, the relative geometry of the data changes over the scan because of the conical scanning mode of the radiometer [9], which makes the resolution spatial variable.Many algorithms have been proposed to solve the degeneration problem, such as the reconstruction method in Banach space [8], the Wiener filtering method in the frequency domain [9], the Backus-Gilbert (BG) inversion method [11] and the deconvolution technique using a convolutional neural network(CNN) [12].The reconstruction method in Banach space enhances the spatial resolution by generalizing the gradient method, which leads to the reduction of over-smoothing effects and oscillation due to Gibbs phenomenon.The Wiener filtering method restores the image with space variant filters in the frequency domain in order to reduce the degeneration caused by the antenna pattern and the relative geometry change during the scan.The BG inversion method utilizes redundant overlapping footprint information and the prior knowledge of the antenna pattern to reduce the instantaneous field of view (IFOV) of the antenna and to enhance spatial resolution of the data.The deconvolution technique with CNN is a learning-based method, which directly learns an end-to-end mapping between low-and high-resolution images.During the multiple feature space transformations, the CNN simultaneously considers many kinds of degeneration factors and can achieve better reconstruction results.
Furthermore, the satellite microwave radiometer data is affected by limited sampling interval.Based on the Nyquist theory, the spectrum of the measured image would be aliased if the image contains a component whose spatial frequency is higher than half of the sampling frequency [9].Therefore, high-resolution image with more pixels should be recovered from the low-resolution image with less pixels, which, in computer vision, is called the single image super-resolution (SR) problem.To solve the problem above, original approaches use interpolation methods based on the sampling theories [13,14], however these methods are limited by recovering detailed textures.Subsequent work adopts image statistics [15] to the problem to reconstruct better high-resolution images.Advanced works aim to directly learn the mapping functions between lots of low-and high-resolution example pairs.And based on the powerful capability of the CNNs, dramatic improvements have been made in recent years [16][17][18][19].
To cope with the complicated degeneration factors and the sampling interval limitation, a deep residual CNN is introduced to enhance the spatial resolution of satellite microwave radiometer images in low frequency.Unlike traditional methods that handle each degradation factor separately, during the end-to-end learning-based process, the deep residual CNN simultaneously considers the sampling limitation and many kinds of degeneration factors.Then, reconstructed images with high spatial resolution can be obtained by the trained network.The contributions of this paper are as follows: (i) In order to enhance the spatial resolution of the satellite microwave radiometer images, a deep residual CNN was proposed to cope with both the limited sampling interval problem and the comprehensive degeneration factors.(ii) The deep residual CNN was found to achieve better resolution enhancement results than the methods in literature both quantitatively and qualitatively.The microwave radiation imager (MWRI) data from the Fengyun-3C (FY-3C) satellite was used to demonstrate the validity and the effectiveness of our method.

MWRI Instrument
The FY-3C is one of the FY-3 series of satellites, which are the second generation of Chinese polar-orbiting environmental and meteorological satellites [20].And the on-board MWRI is a ten-channel, five-frequency (10.65, 18.7, 23.8, 36.5 and 89.0 GHz), total power microwave radiometer with horizontal and vertical polarizations, which can measure microwave emissions from lands, ocean Remote Sens. 2019, 11, 771 3 of 20 surfaces and various forms of water in atmosphere [21,22].The emission energy is collected by a single parabolic reflector of 0.9 m and then reflected to different horns which work at different frequencies [23].Thus, due to different electric size of the antenna, the spatial resolution of different channels varies from 51 km × 85 km (at 10.65 GHz) to 9 km × 15 km (at 89.0 GHz) along and cross the satellite orbit track direction.The specific performance of the MWRI is shown in Table 1.And NE∆T is the receiver sensitivity (noise equivalent differential temperature) of each channel [22].The scan geometry of the on-board MWRI is shown in Figure 1.The positive X ins -axis shows the satellite flight direction, and the positive Z ins -axis is pointing downward to the satellite nadir direction.The satellite orbit height is 836 km and the main reflector of MWRI scans the earth conically with a spinning rate of 1.8 s/rotation.The forward cone scanning method is adopted to cover a swath (width) of 1400 km, completing a measurement scene within ±52 • around the nadir direction with a viewing angle of θ = 45 • .The sampling interval for the earth view is 2.08 ms, so in each scanning cycle, a total of 254 observation samples are collected [23,24].
Remote Sens. 2019, 11, x FOR PEER REVIEW 3 of 21 at different frequencies [23].Thus, due to different electric size of the antenna, the spatial resolution of different channels varies from 51 km × 85 km (at 10.65 GHz) to 9 km × 15 km (at 89.0 GHz) along and cross the satellite orbit track direction.The specific performance of the MWRI is shown in Table 1.And NEΔT is the receiver sensitivity (noise equivalent differential temperature) of each channel [22].The scan geometry of the on-board MWRI is shown in Figure 1.The positive Xins-axis shows the satellite flight direction, and the positive Zins-axis is pointing downward to the satellite nadir direction.The satellite orbit height is 836 km and the main reflector of MWRI scans the earth conically with a spinning rate of 1.8 s/rotation.The forward cone scanning method is adopted to cover a swath (width) of 1400 km, completing a measurement scene within ±52° around the nadir direction with a viewing angle of 45 θ = °.The sampling interval for the earth view is 2.08 ms, so in each scanning cycle, a total of 254 observation samples are collected [23,24].The Level-1 (L1) data acquired by the MWRI on the FY-3C satellite are used for the validation in this paper.Each image file of 254*1725 pixels can be obtained by 1725 conical scans.An example of the real scanning positions in the earth spherical coordinate system [24] is shown in Figure 2. The blue dots show the scanning positions of a descending mode scan on 14th March 2017 and the red dots show the following ascending mode scan.The Level-1 (L1) data acquired by the MWRI on the FY-3C satellite are used for the validation in this paper.Each image file of 254 * 1725 pixels can be obtained by 1725 conical scans.An example of the real scanning positions in the earth spherical coordinate system [24] is shown in Figure 2. The blue dots show the scanning positions of a descending mode scan on 14th March 2017 and the red dots show the following ascending mode scan.

Antenna Temperature Image
The antenna collects temperature information by receiving the electromagnetic wave radiated from the earth, and it can be represented by: where ( ) t r is the temperature information collected by the antenna and r shows the direction of the antenna in the antenna viewing axis coordinate system (AVA) [24].( , ) B t θ ϕ is the ideal brightness temperature information, ( , ) F θ ϕ is the antenna pattern and ( , ) θ ϕ represents the spherical coordinates in AVA.It is worth noting that Equation ( 1) is only a scalar version of the antenna temperature.Some minor factors, such as antenna cross polarization and polarization mismatch between antenna pattern and earth's surface [25], are excluded for the simplicity of the problem.However, most of the image processing operations are done in the image domain, where the data pixels lie in the rectangular grids regardless of the radiometers' scan angle and the satellite orbit.Therefore, the coordinates transformation of Equation ( 1) from AVA to image domain should be incorporated [24].The antenna pattern ( , ) F θ ϕ should be transformed to the point-spread-function (PSF) ( ', ') h X Y [9], which is the normalized antenna gain distribution in image domain.And ' X , ' Y are the row and column indexes in the image domain.Meanwhile, considering the limited sampling interval, the decimation process is used to reduce the pixels of the data.Therefore, the temperature image collected by the antenna in image domain can be represented by: where ( , ) A t X Y is the antenna temperature image obtained by the antenna, ( ', ') B t X Y is the ideal brightness temperature image represented by the row and column indexes in the image domain, and ( , ) n X Y corresponds to the receiver sensitivity NE T Δ .[ ] decimation denotes for the decimation process, where the data samples are extracted to match up with the actual sampling interval of the satellite radiometer.Thus, X , Y are the row and column indexes in the same grid as the obtained image data and the range of X and Y in ( , ) A t X Y is determined by the sampling numbers in

Antenna Temperature Image
The antenna collects temperature information by receiving the electromagnetic wave radiated from the earth, and it can be represented by: where t A (r) is the temperature information collected by the antenna and r shows the direction of the antenna in the antenna viewing axis coordinate system (AVA) [24].t B (θ, ϕ) is the ideal brightness temperature information, F(θ, ϕ) is the antenna pattern and (θ, ϕ) represents the spherical coordinates in AVA.It is worth noting that Equation ( 1) is only a scalar version of the antenna temperature.Some minor factors, such as antenna cross polarization and polarization mismatch between antenna pattern and earth's surface [25], are excluded for the simplicity of the problem.However, most of the image processing operations are done in the image domain, where the data pixels lie in the rectangular grids regardless of the radiometers' scan angle and the satellite orbit.Therefore, the coordinates transformation of Equation ( 1) from AVA to image domain should be incorporated [24].The antenna pattern F(θ, ϕ) should be transformed to the point-spread-function (PSF) h(X ,Y ) [9], which is the normalized antenna gain distribution in image domain.And X , Y are the row and column indexes in the image domain.Meanwhile, considering the limited sampling interval, the decimation process is used to reduce the pixels of the data.Therefore, the temperature image collected by the antenna in image domain can be represented by: where t A (X, Y) is the antenna temperature image obtained by the antenna, t B (X , Y ) is the ideal brightness temperature image represented by the row and column indexes in the image domain, and n(X, Y) corresponds to the receiver sensitivity NE∆T.[ ] decimation denotes for the decimation process, where the data samples are extracted to match up with the actual sampling interval of the satellite radiometer.Thus, X, Y are the row and column indexes in the same grid as the obtained image data and the range of X and Y in t A (X, Y) is determined by the sampling numbers in the cross track (254 for FY-3C MWRI) and the number of rotations in the imaging process (1725 for FY-3C MWRI).However, in practice, the radiometer (antenna plus scanning system) works as an integral transducer that operates on the t B distribution to get the antenna temperature image t A .Although the shape of the antenna footprints on earth keeps the same, due to the conical scanning mode and the curvature of the earth, the orientations of the footprints' major axes change during the scan, leading to the change of angles between the along track direction and the footprints' major axes, as shown in Figure 1.Thus, after the coordinate transformation to the image domain, the PSF changes correspondingly with the scan position.Therefore, with the additional consideration of the scan mode, the temperature image collected by the antenna is given by: where h t (X, Y, h(X , Y )) is the space variable PSF considering the relative geometry of the scan.
During the FY-3C MWRI scanning process, the relative geometry changes of the data in the along track direction (data in the same row) almost stay the same.Thus, only the column data in the image domain is considered under the influence of relative geometry.With this simplification, Equation ( 3) is rewritten by: where w i (X) is the weighing function.When using 254 PSFs, the weighing function is the Kronecker function.
In summary, with the considerations of the sampling interval and the degeneration factors, including the antenna pattern, the receiver sensitivity and the scanning mode, the degradation procedure from ideal brightness temperature image to antenna temperature image is discussed.During the degradation procedure in Equation ( 4), the convolutions with different PSFs due to the limitation of antenna size drastically smooth the image, which lowers the spatial resolution of the antenna temperature image to a great extent.Moreover, the spatial resolution differs in different parts of the image because the shape of the PSF changes during the conical scan.Thus, different parts of the image are smoothed by different PSFs, which leads to resolution spatial variable (will be discussed in detail in Section 3.2).Furthermore, the sampling interval limits the total number of pixels in the observation region and the spectrum of the measured image would be aliased if the image has the component whose spatial frequency is higher than half of the sampling frequency.Thus, the spatial resolution is also decreased by the limited sampling interval.Therefore, methods should be taken to cope with these degradation factors so that the spatial resolution of the antenna temperature images can be improved.

Method
CNNs are feed-forward artificial neural networks with strong feature extraction and mapping ability.In recent years, several different CNNs have been proposed to solve the resolution enhancement problem (SR problem) for the optical image due to the limited sampling interval [16][17][18][19], and reach the state-of-art quality.And the CNN has also been used by us to solve the resolution enhancement problem (deconvolution problem) due to some degeneration factors such as the antenna pattern and the receiver sensitivity [12].The deconvolution result is better than the traditional Wiener filter method both quantitatively and qualitatively.
In this paper, we introduced a deep residual CNN to solve the spatial resolution enhancement problem for the MWRI images on FY-3C satellite.Both the limited sampling interval and the comprehensive degeneration factors, such as the antenna pattern, the receiver sensitivity and the scanning mode, were learned jointly during the training process of the network.

Network Architecture
Recent evidence reveals that the depth (number of the layers) of the network is of crucial importance [26].Deep CNNs naturally integrate more low/mid/high level features and classifiers in an end-to-end multilayer fashion, thus they have more powerful feature extraction and mapping ability.However, normal deep plain CNNs usually face the obstacles of degradation problem and gradient vanishing/exploding problem.Thus, the residual learning technique has been introduced in the deep CNNs to solve these problems [27].Instead of mapping H(X) (X is the input) by a few stacked layers, the residual learning explicitly lets these stacked layers map a residual function F(X) = H(X) − X, and outputs function F(X) + X with a shortcut connection from the input.Although both forms should be able to asymptotically map the same function, the ease of learning might be different.With the help of residual learning, deep CNNs are easier to be optimized and can gain better results [19,27,28].
Therefore, we introduce a deep residual CNN to enhance the spatial resolution of FY-3C MWRI images.A similar network architecture, which was proposed in [19], was used to solve the computer vision SR problem of the optical images.With the consideration of its powerful feature extraction and mapping ability, we introduce the network here to cope with some more complicated and intricate degradation factors in the MWRI imaging process, including the sampling interval limitation, antenna pattern, receiver sensitivity and scan mode.The architecture of the network is shown in Figure 3a.In this paper, we introduced a deep residual CNN to solve the spatial resolution enhancement problem for the MWRI images on FY-3C satellite.Both the limited sampling interval and the comprehensive degeneration factors, such as the antenna pattern, the receiver sensitivity and the scanning mode, were learned jointly during the training process of the network.

Network Architecture
Recent evidence reveals that the depth (number of the layers) of the network is of crucial importance [26].Deep CNNs naturally integrate more low/mid/high level features and classifiers in an end-to-end multilayer fashion, thus they have more powerful feature extraction and mapping ability.However, normal deep plain CNNs usually face the obstacles of degradation problem and gradient vanishing/exploding problem.Thus, the residual learning technique has been introduced in the deep CNNs to solve these problems [27].Instead of mapping ( ) H X ( X is the input) by a few stacked layers, the residual learning explicitly lets these stacked layers map a residual function − , and outputs function ( )+ F X X with a shortcut connection from the input.
Although both forms should be able to asymptotically map the same function, the ease of learning might be different.With the help of residual learning, deep CNNs are easier to be optimized and can gain better results [19,27,28].Therefore, we introduce a deep residual CNN to enhance the spatial resolution of FY-3C MWRI images.A similar network architecture, which was proposed in [19], was used to solve the computer vision SR problem of the optical images.With the consideration of its powerful feature extraction and mapping ability, we introduce the network here to cope with some more complicated and intricate degradation factors in the MWRI imaging process, including the sampling interval limitation, antenna pattern, receiver sensitivity and scan mode.The architecture of the network is shown in Figure 3a.The convolutional layer1 (Conv1) firstly implements the feature extraction and representation from the input sub images (m, m).This operation extracts overlapping patches (patch size = (f, f ), stride = 1) from the input antenna temperature sub images and represent each patch as a high-dimension vector (1, 1, n) [16,17].All these vectors constitute a set of feature maps (m, m, n).Formally, the convolutional layer1 is expressed as an operation F 1 : where X are the sub images (m, m), ⊗ denotes the convolution operation, W 1 (f, f, n) and B 1 (1, 1, n) represent the filters (convolution kernels or weights) and biases respectively and F 1 (X) (m, m, n) is the output of convolutional layer1 representing a set of primary feature maps.
After the convolutional layer1, we apply the Rectified Linear Unit (ReLU) [29] to the response, which makes the convergence much faster while still presenting good quality [30].Formally, the ReLU operation is expressed as: where X represents the input feature maps of the ReLU.
Then the ResBlocks implement the non-linear mapping and residual learning.The mapping operation nonlinearly maps and further extracts a set of feature maps (m, m, n) to another set of feature maps (m, m, n).And residual learning adds input feature maps on the feature maps extracted by non-linear mapping, which makes the ResBlock easier to optimize and solves the problem of vanishing/exploding gradients.Due to the fact that the edge changes of the MWRI images are much less complex than the natural optical images, we chose to cut down the layers of the network to 16 in order to reduce the computational cost.Formally, the ResBlock i (i = 1, 2, . . ., 16) is expressed as the operation R i : where represent the filters and biases respectively in the convolutional layer1' of ResBlock i, W 2i (n, f, f, n) and B 2i (1, 1, n) represent the filters and biases respectively in the convolutional layer2'.α is the residual scaling factor which is used to stabilize the training process and R i (X i ) represents the final output feature maps (m, m, n) of the ResBlock i.
The final Upsample block implements the non-linear mapping and shuffle procedure.The non-linear mapping operation nonlinearly maps and eventually extracts a set of feature maps (m, m, n) to another set of feature maps (m, m, c 2 ).It accommodates the total number of pixels when the up-sampling factor is c.Then the shuffle procedure is implemented by the subpixel phase shifting technique [31] in order to rearrange the final feature maps (m, m, c 2 ) to the output high resolution image (c × m, c × m).
It is worth mentioning here that our network enhances the spatial resolution of the input antenna temperature image in two aspects.On the one hand, the pixels of the output image are c 2 times of the input image, which means that the equivalent sampling interval of the output image is 1/c of the input image.Thus, the aliased spectrum can be effectively eliminated and the high-frequency components of the image can be recovered.On the other hand, some degeneration factors such as the antenna pattern, receiver sensitivity and scanning mode can also be learned jointly through the end-to-end training process.And the smoothing effect, the noise and the resolution differences can therefore be reduced.

Dataset
The learning-based methods are often challenged by the difficulties of effectively and compactly creating the dataset.However, utilizing the characteristics of MWRI that multiple frequency band radiometers scan the region in the exactly same way and at the same time, a flexible dataset creation method is proposed [12].
The spatial resolution of the antenna temperature image varies with the frequency, the image in 89 GHz has the highest resolution due to its narrowest beam width.Thus, in our study for degradation factors reduction, the 89 GHz antenna temperature images were used as the 18.7 GHz simulated brightness temperature images I HR .And based on the MWRI imaging process in Equation ( 4), the 18.7 GHz simulated antenna temperature images I LR were obtained from the I HR .
The two-dimensional Gaussian function with the equal full width half maximum (FWHM) with the real 18.7 GHz antenna pattern in AVA was used as the equivalent antenna pattern [2].And this antenna pattern should be transformed to 254 PSFs h t (i, * , h(X , Y )) according to the prior knowledge of the scan geometry.But in order to reduce the calculation time, it is desirable to carry out less convolution operations in Equation ( 4).Therefore, in this paper, 7 PSFs (located at the 37, 67, 97, 127, 157, 187 and 217 pixels along the across track direction) were finally used for the tradeoff between speed and accuracy.The 2nd, 4th, and 6th PSFs in image domain are shown in Figure 4.

I .
The two-dimensional Gaussian function with the equal full width half maximum (FWHM) with the real 18.7 GHz antenna pattern in AVA was used as the equivalent antenna pattern [2].And this antenna pattern should be transformed to 254 PSFs ( , , ( ', ')) * according to the prior knowledge of the scan geometry.But in order to reduce the calculation time, it is desirable to carry out less convolution operations in Equation ( 4).Therefore, in this paper, 7 PSFs (located at the 37, 67, 97, 127, 157, 187 and 217 pixels along the across track direction) were finally used for the tradeoff between speed and accuracy.The 2nd, 4th, and 6th PSFs in image domain are shown in Figure 4.
Then the weighing functions should be changed accordingly with the PSFs' number and locations, as shown in Figure 5.The 7 lines with different colors in Figure 5    Then the weighing functions should be changed accordingly with the PSFs' number and locations, as shown in Figure 5.The 7 lines with different colors in Figure 5  And the image noise ( , ) n X Y was simulated by randomly generating Gaussian white noise with the standard derivation of NEΔT.The NEΔT is 0.5 K for the 18.7 GHz channel, as shown in Table 1.Finally, a decimation process was used to simulate the limited sampling interval in practical applications.Thus, the dataset creating process is given by: Table 1.Finally, a decimation process was used to simulate the limited sampling interval in practical applications.Thus, the dataset creating process is given by: Then, the mapping functions between I LR and I HR image pairs would be learned by the CNN during the training process.

Spatial Resolution Enhancement with Deep Residual CNN
The spatial resolution enhancement task for the satellite microwave radiometer data was implemented by a supervised deep residual CNN.The flowchart of the task is shown in Figure 6.
As can be seen from the flowchart, the process consisted of three major steps: Dataset creation, training and testing, and spatial resolution enhancement.Dataset creation: By utilizing the feature that the multi-frequency bands of the MWRI scan the region in the exactly same way, the 89 GHz antenna temperature images, which have the highest spatial resolution, were used as the simulated brightness temperature images HR I .And with the full consideration of the degeneration factors and the sampling interval (we doubled the sampling interval in our experiment), the simulated antenna temperature images LR I were produced by Equation ( 9) to simulate the degradation process for the 18.7 GHz data.Then, the mapping functions between HR I and LR I image pairs would be learned during the training process.Dataset creation: By utilizing the feature that the multi-frequency bands of the MWRI scan the region in the exactly same way, the 89 GHz antenna temperature images, which have the highest spatial resolution, were used as the simulated brightness temperature images I HR .And with the full consideration of the degeneration factors and the sampling interval (we doubled the sampling interval in our experiment), the simulated antenna temperature images I LR were produced by Equation ( 9) to simulate the degradation process for the 18.7 GHz data.Then, the mapping functions between I HR and I LR image pairs would be learned during the training process.
Training and testing: The training and testing set were made from the I HR and I LR image pairs.Then the datasets were used to train and test for the deep residual CNN.
Spatial resolution enhancement: Finally, the trained network was used to enhance the spatial resolution of the real 18.7 GHz antenna temperature images.

Training Details
For training and testing, 80 pairs of I LR and I HR images were made from the 89 GHz antenna temperature images with horizontal polarization.These 89 GHz antenna temperature images were acquired by the MWRI on the FY-3C satellite from 28 June 2018 to 5 July 2018 containing sufficient features of earth's lands, oceans and other typical geographical objects.The network was trained for the up-sampling factor c = 2, so each I HR image contained 254 × 1724 pixels and each I LR contained 127 × 862 pixels.And with minor modifications of the Upsample layer and the dataset, the network can also cope with other up-sampling factors.10 I LR and I HR image pairs were randomly selected as the testing set and the remaining 70 image pairs served as the training set.Extracted from the training set, 480 pairs of sub-images with 254 × 254 and 127 × 127 (m = 127) pixels were used as the output and input of the network and the batch size was 5.It worth mentioning that although we used a fixed sub-image size in the training process, the network can be applied to an arbitrary image size during the testing and spatial resolution enhancement process.
The filter size f was set to 3 for all the filters in the convolutional layers, ResBlocks and Upsample layer.The same padding technique was used for all the convolution operations.The feature size n was set to 256, and the shortcuts for residual learning were parameter free (equivalent pass) because of the constant dimension (feature size) through the network.The number of the ResBlocks was set to 16 to exploit the mapping ability of the deep CNN.And the residual scaling factor α was set to 0.1.The network was trained with the ADAM optimizer by setting learning rate = 10 −4 , beta1 = 0.9, beta2 = 0.999 and epsilon = 10 −8 .
Instead of using the common L2 loss to maximize the PSNR [32], our network used the L1 loss to get a better general result considering the PSNR, SSIM and the IFOV'.Furthermore, the L1 loss can provide faster convergence.
The network was implemented by the TensorFlow framework and the training process took roughly 8 h and 19 min (7000 backpropagations) using a NVIDIA GeForce GTX 1080Ti GPU.

Evaluation Metrics
Several metrics were introduced to evaluate the image quality from different perspectives.The peak signal-to-noise ratio (PSNR) is the mean-squared error measurement, which is defined as the ratio of peak signal power to the average noise power [33].For image (M, N): where 0 ≤ X ≤ M − 1 and 0 ≤ Y ≤ N − 1, I HR (X, Y) denotes pixel (X, Y) of the simulated brightness temperature image (reference image), I HR output (X, Y) denotes pixel (X, Y) of the network output image and D is the maximum peak-to-peak swing of the signal.
The structural similarity (SSIM) measures the similarity of structural information between two images [34], which is defined as: where µ is the mean, σ is the standard deviation of the image and σ I HR I HR output is the covariance between I HR and I HR output .C 1 and C 2 are constants that are used to stabilize the formula.In this paper, we choose C 1 = (0.01D) 2 and C 2 = (0.03D) 2 .An SSIM closer to 1 indicates that the structure of the output image is more similar to the reference.
For evaluating the performance in terms of spatial resolution, we used the equivalent instantaneous field of view (IFOV') index [2]: where I HR equi = I HR ⊗ PSF f whm are obtained by convolutions between I HR and a group of rotationally symmetric PSFs whose FWHM increase with the step of 0.5 km.The specific FWHM, whose corresponding I HR equi has the highest correlation coefficient with the output of the network I HR output , is the IFOV'.Thus, a smaller IFOV' represents an image with higher spatial resolution.

Results
In this section, the learning-based method 3-layer plain CNN [12,16,17], the traditional method bicubic interpolation and the super-resolution method (bicubic interpolation plus Wiener filtering) [2] were displayed as the comparison of our deep residual CNN method.Apart from the structure, as shown in Figure 3b, and the filter sizes (same as [16,17]) of the network, the other settings of the 3-layer CNN were consistent with our network in order to implement a fair comparison.

Quantitative and Qualitative Evaluation
Ten different scenes (1-5 scenes within the training set and 6-10 outside the training and testing set) with 238 × 300 pixels (16 pixels on the edge of the scene were cropped due to the no padding mode in 3-layer CNN [12,16,17]) were firstly used for the evaluation.The evaluation indexes are listed in Table 2.And the deep residual CNN yielded the highest scores in all the evaluation indexes (the highest scores are labeled as the bold font in the table).The average PSNR of our network achieved an increase of 3.68 dB compared with bicubic interpolation, 3.07 dB compared with super-resolution method and 1.49 dB compared with 3-layer CNN.The average SSIM improved 0.034 than bicubic interpolation, 0.029 than super-resolution method and 0.008 than 3-layer CNN.Our method also achieved the smallest IFOV'.The average IFOV' of our network was reduced to 18 km, far more less than the other methods: 37.5 km for the bicubic interpolation, 31 km for the super-resolution method and 21.55 km for the 3-layer CNN.Scene 2 (within the training set) shows the southeast coast of Russia around the Sea of Okhotsk, as shown in Figure 7. Figure 7a shows the degraded image I LR .The degradation factors, including the limited sampling interval, antenna pattern, radiometer sensitivity and scanning mode, blur the simulated brightness temperature image, so that the edges of the coast and many other details become unclear.Figure 7b shows the result of bicubic interpolation with the up-sample factor 2. Although the size of the image is doubled by the interpolation, no extra information is added.The super-resolution method uses the Wiener filter to deal with the degeneration factors of antenna pattern and scanning mode after the interpolation.And it eliminates the smoothing effect to a certain extent, so that more details can be obtained from the image, as shown in Figure 7c.The 3-layer CNN and the deep residual CNN all learn the end-to-end mapping between I LR and I HR .Due to the more powerful feature extraction and mapping ability, our network achieves a better result, as shown in Figure 7e, and its output is more similar to the I HR image in Figure 7f.The result of 3-layer CNN is shown in Figure 7d.For the sake of comparison and evaluation, the local areas of the Penzhina Bay in Russia (enclosed by the small black squares before enlargement) are also shown in the bottom left of each image in Figure 7 after a 3 times enlargement (enclosed by the big black squares after enlargement).The same bicubic interpolation method was used for all the local areas enlargement.As can be seen from the enlarged images, our method still has the best result.The above results show the spatial resolution enhancement ability of the deep residual CNN in the simulated LR I and HR I dataset.However, the validity of the model to the MWRI real data is still unknown.Real 18.7 GHz data of FY-3C MWRI with horizontal polarization was used to demonstrate the effectiveness of our network.In addition, since the 18.7 GHz ideal brightness temperature images are unavailable, the real 36.5GHz data of FY-3C MWRI with horizontal polarization, which has the similar characteristics but with higher spatial resolution, were shown for comparison.
Figure 10 shows the results around the Lake Michigan in America.Figure 10a shows the sub-sampled 18.9 GHz data.The data is sub-sampled with the factor 2, so the spatial resolution is further decreased.Figure 10b shows the result by the bicubic interpolation method.Figure 10c shows the result by the super-resolution method.Figure 10d and Figure 10e show the result of 3-layer CNN and our model.The 36.5 GHz data is shown in Figure 10f for comparison.And the enlarged local spot shows the Green Bay.As can be seen in Figure 10, the bay is still blurred for these methods due to their limited resolution enhancement abilities, but our model has clearly recovered the boundaries of the Green Bay.
Figure 11 shows the area around the Mexico and the Pacific Ocean.And the enlarged local spot shows a typical section around the Archangel Island and the Valley of the Cirios.As can be seen from these enlarged images, the features of the island, which previously are only visible in the 36.5 GHz data, can be resolved by our method.Figure 12 shows the Gulf of Alaska in the North Pacific Ocean and it is also the same area as scene 8.The enlarged spot shows the Kupreanof Island in the state of Alaska, the same local spot in Figure 8.Among these methods, our method also achieves the best result.Therefore, these results have demonstrated the powerful spatial resolution enhancement ability of our model in practical use.The above results show the spatial resolution enhancement ability of the deep residual CNN in the simulated I LR and I HR dataset.However, the validity of the model to the MWRI real data is still unknown.Real 18.7 GHz data of FY-3C MWRI with horizontal polarization was used to demonstrate the effectiveness of our network.In addition, since the 18.7 GHz ideal brightness temperature images are unavailable, the real 36.5GHz data of FY-3C MWRI with horizontal polarization, which has the similar characteristics but with higher spatial resolution, were shown for comparison.
Figure 10 shows the results around the Lake Michigan in America.Figure 10a shows the sub-sampled 18.9 GHz data.The data is sub-sampled with the factor 2, so the spatial resolution is further decreased.Figure 10b shows the result by the bicubic interpolation method.Figure 10c shows the result by the super-resolution method.Figures 10d and 10e show the result of 3-layer CNN and our model.The 36.5 GHz data is shown in Figure 10f for comparison.And the enlarged local spot shows the Green Bay.As can be seen in Figure 10, the bay is still blurred for these methods due to their limited resolution enhancement abilities, but our model has clearly recovered the boundaries of the Green Bay.
Figure 11 shows the area around the Mexico and the Pacific Ocean.And the enlarged local spot shows a typical section around the Archangel Island and the Valley of the Cirios.As can be seen from these enlarged images, the features of the island, which previously are only visible in the 36.5 GHz data, can be resolved by our method.Figure 12 shows the Gulf of Alaska in the North Pacific Ocean and it is also the same area as scene 8.The enlarged spot shows the Kupreanof Island in the state of Alaska, the same local spot in Figure 8.Among these methods, our method also achieves the best result.Therefore, these results have demonstrated the powerful spatial resolution enhancement ability of our model in practical use.

Running Time and Network Convergence
The running time of the bicubic interpolation is 0.002 s and the super-resolution method costs 3.677 s.After the training process, the 3-layer CNN produces a result in about 2.936 s and our model produce a result in about 19.049 s.We profiled the running time of all the methods in the same computer (Intel CPU I5 2.3 GHz).The interpolation method and the super-resolution method were obtained from the MATLAB implementation, whereas 3-layer CNN and our model were in TensorFlow.However, the speed gap is not only caused by the different MATLAB/TensorFlow implementations.The super-resolution method costs most of the time in 7 deconvolution operations with different PSFs when the scanning mode is considered.Whereas the 3-layer CNN and deep residual CNN produce the result in a completely feed-forward mode and the running time is related to the number of parameters in the networks.The multi-layers and the large feature size of our network increase the parameters to a certain extent, but the running time is still sufficient in the practical use.The time, that is needed for the FY-3C MWRI to scan the same scene (254 × 300 pixels), is about 27 times longer than the processing time of our method.
The training loss curves and the testing PSNR evaluation curves during the training process of our network and 3-layer CNN are shown in Figure 13 and Figure 14.Our network was trained for 7000 backpropagations and the 3-layer CNN was trained for 50,000 backpropagations.Due to the utilization of residual learning technique, our model is easier to converge (training loss ≈ 1.6@7000 backpropagations for our network and training loss ≈ 3.5@7000 backpropagations for 3-layer CNN).And with the help of deep layers and large feature size, our model can achieve smaller loss (training loss ≈ 1.9@50000 backpropagations for 3-layer CNN) and higher PSNR (testing PSNR ≈ 38.3 dB@7000 backpropagations for our network and testing PSNR ≈ 37.7 dB@50000 backpropagations for 3-layer CNN) during the training and testing process.

Running Time and Network Convergence
The running time of the bicubic interpolation is 0.002 s and the super-resolution method costs 3.677 s.After the training process, the 3-layer CNN produces a result in about 2.936 s and our model produce a result in about 19.049 s.We profiled the running time of all the methods in the same computer (Intel CPU I5 2.3 GHz).The interpolation method and the super-resolution method were obtained from the MATLAB implementation, whereas 3-layer CNN and our model were in TensorFlow.However, the speed gap is not only caused by the different MATLAB/TensorFlow implementations.The super-resolution method costs most of the time in 7 deconvolution operations with different PSFs when the scanning mode is considered.Whereas the 3-layer CNN and deep residual CNN produce the result in a completely feed-forward mode and the running time is related to the number of parameters in the networks.The multi-layers and the large feature size of our network increase the parameters to a certain extent, but the running time is still sufficient in the practical use.The time, that is needed for the FY-3C MWRI to scan the same scene (254 × 300 pixels), is about 27 times longer than the processing time of our method.
The training loss curves and the testing PSNR evaluation curves during the training process of our network and 3-layer CNN are shown in Figures 13 and 14.Our network was trained for 7000 backpropagations and the 3-layer CNN was trained for 50,000 backpropagations.Due to the utilization of residual learning technique, our model is easier to converge (training loss ≈ 1.6@7000 backpropagations for our network and training loss ≈ 3.5@7000 backpropagations for 3-layer CNN).And with the help of deep layers and large feature size, our model can achieve smaller loss (training loss ≈ 1.9@50000 backpropagations for 3-layer CNN) and higher PSNR (testing PSNR ≈ 38.3 dB@7000 backpropagations for our network and testing PSNR ≈ 37.7 dB@50000 backpropagations for 3-layer CNN) during the training and testing process.

Discussion
In this paper, the network's training and testing set were made from the 89 GHz antenna temperature images, because the 18.7 GHz ideal brightness temperature images are hard to obtain.With this approximation, we proposed a flexible approach to create the dataset, so a considerable amount of training data could be easily and conveniently provided to optimize the network's parameters.Using this dataset, our network produced better results for the simulated and real measured data than the traditional bicubic interpolation method, the super-resolution method [2] and the learning-based 3-layer CNN method [12,16,17].However, the 89 GHz channel is easily affected by the atmospheric effects, which tends to smooth the satellite radiometer data.Thus, methods to remove these atmospheric effects before making the dataset need to be explored in future work.

Discussion
In this paper, the network's training and testing set were made from the 89 GHz antenna temperature images, because the 18.7 GHz ideal brightness temperature images are hard to obtain.With this approximation, we proposed a flexible approach to create the dataset, so a considerable amount of training data could be easily and conveniently provided to optimize the network's parameters.Using this dataset, our network produced better results for the simulated and real measured data than the traditional bicubic interpolation method, the super-resolution method [2] and the learning-based 3-layer CNN method [12,16,17].However, the 89 GHz channel is easily affected by the atmospheric effects, which tends to smooth the satellite radiometer data.Thus, methods to remove these atmospheric effects before making the dataset need to be explored in future work.

Discussion
In this paper, the network's training and testing set were made from the 89 GHz antenna temperature images, because the 18.7 GHz ideal brightness temperature images are hard to obtain.With this approximation, we proposed a flexible approach to create the dataset, so a considerable amount of training data could be easily and conveniently provided to optimize the network's parameters.Using this dataset, our network produced better results for the simulated and real measured data than the traditional bicubic interpolation method, the super-resolution method [2] and the learning-based 3-layer CNN method [12,16,17].However, the 89 GHz channel is easily affected by the atmospheric effects, which tends to smooth the satellite radiometer data.Thus, methods to remove these atmospheric effects before making the dataset need to be explored in future work.3.These results suggest that the existing 70 image training set has covered sufficient temperature image features, and further increasing the number of images does not significantly contribute to the training of the network.
Only three dominant degeneration factors, including the antenna pattern, scan mode and receiver sensitivity, were considered in this paper.Other factors, for example the deformation of the antenna, the side lobes of the antenna pattern and the atmospheric effects, etc., could also be added to the model and learned by the network.Additionally, the up-sampling factor could also be changed in the model according to the practical demands.
Furthermore, this method can be perfectly used for the Special Sensor Microwave/Imager (SSM/I) data whose 85 GHz channel's (highest frequency channel's) sampling interval is half of the other low frequency channels' sampling interval [9].Thus, the pixels of low frequency images could be increased to match up with the pixels in high frequency images.
In future work, simulated brightness temperature scenarios with smaller sampling interval than the MWRI data would be used to create the training/testing dataset to further strengthen the method's practical use.Furthermore, the enhanced data would be used in the inversion of geophysical parameters to further validate the effectiveness of our method.

Conclusions
In this paper, to improve the spatial resolution of satellite microwave radiometer data, a deep residual convolutional neural network (CNN) was proposed.The network solves the comprehensive degradation problem, which is induced by sampling interval, antenna pattern, receiver sensitivity and scan mode, by learning the end-to-end mapping between low-and high-resolution images.Different from the traditional methods that separately handle every degradation factor, our learning-based method corporately learns and optimizes all the factors during the training process.And by utilizing the multi-band characteristics of the radiometer, a flexible and convenient dataset creation approach was proposed to produce a considerable amount of training data for the network.Furthermore, due to the powerful mapping capability, the deep residual CNN has been testified to achieve better results than the methods in literature [2,12,16,17] both quantitatively and qualitatively.
Application of this method to FY-3C microwave radiation imager (MWRI) 18.7 GHz data showed that the spatial resolution could be dramatically enhanced and features, which previously were only visible in the higher resolution 36.5 GHz image, could be resolved from the enhanced image.Furthermore, real time application of this method is feasible because the time needed for scan is a factor of 27 longer than the calculation time.

Figure 2 .
Figure 2. The actual scan positions of the FY-3C MWRI in the earth spherical coordinate system.(a) An example of the scanning positions of a descending mode scan (blue) and the following ascending mode scan (red); (b) Enlarged view of the black box in (a).

Figure 2 .
Figure 2. The actual scan positions of the FY-3C MWRI in the earth spherical coordinate system.(a) An example of the scanning positions of a descending mode scan (blue) and the following ascending mode scan (red); (b) Enlarged view of the black box in (a).

Figure 3 .
Figure 3.The architecture of the deep residual CNN and 3-layer CNN.(a) Deep residual CNN; (b) 3-layer CNN.

Figure 3 .
Figure 3.The architecture of the deep residual CNN and 3-layer CNN.(a) Deep residual CNN; (b) 3-layer CNN.
the locations of the peak of each line are shown in the label.The overlapping of neighboring weighing functions serves to reduce the boundary artifacts.Additionally, the triangle shape of the weighing function is designed to activate the specific PSFs in different local regions and suppress the other PSFs.

Figure 5 .
Figure 5. Weighing functions for the 7 PSFs.And the image noise n(X, Y) was simulated by randomly generating Gaussian white noise with the standard derivation of NE∆T.The NE∆T is 0.5 K for the 18.7 GHz channel, as shown in

Figure 6 .
Figure 6.The flowchart of the resolution enhancement task.

Figure 6 .
Figure 6.The flowchart of the resolution enhancement task.
SSIM(I HR , I HR output ) = (2µ I HR µ I HR output + C 1 )(2σ I HR I HR output

21 Figure 7 .
Figure 7. Experiment results of the scene 2 (in training set).(a) The simulated antenna temperature image LR I ; (b) The bicubic interpolation result; (c) The super resolution result; (d) The 3-layer CNN result; (e) The deep residual CNN result; (f) The simulated brightness temperature image HR I .

Figure 8 .
Figure 8. Experiment results of the scene 8 (outside training/testing set).(a) The simulated antenna temperature image LR I ; (b) The bicubic interpolation result; (c) The super resolution result; (d) The 3-layer CNN result; (e) The deep residual CNN result; (f) The simulated brightness temperature image HR I .

Figure 7 . 21 Figure 7 .Figure 8 .
Figure 7. Experiment results of the scene 2 (in training set).(a) The simulated antenna temperature image I LR ; (b) The bicubic interpolation result; (c) The super resolution result; (d) The 3-layer CNN result; (e) The deep residual CNN result; (f) The simulated brightness temperature image I HR .Scene 8 (outside the training and testing set) shows the area around the Gulf of Alaska in the North Pacific Ocean, as shown in Figure 8.And the enlarged local spot shows the Kupreanof Island in the state of Alaska.Scene 10 (outside the training and testing set) shows the north Ural Federal District in Russia, as shown in Figure 9.The enlarged local spot shows the Gulf of Ob.The images produced by our model also achieve the best visual effect.

Figure 8 .
Figure 8. Experiment results of the scene 8 (outside training/testing set).(a) The simulated antenna temperature image I LR ; (b) The bicubic interpolation result; (c) The super resolution result; (d) The 3-layer CNN result; (e) The deep residual CNN result; (f) The simulated brightness temperature image I HR .

Figure 9 .
Figure 9. Experiment results of the scene10 (outside training/testing set).(a) The simulated antenna temperature image LR I ; (b) The bicubic interpolation result; (c) The super resolution result; (d) The 3-layer CNN result; (e) The deep residual CNN result; (f) The simulated brightness temperature image HR I .

Figure 9 .
Figure 9. Experiment results of the scene 10 (outside training/testing set).(a) The simulated antenna image I LR ; (b) The bicubic interpolation result; (c) The super resolution result; (d) The 3-layer CNN result; (e) The deep residual CNN result; (f) The simulated brightness temperature image I HR .

Figure 10 .
Figure 10.Tested results for the 18.7 GHz data.(a) The sub-sampled 18.7 GHz data 18.7 G I ; (b) The bicubic interpolation result; (c) The super resolution result; (d) The 3-layer CNN result; (e) The deep residual CNN result; (f) The 36.5 GHz data.

Figure 11 .
Figure 11.Tested results for the 18.7 GHz data.(a) The sub-sampled 18.7 GHz data 18.7 G I ; (b) The bicubic interpolation result; (c) The super resolution result; (d) The 3-layer CNN result; (e) The deep residual CNN result; (f) The 36.5 GHz data.

Figure 10 . 21 Figure 10 .
Figure 10.Tested results for the 18.7 GHz data.(a) The sub-sampled 18.7 GHz data I 18.7G ; (b) The bicubic interpolation result; (c) The super resolution result; (d) The 3-layer CNN result; (e) The deep residual CNN result; (f) The 36.5 GHz data.

Figure 11 .
Figure 11.Tested results for the 18.7 GHz data.(a) The sub-sampled 18.7 GHz data 18.7 G I ; (b) The bicubic interpolation result; (c) The super resolution result; (d) The 3-layer CNN result; (e) The deep residual CNN result; (f) The 36.5 GHz data.

Figure 11 .
Figure 11.Tested results for the 18.7 GHz data.(a) The sub-sampled 18.7 GHz data I 18.7G ; (b) The bicubic interpolation result; (c) The super resolution result; (d) The 3-layer CNN result; (e) The deep residual CNN result; (f) The 36.5 GHz data.

Figure 12 .
Figure 12.Tested results for the 18.7 GHz data.(a) The sub-sampled 18.7 GHz data 18.7 G I ; (b) The bicubic interpolation result; (c) The super resolution result; (d) The 3-layer CNN result; (e) The deep residual CNN result; (f) The 36.5 GHz data.

Figure 12 .
Figure 12.Tested results for the 18.7 GHz data.(a) The sub-sampled 18.7 GHz data I 18.7G ; (b) The bicubic interpolation result; (c) The super resolution result; (d) The 3-layer CNN result; (e) The deep residual CNN result; (f) The 36.5 GHz data.

Figure 13 .
Figure 13.Training loss curves for deep residual CNN and 3-layer CNN.

Figure 14 .
Figure 14.Testing PSNR curves for deep residual CNN and 3-layer CNN.
To further validate the effectiveness of the existing training set, 70 images from 25 December 2018 to 5 January 2019 (data in winter), 130 images from 19 June 2018 to 5 July 2018 (data in summer), and 200 images of these images combined (data in summer and winter) were respectively used for the training set of our network, the averaged test results of scene 6-10 (outside

Figure 14 .
Figure 14.Testing PSNR curves for deep residual CNN and 3-layer CNN.
To further validate the effectiveness of the existing training set, 70 images from 25 December 2018 to 5 January 2019 (data in winter), 130 images from 19 June 2018 to 5 July 2018 (data in summer), and 200 images of these images combined (data in summer and winter) were respectively used for the training set of our network, the averaged test results of scene 6-10 (outside

Figure 14 .
Figure 14.Testing PSNR curves for deep residual CNN and 3-layer CNN.
To further validate the effectiveness of the existing training set, 70 images from 25 December 2018 to 5 January 2019 (data in winter), 130 images from 19 June 2018 to 5 July 2018 (data in summer), and 200 images of these images combined (data in summer and winter) were respectively used for the training set of our network, the averaged test results of scene 6-10 (outside the training and testing set) are shown in Table

Table 2 .
The evaluation indexes of the 10 scenes.

Table 3 .
The evaluation of deep residual CNN with different training set.