An Improved Method Combining CNN and 1D-Var for the Retrieval of Atmospheric Humidity Proﬁles from FY-4A/GIIRS Hyperspectral Data

: FY-4A/GIIRS (Geosynchronous Interferometric Infrared Sounder) is the ﬁrst infrared hyperspectral atmospheric vertical detector in geostationary orbit. Compared to other similar instruments, it has the advantages of high temporal resolution and stationary relative to the ground. Based on the characteristics of GIIRS observation data, we proposed a humidity proﬁle retrieval method. We fully utilized the information provided by the observation and forecast data, and used the two-dimensional brightness temperature data with the dimension of time and optical spectrum as the input of the CNN (convolution neural network model). Then, the obtained brightness temperature data were shown to be more suitable as the input for the physical retrieval method for humidity than the conventional correction method, improving the accuracy of humidity proﬁle retrieval. We performed two comparative experiments. The ﬁrst experiment results indicate that, compared to ordinary linear correction and ANN (artiﬁcial neural network algorithm) correction, our revised observed brightness temperature data are much closer to the simulated brightness temperature obtained by inputting ERA5 reanalysis data into RTTOV (Radiative Transfer for TOVS). The results of the second experiment indicate that the accuracy of the humidity proﬁle retrieved by our method is higher than that of conventional ANN and 1D-Var (one-dimensional variational algorithm). With ERA5 reanalysis data as the reference value, the RMSE (Root Mean Squared Error) of the humidity proﬁles by our method is less than 8.2% between 250 and 600 hPa. Our method holds the unique advantage of the high temporal resolution of GIIRS, improves the accuracy of humidity proﬁle retrieval, and proves that the combination of machine learning and the physical method is a compelling idea in the ﬁeld of satellite atmospheric remote sensing worthy of further exploration.


Introduction
Accurate atmospheric temperature and humidity profiles can be obtained based on hyperspectral infrared observations, and are mainly utilized in numerical weather predictions, disaster weather warnings, etc. [1][2][3][4]. The common hyperspectral infrared detection instruments carried on meteorological satellites mainly include AIRS (Atmospheric Infrared Sounder), CrIS (Cross-Track Infrared Sounder), IASI (Infrared Atmospheric Sounding Interferometer), and HIRAS (Hyperspectral Infrared Atmospheric Sounder). Relevant reports [5,6] have revealed that satellite hyperspectral infrared data have a very important impact on the assimilation and prediction results of the NWP (Numerical Weather Prediction) system. Among them, AIRS and IASI have been assimilated by many NWP centers, such as the European Centre for Medium-Range Weather Forecasts and the National Centers for Environmental Prediction in the U.S., which have very positive impacts on improving the accuracy of water, wind and weather forecasts. With the further improvements of the NWP vertical resolution, it is expected that hyperspectral infrared data will play a greater role [7][8][9][10].
The current research methods in atmospheric temperature and humidity detection based on satellite-borne hyperspectral infrared observations are retrieval methods mainly based on the physical model and data statistics [11][12][13]. The physical retrieval methods aim to solve the radiative transfer equation by a mathematical method from the perspective of physical law [14]. Rodgers [15] proposed a new physical retrieval method for ill-conditioned problems, which takes the initial conjecture of the profile to be retrieved as the constraint to prevent the solved value from deviating from the real profile. Lorenc [12] proposed the concept of the objective function, which transformed the solution of the radiative transfer equation into a nonlinear optimization problem in mathematics. Li and Zeng [16,17] obtained the analytical form of atmospheric temperature and water vapor weight function of AIRS simulation data under clear sky conditions via the onedimensional variational principle, and obtained the radiative transfer equation. Moreover, the atmospheric temperature and humidity profiles were obtained through Newton nonlinear iteration, and it was found that the RMSE of the simulation retrieval results of AIRS temperature profiles was less than 1 K and the water vapor mixing ratio was less than 10%. Ma [18] used the eigenvector to solve the initial profiles, and then further improved the retrieval accuracy via the Newton nonlinear iteration method. Zhu [19] utilized the hyperspectral infrared atmospheric sounder HIRAS carried by the polar orbiting meteorological satellite FY-3D to retrieve the atmospheric temperature and humidity profiles based on the one-dimensional variational retrieval algorithm, and the retrieval temperature and humidity results were better than the background profiles as a whole. The RMSE of the temperature profiles was better than 1.5 K below 100 hPa, and that of some pressure layers was better than 1 K, while the RMSE of the humidity profiles was better than 10% as a whole.
The statistical retrieval method aims to establish a statistical regression model between the atmospheric parameters to be retrieved and the satellite-observed brightness temperature, which can be solved by the eigenvector method [20], the empirical orthogonal function expansion method [21], etc. With the development of machine learning theory, traditional statistical regression methods are gradually being replaced by neural networks and other machine learning methods. Compared to traditional regression methods, neural network methods do not require significant analysis of nonlinear mapping itself [22]. Liu and Guan [23] conducted a retrieval experiment of atmospheric humidity profiles using the BP (back propagation) neural network method with one hidden layer combined with the simulated brightness temperature data of AIRS. The retrieval results showed that, compared to the eigenvector method, the neural network method improved the retrieval accuracy of the humidity profiles and better revealed the fine structure of these profiles.
The FY-4A satellite is a new generation of geostationary meteorological satellites for quantitative remote sensing applications in China. FY-4A/GIIRS has two spectral bands to detect the atmosphere, which can achieve the vertical structure observation of atmospheric temperature, humidity, and other atmospheric information, and can improve the resolution of meteorological vertical observation [24][25][26]. Thus, it can provide weather monitoring and atmospheric environmental component detection services for China and the surrounding countries. In recent years, many improved retrieval algorithms related to machine learning have been generated based on the observation data of GIIRS. Cai [27] retrieved the atmospheric temperature and humidity profiles of FY-4A/GIIRS based on the artificial neural network algorithm and verified it with AIRS products. The results Remote Sens. 2021, 13, 4737 3 of 20 showed that the average RMSE of temperature from 200 hPa to 1000 hPa in the troposphere was less than 1 K, and that of humidity was better than 10% (from July to August 2018, 32 • N-40 • N, 114 • E-121 • E). Zheng [28] retrieved a 4D wind field based on the neural network algorithm and the GIIRS observation data during typhoon Maria, and avoided the uncertainty caused by the retrieval of humidity profiles. The final retrieval results were compared to the GDAS (Global Data Assimilation System) and ERA5 datasets, and the RMSE of the U and V components of the tropospheric wind field were both less than 2 m/s. Huang [29] proposed a temperature retrieval method based on the GIIRS observation data which combined a neural network with the traditional physical 1D-Var method. In the experiment, the accuracy of the temperature profiles retrieved by this method was better than that of the GIIRS temperature profile products between 10 and 600 hPa. Compared to other hyperspectral infrared instruments of the same type, the higher time resolution is a great advantage of GIIRS because it is carried on a geostationary satellite. The revisit cycle in conventional time is 2 h, which can reach 15 min in special periods (e.g., typhoons). The retrieval method combined with machine learning often needs to count the laws between different types of data (e.g., the brightness temperature observed by the sounder and the atmospheric profiles) in a very large number of training samples. Therefore, the sounder is required to provide a large amount of observation data for detecting the same area in a limited time. GIIRS with high time resolution meets this requirement, with greater advantages than other similar instruments for the retrieval method combined with machine learning.
However, because the orbit height of the GIIRS is higher than other similar instruments, the detection signal is weak, and there are some difficulties in the detection with this instrument. For humidity retrieval, GIIRS mainly has the following disadvantages: It lacks 1130-1650 cm −1 wave band, which includes some important water vapor absorption bands; the observation areas of the long-and medium-wave bands do not completely overlap; and some medium-wave channels have large observation errors. Therefore, the effect of humidity profile retrieval using conventional physical methods is not ideal.
In this paper, a humidity retrieval method combining machine learning and physical methods is proposed based on the characteristics of the GIIRS observation data. This method more effectively uses the information provided by observation and prediction data, and reduces the error of satellite observation and radiation transmitted models. Compared to the retrieval results of the 1D-Var and ANN methods, our method showed higher retrieval accuracy. Our method gives full play to the advantages of the high temporal resolution of the GIIRS observation data, improves the accuracy of humidity profile retrieval, and provides a reference basis for strengthening the accuracy of business products.

Datasets
The target area of this experiment was the land within latitude 10-28 • and longitude 67-103 • (Figure 1), and the time range was from 5 to 9 December 2020. The data of the first 3 days were for model training, and those of the last 2 days were for experimental verification. No extreme weather conditions occurred during this period in the selected area, and significant clear sky data were obtained. The selected area is far away from the nadir of the satellite, and the detection accuracy is relatively low, thus making it difficult to retrieve humidity profiles by conventional methods.
The infrared hyperspectral detection data in this paper were the Level-1 product of GIIRS (http://satellite.nsmc.org.cn/portalsite/Data/Satellite.aspx, accessed on 11 November 2021), whose data are divided into two wavebands. The long wave is 700-1130 cm −1 , the medium wave is 1650-2250 cm −1 , the spectral resolution is 0.625 cm −1 , and the spatial resolution is 16 km. Within the time and place selected in the experiment, the revisit period of detection was approximately 2 h. GIIRS is relatively geostationary, and its detection angle changes very little at different times during the same place. Table 1 shows the FY-4A/GIIRS instrument performance parameters. The infrared hyperspectral detection data in this paper were the Level-1 product of GIIRS (http://satellite.nsmc.org.cn/portalsite/Data/Satellite.aspx, accessed on 11 November 2021), whose data are divided into two wavebands. The long wave is 700-1130 cm −1 , the medium wave is 1650-2250 cm −1 , the spectral resolution is 0.625 cm −1 , and the spatial resolution is 16 km. Within the time and place selected in the experiment, the revisit period of detection was approximately 2 h. GIIRS is relatively geostationary, and its detection angle changes very little at different times during the same place. Table 1 shows the FY-4A/GIIRS instrument performance parameters.   The forecast data in the experiment were the historical forecast dataset with a valid period of 6 h provided by GFS (Global Forecast System, https://www.ncei.noaa.gov/ products/weather-climate-models/global-forecast, accessed on 11 November 2021), and the forecast time was at 00:00, 06:00, 12:00, and 18:00 every day. The profile information selected from the dataset included the temperature, humidity, and ozone. The ground information included the surface pressure and temperature, temperature at 2 m, and U and V vector wind speed of 10 m. The spatial resolution of longitude and latitude was 0.5 • .
The reference humidity information in the experiment was the reanalysis dataset provided by ERA5 (https://cds.climate.copernicus.eu/#!/search?text=ERA5&type=dataset, accessed on 11 November 2021) and the sounding dataset provided by the University of Wyoming (http://weather.uwyo.edu/upperair/sounding.html, accessed on 11 November 2021). The reanalysis dataset provided by ERA5 divided the profile information into 37 layers from 1 to 1000 hPa. The spatial resolution of longitude and latitude of the profile  [30]. RTTOV is very suitable to provide a large number of sample datasets for this experimental method, so we mainly used RTTOV as the radiative transfer model. The unique fitting coefficient file of FY-4A/GIIRS provided by the RTTOV official website was used in this experiment (https://nwp-saf.eumetsat.int/site/software/rttov/download/ coefficients/coefficient-download/, accessed on 11 November 2021). We provided RTTOV with atmospheric profile information and the Earth's surface information, and obtained the simulated observation brightness temperature and Jacobian matrix of GIIRS in this experiment (https://nwp-saf.eumetsat.int/site/software/rttov/documentation, accessed on 11 November 2021).

CNN
The conventional convolution neural network structure includes an input layer, a convolutional layer, a pooling layer, a fully connected layer, and an output layer. The convolution neural network model has been widely applied to a large number of digital image sample fitting problems. CNN also plays an important role in the application of atmosphere remote sensing [31][32][33]. Compared to traditional neural networks, a part of feature learning is added to the convolution neural network model, which makes it easier to find the overall feature information provided by pixels in a region. In addition, it has three advantages [34]: a simple structure design and high accuracy; fully feedforward, fast speed, and no need to consider the optimization problem; and a strong processing capacity of large amounts of data.

ANN
The traditional artificial neural network includes the input layer, hidden layer, and output layer. The input layer and output layer are, respectively, determined by the input data and target output data that need to be fitted. The number of hidden layers, the number of neurons in each layer, and the design of activation function must be appropriately adjusted in combination with data characteristics and the effect of model training. The core structure of the traditional retrieval method of ANN humidity profile is a shallow BP neural network, whose input and output are observation data and humidity profile data, respectively. In addition, the number of hidden layers is no more than 2.

Method
As shown in Figure 2, this chapter introduces our experimental methods from four parts according to the order of the experimental process: data preprocessing, CNN model training, calculation of initial profiles, and 1D-Var. neural network, whose input and output are observation data and humidity profile data, respectively. In addition, the number of hidden layers is no more than 2.

Method
As shown in Figure 2, this chapter introduces our experimental methods from four parts according to the order of the experimental process: data preprocessing, CNN model training, calculation of initial profiles, and 1D-Var.

Data Preprocessing
As introduced in Section 2.1, the experiments mainly used GIIRS observation data, GFS prediction data, ERA5 reanalysis data, and AGRI/CLM data. These datasets have different resolutions of space and time, so interpolation is needed to match them. ERA5 divided the atmospheric profiles into 37 layers from 1 to 1000 hPa, which we used as the standard of profile stratification. The GFS profile data adopted linear interpolation according to this standard of pressure layers. The longitude and latitude used the GFS forecast data as a standard. According to this standard of longitude and latitude, the ERA5 data were directly matched, and the GIIRS data adopted two-dimensional linear interpolation. If CLM data within 0.5 • around a judged point were defined as clear sky, then the point would be considered a clear sky point. The time used ERA5 reanalysis data as a standard. According to the standard of time, the GFS forecast data adopted linear interpolation, and the GIIRS observation data adopted adjacent point matching method. Deviation within 15 min of the data was allowed, but if there was no observation data within 2 h before and after the matched point, the data of this point were considered as a null value.

CNN Training
According to Section 2.1, we conducted model training by the matched data of the first 3 days. Because of the clouds' substantial interference with infrared detection, we strictly selected the data of clear skies on the premise that there is no data loss or null values within 16 h before the current time of the selected data, the sky must be clear within 6 h before, and the sky must be clear for more than 70% of the time within 16-6 h before (the amount and quality of data used for training are considered comprehensively).
The matched GFS forecast data and the ERA5 reanalysis data were input into the RTTOV radiation transfer model, respectively, and the simulated observation brightness temperature of the GIIRS was obtained, named GFS_RTTOV_BT and ERA5_RTTOV_BT, respectively. The observed brightness temperature of GIIRS after linear correction was named GIIRS_BT. As shown in Formula 1, the deviation between the observed brightness temperature of GIIRS (BT 1 ) and ERA5_RTTOV_BT was counted, and the vectors a and b were calculated to make the corrected observed brightness temperature (BT 2 ) closer to ERA5_RTTOV_BT. The GIIRS_BT and GFS_RTTOV_BT data were extracted, and brightness temperature matrixes were created. The first dimension of the matrix was 16 consecutive hours arranged in chronological order from smallest to largest. The second dimension was 256 continuous GIIRS observation channels set in order of wave number from smallest to largest. As shown in Figure 3, the brightness temperature was represented by the different colors. For the convenience of the readers, we appropriately interpolated and blurred the images. In fact, the pictures show two 16 × 256 matrixes. The data of these two matrixes have the same time and channels, which, together with the corresponding ERA5_RTTOV_BT, form one set of data for CNN model training. The latitude and longitude coordinates of the whole set of data are consistent. We collated 3071 sets from training data and 928 sets from testing data using this method for our experiment.
To highlight the characteristics of the brightness temperature matrix, we normalized (restriction from 0 to 1) it according to Formula (2), where BT refers to the brightness temperature before normalization, and BT' refers to the brightness temperature after normalization. All brightness temperatures (including the observed and simulated brightness temperatures) of the current channel in the training set were counted in the experiment, and the maximum (BT max ) and minimum (BT min ) values were obtained after excluding each singular value (the value obviously differs from the mean value of brightness temperature of the current channel). Figure 4 shows the structure of the CNN network for this experiment, with the input layer being two characteristic matrixes, GIIRS_BT and GFS_RTTOV_BT, as introduced previously, and the intermediate layer being convolutional layer-pooling layer-convolutional layer-pooling layer-fully connected layer, where the filter size of the convolutional layer is 5 × 5 and the pooling size of the pooling layer is 2 × 2. To improve the accuracy of the model, the output layer ERA5_RTTOV_BT is only a pixel, whose time corresponds to the 16 th layer of the timeseries (time is arranged in chronological order from smallest to largest, so the 16 th layer is the latest time layer of the set of data). The spectrum of the 256 selected channels of the input layer were the closest to the channel of the output layer pixel. The selected channels of the input layer contain the channel of the output layer pixel and the rest of the 255 adjacent channels. For example, the spectral resolution of GIIRS is 0.625 cm −1 . If the wavenumber of output target channel of the model was 1846.875 cm −1 , the input channel range was 1767. .875 cm −1 . If the wavenumber of the output target channel of the model was 1650.625 cm −1 , the input channel range of the model was 1650-1809.375 cm −1 , since the medium wave range of GIIRS was 1650-2250 cm −1 . The output of CNN model was a pixel, and each group of CNN training experiments was only targeted at one channel. Therefore, we determined how many channels required bias correction and how many CNN models required training. The intercept was set between every two layers in the network structure, and the activation functions of both the convolutional layers and the fully connected layer were sigmoid functions. To highlight the characteristics of the brightness temperature matrix, we normalized (restriction from 0 to 1) it according to Formula (2), where BT refers to the brightness temperature before normalization, and BT' refers to the brightness temperature after normalization. All brightness temperatures (including the observed and simulated brightness temperatures) of the current channel in the training set were counted in the experiment, and the maximum (BTmax) and minimum (BTmin) values were obtained after excluding each singular value (the value obviously differs from the mean value of brightness temperature of the current channel). Figure 4 shows the structure of the CNN network for this experiment, with the input layer being two characteristic matrixes, GIIRS_BT and GFS_RTTOV_BT, as introduced previously, and the intermediate layer being convolutional layer-pooling layer-convolutional layer-pooling layer-fully connected layer, where the filter size of the convolutional Each iteration calculated the MSE (Mean Squared Error) of the model output and the target, and recorded the network corresponding to the minimum value of MSE. The network convergence condition was that the minimum value of the MSE for 500 consecutive iterations no longer changed. This experiment used the information entropy method [35,36] to select 10 water vapor channels (the channels were selected to better fit the physical algorithm described in Section 3.4), so we trained 10 CNN network models for these channels. In the test set, GIIRS_BT and GFS_RTTOV_BT were input into the CNN model to obtain the brightness temperature of the 10 water vapor channels, and the brightness temperature was named CNN_BT for the retrieval experiments introduced in Section 3.4.
Remote Sens. 2021, 13, x FOR PEER REVIEW 9 of 21 and how many CNN models required training. The intercept was set between every two layers in the network structure, and the activation functions of both the convolutional layers and the fully connected layer were sigmoid functions. Each iteration calculated the MSE (Mean Squared Error) of the model output and the target, and recorded the network corresponding to the minimum value of MSE. The network convergence condition was that the minimum value of the MSE for 500 consecutive iterations no longer changed. This experiment used the information entropy method [35,36] to select 10 water vapor channels (the channels were selected to better fit the physical algorithm described in Section 3.4), so we trained 10 CNN network models for these channels. In the test set, GIIRS_BT and GFS_RTTOV_BT were input into the CNN model

Initial Profiles
To converge the 1D-Var humidity retrieval method in Section 3.4 more efficiently, more accurate initial profiles were required in the experiment. Typically, the 1D-Var selects forecast profile data as the initial guess. Section 2.1 describes the preprocessing of GFS forecast profile data. However, since GFS and ERA5 profile data differ significantly in temporal resolution, we found that the GFS forecast profile data for some pressure layers were not suitable for the initial guess of 1D-Var.
We trained the network model with the traditional ANN humidity retrieval method, where the input to the network was the GIIRS observed brightness temperature and the target output of the network was the ERA5 humidity profiles (transformed into relative humidity). The model was designed with reference to the met-ANN of the article [29].
We used the ERA5 profiles in the testing set as the "true value," and compared the error RMSE of the humidity profiles output from the ANN retrieval model and that of the GFS forecast humidity profiles (as shown in Figure 5). We finally decided to select the ANN retrieval results between 50 and 250 hPa as the initial profiles of 1D-Var, and the GFS forecast humidity as the initial profiles of 1D-Var for the rest of the pressure layers.
in temporal resolution, we found that the GFS forecast profile data for some pressure layers were not suitable for the initial guess of 1D-Var.
We trained the network model with the traditional ANN humidity retrieval method, where the input to the network was the GIIRS observed brightness temperature and the target output of the network was the ERA5 humidity profiles (transformed into relative humidity). The model was designed with reference to the met-ANN of the article [29].
We used the ERA5 profiles in the testing set as the "true value," and compared the error RMSE of the humidity profiles output from the ANN retrieval model and that of the GFS forecast humidity profiles (as shown in Figure 5). We finally decided to select the ANN retrieval results between 50 and 250 hPa as the initial profiles of 1D-Var, and the GFS forecast humidity as the initial profiles of 1D-Var for the rest of the pressure layers.

1D-Var
1D-Var is a traditional physical method for profiles retrieval. We retained the original structure of the method, replaced the data input, and used RTTOV as the forward radiative transfer model for this experiment.
In Formula 5, J is the objective function constructed for this experiment; Sε represents the observation error covariance of GIIRS, which was counted from the deviation of GIIRS_BT and ERA5_RTTOV_BT in the training set; Sap represents the background error covariance, which was calculated based on the deviation of the GFS forecast humidity

1D-Var
1D-Var is a traditional physical method for profiles retrieval. We retained the original structure of the method, replaced the data input, and used RTTOV as the forward radiative transfer model for this experiment.
In Formula 5, J is the objective function constructed for this experiment; S ε represents the observation error covariance of GIIRS, which was counted from the deviation of GIIRS_BT and ERA5_RTTOV_BT in the training set; S ap represents the background error covariance, which was calculated based on the deviation of the GFS forecast humidity profile data and the ERA5 reanalysis humidity profile data; Y refers to CNN_BT, the output of the GIIRS_BT and GFS_RTTOV_BT brought to the CNN model introduced in Section 3.2 (in the conventional 1D-Var method, Y is the observation data of the satellite); m represents the dimension of Y, i.e., the number of selected channels; F refers to the RTTOV radiative transfer model, and the temperature information needed for inputting into the model in this experiment was derived from GFS forecast data; x represents the humidity profile, where the initial guess for the humidity profile is as described in Section 3.3 (in the conventional 1D-Var method, the initial guess value of x is the forecast profile); x b represents the forecast humidity profile. In this experiment, x b is consistent with the initial guess for the humidity profile; γ is the Lagrange smoothing factor, which was 1 in this experiment.
Using Newton's method, we updated the humidity profile x and calculated the updated objective function J. The x corresponds to the time when J reached the minimal value was the final retrieval result. Each iteration used the equation in Formula 6, where n is the number of iterations, experimentally set not to exceed 40 (the convergence effect and algorithm speed are considered comprehensively), and K is the Jacobian matrix.

CNN Effect Verification
Section 3.2 describes the process of CNN model training. CNN network models of 10 channels were trained using the training set, and this section validated the fitting effect of the CNN models using both the training and test sets.
The target output during model training was the simulated brightness temperature obtained by inputting the ERA5 data into the RTTOV model (see Section 3.2, ERA5_RTTOV_BT). The ERA5_RTTOV_BT of the training dataset was named the Training Target, and that of the test dataset was named Testing Target. During training and testing, the output brightness temperature by the CNN models (see Section 3.2, CNN_BT) were named Training Output and Testing Output, respectively.
As shown in Figure 6, the horizontal and vertical coordinates of the dots are the Output and Target (Kelvin), and the red line is the ideal straight line of Output = Target. The closer the dot is to the red line, the better the training outcome of the CNN model. Furthermore, we counted the fitting determination coefficient R 2 , and the closer R 2 is to 1, the better the fit. We placed the data points of the 10 CNN models representing different observation channels onto the same figure, and each model counted R 2 . The R 2 data shown in the figure are the mean values taken after counting the R 2 for each of the 10 models. As shown in Figure 6, for both the training and test sets, most of the data points are concentrated near the ideal red line. There are a few points in the test set that are far from the red line, which may be caused by overfitting of the model. However, the majority of the test data points had an error less than 2 K, and the R 2 of the test set reached a more ideal 0.9202.

CNN_BT Accuracy Verification
In Section 3.2, we named CNN_BT as the output brightness temperature of the CNN model, GIIRS_BT as the brightness temperature data observed by GIIRS, and ERA5_RTTOV_BT as the output brightness temperature of the ERA5 data input into the RTTOV model. We conducted linear deviation correction for GIIRS_BT using ERA5_RTTOV_BT as the reference in the training set. In this section, using As shown in Figure 6, for both the training and test sets, most of the data points are concentrated near the ideal red line. There are a few points in the test set that are far from the red line, which may be caused by overfitting of the model. However, the majority of the test data points had an error less than 2 K, and the R 2 of the test set reached a more ideal 0.9202.

CNN_BT Accuracy Verification
In Section 3.2, we named CNN_BT as the output brightness temperature of the CNN model, GIIRS_BT as the brightness temperature data observed by GIIRS, and ERA5_RTTOV_BT as the output brightness temperature of the ERA5 data input into the RT-TOV model. We conducted linear deviation correction for GIIRS_BT using ERA5_RTTOV_BT as the reference in the training set. In this section, using ERA5_RTTOV_BT as the reference standard, two comparison experiments were conducted to verify the accuracy of CNN_BT using the test dataset. Figure 7a,c,e show the absolute value deviation of GIIRS_BT and ERA5_RTTOV_BT for the test set, while Figure 7b,d,f show the absolute value deviation of CNN_BT and ERA5_RTTOV_BT for the test set. In Figure 7, the horizontal axis of each box plot represents the wave number in the corresponding channel (10 channels selected in Section 3.2), and the vertical axis represents the absolute value of the deviation. The horizontal line in the middle of the box refers to the median of the absolute value of the statistical data deviation, and the horizontal lines at the top and bottom of the box are the upper and lower quartiles of the absolute value of the statistical data deviation, respectively.

Comparison Experiment with Observation Data
Comparing the deviations of GIIRS_BT and CNN_BT, we can find that the deviations of CNN_BT for all channels were smaller than those of GIIRS_BT, and the median of the absolute values of the deviations of CNN_BT for almost all channels was less than half of that of GIIRS_BT. The GIIRS_BT deviation of the channel corresponding to the wave number of 1728.125 cm −1 was larger. However, the CNN method in this paper can extract the observation and forecast information from all the input channels of the model. Therefore, the CNN_BT deviation corresponding to this channel is still more desirable.

Comparison Experiment with the ANN Deviation Correction Method
We refer to the method Met-I1D-Var in the article [29], which corrects the deviation of the GIIRS observation data based on an ANN model. The input of the network was the GIIRS observation data of all channels, and the target output was ERA5_RTTOV_BT for training. We named the output of this ANN model ANN_BT. In reproducing this method, we used the training and test sets consistent with the experiments in this paper. We used ERA5_RTTOV_BT as a reference and calculated the deviation (RMSEs) of CNN_BT and ANN_BT in the test set, respectively.
The statistical results of test sets are shown in Table 2, where the first column contains the wave numbers for the 10 channels selected for this experiment. From the table, we can find that there are eight channels in which the RMSE of CNN_BT is smaller than that of ANN_BT. We also estimated the mean RMSE of the 10 channels, and the mean RMSE of CNN_BT was smaller than that of ANN_BT by approximately 0.09 K.  Comparing the deviations of GIIRS_BT and CNN_BT, we can find that the deviations of CNN_BT for all channels were smaller than those of GIIRS_BT, and the median of the absolute values of the deviations of CNN_BT for almost all channels was less than half of that of GIIRS_BT. The GIIRS_BT deviation of the channel corresponding to the wave number of 1728.125 cm −1 was larger. However, the CNN method in this paper can extract the

Weight Function
We extracted the profiles and ground information used in the test set for inputting the RTTOV radiation model, and we counted their mean values. We added slight perturbation into the mean value humidity profile to calculate the Jacobian matrix: A 1% relative humid-ity perturbation was added to each pressure level separately, and the profile was input into the RTTOV model to count the brightness temperature change of the selected channels.
We normalized the absolute value of the brightness temperature change of the different pressure layers of the same channel, and we used this value as the response weight of the channel to the humidity profiles. A larger weight represents a more significant influence of the humidity of the current pressure layer on the observation of the channel. When conducting a humidity retrieval experiment, the retrieval results of the humidity profiles in a pressure layer are often determined by the channels with the larger response weight. Figure 8a shows the weight function curves of the channels with wavenumbers from 1700 cm −1 to 1920 cm −1 . This wavenumber range is our reference range for selecting channels. As shown in Figure 8b, the weight function curves of the selected 10 channels were counted. Their weight peak values were between 400 and 500 hPa (if the pressure is less than 20 hPa, the water vapor content is minimal, so this experiment did not consider the pressure less than 20 hPa). The weights of pressure altitudes of 20-250 hPa and 600-1000 hPa were close to 0, indicating that the humidity profiles of these pressure altitudes had less influence on the 10 channels selected for this experiment. Therefore, this experiment focused on humidity profile retrieval for the pressure altitude of 250-600 hPa.
Remote Sens. 2021, 13, x FOR PEER REVIEW 15 of 21 had less influence on the 10 channels selected for this experiment. Therefore, this experiment focused on humidity profile retrieval for the pressure altitude of 250-600 hPa.

Humidity Profile
To improve the accuracy of the verification, we performed the following two attempts: (1) The test datasets used for verification in this experiment lagged in time behind all datasets used for model training (see Section 2.1). (2) We conducted two verification experiments, with ERA5 reanalysis data and sounding data (independent verification sets) as the "true value," respectively.
We named the humidity profile retrieval method described in Chapter 3 as our method and conducted a comparison experiment using the conventional 1D-Var (referencing paper [37][38][39]) and ANN humidity retrieval method (referencing paper [27,40,41]), where the same training and test sets were selected. The experiment was conducted with the humidity profile data provided by ERA5 as the reference true value, and the errors of the retrieval humidity profiles by our method, 1D-Var, and ANN were counted using the test set, respectively, where a larger RMSE of relative humidity represents a larger error. In addition, we counted the RMSE of the matched GFS forecast humidity profiles and the ERA5 data in the test set, and used it as a reference. As shown in Figure 9, the horizontal axis represents the RMSE, and the vertical axis represents the pressure of the profiles. In our experiment, the RMSE of our method was smaller than the GFS forecast data between 20 and 600 hPa, and it was more obvious between 100 and 300 hPa. Figure 9 shows that at 250-500 hPa, our method error was smaller than that of other methods. At 600-1000 hPa, the error curves of our method, 1D-Var, and GFS basically overlapped, indicating that the humidity corresponding to part of the pressure altitude had a smaller response to the se-

Humidity Profile
To improve the accuracy of the verification, we performed the following two attempts: (1) The test datasets used for verification in this experiment lagged in time behind all datasets used for model training (see Section 2.1). (2) We conducted two verification experiments, with ERA5 reanalysis data and sounding data (independent verification sets) as the "true value," respectively.
We named the humidity profile retrieval method described in Chapter 3 as our method and conducted a comparison experiment using the conventional 1D-Var (referencing paper [37][38][39]) and ANN humidity retrieval method (referencing paper [27,40,41]), where the same training and test sets were selected. The experiment was conducted with the humidity profile data provided by ERA5 as the reference true value, and the errors of the retrieval humidity profiles by our method, 1D-Var, and ANN were counted using the test set, respectively, where a larger RMSE of relative humidity represents a larger error. In addition, we counted the RMSE of the matched GFS forecast humidity profiles and the ERA5 data in the test set, and used it as a reference. As shown in Figure 9, the horizontal axis represents the RMSE, and the vertical axis represents the pressure of the profiles. In our experiment, the RMSE of our method was smaller than the GFS forecast data between 20 and 600 hPa, and it was more obvious between 100 and 300 hPa. Figure 9 shows that at 250-500 hPa, our method error was smaller than that of other methods. At 600-1000 hPa, the error curves of our method, 1D-Var, and GFS basically overlapped, indicating that the humidity corresponding to part of the pressure altitude had a smaller response to the selected channels.
OR PEER REVIEW 16 of 21 Figure 9. With ERA5 reanalysis data as the reference true value, the RMSEs of our method, GFS, 1D-Var, and ANN (1-1000 hPa).
Combined with the weight function curve in Section 4.3.1, we focused on the humidity profiles of the retrieval of the three methods between 250 and 600 hPa. As per the RMSEs counted in Figure 10, the RMSE curve for our method did not exceed 8.2% and was smaller than that of the other two methods overall. We calculated the mean value of each error curve and found that the mean value of our method was 0.6875 less than that of 1D-Var and 3.8375 less than that of ANN. Combined with the weight function curve in Section 4.3.1, we focused on the humidity profiles of the retrieval of the three methods between 250 and 600 hPa. As per the RMSEs counted in Figure 10, the RMSE curve for our method did not exceed 8.2% and was smaller than that of the other two methods overall. We calculated the mean value of each error curve and found that the mean value of our method was 0.6875 less than that of 1D-Var and 3.8375 less than that of ANN.
We interpolated and matched the test dataset using the sounding data from the University of Wyoming as the standard. With the sounding data as the reference true value, we counted the RMSEs of the three methods with respect to humidity profile retrieval. Figure 11 shows the retrieval accuracy at 250-600 hPa, indicating that the RMSE of our method is lower than that of the other two methods. We also calculated the mean value of each error curve and found that the mean value of our method was 0.9744 less than that of 1D-Var and 3.7287 less than that of ANN.
Combined with the weight function curve in Section 4.3.1, we focused on the humidity profiles of the retrieval of the three methods between 250 and 600 hPa. As per the RMSEs counted in Figure 10, the RMSE curve for our method did not exceed 8.2% and was smaller than that of the other two methods overall. We calculated the mean value of each error curve and found that the mean value of our method was 0.6875 less than that of 1D-Var and 3.8375 less than that of ANN. We interpolated and matched the test dataset using the sounding data from the University of Wyoming as the standard. With the sounding data as the reference true value, we counted the RMSEs of the three methods with respect to humidity profile retrieval. Figure 11 shows the retrieval accuracy at 250-600 hPa, indicating that the RMSE of our method is lower than that of the other two methods. We also calculated the mean value of each error curve and found that the mean value of our method was 0.9744 less than that of 1D-Var and 3.7287 less than that of ANN. Figure 11. With sounding data from the University of Wyoming as the reference true value, the RMSEs of the three retrieval methods and ERA5 reanalysis data (250-600 hPa).

Discussion of CNN
The verification experiments in Section 4.1 count the R 2 of the CNN model training and testing. Both R 2 values exceeded 0.9, proving that the CNN models trained in this experiment have good generalizability. The comparison experiments in Section 4.2 demonstrate that our method yields simulated brightness temperature data closer to those obtained by the ERA5 input radiation transmission model. Moreover, the revised results of our method had smaller errors compared to the conventional linear correction method and the ANN deviation correction method.
GIIRS is the first infrared hyperspectral atmospheric vertical detector carried in geostationary orbit. Compared to other similar instruments, the orbit height is higher and the angular resolution is better, but the instrument development and data calibration are complicated, and the observation accuracy for some channels is not high. However, GIIRS has two major advantages in detection data: (1) GIRS has temporal resolution and a short revisit period. For the same region, GIIRS can provide a large amount of detection data in a short period of time. (2) There is no need to consider the factor of the detection angle. The detection angle often has a great influence on instrument detection. When using polarorbiting satellites, the change in observed brightness temperature due to the detection angle usually needs to be considered. However, GIIRS and the Earth are relatively static, meaning the detection angle at the same region will not change with time. In addition, the Figure 11. With sounding data from the University of Wyoming as the reference true value, the RMSEs of the three retrieval methods and ERA5 reanalysis data (250-600 hPa).

Discussion of CNN
The verification experiments in Section 4.1 count the R 2 of the CNN model training and testing. Both R 2 values exceeded 0.9, proving that the CNN models trained in this experiment have good generalizability. The comparison experiments in Section 4.2 demonstrate that our method yields simulated brightness temperature data closer to those obtained by the ERA5 input radiation transmission model. Moreover, the revised results of our method had smaller errors compared to the conventional linear correction method and the ANN deviation correction method.
GIIRS is the first infrared hyperspectral atmospheric vertical detector carried in geostationary orbit. Compared to other similar instruments, the orbit height is higher and the angular resolution is better, but the instrument development and data calibration are complicated, and the observation accuracy for some channels is not high. However, GIIRS has two major advantages in detection data: (1) GIRS has temporal resolution and a short revisit period. For the same region, GIIRS can provide a large amount of detection data in a short period of time. (2) There is no need to consider the factor of the detection angle.
The detection angle often has a great influence on instrument detection. When using polarorbiting satellites, the change in observed brightness temperature due to the detection angle usually needs to be considered. However, GIIRS and the Earth are relatively static, meaning the detection angle at the same region will not change with time. In addition, the detection angle changes little in neighboring regions due to the high orbit.
This method adds the time dimension to the input data of the model and uses the CNN to find features in large sample datasets, which is better at extracting the overall information of the adjacent data points than ANN. Moreover, the longitude and latitude coordinates of the same data group used for training are consistent. Therefore, we did not need to consider the influence of the detection angle on the instrument observation and model transmission. Therefore, this method gives full play to the two major advantages of the GIIRS observation data proposed above.

Discussion of the Humidity Retrieval Method
There are many error sources in the results of the retrieval experiment, such as the instrument detection [42], calibration [6], radiative transfer model (https://nwp-saf.eumetsat. int/downloads/rtcoef_rttov13/visir_lbl_comp/lbl_comp_rtcoef_fy4_1_giirs_o3_v13pred_ 101L.html, https://nwp-saf.eumetsat.int/site/software/rttov/download/coefficients/ comparison-with-lbl-simulations/, accessed on 11 November 2021), retrieval algorithm, and data error for verification [43]. In this paper, the error of the verification experiment results is the overall error due to the abovementioned reasons. It is very difficult to analyze the specific error caused by the retrieval algorithm. The comparison experiment we conducted neither changed the training and testing dataset, nor changed the data preprocessing and verification methods. Instead, it only changed the retrieval algorithm. Therefore, we roughly estimate that the smaller the error of the statistical retrieval results, the higher the accuracy of the retrieval method.
As described in Section 3.2, CNN input data has 16 levels in the time dimension, while output data has only 1 level in the time dimension, and the level is the 16th level (the lastest time) of the input data. Meanwhile, our test datasets used for verification in this experiment lagged in time behind all the datasets used for model training. Therefore, all our retrieval experiments only use the prior information and satellite observation information of the test data time.
It can be seen from Section 4.3 that the observation data of the selected channels in the experiment were greatly affected by the humidity of 250-600 hPa. Thus, we placed the focus on the humidity profiles of this part of the pressure altitude, and found that the error of our method is lower than that of the 1D-Var and ANN retrieval methods, proving that our method can effectively improve the accuracy of the retrieved humidity profiles.
In 1D-Var Newton's method, as long as the objective function does not immerge in the local minimum, the closer the input observed brightness temperature data to the true value, the higher the accuracy of the retrieved humidity profiles. At the same time, if the initial profiles are closer to the ideal true values, it is less possible to use iteration to immerge the objective function in the local minimum far from the true values, which reduces the difficulty of searching the optimal solution. In this paper, machine learning provided more suitable brightness temperature data and initial profiles for the physical retrieval method. Therefore, the retrieval accuracy of this method is better than that of the conventional 1D-Var retrieval method as a whole.
Both the conventional ANN retrieval method and our method require a large number of samples for model training, and the input of the network is brightness temperature data. The difference is that the network target output of conventional ANN is the profile data, so the input and output of the network are not the same types of data in ANN method, which increases the training difficulty. However, both network input and target output are the brightness temperature data in our method, which simplifies the network structure and avoids adverse effects of detection angle and surface information on the network model. Moreover, our method combines the advantages of physical methods: It makes fully utilizes prior information and observation information, and has a stronger generalization ability than ANN. Thus, the retrieval accuracy of our method is obviously superior to that of the conventional ANN retrieval method.

Weaknesses of Our Method
(1) The water vapor absorption infrared band is mainly concentrated between 1200 and 1900 cm −1 , and the band covered by GIIRS lacks some of them. (2) The observation error of some water vapor channels is large [44]. (3) Some channels are not only sensitive to changes in water vapor, but also to the other meteorological variables (such as temperature, ozone, and CH 4 ), so these channels are not suitable for the retrieved humidity profiles. These three reasons cause the wave number of the channels selected in this experiment to fall in the range of 1700-1920 cm −1 . The selected channels are insensitive to humidity at the pressure altitudes of 20-250 hPa and 600-1000 hPa, resulting in almost no improvement in the accuracy of the humidity profile retrieval results in this part of pressure. FY-4B/GIIRS, which is newly developed, has more accurate detection. In the future, it will be necessary to use FY-4B/GIIRS observation data to find more humidity retrieval channels and to discuss whether our method is suitable for these channels.
Our method relies on the training of the CNN model. It the disadvantages of most machine learning methods: The model needs to be updated frequently, and the training process is complicated. The suitability of our method for practical application remains to be verified.
Clouds have strong interference with the detection of the infrared band, and the accuracy of the radiative transfer model will be greatly reduced in areas with clouds, which greatly affects the accuracy of model training in this experiment. Under the cloudy conditions, determining how to adjust the model according to the attributes of cloud remains to be studied.

Conclusions
Our method combines the features of CNN, including its ability to discover the overall characteristics of adjacent data and the laws from a large amount of data samples. Our method also combines the strong generalization ability of 1D-Var. In addition, our method effectively gives play to the two unique advantages of GIIRS (high temporal resolution and stationary relative to the ground). Therefore, it improves the accuracy of retrieved humidity profiles.
In this paper, GIIRS observation data and GFS prediction data were used as inputs of the CNN and ANN models, obtaining input data (brightness temperature and initial profiles) more suitable for 1D-Var, and realizing the combination of machine learning and the physical retrieval method. Comparative experiments show that the accuracy of this combined method is higher than that of the conventional retrieval method of humidity profiles. Therefore, it was proven that the combination of machine learning and the physical approach is highly effective. According to the characteristics of hyperspectral data in geostationary orbit, this idea can be a further explored direction.
The detection data of geostationary meteorological satellites generally have the characteristics of high temporal resolution. These types of data, which can provide a much richer sample of observation information in a short period of time, are more suitable for the training of machine learning. With the rapid development of machine learning, more related methods are applied to the satellite remote sensing field at the moment. The use of geostationary satellites for meteorological data detection will have broad prospects.  Data Availability Statement: All data generated or analyzed during this study are included in this article.