Numerical Investigations on Wave Remote Sensing from Synthetic X-Band Radar Sea Clutter Images by Using Deep Convolutional Neural Networks

X-band marine radar is an effective tool for sea wave remote sensing. Conventional physical-based methods for acquiring wave parameters from radar sea clutter images use three-dimensional Fourier transform and spectral analysis. They are limited by some assumptions, empirical formulas and the calibration process while obtaining the modulation transfer function (MTF) and signal-to-noise ratio (SNR). Therefore, further improvement of wave inversion accuracy by using the physical-based method presents a challenge. Inspired by the capability of convolutional neural networks (CNN) in image characteristic processing, a deep-learning inversion method based on deep CNN is proposed. No intermediate step or parameter is needed in the CNN-based method, therefore fewer errors are introduced. Wave parameter inversion models were constructed based on CNN to inverse the wave’s spectral peak period and significant wave height. In the present paper, the numerically simulated X-band radar image data were used for a numerical investigation of wave parameters. Results of the conventional spectral analysis and CNN-based methods were compared and the CNN-based method had a higher accuracy on the same data set. The influence of training strategy on CNN-based inversion models was studied to analyze the dependence of a deep-learning inversion model on training data. Additionally, the effects of target parameters on the inversion accuracy of CNN-based models was also studied.


Introduction
Sea wave remote sensing has important scientific significance and practical value [1]. Wave remote sensing is a fundamental part of ocean monitoring and helps to better understand the regular pattern of marine changes. It is also a key factor for safety assessment of offshore operations and real-time prediction of ship and platform motion attitude.
The X-band marine radar is active microwave imaging radar equipment. It has been widely used for wave remote sensing since 1965, when it was used by Wright to obtain wavelength and direction of wave propagation [2]. X-band radar produces Bragg-scattered signals associated with a short surface wave [3]. The echo signal is mainly modulated by hydrodynamic, tilt and shadowing effects [4,5].
The radar system generates sea clutter images according to signal intensity when it receives the echo signal. The sea clutter image contains temporal and spatial information of a wave field that can be used to inverse the wave parameters.
The conventional spectral analysis method based on a three-dimensional Fourier transform is an effective approach for extracting wave parameters from radar images [6][7][8][9]. The basic idea is to use a To address deficiencies of the conventional method, and in view of the advantages of CNN in image processing, we developed a CNN-based technique for wave parameter inversion from radar sea clutter images. In this problem, the synthetic radar images are the inputs and the outputs are values of spectral peak period and significant wave height. In essence, it is a two-parameter regression problem because the outputs are two continuous parameters. In the CNN-based method, convolutional and pooling layers are used to extract the characteristics of images, and fully-connected layers are used to combine these characteristics and target parameters. Moreover, a training process replaces the calibration process and avoids the errors introduced by unknown empirical parameters and the shape of empirical formulas. The proposed method is compared with conventional spectral analysis method. The training data dependence and the general performance using different test samples are also discussed.
The rest of the paper is organized as follows. In Section 2, the principle and procedure of both wave spectral peak period and significant wave height inversion by conventional spectral analysis method are introduced. Section 3 gives a theoretical introduction into the CNN method. In Section 4, the numerical simulation of radar images is presented. Results and discussion are described in Sections 5 and 6. Finally, Section 7 gives the conclusions of the present study.

The Spectral Analysis Method for Radar Images Inversion
Presently, the most popular approach for inversing a wave's spectral peak period and significant wave height from radar sea clutter images is based on the theory proposed by Young et al. and Nieto Borge et al. [5,6]. The theoretical process and formulas are summarized as follows.
Firstly, as shown in Equation (1), a three-dimensional fast Fourier transform is applied on a radar image sequence I(x, y, t) to obtain its image spectrum F(kx, ky, ω): where ∆kx = 2π Lx , ∆ky = 2π Ly , ∆ω = 2π T , Lx and Ly are the wave lengths in the X and Y directions, respectively. T is the time duration of the radar image data.
Secondly, filters are used to extract the signal about the wave from the image spectrum. It eliminates the low-frequency energy induced by radar image long-range dependence modulation effects. Then, the filtered three-dimensional image spectrum I (3) (k xn , k ym , ω p ) and the energy of noise can be obtained.
Thirdly, the two-dimensional image spectrum is calculated by integrating the three-dimensional image spectrum in the range of ω > 0. The two-dimensional image spectrum is formulated as Equation (2). The image spectrum can be converted into wave spectrum through an MTF M(k) by using Equation (3).
where F(k) is the wavenumber spectrum and I(k) is the image spectrum. The MTF is usually obtained by comparing the filtered radar spectrum and the in situ measured buoy spectrum. Nieto Borge et al. offered an empirical formula [5]: where q is a constant parameter related to the sea states. Fourthly, the wave frequency spectrum can be derived from wavenumber spectrum by coordinate transformation: S (2) (ω, θ) = 2 ω 2 g F (2) ω 2 g cos(θ), Remote Sens. 2020, 12, 1117 4 of 21 The wave's spectral peak period can be estimated directly from the wave frequency spectrum S(ω). Finally, the SNR is determined based on Equation (7): where the SIG is the sum of corrected wave spectrum energy and BGN is the sum of the filtered noise, and subsequently used to inverse the significant wave height as shown in Equation (8). Significant wave height H S can be calculated by the formula: where A and B are assumed constants that can be determined by calibration.
In the present work, background noise does not exist in the numerical program that produces radar images. So, it is not proper to calculate H S based on SNR. Instead, we calculate H S based on the zeroth moment of the wave frequency spectrum S(ω), as shown in Equation (9): In the above inversion method, the observed wave field is assumed to have spatial uniformity and time stationarity. Furthermore, it is necessary to calibrate the radar before measuring. Buoys or some other in situ sensors are required to provide measured data. Moreover, this approach is based on the assumption of linearity and complex physical equations. The calculation process is complicated and involves many uncertain and empirical parameters. Therefore, there are several challenges necessary to improve inversion accuracy.

Convolutional Neural Networks
The CNN technique is an important branch of neural networks. It is a feed-forward neural network usually used to solve problems about images. The implementation process of the CNN model is shown in Figure 1. The network architecture of the CNN model mainly consists of the following key elements: Remote Sens. 2020, 12, x FOR PEER REVIEW 5 of 22 In fact, there are several methods for extracting geometric features and dissimilarity of remotely sensed imagery [22]. In CNN models, the geometric features and dissimilarity of synthetic radar images are extracted by the convolutional and pooling layers. In short, CNN can be regarded as an input-output mapping, where x represents the input image and y represents the output. The y can be a class, or a numerical value. The f represents the whole network, which is trained by many image data to make the outputs close to actual values. The process of training can be similarly understood as a "function fitting" process. (a) The input layer-Reading images and passing data.
Remote Sens. 2020, 12, 1117 5 of 21 (b) The convolutional layers-Convolutional layers can extract the characteristics of images by convolution. They are the core of CNN. The characteristics of images can be made more prominent through the process of convolution.
(c) Activation function-The use of an activation function is to add nonlinear items so that the networks can effectively deal with nonlinear problems.
(d) Pooling layers-Characteristics output by convolutional layers is large. Pooling layers are used to cut down this number and highlight the main characteristics, which reduces computation.
(e) Fully connected layers-Fully connected layers play the role of finishing classification or regression and outputting the final results. Characteristics of images extracted by pooling convolutional layers are gathered and calculated in fully connected layers.
In fact, there are several methods for extracting geometric features and dissimilarity of remotely sensed imagery [22]. In CNN models, the geometric features and dissimilarity of synthetic radar images are extracted by the convolutional and pooling layers. In short, CNN can be regarded as an input-output mapping, where x represents the input image and y represents the output. The y can be a class, or a numerical value. The f represents the whole network, which is trained by many image data to make the outputs close to actual values. The process of training can be similarly understood as a "function fitting" process.

Inversion Models of Wave Parameters Based on CNN
Several kinds of network architecture have been proposed, such as AlexNet, GoogLeNet, VGGNet and ResNet. The differences among them mainly lie in the depth of the network and the setting of convolutional layers. In this paper, we chose AlexNet and VGGNet as basic structures to build inversion models under the Tensorflow framework. AlexNet includes five convolutional layers and three fully connected layers, and there are three pooling layers following the three convolutional layers. The ReLU is as the activation function, which is the same as AlexNet. VGGNet includes five convolutional parts and three fully connected layers, although there are several convolutional layers in each convolutional part. Thus, there are 13 or 16 convolutional layers and three fully connected layers in total. Further, to adapt to this two-parameter regression problem, input layers were modified to receive images in RGB and the size of images was 256 × 256 pixels. Output layers and their activation function were also modified to output two target parameters that have continuous values.
Besides network structure, the quality and quantity of the image data are also key factors in building the models. As for the parameter settings of the data set, we will introduce this in detail in Section 5.

Numerical Simulation of Radar Sea Clutter Images
In this paper, a number of radar sea clutter images were generated by a numerical simulation program to build the testing and training data sets required by the methods described in Sections 2 and 3.

Imaging Principle of X-Band Marine Radar
X-band maritime radar is an active microwave imaging radar with high resolution, which transmits electromagnetic waves to the observed sea surface and receives the backscattered echo signals to realize the observation of sea waves.
The process of X-band radar receiving echo signals can be explained by the Bragg model and two-scale model [3,23]. Moreover, the process involves modulation effects, mainly including hydrodynamic modulation, tilt modulation and shadowing modulation [4,5]. In this section, our numerical simulation program was based on the Bragg model and two-scale model, and involves tilt modulation and shadowing modulation at the same time. Background noise was not considered. Bragg scattering is a phenomenon of superposition and interference of backscattered signals. When the wavelength of a sea wave satisfies the condition: then radar echo signals reflected by each wave surface have superposition in the same phase. In Equation (10), λ s is the wavelength of the sea surface wave, λ r is the wavelength of the radar signal, θ is the incident angle of radar signal, φ is the angle between the direction of the sea wave propagation and the direction of radar signal, and n is a natural number greater than 0.

Two-Scale Model
In two-scale model theory, it is assumed that the sea surface is composed of two scales of waves; microscale waves are superimposed on the long waves. Due to the modulation effect of the tilted wavefront on the microscale wave, the long wave changes the local incident angle of the radar, which affects the backscattering cross section, and finally affects the backscattering signal.
In the calculation, the local scattering cross section in a small area was calculated first, and then the probability density function of the long wave surface slope was used to integrate the whole area. The local scattering cross section is given by: where the g is the polarization function, ψ is the microscale spectrum and θ i is the local incident angle, where δ 1 is the angle that the local scattering unit normal deviates from the vertical line in the incident plane caused by long wave, and δ 2 is the angle of the normal deviating from the vertical line in a plane perpendicular to the incident plane. Therefore, the backscattering cross section of each little square can be calculated as where p(tan δ 1 , tan δ 2 ) is the joint probability density function of the long wave slope.

Tilt Modulation
The Bragg scattering of radar signal to the microamplitude wave is affected by the existence of the long wave. In particular, the angle of the long wave surface changes the normal direction of the backscattering surface, leading to a change of the local incident angle and a change of the backscattering cross section.

Shadowing Modulation
When the incident angle of the radar signal is large and almost parallel to the sea level, then due to the fluctuation of the sea surface, the higher wave will block the sea surface behind it, resulting in a "blind area" that the radar incident signal cannot illuminate. This blind area produces almost no backscattered echo signal.

Numerical Simulation of Radar Images Data
We developed a numerical program to simulate radar sea clutter images under different sea states based on the above principle. With this program, we could, according to our needs, set the significant wave height and the wave's spectral peak period of the sea area shown in the radar image. The wave spectrum we chose was the ITTC (International Towing Tank Conference) two-parameter wave spectrum. The model of the ITTC two-parameter wave spectrum is: where H S is the significant wave height and T S is the wave's spectral peak period. The relation of wave number, wave direction and corresponding wave height is shown in Figure 2.
Remote Sens. 2020, 12, x FOR PEER REVIEW 7 of 22 a "blind area" that the radar incident signal cannot illuminate. This blind area produces almost no backscattered echo signal.

Numerical Simulation of Radar Images Data
We developed a numerical program to simulate radar sea clutter images under different sea states based on the above principle. With this program, we could, according to our needs, set the significant wave height and the wave's spectral peak period of the sea area shown in the radar image. The wave spectrum we chose was the ITTC (International Towing Tank Conference) two-parameter wave spectrum. The model of the ITTC two-parameter wave spectrum is: where HS is the significant wave height and TS is the wave's spectral peak period. The relation of wave number, wave direction and corresponding wave height is shown in Figure  2. The left part of Figure 3 shows one synthetic radar image in which the significant wave height HS is 4 meters and the wave's spectral peak period TS is 8.5 seconds. The radius of sea area shown in the image is 3 kilometers. The spatial resolution is 2 × 2 m, and the temporal resolution is 3 seconds. Figure 4 shows the spatial distributions of water elevation and echo intensity along a radius (Y = 0 m, 300 m < X < 3000 m) of the synthetic image.
Since the size of simulated image was large, we just cut out a square area of 256 × 256 pixels in the image to build training and testing data sets so as to reduce the computational load. Figure 4 also shows the cropping process of the original radar image. The left part of Figure 3 shows one synthetic radar image in which the significant wave height H S is 4 meters and the wave's spectral peak period T S is 8.5 seconds. The radius of sea area shown in the image is 3 kilometers. The spatial resolution is 2 × 2 m, and the temporal resolution is 3 seconds.
Remote Sens. 2020, 12, x FOR PEER REVIEW 8 of 22  Figure 4 shows the spatial distributions of water elevation and echo intensity along a radius (Y = 0 m, 300 m < X < 3000 m) of the synthetic image.   Since the size of simulated image was large, we just cut out a square area of 256 × 256 pixels in the image to build training and testing data sets so as to reduce the computational load. Figure 4 also shows the cropping process of the original radar image.

Definitions of the Accuracy Measures
In this paper, accuracy measures including the relative error (RE), mean relative error (MRE), absolute error (AE) and root mean squared error (RMSE) (Equations (15), (16), (17) and (18), respectively), were used to evaluate the performance of both the spectral analysis method and the CNN-based method on wave parameter inversion: where X I is the inversed value of H S or T S by the above two methods, while X A is the actual value and N is the number of samples tested.

Comparisons of the CNN-Based and Spectral Analysis Methods
Radar sea clutter image sequences under 15 different sea states were generated by the numerical program introduced in Section 4. Each sequence included 16 images continuous in time. The H S and T S of these images were set as shown in Table 1. These images were used to test the conventional spectral analysis method. In the CNN-based method, there needs to be a large number of training data to train the CNN-based models. We used the numerical program to generate 2801 radar images. The H S of these radar images ranged from 0.5 to 7.5 m, and one image was generated every 0.0025 m. Correspondingly, the T S ranged from 6.5 to 10.5 s, and was linearly distributed in this range, corresponding to the H S one-by-one. That is: Actually, there was not such a relation between T S and H S . We set up this relationship just to make it easier to generate images.
After being cropped in the way introduced in Section 4, 2403 of the 2801 images were used to build the training data set, and 398 of the 2801 were used for validation training. After training, the two CNN-based models, AlexNet-based and VGGNet-based, were tested by test images that were selected from the testing data set of the conventional spectral analysis method. Fifteen test samples were taken from the 15 image sequences, respectively. Using the same test samples to test these two different methods provided more convincing results.
The inversion results of the two methods are shown in Tables 2 and 3.  In Figure 5, we can see intuitively the difference between the results of the above methods. The distance between inversion points and the baseline can indicate the inversion errors (Figure 5a,b).
The results' statistical characteristics of the two methods are shown in Table 4.  Figure 5, we can see intuitively the difference between the results of the above methods. The distance between inversion points and the baseline can indicate the inversion errors (Figure 5a,b). The results' statistical characteristics of the two methods are shown in Table 4.

Training Dependence of the CNN-Based Inversion Models
We developed numerical experiments to study the dependence of CNN-based inversion models on the training data set for two aspects, the parameter setting range of the training images and the position where the training images are cut out from the original radar image.

Dependence of CNN-based inversion models on the parameter setting range of the training images
The relationship between the TS and HS of images in the training set for CNN-based inversion models introduced in Section 5.2 are represented by the line in Figure 6.

Training Dependence of the CNN-Based Inversion Models
We developed numerical experiments to study the dependence of CNN-based inversion models on the training data set for two aspects, the parameter setting range of the training images and the position where the training images are cut out from the original radar image.

Dependence of CNN-Based Inversion Models on the Pparameter Setting Range of the Training Images
The relationship between the T S and H S of images in the training set for CNN-based inversion models introduced in Section 5.2 are represented by the line in Figure 6. However, the TS and HS do not correspond one-to-one to each other in the actual ocean. The relationship shown by the line in Figure 6 is only for the maximum probabilities. We simulated some radar images beyond the coverage shown by this line according to the relationship between the wave's spectral peak period and the significant wave height [24], to test the CNN models trained by the initial training set. The parameters of the test samples are shown in Table 5 and the distribution of the test samples is shown as the blue points in Figure 6. Some of the test sample images are shown in Figure 7. However, the T S and H S do not correspond one-to-one to each other in the actual ocean. The relationship shown by the line in Figure 6 is only for the maximum probabilities. We simulated some radar images beyond the coverage shown by this line according to the relationship between the wave's spectral peak period and the significant wave height [24], to test the CNN models trained by the initial training set. The parameters of the test samples are shown in Table 5 and the distribution of the test samples is shown as the blue points in Figure 6. Some of the test sample images are shown in Figure 7.    The results of the 14 test samples inversed by the CNN-based models and trained by the initial training data set are shown in Figure 8.
We can see that there were obvious errors because of the difference between the test samples and the training data. To address this problem, we expanded the training data set as shown in Figure 9; one radar image was generated every 0.008 s in T S at the same H S , and one radar image was generated every 0.014 m in H S at the same T S . We can see that there were obvious errors because of the difference between the test samples and the training data. To address this problem, we expanded the training data set as shown in Figure  9; one radar image was generated every 0.008 s in TS at the same HS, and one radar image was generated every 0.014 m in HS at the same TS.
We trained the CNN-based models with the expanded training data set. Afterward, the two models were used to inverse HS and TS of the 14 test samples. Results are shown in Figure 10.  We trained the CNN-based models with the expanded training data set. Afterward, the two models were used to inverse H S and T S of the 14 test samples. Results are shown in Figure 10.
From Figure 10, we can see that after being trained by the expanded training set, CNN-based models' inversion accuracy increased greatly. Table 6 gives the errors in detail.
According to Table 6, we see that the MRE of T S and H S inversed by the two CNN-based models trained by the initial training data set were 31.49% and 26.40%, 32.46% and 22.51%, respectively. By comparison, the MRE of T S and H S inversed by CNN-based models trained by the expanded training data set were 6.14% and 2.82%, 16.24% and 10.73%, respectively.  Figure 9. The range of parameter settings for the expanded training data set. HS means the significant wave height, and TS means the spectral peak period. From Figures 10, we can see that after being trained by the expanded training set, CNN-based models' inversion accuracy increased greatly. Table 6 gives the errors in detail.

Dependence of CNN-Based Inversion Models on the Cropping Position of Images
In Section 4.2, we introduced a method of cropping the original image to reduce the computation effort. However, in the original radar image the characteristics information contained in various positions may be different. The training images cut out from the original radar images may only contain local information of the wave. In this section we describe a numerical experiment developed to study whether CNN-based models trained by local area images could correctly inverse the information of other area images that were not involved in training.
To begin, we named the area cut out in Section 4.2 Area A. Then, three images with the same size were cut out from the other three areas: B, C and D in the original image, as shown in Figure 11.
The images of Area B, Area C and Area D were used for training, and the images of Area A were used for testing to validate whether CNN-based models could extract global characteristics information from local area images. Test results of two CNN-based models are shown in Figure 12 and Table 7. Table 7. Inversion errors of wave's spectral peak period (T S ) and significant wave height (H S ) by CNN-based models. contain local information of the wave. In this section we describe a numerical experiment developed to study whether CNN-based models trained by local area images could correctly inverse the information of other area images that were not involved in training.

CNN-Based
To begin, we named the area cut out in Section 4.2 Area A. Then, three images with the same size were cut out from the other three areas: B, C and D in the original image, as shown in Figure 11. The images of Area B, Area C and Area D were used for training, and the images of Area A were used for testing to validate whether CNN-based models could extract global characteristics information from local area images. Test results of two CNN-based models are shown in Figure 12 and Table 7.

Effects of Wave Parameters on Inversion Accuracy of the CNN-based Models
CNN-based inversion models showed different inversion accuracy for TS and HS. In this section, we explore the influence of changes of two inversion target parameters on the inversion accuracy of CNN-based models. In order to eliminate the influence of training set on the results, the CNN models used in this section were trained by the expanded training data set.
We first considered the influence of the change of TS on the CNN-based models with the same HS. In order to avoid the influence of particularity of selected HS on the results, we used the numerical simulation program to generate radar images under three HS: 0.5, 4 and 7.5 m. Eleven images with different TS were generated under each HS. TS was set to 6.5, 6.9, 7.3, 7.7, 8.1, 8.5, 8.9, 9.3, 9.7, 10.1 and  10.5s, respectively. After being cropped, the 33 images were used as input to the CNN-based models as test samples.

Effects of Wave Parameters on Inversion Accuracy of the CNN-Based Models
CNN-based inversion models showed different inversion accuracy for T S and H S . In this section, we explore the influence of changes of two inversion target parameters on the inversion accuracy of CNN-based models. In order to eliminate the influence of training set on the results, the CNN models used in this section were trained by the expanded training data set.
We first considered the influence of the change of T S on the CNN-based models with the same H S . In order to avoid the influence of particularity of selected H S on the results, we used the numerical simulation program to generate radar images under three H S : 0.5, 4 and 7.5 m. Eleven images with different T S were generated under each H S . T S was set to 6.5, 6.9, 7.3, 7.7, 8.1, 8.5, 8.9, 9.3, 9.7, 10.1 and 10.5s, respectively. After being cropped, the 33 images were used as input to the CNN-based models as test samples. Figure 13 shows the change of RE and AE. For example, Figure 13a shows the changing trend of RE(T S ) and RE(H S ) with changing T S when H S = 0.5 m. The red dash lines with cross and circle symbols represent the RE(T S ) and RE(H S ) of the AlexNet-based model, while the blue dot lines with cross and circle symbols represent the RE(T S ) and RE(H S ) of the VGGNet-based model. It is the same in Figure 13b,c except that they were under different H S . According to Figure 13a-c, regardless of red lines or blue lines, there was no uniform regularity of these changes. The changes of RE for H S and T S were generally irregular when T S changed and H S was constant. Figure 13d,e shows the changes of AE(H S ) when applying the AlexNet-based and VGGNet-based models. In this figure, the red lines with squares represent the AE(H S ) when H S = 0.5 m, while the black lines with circles represent AE(H S ) when H S = 4 m, and the blue lines with crosses represent AE(H S ) when H S = 7.5 m. Analogously, Figure 13f,g shows the changes of AE(T S ). The changing trend of AE was also generally irregular. In Figure 14, blue and red lines with circles represent RE(TS) of the two CNN-based mode Generally, these lines change irregularly. For the inversion RE(HS), however, which are represent by the red and blue lines with crosses, it decreased with the increase of HS. The RE(HS) was lar Next, TS remained constant and the influence of the change of HS on the CNN-based models was studied. TS was set to 6.5, 8.5 and 10.5 s, and 11 images with different HS were generated under each TS. HS was 0.5, 1.2, 1.9, 2.6, 3.3, 4, 4.7, 5.4, 6.1, 6.8 and 7.5m, respectively. These images were input into the CNN-based inversion models after being cropped. The changes of inversion RE are shown in Figure 14. In Figure 14, blue and red lines with circles represent RE(TS) of the two CNN-based models. Generally, these lines change irregularly. For the inversion RE(HS), however, which are represented by the red and blue lines with crosses, it decreased with the increase of HS. The RE(HS) was large According to Figure 15, the change of AE(HS) did not have the same regularity with RE(HS). It was generally irregular.

Discussion
In Section 5.2, the inversion accuracy of the CNN-based method was compared with the accuracy of the conventional spectral analysis method using the same data set. From Figure 5, we can see that, entirely, the results of HS and TS inversed by the CNN-based models were in good agreement with actual values. According to Table 4, the MRE of TS inversed by the two CNN-based models were 1.29% and 1.63%, and RMSE were 0.18 s and 0.21 s. At the same time, the MRE of HS were 5.20% and 5.49%, respectively, the RMSE were 0.27 m and 0.28 m. These errors are acceptable. So, we can say that the CNN-based method is a feasible way to inverse wave parameters from our synthetic radar images. In comparison, as for the spectral analysis method, MRE of TS and HS were 14.25% and 20.59%, while RMSE were 1.31 s and 0.97 m, respectively. Both CNN-based models had higher accuracy than the spectral analysis method in this inversion problem. For these test samples, the errors of the conventional method, regardless of relative or absolute errors, were greater than those of the CNN-based method. It shows there are advantages of the CNN-based method when compared to the conventional spectral analysis method, to some extent.
In Section 5.3, the dependence of CNN-based inversion models on the training data set was studied in two aspects: the range of parameter settings for the training images and the position where the training images are cut out from the original radar image. In the first aspect, according to the results, it is obvious that the accuracy of the CNN-based model trained by the expanded training data set was much higher than that of the CNN model trained by the initial training data set. The initial According to Figure 15, the change of AE(H S ) did not have the same regularity with RE(H S ). It was generally irregular.

Discussion
In Section 5.2, the inversion accuracy of the CNN-based method was compared with the accuracy of the conventional spectral analysis method using the same data set. From Figure 5, we can see that, entirely, the results of H S and T S inversed by the CNN-based models were in good agreement with actual values. According to Table 4, the MRE of T S inversed by the two CNN-based models were 1.29% and 1.63%, and RMSE were 0.18 s and 0.21 s. At the same time, the MRE of H S were 5.20% and 5.49%, respectively, the RMSE were 0.27 m and 0.28 m. These errors are acceptable. So, we can say that the CNN-based method is a feasible way to inverse wave parameters from our synthetic radar images. In comparison, as for the spectral analysis method, MRE of T S and H S were 14.25% and 20.59%, while RMSE were 1.31 s and 0.97 m, respectively. Both CNN-based models had higher accuracy than the spectral analysis method in this inversion problem. For these test samples, the errors of the conventional method, regardless of relative or absolute errors, were greater than those of the CNN-based method. It shows there are advantages of the CNN-based method when compared to the conventional spectral analysis method, to some extent.
In Section 5.3, the dependence of CNN-based inversion models on the training data set was studied in two aspects: the range of parameter settings for the training images and the position where the training images are cut out from the original radar image. In the first aspect, according to the results, it is obvious that the accuracy of the CNN-based model trained by the expanded training data set was much higher than that of the CNN model trained by the initial training data set. The initial training data set did not cover the test samples, while the expanded training data set covered them (but they were not a part of expanded training data). This phenomenon shows that the CNN method is ineffective in dealing with data not covered by the training set. It also reveals that CNN is a data-driven method. Thus, the quantity and quality of training set data are the key factors affecting the performance of the CNN-based models. In the second aspect, it is clear that inversion errors of AlexNet-based model were large. Especially, the MRE of H S reached 64.18%. That is, the AlexNet-based model could not inverse H S and T S of Area A images correctly according to the characteristics information which it learned from the images of Area B, C and D. Therefore, we conclude that the AlexNet-based model was unable to learn the global characteristics information from local images. In contrast, the VGGNet-based model had high accuracy, and its inversion errors were all within an acceptable range. Therefore, we believe that the VGGNet-based model extracted information about H S and T S from the images of Area B, C and D and applied it to the inverse parameters of the Area A images. Furthermore, the VGGNet-based model could obtain the global characteristics information from local images. As for the reason for the above phenomenon, we infer that because the architecture of VGGNet is deeper, it is more effective than AlexNet for complex characteristics information. Hence, the VGGNet-based model had better performance in this problem. The more detailed reasons will be studied in our follow-up work.
Finally, in Section 5.4, the influence of changes for two inversion target parameters on inversion accuracy of CNN-based models was studied. When T S changed and H S was constant, the changes of RE(T S ), AE(T S ), RE(H S ) and AE(H S ) were all generally irregular. When H S changed and T S was constant, it appeared that the RE(H S ) was larger when the value of H S was smaller. After the changes of AE(H S ) were drawn, we concluded that the reason why RE(H S ) decreases with increasing of H S was the definition of relative error. According to the definition of RE, when the AE(H S ) is the same, if the value of H S is smaller, then RE(H S ) will be larger. Therefore, in summary, the changes of T S and H S did not obviously affect the inversion AE of the CNN-based models, although the RE(H S ) was affected by the value of H S . The RE(H S ) was larger when H S was smaller.

Conclusions
Conventional wave inversion methods are limited by assumptions involved in the calibration process. It is challenging to further improve wave inversion accuracy by using these methods. Inspired by the capability of CNN techniques in handling image problems, a machine learning inversion method based on CNN was proposed. Comparison studies, training strategy and training data dependency were investigated. Some concluding remarks are summarized as follows.
The inversed results of both spectral peak periods and significant wave heights by the AlexNet-based and VGGNet-based models were highly correlated to the targets. The mean relative error was within an acceptable range. It was demonstrated that CNN models could effectively extract the characteristic periods and wave heights from radar images. It also verified the feasibility of using CNN to extract information from radar sea clutter images. Furthermore, compared to the conventional spectral analysis method, the CNN-based method produced higher accuracy. The method therefore provides a potential way for accurate wave parameter inversion from radar images.
Results of the training data set dependence on CNN-based inversion models show that CNN models only performed well when the test image data had the same wave parameter ranges as the training data set. There were obvious inversion errors if test samples resulted from wave parameters out of the training data range. These results provide the scope of applicability for the CNN models in wave inversion. Comparatively, the VGGNet-based model could obtain the overall wave characteristics of radar images from local cropped pictures, although the AlexNet-based model failed.
Finally, as for the effects of the target wave parameters on the inversion results, it was indicated that the changes of spectral peak period and significant wave height had little effect on inversion accuracy of the CNN-based method.
However, in this paper, the validation of the method was based on a synthetic image data set. The CNN-based models were effective on the synthetic image data set, but further testing is necessary to establish whether the method is suitable for a real radar image data set. Moreover, the CNN-based