Direction-of-Arrival Estimation over Sea Surface from Radar Scattering Based on Convolutional Neural Network

: Conventional direction-of-arrival (DOA) estimation methods are primarily used in point source scenarios and based on array signal processing. However, due to the local scattering caused by sea surface, signals observed from radar antenna cannot be regarded as a point source but rather as a spatially dispersed source. Besides, with the advantages of ﬂexibility and comparably low cost, synthetic aperture radar (SAR) is the present and future trend of space-based systems. This paper proposes a novel DOA estimation approach for SAR systems using the simulated radar measurement of the sea surface at different operating frequencies and wind speeds. This article’s forward model is an advanced integral equation model (AIEM) to calculate the electromagnetic scattered from the sea surface. To solve the DOA estimation problem, we introduce a convolutional neural network (CNN) framework to estimate the transmitter’s incident angle and incident azimuth angle. Results demonstrate that the CNN can achieve a good performance in DOA estimation at a wide range of frequencies and sea wind speeds.


Introduction
Determining the direction of arrival (DOA) of the radar signal is a fundamental problem for sea surveillance. The task of DOA estimation is to identify the signal source directions in which the signal is transmitted. Conventional DOA estimation methods, including beamforming techniques [1-3] and subspace-based methods [4][5][6][7][8], are primarily used in point source scenarios. With the development of machine learning and artificial intelligence, neural network (NN) has been applied in the DOA estimation domain [9][10][11][12][13][14]. This method establishes training datasets with DOA labels first, and then derives a mapping from antenna outputs to signal directions with existing methods. The derived mapping is then used on test datasets to estimate signal directions. An NN-based DOA estimation is data driven and does not rely on pre-assumptions about array geometries and whether the antenna outputs are calibrated or not. They have been demonstrated to be computationally more efficient than subspace-based methods in simulations [14].
However, sea-surface-induced local scattering causes signals from a single source to arrive via different paths and at different angles. Therefore, the signals observed from radar antenna can then be regarded as a superposition of all contributions from different propagating paths. In this case, the source is no longer perceived as a point source but instead as a spatially dispersed source with mean DOA and spatial extent. Previous works have shown that the performance of point source DOA estimation methods is degraded [15][16][17]. Most approaches for DOA estimation of scattered sources utilize a model of the data covariance matrix, for example, maximum likelihood estimators [18][19][20][21][22].
The price to pay for this increased efficiency and accuracy is that those algorithms require a multidimensional search to find the estimates and converge to a local minimum. Another commonly used approach, the spatial smoothing algorithm [23,24], obtains the robustness at the expense of a reduced effective aperture.
Additionally, methods as mentioned above require many antenna elements to achieve a high degree of accuracy. As the number of antenna elements increases, the power consumption, size, and the cost of the system increase. In [25], artificial neural networks are used to decrease the size and weight and reduce the costs of components production and whole application systems in forming a beam from radiating complex microstrip antenna. Previous studies on DOA estimation of SAR system were realized by filtering the roughsurface induced clutter and then retrieving the clutter-surpassed target signals [26,27]. In this paper, we attempt to develop a new DOA estimation technique with low computational complexity and high estimation accuracy from a radar observation.
This paper proposes a two-dimensional DOA estimation method for a radar system under a special case where the incident signal is fully scattered from the sea surface. At this point, the received signal contains sea clutter. We utilize a convolutional neural network (CNN) to establish the relationship of radar measurements with their incident directions. The advanced integral equation model (AIEM) is applied as the forward model to simulate the sea surface electromagnetic scattering under different operating frequencies, observation geometry, and winds. The input of CNN are model-generated radar measurements added with speckle noise and the output is the related incidence location of the input scattering pattern.
The contributions of this article are briefed as follows: We explore the DOA estimation problem under a special case where the incident signal is randomly scattered by sea surface. The effect of radar frequencies and wind speeds is taken into consideration as well.
We propose a CNN framework that makes use of underlying statistical characteristics of speckle noise-contaminated radar measurements to predict their related incidence location. By experiment, the CNN shows good performance in the DOA estimation problem, and the performance of CNN is not affected by the radar frequencies and wind speeds. This paper is organized as follows: Section 2 gives the relevant background on the random rough surface scattering models, especially the AIEM model, and illustrates the sensitivity of the received signal to influential parameters. Section 3 formulates the DOA estimation by sea surface scattering measurement, and presents an algorithm to simulate the sea surface measurement dataset and basic notations used for illustration. Besides, we present the CNN structure and interpret how it fits our requirements in Section 3. In Section 4, implementation details and experimental results are presented. Section 5 discusses the DOA estimation results with state-of-the-art algorithms and limitations in the current work. Finally, the conclusions are summarized to close the paper.

Sea Surface Scattering Datasets Generated by AIEM
We first give the AIEM bistatic scattering model, followed by a sensitivity analysis of radar measurement to incidence location and receiver location under certain sea states.

Bistatic Scattering Model
Figure 1 depicts a geometry of bistatic radar scattering from the sea surface. Sea surface is assumed to be a randomly rough surface with known statistical properties; θ i and φ i are the incident angle and incident azimuth angle, respectively; θ s and φ s are the scattering angle and scattering azimuth angle of the receiver, respectively; ε 0 and µ 0 are the permittivity and permeability of the half free space; ε r and µ r are relative permittivity and of the sea;k i is the incident wave vector andk s is the scattered wave vector.k i andk s are defined as: For the purpose of this paper, we utilize the AIEM as a forward model to simulate the scattering process of the sea surface under different observation configurations and sea states. The AIEM model is an analytical model based on the integral equation method (IEM) [28][29][30]. Both the IEM and AIEM models have been used on sea surface microwave scattering and are in excellent agreement with radar measurements [31][32][33][34]. In original AIEM model [29], can be written as: (2) where the polarization indices p and q represent the p -polarized ( v or h ) transmitted power and the q -polarized ( v or h ) received power; k is the wave number of inci- An approximate representation of the sea surface correlation function is given by [36,37]: (4) where is the correlation length along . is the correlation length in the upwind direction (along ); c L is the correlation length in the crosswind direction (along ). An approximate form for is given by [37]: For the purpose of this paper, we utilize the AIEM as a forward model to simulate the scattering process of the sea surface under different observation configurations and sea states. The AIEM model is an analytical model based on the integral equation method (IEM) [28][29][30]. Both the IEM and AIEM models have been used on sea surface microwave scattering and are in excellent agreement with radar measurements [31][32][33][34].
In original AIEM model [29], σ 0 pq can be written as: where the polarization indices p and q represent the p-polarized (v or h) transmitted power and the q-polarized (v or h) received power; k is the wave number of incident wave in free space; k ix , k iy , k iz , k sx , k sy , and k sz are the coordinate projection ofk i andk s ; σ is the root-mean-squared (RMS) height of rough surface; the explicit expression of I n pq is given in [30]; W (n) is the Fourier transform of the n-th power of the surface correlation function ρ s . For sea surface scattering, W (n) is relevant to the relative azimuth angle χ between the direction of sea wind φ w and φ i [35]: An approximate representation of the sea surface correlation function is given by [36,37]: where L t = L u cos 2 φ w + L c sin 2 φ w is the correlation length along φ w .L u is the correlation length in the upwind direction (along φ w = 0 • ); L c is the correlation length in the crosswind direction (along φ w = 90 • ). An approximate form for W (n) (k, χ) is given by [37]: It has been successfully used in sea surface scattering computations.

Sensitivity of Received Signals to Incidence Location
Before devising an effective approach to DOA estimation, it is essential to conduct a sensitivity analysis of scattering responses to incidence location under specific surface parameters. According to Equations (1), (2), and (5), besides the incidence and observation location, the three rough surface parameters, including RMS height, correlation length, and dielectric constant, are related to bistatic scattering coefficients. For sea surface scattering, these parameters are influenced by sea wind speed and radar operating frequency. The values of L u , L c , and σ under different wind speeds and frequencies can be found in [37]. They are empirical values derived by fitting the rough surface scattering model with In what follows, we illustrate the bistatic hemispherical plots for the dependences of incidence location and wind direction and wind speed.
In Figure 2a,b, we examine the angular dependence of polarized bistatic scattering coefficients. Figure 2a shows the effect of the incident angle with f =5.3 GHz, U =25 m/s, φ i = 0 • , θ i = 20 • (the left column), and θ i = 60 • (the right column). Similarly, Figure 2b shows the effect of the incident azimuth angle with f =5.3 GHz, U =25 m/s, θ i = 40 • , φ i = 30 • (the left column), φ i = 90 • (the middle column), and φ i = 120 • (the right column). For the same set of sea surface parameters, both the incident angle and incident azimuth angle have a strong influence on the scattering patterns and strength for all polarizations. In Figure 2a, the left-hand side of the half-sphere and right-hand side of the half-sphere represent backward and forward scattering regions. In contrast, the horizontal axis and vertical axis represent the incident plane and cross-incident plane. In the forward scattering region of Figure 2a, the strong scattering is due to the specular scattering, and it moves to the right hemisphere with the increase of the incident angle. For a smaller incident angle, the small azimuth angular region shows a stronger scattering strength at HH polarization. As the incident angle increases, the region with strong scattering starts to narrow down. In the backward region, the intensity of HH polarization reduces as the incident angle increases, but a subtle increment appears at a large scattering angle and scattering azimuth angle. Similar trends can be observed in VV polarization. Meanwhile, the strength of cross-polarization, HV and VH, quickly weakened on the whole upper hemisphere with an increasing incident angle. As for Figure 2b, the forward and backward scattering regions change with the incident azimuth angle. Relatively speaking, for all polarizations, the strong scattering area moves from the right-hand hemisphere to the left-hand hemisphere with the increase of the incident azimuth angle.
, and , with relative wind direction varying from up/downwind ( 0  /180  ) to crosswind ( 90  ). For co-polarization, we  Figure 3a shows the hemispherical plot of co-and cross-polarized bistatic scattering coefficients with f = 5.3 GHz, U = 25 m/s, θ i = 30 • , and φ i = 0 • , with relative wind direction varying from up/downwind (0 • /180 • ) to crosswind (90 • ). For co-polarization, we see that the strong scattering area in the backward region at up/downwind is more pronounced than at crosswind. In addition, the cross-polarization is impacted by the relative wind direction to a lesser extent than co-polarization is.

Problem Formulation
To better estimate the angles of the incident source, a proper data model is essential to enable a mapping of the measurement domain to the feature domain that is of interest. We detail how we came up with a good scheme to improve DOA estimation accuracy and efficiency. The radar scattering from a rough surface can be modeled as [30] where x is a vector including the surface parameters and radar parameters; matrix W relates the parameters vector x to radar scattering coefficients b ; and u represents the error vector induced by system and calibration errors, and speckle noise, among other  Finally, we examine the sea surface roughness dependence in Figure 3b, which is the hemispherical plot of co-and cross-polarized bistatic scattering coefficients with f = 13.9 GHz, θ i = 30 • , φ i = 0 • , and φ w = 30 • , but sea wind speed varying from 7.5 m/s to 19.4 m/s. With the increase of sea wind speeds, sea surface roughness is increased as well. However, the change of scattering coefficients is coupled with three roughness parameters. In general, the scattering strength under higher wind speed in both forward and backward regions is greater than the scattering strength under lower wind speed. Especially for cross-polarization under lower wind speed, there is a poorer feature in the backward scattering region than that in the forward region.

Problem Formulation
To better estimate the angles of the incident source, a proper data model is essential to enable a mapping of the measurement domain to the feature domain that is of interest. We detail how we came up with a good scheme to improve DOA estimation accuracy and efficiency. The radar scattering from a rough surface can be modeled as [30]: where x is a vector including the surface parameters and radar parameters; matrix W relates the parameters vector x to radar scattering coefficients b; and u represents the error vector induced by system and calibration errors, and speckle noise, among other factors. In this paper, the effect of speckle noise is considered only.
In statistical sense, x constitutes a random variable due to spatially and temporally varying properties, such that: where x t is the true value and x n is the noise term. In practice, the "truth" is never obtainable and always vague. Statistically, x t and x n may be assumed to be uncorrelated, such that x is an unbiased estimation of x t , i.e., where E denotes the statistical mean, and σ 2 x n is a variance of x n . In general, the radar response is formed by the scattering matrix. For the purpose of this paper, we assume the radar measurement vector b is formed by multi-polarized scattering coefficients and the location of the scattering source: The location of the incident source is determined by the incident angle θ i and incident azimuth angle φ i . They are denoted by: Thus, the cost function of DOA estimation problem can be written as: where ||·|| 2 denotes l 2 -norm and x est is the estimated incident angles. Based on what we presented in last section, the scattering behavior is completely intricate. Therefore, it hard to get an analytical but practical solution of Equation (6). In this paper, we adapt a convolutional neural network approach for searching the cost function minima in Equation (6).

Data Input-Output and Preprocessing
In this part, we introduce the method to simulate the sea surface scattering by the AIEM model. Two sets of radar operating frequencies are considered in this paper. The frequencies in set one are 5. ranges of the incidence and received angle, operating frequencies, and sea wind speeds in the DOA estimation experiment are listed in Table 1. Table 1. The ranges of the incidence and received angle, operating frequencies, and sea wind speeds in the DOA estimation experiment.

Parameters Description Range
Step Size In regard to the DOA estimation problem, as illustrated in the previous section, the input of CNN is bistatic scattering coefficients (σ 0 hh , σ 0 hv , σ 0 vh , σ 0 vv ) and related scattering angles (θ s and φ s ). The output of CNN is the corresponding incident source (θ i and φ i ). Under a given frequency and sea speed wind, the Algorithm 1 to generate datasets of sea surface scattering patterns is a quintuple iteration, i.e.,

Algorithm 1 Generation of Training Data by the AIEM Model
The iterations for φ w and φ s generate one single input sample. Hence, the length of a single input is the product of N_φ w (the number of φ w ) and N_φ s (the number of φ s ). In this paper, this value is 1221. The width of inputs is six due to six attributes: σ 0 hh , σ 0 hv , σ 0 vh , σ 0 vv , θ s , and φ s . The iterations for θ i , φ i , and θ s decide the number of inputs, which is the product of N_θ i , N_φ i , and N_θ s . This value is equal to 4719 in this study.
However, as illustrated above, in the actual radar image, it always exhibits large pixel-to-pixel intensity variations, referred to as speckle. To better characterize realistic scattering coefficients in SAR images, we assume that the actual radar measured data follows the noise model as: σ n pq = σ 0 pq + 10 log 10 r where σ n pq and σ 0 pq represent the measured and noise-free radar signals (in decibels), respectively; r is a random number of the K-distribution, which models the speckle noise in radar measurements. In this paper, we use a kind of ν-model to determine the values of the K-distribution shape parameter under different wind speeds. The shape parameter ν can be expressed as [38]: where ν 0 and U 0 are parameters that are dependent on the radar incident angle and wind direction. The scale parameter of the K-distribution is set to be 1. By adding speckle noise to model-generated bistatic scattering coefficients, the parameters of CNN input become σ n hh , σ n vv , σ n hv , σ n vh , θ s , and φ s . At this point, we have finished the whole process of generating sea surface scattering datasets.

CNN Configurations for DOA Estimation
We aim to find a proper CNN structure trained to estimate from sea surface scattering datasets under different wind speeds and radar operating frequencies. A CNN topology typically consists of multiple convolutional layers followed by the fully connected layers. In CNN architectures, the convolutional layers are pairs of convolution and pooling operations. Especially for DOA estimation, we should perform multi-output regression. We share the same convolutional layer structure to predict the incident angle and incident azimuth angle using one fully connected layer at the final stage.
The main focus is to achieve a trade-off between the training speed and accuracy of the CNN. Therefore, as outlined in Table 2, we design five different CNN models inspired by VGG-16 [39] to be evaluated. In Table 2, the convolutional layer is abbreviated as "Conv" and the layer parameters are denoted as "length×width×number of filters, stride". The Average-pooling layer is abbreviated as "Avg-p" and pooling size is denoted as "length×width, stride." "FC" and "Regrs" represent the fully connected layer and regression layer, respectively. In Model 1, the CNN comprises eight convolutional layers, three average pooling layers, and a fully connected layer. The pooling layer follows the fourth, sixth, and eighth convolutional layers. A stride of two is fixed for the averagepooling layer of this CNN framework. When an input-output pair feeds into this CNN, two transverse 1-D convolutional layers follow to extract the features crosswise of the input sample. Then, three or four groups of vertical 1-D convolutional layers and vertical average pooling layers aim to utilize the lengthwise information, which is concerned with the location of the transmitter and wind direction. Therefore, the scattering pattern and strength under the different directions of sea wind and observation geometry can be exploited by this model at large.
Previous studies have shown that shallow networks require exponentially more neurons than deep networks to achieve accuracy for function approximation [40]. Hence, we enlarge the length of the convolutional layer rather than adopting a typical short-length kernel as the VGG-16 model (i.e., in the size of 3 × 3). The kernel size of both convolutional and pooling layers is sequentially increased from 5 × 1 to 10 × 1. At the end of Model 1, a fully connected layer with a regression layer with 1024 nodes is applied to generate the final outputs for θ i and φ i prediction.
The depth of the configurations increases from the left (Model 1) to the right (Model 2) as more layers are added in front of the fully connected layer. The amount of filters in each convolutional layer is decreased from Model 2 to Model 3. Besides, we add models "FC-512" and "FC-2048" to compare the performance between models with different numbers of hidden units in a fully connected layer; "FC-512" denotes the number of hidden units being 512, and "FC-2048" being 2048 hidden units. The various numbers of hidden units of each model in Table 2 are shown in bold. Table 2. CNN configurations (shown in columns). The depth of the configurations increases from the left (Model 1) to the right (Model 2), as more layers are added. Besides, the number of filters decreases from Model 2 to Model 3. Changes are shown in bold. The aim of model "FC-512" and "FC-2048" is to compare the performance between models with different numbers of hidden units in the fully connected layer. The convolutional layer is abbreviated as "Conv" and layer parameters are denoted as "length × width × number of filters, stride". The average-pooling layer is abbreviated as "Avg-p" and layer size is denoted as "length × width, stride". "FC" and "Regrs" represent a fully connected layer and a regression layer, respectively.

Comparison of CNN Configurations
In the previous section, we presented the details of five CNN configurations to be evaluated. We aimed to find a CNN structure with the best performance to estimate the direction of the incident source from simulated radar measurement data. We implemented the CNN models as mentioned earlier with MATLAB ® Deep Learning Toolbox. We used a Dell T7810 Series desktop with two Inter Xeon processors and an NVIDIA Quadro K620 GPU for the training and test. The dataset used in the comparison experiment was the simulated sea surface scattering dataset at 13.9 GHz, 5.5 m/s wind speed.
We trained and tested the proposed CNN models with each dataset 100 times to get a robust and stable result. The dataset was randomly divided into three parts: training, validation, and testing set with a 0.7:0.15:0.15 ratio in every realization. The dropout strategy was used in the neural network, and the dropout rate was 0.05. The learning rate was set to be 0.01 initially and decay 0.1 for every 10 epochs. Besides, we applied the batch normalization technique for the training set to reduce time consumption. The size of the mini-batch was set to be 50. The validation of the current CNN occurs at the end of each mini-training batch. The early stopping strategy was also taken into consideration. The loss on the validation set can be more significant or equal to the previous loss in five epochs before network training stops.

Training
In deep learning, the training algorithm is based on backpropagation using gradient descent algorithms. Therefore, a loss function must be defined that computes the difference between the network output and the truth value. In this work, the half-mean-squared error loss was used as the loss function L to optimize the loss between the true values of incident angles and the model predictions. L is expressed as follows: where y 1 and y 2 denote the truth value of θ i and φ i ;ŷ 1 andŷ 2 are the predicted values of θ i and φ i ; N represents the size of the mini batch.
To minimize the loss function, we used the stochastic gradient descent with momentum (SGDM) as an optimizer of CNN. The standard stochastic gradient descent algorithm can oscillate along the path of steepest descent towards the optimum. Adding a momentum term to the network parameter (weights in the convolutional kernel) update is one way to reduce this oscillation. The SGDM update is: where w represents the network parameter vector, l is the iteration number, α is a learning rate, E(w) is the loss function, and γ determines the contribution of the previous gradient step to the current iteration. In this experiment, we set γ to 0.9. Figure 4 illustrates the loss function minimization process of CNN configurations to be evaluated. The solid line represents the smoothed training loss, and the red dots represent the loss of the validation set. The training process will be stopped if the validation loss satisfies the early stopping condition or the number of epochs reaches the maximum. The model will achieve better performance if the loss function finds a good local minim or a global minimum. From Figure 4, we can see that the training loss functions of all models are convergent to minimal loss. Similar results were obtained for every realization but are not included here for brevity.

Testing
After the training process, we tested CNNs with corresponding test sets. In this paper, the root-mean-square error was chosen to measure the error of CNN in DOA estimation. The results are shown in Table 3. Note that the CNN configuration with fewer stacked convolutional layers (Model 1) and fewer filters in the convolutional layer (Model 3) will lead to less time consumption but performs worse compared with Model 2. By comparing model FC-512, Model 2, and FC-2048, we found that the number of hidden units in a fully-connected layer has an effect on CNN performance. According to our experiment, more hidden units in a fully connected layer improves the prediction accuracy, but the computation time increases. In comparison, Model 2 can balance both estimation accuracy and computation time in the DOA estimation task. Therefore, in the following section, the CNN architecture of Model 2 was employed.

Testing
After the training process, we tested CNNs with corresponding test sets. In this paper, the root-mean-square error was chosen to measure the error of CNN in DOA estimation. The results are shown in Table 3. Note that the CNN configuration with fewer stacked convolutional layers (Model 1) and fewer filters in the convolutional layer (Model 3) will lead to less time consumption but performs worse compared with Model 2. By comparing model FC-512, Model 2, and FC-2048, we found that the number of hidden units in a fully-connected layer has an effect on CNN performance. According to our experiment, more hidden units in a fully connected layer improves the prediction accuracy, but the computation time increases. In comparison, Model 2 can balance both estimation accuracy and computation time in the DOA estimation task. Therefore, in the following section, the CNN architecture of Model 2 was employed.

Model Training
After deciding on a certain CNN model, we utilized the simulated sea surface scattering datasets illustrated in the previous section to estimate DOA at different frequencies and wind speeds. Figure 5 shows the CNN configuration used for DOA estimation. Feature maps in Figure 5 (14), and the optimizer to minimize the loss function is SGDM. Other training options were the same as an illustrated in Section 4.1. Similarly, we trained and tested the proposed CNN models with each dataset 100 times for a robust and stable result. and wind speeds. Figure 5 shows the CNN configuration used for DOA estimation. Feature maps in Figure 5 represent the output of each average-pooling layer. The operating frequencies of the datasets are 5.3 GHz (at a wind speed of 25 m/s, 35 m/s, 45 m/s, and 55 m/s), 9.4 GHz (at wind speed of 5.5 m/s, 7.5 m/s, 12 m/s, 15 m/s, and 19.4 m/s), 13.9 GHz (same wind speeds as 9.4 GHz), and 13.99 GHz (at wind speed of 5 m/s, 10 m/s, 15 m/s, 20 m/s, 25 m/s, and 30 m/s). The implementation details are mentioned at the start of Section 4.1 and are not summarized here. The loss function is defined in Equation (14), and the optimizer to minimize the loss function is SGDM. Other training options were the same as an illustrated in Section 4.1. Similarly, we trained and tested the proposed CNN models with each dataset 100 times for a robust and stable result.

DOA Estimation Result
In this section, we evaluated the performance of Model 2 described in Figure 4. Table  2 summarizes the testing accuracy of DOA estimation-based sea surface scattering datasets at 5.3 GHz and 13.99 GHz. The values reported in this table are the average of 100 realizations. For incident angle estimation, the average RMSE of all datasets is about 1  , and for incident azimuth angle estimation, the average RMSE of all datasets is between 3  to 3 5 .  . For datasets at the 5.3 GHz operating frequency, we found that RMSE values of both the incident angle and incident azimuth angle are slightly decreased with increasing sea wind speed. However, for datasets under the 13.99 GHz operating frequency, there is no distinct relationship between RMSE and wind speeds. Figure 6 shows the distribution of RMSE between the truth and predicted incident source in 100 realizations.

DOA Estimation Result
In this section, we evaluated the performance of Model 2 described in Figure 4. Table 2 summarizes the testing accuracy of DOA estimation-based sea surface scattering datasets at 5.3 GHz and 13.99 GHz. The values reported in this table are the average of 100 realizations. For incident angle estimation, the average RMSE of all datasets is about 1 • , and for incident azimuth angle estimation, the average RMSE of all datasets is between 3 • to 3.5 • . For datasets at the 5.3 GHz operating frequency, we found that RMSE values of both the incident angle and incident azimuth angle are slightly decreased with increasing sea wind speed. However, for datasets under the 13.99 GHz operating frequency, there is no distinct relationship between RMSE and wind speeds. Figure 6 shows the distribution of RMSE between the truth and predicted incident source in 100 realizations.  Table 4. Table 4. Results of 100 realizations of the 5.3 GHz and 13.99 GHz datasets using the CNN method illustrated in Figure 5. The numbers reported in this  Furthermore, we explored the CNN performance for datasets at the same wind speeds but different radar frequencies. Table 5 shows the average results of 100 realizations of the 9.4 GHz, 13.9 GHz, and 14.6 GHz datasets. For incident angle estimation, the average RMSE of all frequencies and wind speeds can approximately reach 1  . As for the Furthermore, we explored the CNN performance for datasets at the same wind speeds but different radar frequencies. Table 5 shows the average results of 100 realizations of the 9.4 GHz, 13.9 GHz, and 14.6 GHz datasets. For incident angle estimation, the average RMSE of all frequencies and wind speeds can approximately reach 1 • . As for the incident azimuth angle, this value is dispersed from 3.24 • to 3.52 • . Figure

Validation
In this paper, four types of validation data at the L-, C-, X-, and Ku-bands were adopted to examine the generalization ability of the proposed CNN structure. The L-band validation data for different wind directions come from the Aquarius scatterometer operating at 1.26 GHz, which has three incident angles from 28.7° to 45.6°. The backscattering coefficients from the sea surface are under 3, 5, 8, 10, 12, and 15 m/s wind speed [33,34]. The validation data for C-(5.3 GHz), X-(9.6 GHz), and Ku-bands (14 GHz) are generated by empirical CMOD7 [41], XMOD2 [42], and NSCAT-4 [43] models, respectively. Besides, other C-(5.3 GHz) and  GHz) data are radar measurements from [37] and [44], respectively. Due to the lack of bistatic scattering data, the scattering coefficients in a single validation sample are backscattered and mixed up with different frequencies and wind speeds. The incident angle and incident azimuth angle of the validation sample are

Results Interpretation
In our experiments, by comparing the average RMSE of all datasets listed in Tables 4 and 5, we found that the performance of the CNN-based DOA estimation does not rely on radar frequencies and sea wind speeds. For almost all datasets, the average RMSE of incident angle is approaching 1 • , and the average RMSE of the incident azimuth angle is between 3.0 • and 3.5 • . One possible reason is that the chosen CNN configuration is robust to the speckle noise appearing in coherent SAR measurements. Further investigation is required to generalize this claim.

Comparison with Other Algorithms
In this subsection, we compared the CNN-based method with the state-of-the-art DOA estimation algorithms, including machine learning or deep learning methods [13,45] and conventional parametric methods [46][47][48]. Those methods are proposed for local-scattered signal sources, which is quite similar to the sea surface scattering. The algorithms cited in this subsection process the signal sequence from the scattered sources, but in our case, the intensity of received signals was taken as the model input. The accuracy of the methods is summarized in Table 6. Using one-dimensional (1-D) DOA estimation means that only the incident angle was considered, while using the two-dimensional (2-D) DOA estimation, we predicted both the incident angle and incident azimuth angle of the signal sources. For 1-D DOA estimation, a machine learning-based approach, such as [45] utilizing the support vector machine (SVM), the RMSE of the incident angle was between 1 • and 1.5 • under the 7 dB of the signal to noise ratio (SNR). The authors of [46] used a multilayer perceptron neural network, and the RMSE of the DOA classifier was between 0.8 • and 3 • when 12.5% simulated training records were used (about 5895 samples in the dataset). In [13], the authors proposed a deep neural network configuration for two signals in the presence of array imperfections, and the RMSEs were between 0.2 • and 0.4 • . Note that the local scattering angle was not considered in [13]. For 2-D DOA estimation, [47] used the ESPRIT-based method. The RMSE of the incident angle and incident azimuth angle in [47] was between 0.4 • and 0.6 • , 3 • and 4 • when the signal angular spreading degree was larger than 8 • . In [48], the authors used the beamspace transformation method to estimate 2-D DOA. The RMSEs of the incident angle and incident azimuth angle were 0.7 • and 1 • , respectively, under 7 dB of SNR.
The accuracy in this paper is very close to the state-of-the-art signal processing-based works. Therefore, we can confirm that these results are quite satisfactory. We consider that using different deep learning techniques could lead to better results. For example, in [49], a Kalman filter-trained dynamic learning neural network (DLNN) was used to retrieve surface parameters from bistatic scattering data. Different from CNN, the configuration of DLNN is modified from multiple layer perceptron and each updated estimate of the DLNN weight is computed from the previous estimate and the new input data. The objective of [49] was to retrieve the surface parameters, while this paper was concerned with estimating the direction of the incident sources. Computationally, the sizes of input samples in [49] and in this work are also quite different.

Limitation
Perhaps, the biggest limitation of deep learning is its difficulty in generalization. Training a CNN to suit a variety of sea conditions and radar observations will lead to lower accuracy. Even though the observation in this article is omnidirectional, each dataset is still at a certain frequency and wind speed. Further investigation should ascertain the details of this trade-off between accuracy and generalization.
Another limitation is uncertainty. The existing uncertainty in these results is associated with the inaccuracy of speckle-interfered radar measurement modeling, sea surface roughness assumptions, and retrieval errors. We plan to compare the simulated data with real radar measurement data and use other deep learning models as a baseline algorithm in future work to reduce this error.

Conclusions
In this study, we present a novel method of two-dimensional DOA estimation for SAR systems based on sea surface scattering and convolutional neural networks. To simulate the bistatic radar measurement data of sea surface scattering under a certain operating frequency and sea wind speed, we utilized the AIEM model to calculate the bistatic scattering coefficients and then added a K-distributed bias as speckle noise in the real radar measurements. The radar operating frequencies used in this paper included C-band, X-band, and Ku-band, and the range of the sea wind speed was from 5 m/s to 55 m/s. We proposed a CNN structure that learns the characteristics of scattering pattern and strength both lengthwise and widthwise. In this way, CNN can make full use of the radar measurement data under different sea wind conditions to improve the estimation accuracy of the incident source. For each dataset, we trained and tested it 100 times. The experimental results showed that the CNN structure in Figure 5 could be applied on DOA estimation with satisfactory accuracy at a wide range of frequencies and sea wind speeds. The average RMSE of the incident angle is about 1 • , and the average RMSE of the incident azimuth angle is between 3 • and 3.5 • .