Research on Underdetermined DOA Estimation Method with Unknown Number of Sources Based on Improved CNN

This paper proposes a joint estimation method for source number and DOA based on an improved convolutional neural network for unknown source number and undetermined DOA estimation. By analyzing the signal model, the paper designs a convolutional neural network model based on the existence of a mapping relationship between the covariance matrix and both the source number and DOA estimation. The model, which discards the pooling layer to avoid data loss and introduces the dropout method to improve generalization, takes the signal covariance matrix as input and the two branches of source number estimation and DOA estimation as outputs, and achieves the unfixed number of DOA estimation by filling in invalid values. Simulation experiments and analysis of the results show that the algorithm can effectively achieve the joint estimation of source number and DOA. Under the conditions of high SNR and a large snapshot number, both the proposed algorithm and the traditional algorithm have high estimation accuracy, while under the conditions of low SNR and a small snapshot, the algorithm is better than the traditional algorithm, and under the underdetermined conditions, where the traditional algorithm often fails, the algorithm can still achieve the joint estimation.


Introduction
Array signal processing is the use of sensor arrays to extract signal features to achieve signal parameter estimation, signal enhancement, signal separation and other signal processing operations; it is widely used in a variety of civil and military fields such as radar, exploration, and medical diagnosis [1]. Signal parameter estimation mainly includes source number estimation and direction of arrival (DOA) estimation, etc. Generally speaking, source number estimation is a prerequisite for signal-related parameter estimation and has an important position in array signal processing. The earliest proposed method of source number estimation is the hypothesis testing method [2], which performs a characteristic decomposition of the received signal covariance matrix. Theoretically, the first few larger eigenvalues are equal in number to sources and correspond to the signal eigenvalues, while the remaining eigenvalues correspond to the noise eigenvalues, which are usually divided according to empirically set thresholds. This method requires the source number according to the set threshold, which is susceptible to subjective factors and makes it difficult to accurately distinguish between signal eigenvalues and noise eigenvalues under low signal-to-noise ratio (SNR) or small snapshot conditions. The Akaike Information Criterion (AIC) and the Minimum Description Length criterion (MDL) [3], which are regarded as classical source number estimation methods based on information theory, were successively proposed by Akaike. and Rissanen with the development of information theory. They calculate the maximum likelihood estimate of the unknown parameters of the array signal and introduce a penalty function for correction, constituting the discriminant function and the number of sources that can be estimated [4]. Based on the issues of inconsistent estimation and underestimating under low SNR and small snapshot settings, researchers then made a number of changes to the AIC criterion and MDL criterion; to address the failure of the above algorithms under color noise, the Gerschgorin circle algorithm was proposed [5], which uses the property of orthogonality between the signal eigenvector and the noise eigenvector to separate the Gerschgorin circle corresponding to the signal [6]. However, the algorithm takes the eigenspace of a small one-dimensional covariance matrix, losing one system degree of freedom, and the estimation performance is unstable under high SNR conditions. All the above algorithms are only applicable to uniform linear array (ULA), and the number of sources is smaller than the number of physical array elements. For scenarios where the number of sources is greater than the number of physical array elements, sparse arrays are typically utilized to enable multi-source parameter estimation in situations when the number of sources is more than the number of physical array elements. Sparse arrays have array elements separated by more than half a wavelength; a wider array aperture and more degrees of freedom are produced by larger array element spacing. Co-prime arrays, uniformly sparse arrays, minimally redundant arrays, and nested arrays are examples of common sparse array types. Dong et al., designed a spatial covariance model via spatial smoothing of the coprime array output signal to estimate the source number [7]. However, the method relies too heavily on the predefined spatial grid for sparse reconstruction, which is somewhat different from reality. To achieve joint estimation of the source number and DOA, Izedi F. et al., proposed an improved hypothesis testing method based on an arbitrary array type [8]. However, because DOA is unstable, the estimation probability of the source number is subject to changes. To achieve underdetermined source number estimation, Zhang Y et al., used the spatial smoothing method to build a virtual array [9], but this method resulted in a loss of degrees of freedom. Overall, the use of sparse arrays can improve the degrees of freedom of the array, but the methods for source number estimation and DOA estimation are also more complex [10]. Compared to ULA, sparse arrays suffer from ambiguity, and the blurred angles can seriously interfere with the accurate discrimination of source DOA. Matter et al., removed the blurred angles in a 1D line array by using particle swarm optimization [11]; Yu et al., filtered out the spurious peaks by using a double V-shaped array with a common point [12]. The deblurring techniques mentioned above can increase DOA estimation resolution without increasing the array's virtual elements. New DOA estimation techniques have emerged as sparse arrays with particular structures, such as minimum redundant arrays, co-prime arrays, and nested arrays. The covariance matrix reconstruction method extends the array aperture by reconstructing an extended covariance matrix from the array properties, which is equivalent to a uniform linear array with more physical array elements. This allows for high-resolution angle estimation using the DOA estimation method for ULA. For example, Pal and Vaidyanathan [13,14] extended the signal and noise subspaces of a co-prime array by rewriting the array flow pattern, which increased the resolution and the number of sources that could be detected. However, this type of method is often reserved for specific types of arrays and is difficult to apply widely. In recent years, with the application of deep learning, DOA estimation methods based on deep learning have emerged [15]. Fast matrix decomposition (eigenvalue decomposition and unitary decomposition) was achieved by Luo and Gohil et al., using deep learning, and DOA estimation was then accomplished using conventional algorithms [16,17]. A convolutional autoencoder was used by Liu Z et al., to classify signals coarsely, and multiple DNNs were then used to achieve high accuracy in each sector [18]. Ge et al., discretized the spatial angle with the aid of a convolutional neural network to turn the angle estimation problem into a classification problem. [19]. The above algorithms often require the number of sources as known data, and the estimation accuracy is limited by the angle discretization.
To address the above problems, this paper proposes a joint estimation method of source number and DOA based on an improved convolutional neural network (CNN), which does not need to estimate the number of sources first and then perform angle estimation. The two output branches of CNN are the number of sources and angle values, respectively, which are not constrained by the array flow pattern. The algorithm is still valid when the number of sources is greater than the number of physical and virtual array elements.

Signal Model
Consider a linear array model with M elements, array element spacing d = {d i }, where λ is the signal wavelength and Z i is a positive integer; when Z i is all 1, the array is a ULA, otherwise, the array is a sparse array. When K uncorrelated far-field narrowband signals are incident to the array with θ 1 , θ 2 , . . . , θ K , and the source positions are fixed and the centre frequencies are the same, the guidance vector of the k-th signal is Then, the guidance vector matrix is The array output signal vector is where x i (i = 1, 2, . . . , M) denotes the output of the i-th array, S(t) = [s 1 (t), s 2 (t), . . . , s K (t)] T denotes the signal vector, and n(t) is the additive noise. In general, the source number is often used as a priori knowledge for DOA estimation and needs to be calculated before DOA estimation. The source number can be determined from the eigenvalues of the covariance matrix R when the array is a uniform linear array. The covariance matrix of X(t) is where R S = E S(t)S H (t) denotes the signal covariance matrix, σ 2 denotes the noise power, and I M denotes the unit matrix of order M. When the signals are uncorrelated, the column vectors a(θ k ), k = 1, 2, . . . , K of the guidance vector matrix A(θ) are non-linearly correlated with each other, and R S is a nonsingular matrix. Under ideal conditions, the rank of the A(θ)R S A H (θ) matrix is K, and the feature decomposition of R yields where U is the eigenvector matrix, Λ = diag{λ 1 , λ 2 , . . . , λ M } is the eigenvalue diagonal matrix; the eigenvalues are sorted in ascending order of size, i.e., where the (M − K) eigenvalues with small values are called noise covariance eigenvalues and have the following characteristics, Therefore, the number of signal sources can be calculated from the number of noise covariance eigenvalues. In fact, there is an error between the covariance matrix calculated from the sampled signal and the true value; the noise covariance eigenvalues are no longer identical [20], i.e., For example, to design an ideal ULA with 6 elements and 3 incident signals, θ 1 = −30 • , θ 2 = 10 • and θ 3 = 30 • ; feature decomposition is performed, and the eigenvalues of the signal covariance matrix at different SNRs and snapshots are calculated, as shown in Table 1.  Each group's eigenvalues (λ 1 to λ 6 ) in Table 1 are normalized as shown in Figure 1.
For example, to design an ideal ULA with 6 elements and 3 incident signals, = −30°, = 10° and = 30°; feature decomposition is performed, and the eigenvalues of the signal covariance matrix at different SNRs and snapshots are calculated, as shown in Table 1. Each group's eigenvalues ( to ) in Table 1 are normalized as shown in Figure 1. As can be seen from the Table 1 and Figure 1, both a lower SNR and a smaller number of snapshots will have a greater impact on the eigenvalues, with the impact of a low SNR being the most obvious. When SNRs are less than −10 dB, distributions of eigenvalues seen in Figure 1 are almost linear, so it is difficult to distinguish between larger and smaller eigenvalues; this makes it challenging to derive an accurate source number from the distributions of eigenvalues. In addition, from the above derivation process, it can be seen that when the number of sources is greater than or equal to the number of array elements, the number of sources cannot be judged according to the characteristics of the eigenvalue, and the method fails. For the situation where the number of sources is more than the actual number of array elements, sparse arrays are mostly used to increase the array degrees of freedom. The source number estimation method of sparse arrays first vectorizes the received signal covariance matrix to construct a virtual array model [8],  As can be seen from the Table 1 and Figure 1, both a lower SNR and a smaller number of snapshots will have a greater impact on the eigenvalues, with the impact of a low SNR being the most obvious. When SNRs are less than −10 dB, distributions of eigenvalues seen in Figure 1 are almost linear, so it is difficult to distinguish between larger and smaller eigenvalues; this makes it challenging to derive an accurate source number from the distributions of eigenvalues. In addition, from the above derivation process, it can be seen that when the number of sources is greater than or equal to the number of array elements, the number of sources cannot be judged according to the characteristics of the eigenvalue, and the method fails. For the situation where the number of sources is more than the actual number of array elements, sparse arrays are mostly used to increase the array degrees of freedom. The source number estimation method of sparse arrays first vectorizes the received signal covariance matrix to construct a virtual array model [8], where P = σ 2 1 , σ 2 2 , . . . , σ 2 K T , σ 2 i , i = 1, 2, . . . , and K denotes the signal power of the i-th source; when the signals are independent of each other, B(θ 1 , θ 2 , . . . , θ K ) can be further expressed as: B(θ 1 , θ 2 , . . . , θ K ) is the extended virtual array; note that the maximum continuous virtual array element response part length in B(θ 1 , θ 2 , . . . , θ K ) is L. Then, the L × K dimensional uniform array in B(θ 1 , θ 2 , . . . , θ K ) is selected as the new array-oriented vector, and P can be regarded as the new signal matrix for a single snapshot; the new covariance matrix is obtained by construction as where P can be regarded as the new signal matrix for a single snapshot. R v is spatially smoothed, i.e., the covariance matrix is averaged over all smoothed sub-array covariance matrices to obtain a full-rank covariance matrix R . The eigenvalue decomposition is performed on the covariance matrix R according to Equation (5), and the eigenvalues are ranked from largest to smallest, with the number of the first few larger eigenvalues being the number of sources. When the source number is smaller than the actual number of arrays or the maximum number of consecutive virtual arrays of sparse arrays, DOA estimation can be performed according to subspace algorithms, such as the MUSIC algorithm, ESPRIT algorithm, etc.

Convolutional Neural Network Model
In the estimation of both source number and DOA, the calculation of the received signal covariance matrix is an important step, and there is a non-linear mapping relationship between the received signal covariance matrix and the source number and DOA [21]. The convolutional neural network (CNN) model proposed in this paper is shown in the following Figure 2.
virtual array element response part length in ( , , … , ) is . Then, the × dimensional uniform array in ( , , … , ) is selected as the new array-oriented vector, and can be regarded as the new signal matrix for a single snapshot; the new covariance matrix is obtained by construction as where can be regarded as the new signal matrix for a single snapshot. is spatially smoothed, i.e., the covariance matrix is averaged over all smoothed sub-array covariance matrices to obtain a full-rank covariance matrix . The eigenvalue decomposition is performed on the covariance matrix according to Equation (5), and the eigenvalues are ranked from largest to smallest, with the number of the first few larger eigenvalues being the number of sources. When the source number is smaller than the actual number of arrays or the maximum number of consecutive virtual arrays of sparse arrays, DOA estimation can be performed according to subspace algorithms, such as the MUSIC algorithm, ESPRIT algorithm, etc.

Convolutional Neural Network Model
In the estimation of both source number and DOA, the calculation of the received signal covariance matrix is an important step, and there is a non-linear mapping relationship between the received signal covariance matrix and the source number and DOA [21]. The convolutional neural network (CNN) model proposed in this paper is shown in the following Figure 2. As can be seen from Figure 1, the CNN model proposed in this paper is a single-input multi-branch output neural network model. The received signal covariance matrix serves as the model's input, and its output is divided into two branches: one produces the number of sources, the other the DOA of all the sources, and the two outputs can both influence and verify one another. Since the source number is unknown, the number of output angles is uncertain, but the model training process requires the input and output to be fixedlength data, i.e., the data size of the input and output is fixed. In this premise, Output2 in this paper's model is of fixed length, and its length is the maximum possible number of sources received by the array. When the actual number of received sources is smaller than the length of Output2, it is padded with the value N, which is the value in the non-target As can be seen from Figure 1, the CNN model proposed in this paper is a single-input multi-branch output neural network model. The received signal covariance matrix serves as the model's input, and its output is divided into two branches: one produces the number of sources, the other the DOA of all the sources, and the two outputs can both influence and verify one another. Since the source number is unknown, the number of output angles is uncertain, but the model training process requires the input and output to be fixed-length data, i.e., the data size of the input and output is fixed. In this premise, Output2 in this paper's model is of fixed length, and its length is the maximum possible number of sources received by the array. When the actual number of received sources is smaller than the length of Output2, it is padded with the value N, which is the value in the non-target region. To avoid indistinguishability, N should preferably have a large difference with the target region boundary.

Convolutional Layers
In convolutional layers, a number of square arrays of fixed size, called filters, are given; these are also called convolution kernels. The size of the convolution kernels corresponds to what is called the perceptual field on the input matrix, and the convolution kernels move from left to right and from top to bottom on the input matrix according to the given step size; the perceptual field performs the convolution operation with the convolution kernels to obtain the output matrix, which can be used as the input matrix for subsequent convolutional layers [22]. When performing a convolution operation, the boundaries of the input matrix are computed less frequently than the elements within the matrix, and the size of the output matrix becomes smaller compared to the input during the convolution operation. In order to preserve the boundary features of the matrix as much as possible [23] and to avoid the problem of obtaining feature vectors that are too small in the subsequent convolution layers, the input data are padded before the convolution layer operation, usually with a padding value of 0. The blue arrows in the convolutional layers section of the diagram indicate convolutional operations, and "padding" indicates boundary padding. The model contains three convolutional layers, and the input data or feature vectors are padded before the first two convolutional operations, but not the third. The model uses the "SAME" mode of padding, i.e., the size of the input does not change before and after the convolution. Since there is no special meaning in the single element of the received data covariance matrix, filling the boundary before each convolution may lead to excessive retention of boundary information, resulting in data redundancy; therefore, no boundary filling is performed before the third convolution.
To increase the non-linear capability of the model, a bias term is added after the convolution operation, and a non-linear function is connected, which is called the activation function [24]. The common activation functions include sigmoid function, Tanh function, ReLU function [25], etc. However, the problem of the first two is that there is a gradient saturation problem, and the convergence speed is slow, while the ReLU function converges quickly and does not have the gradient saturation problem. However, the ReLU function will assign the value to zero in the case of less than zero, so that the neuron can no longer be activated. Based on the above problems, this paper chooses the Leaky ReLU function [26], which is an improvement of the ReLU function, and its expression is The value of a is generally small, usually a = 0.01. The Leaky ReLU function is chosen to avoid the problem of neurons not being re-activated and has a better performance.
In traditional convolutional neural network models, a pooling layer is connected after the convolutional layer, the purpose of which is to achieve feature dimensionality reduction to reduce the size of the data; however, in practice, the number of array elements is often finite, and usually the number is not particularly large, so the size of the covariance matrix is not very large. When the pooling layer is introduced, whether by maximum pooling or average pooling, etc., it will lead to the loss of the finite data, making it difficult for the features to be fully extracted, while not performing boundary filling before the third convolution operation will reduce the size of the feature vector after convolution, and to a certain extent, can achieve reduced complexity.

Fully Connected Layers
Fully connected layers are tiled structures of many neurons, which convert all the feature vectors obtained from the convolutional layer into a one-dimensional feature vector for classification or regression via this layer. The number of layers of fully connected layers (i.e., depth) and the number of neurons in each layer can effectively improve the non-linear performance and complexity of the model, thus enhancing the learning ability of the model; however, too much depth and too many neurons can easily cause overfitting. On the basis of limiting the depth and the number of neurons as much as possible, this paper uses the dropout method [18], which is schematically shown in Figure 3 below.
During the training process, the neurons in each layer are discarded with a certain probability at each iteration. The probability of each layer can be set individually, and the neurons in the output layer are not discarded. The dropout method is used to reduce the constraints of neurons on the training data and to enhance the generalization of the model. In the model of this paper, the dropout method is used for all fully connected layers except for the output layer of two branches. layers (i.e., depth) and the number of neurons in each layer can effectively improve the non-linear performance and complexity of the model, thus enhancing the learning ability of the model; however, too much depth and too many neurons can easily cause overfitting. On the basis of limiting the depth and the number of neurons as much as possible, this paper uses the dropout method [18], which is schematically shown in Figure 3 below. During the training process, the neurons in each layer are discarded with a certain probability at each iteration. The probability of each layer can be set individually, and the neurons in the output layer are not discarded. The dropout method is used to reduce the constraints of neurons on the training data and to enhance the generalization of the model. In the model of this paper, the dropout method is used for all fully connected layers except for the output layer of two branches.
In the first two layers of the fully connected layers, the activation functions are all Leaky ReLU functions [26]. The first branch, guided by the blue arrow, implements the estimation of the number of sources and belongs to the multi-classification problem. The activation function of the first layer of this branch is the Leaky ReLU function, and the second layer, the output layer, is the Softmax layer, where the Softmax function is used for the multi-classification problem, in which each classification is mutually exclusive: The loss function is the cross-entropy function, whose expression is where denotes the number of samples, denotes the number of categories, denotes the probability that the prediction is category j, and ( ) denotes the label that is actually category . Usually, = 1 if it belongs to category , and = 0 if it does not. The second branch guided by the orange arrow implements the estimation of the DOA for each source; the activation function of this branch is the Leaky ReLU function. This branch is a regression problem; its loss function is the mean squared difference loss function, whose expression is: where denotes the number of samples, denotes the number of outputs, denotes the true value, and denotes the output value, i.e., DOA estimation in this paper. The two branches are independent of each other, but neurons of the first two layers are jointly influenced by the two branches. In the first two layers of the fully connected layers, the activation functions are all Leaky ReLU functions [26]. The first branch, guided by the blue arrow, implements the estimation of the number of sources and belongs to the multi-classification problem. The activation function of the first layer of this branch is the Leaky ReLU function, and the second layer, the output layer, is the Softmax layer, where the Softmax function is used for the multi-classification problem, in which each classification is mutually exclusive: The loss function is the cross-entropy function, whose expression is where M denotes the number of samples, N denotes the number of categories, p x ij denotes the probability that the prediction is category j, and q x ij denotes the label that is actually category j. Usually, q x ij = 1 if it belongs to category j, and q x ij = 0 if it does not. The second branch guided by the orange arrow implements the estimation of the DOA for each source; the activation function of this branch is the Leaky ReLU function. This branch is a regression problem; its loss function is the mean squared difference loss function, whose expression is: where M denotes the number of samples, N denotes the number of outputs, y ij denotes the true value, and y ij denotes the output value, i.e., DOA estimation in this paper. The two branches are independent of each other, but neurons of the first two layers are jointly influenced by the two branches.

Simulation Experiments and Analysis of Results
In the simulation experiments, an ideal ULA and a sparse array with eight array elements are designed, where the array elements of ULA are arranged as [0, 1, 2, 3, 4, 5, 6, 7,]·(λ/2), the sparse array is a co-prime array with co-prime numbers of 4 and 5, whose array elements are arranged as [0, 4,5,8,10,12,15,16]·(λ/2), and λ is the signal wavelength. The source number is increased from 1 to 18, and the target region is [−60 • , 60 • ]. In the proposed model, there are three convolutional layers, the size of convolutional kernels is 3 × 3, and the number of convolutional kernels is 64, 64 and 32. The fully connected layers are designed as shown in Figure 2; the number of neurons in the first two layers is 1000 and 800, and the probability of dropout is 0.2. The probabilities of dropout are 0.2 and 0.1 for Output1, and the probabilities of dropout are 0.3 and 0.2 for Output2, which has two fully connected layers with 800 and 600 neurons. Since a series of operations such as convolution operations and activation functions in the forward and backward propagation processes in CNN are all real operations, and the model outputs are also real numbers, the complex components are not suitable for participation in them, but the complex components are still meaningful for angle estimation and cannot be discarded. However, the complex components remain significant for angle estimation and cannot be ignored. In the simulation, the real and imaginary components of the covariance matrix are thus split apart and combined to create a M × 2M matrix (M indicates the actual number of elements), which is used as the convolutional neural network's input. The total number of training sets is 36,000, and the total number of test sets is 5400. The simulation experiments are designed to implement source number estimation and DOA estimation when the number of sources is unknown.

Performance of Source Number Estimation
The accuracy and consistency rates are used as indicators for evaluating the performance of the source number estimation, where the accuracy rate indicates the proportion of the source number estimate that is the same as the actual value; it is used to evaluate the good or bad performance of the model for the source number estimation. The expressions are: where N a denotes the number of accurate estimates achieved in the test set, and N t denotes the test set capacity. Although Output2 outputs the target's DOA, it is also able to reflect the number of sources from it. The consistency rate represents the proportion of the test set in which the estimated number of sources output by Output1 and the number of sources obtained from Output2 are the same; it is used to measure the degree of consistency between the two output branches and is expressed as: where N c denotes the amount of data with the same number of sources obtained by Output1 and Output2. Statistically, the accuracy of the source number estimation for different numbers of sources received by the model for ULA and sparse array at the snapshot number of 200 and different SNRs are shown in the table below Tables 2 and 3.  From Tables 2 and 3, it can be seen that the accuracy of estimation for a different number of sources for both sparse arrays and ULA is above 90% at different SNRs. When there are no more than 12 sources for sparse arrays and no more than 6 sources for ULA, both are capable of achieving an accuracy of 100%. However, as the number of sources increases, it is evident that the estimation accuracy under high SNR conditions is higher. Nearly all estimation errors are +1 or −1 for situations where the estimated number of sources is different from the actual number of sources. Consistency of the source number estimation for different numbers of sources received by the model for ULA and sparse array at the snapshot number of 200 and different SNRs are shown in the table below  Tables 4 and 5. Comparing Tables 4 and 5, it can be seen that the model estimates the number of sources for sparse arrays slightly more accurately than it does for ULA. Comparing Tables 2-5, it can be seen that the accuracy and consistency rate of the source number estimation are essentially the same, with a slightly lower consistency rate when the source number increases and the SNR is small. As a result, the more precise performance of Output2's source number is marginally inferior to Output1's. In particular, it is possible that some angles cannot be estimated when two or more sources have similar angles of incidence, which could result in underreporting.

Performance of DOA Estimation
When DOA estimation of Output2 has a large deviation from the target region, it is regarded as an invalid estimation. In addition, all the filler values of N set in the text are invalid estimations. Usually, the root mean square error (RMSE) is used as an index to evaluate the accuracy of DOA estimation, and its expression is: where N t denotes the number of all valid sources, θ i denotes the estimated angular value, and θ i denotes the actual angular value.

DOA Estimation Performance at Different SNRs
When the number of snapshots is 200 and SNRs are −5 dB, 0 dB, 5 dB, 10 dB, 15 dB, 20 dB, 25 dB, ULA and the sparse array perform DOA estimation is conducted when the source number is unknown to obtain Output2; the RMSE of DOA estimation at different SNRs with the number of sources from 1 to 18 respectively is shown in the Tables 6 and 7.  Tables 6 and 7 show that for both ULA and sparse array, estimation accuracy increases with increasing SNR while decreasing with increasing sources. Particularly, estimation accuracy decreases more noticeably for sparse array when the number of sources is greater than 16 and for ULA when the number of sources is greater than 6. With the same number of sources and SNR as shown in Tables 6 and 7, sparse arrays' estimation accuracy is marginally better than the ULA's. Regardless of the quantity of sources, they perform better during estimation for sparse arrays when the SNR is higher than 10 dB RMSE of DOA estimation for ULA and sparse arrays for all valid sources (sources numbered from 1 to 18) at the same SNR. Comparing the RMSE of the algorithm proposed in this paper with the differential co-array joint MUSIC algorithm (DCAM) [27], L1SVD algorithm [28], and L1CMSR algorithm [29] for sparse arrays of the same array type, for ULA, the MUSIC algorithm [30] is used for multi-objective DOA estimation with the same array type. Only consecutive differential joint array elements can be used in the simulation experiments of this paper when using the differential common array algorithm. According to the co-prime array mentioned in this paper, its maximum number of consecutive array elements is 13, so the maximum number of sources that can be estimated by this algorithm is 12. The L1SVD algorithm and the L1CMSR algorithm can estimate a maximum of 16 sources each. Therefore, the number of sources for the aforementioned three algorithms is planned to be 1 to 12, 1 to 16, and 1 to 16 in the simulation experiments, while the MUSIC algorithm's maximum number of estimable sources must be less than the number of array elements, or 7 in this case. As a result, 1 to 7 sources are intended to be used in the comparison experiments for the MUSIC to perform DOA estimation on ULA. The results are shown in Figure 4. algorithm is 12. The L1SVD algorithm and the L1CMSR algorithm can estimate a m mum of 16 sources each. Therefore, the number of sources for the aforementioned t algorithms is planned to be 1 to 12, 1 to 16, and 1 to 16 in the simulation experim while the MUSIC algorithm's maximum number of estimable sources must be less the number of array elements, or 7 in this case. As a result, 1 to 7 sources are intende be used in the comparison experiments for the MUSIC to perform DOA estimatio ULA. The results are shown in Figure 4. As can be seen from Figure 4, each algorithm's RMSE decreases and estimation a racy increases as the SNR rises. When the array is a sparse array, the algorithm sugge in this paper performs better than DCAM, L1SVD, and L1CMSR. When the array is a U the algorithm outperforms MUSIC when the SNR is less than 5 dB, and the MUSIC a As can be seen from Figure 4, each algorithm's RMSE decreases and estimation accuracy increases as the SNR rises. When the array is a sparse array, the algorithm suggested in this paper performs better than DCAM, L1SVD, and L1CMSR. When the array is a ULA, the algorithm outperforms MUSIC when the SNR is less than 5 dB, and the MUSIC algorithm performs better when the SNR is greater than 5 dB. It should be noted that the other four algorithms require a known number of sources to achieve DOA estimation, and the MUSIC algorithm is prone to generating false spectral peaks or no spectral peaks when the source number is large and the angles of incidence of the sources are similar, i.e., when the angular difference is small, which affects the estimation performance.

DOA Estimation Performance at Different Snapshots
When the number of sources is unknown, the SNR is 10 dB and the number of snapshots is 50, 100, 150, 200, 300, and 400, respectively, the ULA and sparse array perform DOA estimation to obtain Output2. According to statistics, Tables 8 and 9 below show the estimated RMSE of DOA at various snapshots, with the source number ranging from 1 to 18, respectively. From Tables 8 and 9, it can be seen that the estimation accuracy rises with the number of snapshots and that the estimation error is higher when the number of sources is high and the number of snapshots is small. When there are many sources, there is a greater chance that there will be two or more sources with a close angle of incidence. When this occurs, the angle estimation error and RMSE both rise along with the number of sources.
Under the above conditions, the RMSEs of the proposed algorithm for DOA estimation of sparse arrays and ULAs for all valid sources (number of sources from 1 to 18) at the same snapshot are calculated. For sparse arrays, the algorithm is compared with the DCAM [20], L1SVD [21], and L1CMSR [22] algorithms for the RMSE of DOA estimation under the same conditions, and with the MUSIC [23] algorithm for RMSE comparison for ULA. The maximum number of measurable sources for each algorithm is consistent with the above. The results are shown in Figure 5.
From Figure 5, it is clear that as the number of snapshots rises, each algorithm's estimation accuracy gets better. When the array is ULA, it can be seen that when the number of snapshots is less than 200, the RMSE of the proposed algorithm is significantly lower than that of MUSIC algorithm, and the estimation accuracy of the proposed algorithm is higher, while when the number of snapshots is greater than 200, the estimation accuracy of the two algorithms is not significantly different. The estimation accuracy of the proposed algorithm is superior to the other three algorithms when the array is sparse, and its RMSE is also lower. When working with sparse arrays and ULA, it is clear from comparing Figures 4 and 5 that the SNR has a greater impact on the algorithm than the number of snapshots.  From Figure 5, it is clear that as the number of snapshots rises, each algorithm's estimation accuracy gets better. When the array is ULA, it can be seen that when the number of snapshots is less than 200, the RMSE of the proposed algorithm is significantly lower than that of MUSIC algorithm, and the estimation accuracy of the proposed algorithm is higher, while when the number of snapshots is greater than 200, the estimation accuracy of the two algorithms is not significantly different. The estimation accuracy of the proposed algorithm is superior to the other three algorithms when the array is sparse, and its RMSE is also lower. When working with sparse arrays and ULA, it is clear from comparing Figures 4 and 5 that the SNR has a greater impact on the algorithm than the number of snapshots.

Performance at Small Snapshots and Low SNR
From Tables 6-8, as well as Figures 3 and 4, it can be seen that both the number of snapshots and the SNR affect the estimation accuracy of the source number and the RMSE of the DOA estimation. To verify the estimation performance of the algorithm proposed in this paper under the conditions of low SNR and small snapshots, the ULA and sparse

Performance at Small Snapshots and Low SNR
From Tables 6-8, as well as Figures 3 and 4, it can be seen that both the number of snapshots and the SNR affect the estimation accuracy of the source number and the RMSE of the DOA estimation. To verify the estimation performance of the algorithm proposed in this paper under the conditions of low SNR and small snapshots, the ULA and sparse array are designed to perform source number and DOA estimation under the conditions of a snapshot number of 50 and SNR of −5 dB; the accuracy of source number estimation and RMSE of DOA estimation under each source number is shown in Figure 6. of the DOA estimation. To verify the estimation performance of the algorithm proposed in this paper under the conditions of low SNR and small snapshots, the ULA and spars array are designed to perform source number and DOA estimation under the condition of a snapshot number of 50 and SNR of −5 dB; the accuracy of source number estimation and RMSE of DOA estimation under each source number is shown in Figure 6. It can be seen from Figure 6a that the accuracy of source number estimation decreases as the source number increases at low snapshot and low SNR. When the source number does not exceed 8, for sparse arrays and ULA, the accuracy is close to 100%; when it is greater than 8, the accuracy of source number estimation for sparse arrays is significantly better than ULA. After the number of sources exceeds 12, the accuracy of the source number for ULA estimation decreases significantly; when the number of sources reaches 18, and the accuracy of the estimation is still greater than 80%, while the accuracy of the estimation of the source number for the sparse array decreases significantly when the source number is greater than 15, and the accuracy is more than 85% when the source number reaches 18. As can be seen from Figure 6b, the RMSE of both the sparse array and ULA increases with the source. The DOA estimation performance of the sparse array is significantly better than that of ULA, and the DOA estimated RMSE of ULA increases significantly when the number of sources is greater than six; in particular, when the source number is greater than 15, the estimated RMSE exceeds 1°, the RMSE of the sparse array is less than 0.6° when the number of sources is less than 9, and the RMSE is around 0.7° when the number of sources is greater than 13. Combined with Figure 6a,b, it can be seen that the proposed algorithm still has good performance in source number estimation and DOA estimation at low snapshot and low SNR and can meet the practical requirements.

Discussion
In the field of array signal processing, the source number is usually a prerequisite for estimating signal-related parameters. In order to simplify the source number and DOA estimation problem and improve the estimation accuracy, a joint source number and DOA estimation method based on an improved convolutional neural network is proposed in this paper. The model discards the pooling layer in the traditional convolutional neura network to fully retain and extract the features of the covariance matrix and introduces It can be seen from Figure 6a that the accuracy of source number estimation de-creases as the source number increases at low snapshot and low SNR. When the source number does not exceed 8, for sparse arrays and ULA, the accuracy is close to 100%; when it is greater than 8, the accuracy of source number estimation for sparse arrays is significantly better than ULA. After the number of sources exceeds 12, the accuracy of the source number for ULA estimation decreases significantly; when the number of sources reaches 18, and the accuracy of the estimation is still greater than 80%, while the accuracy of the estimation of the source number for the sparse array decreases significantly when the source number is greater than 15, and the accuracy is more than 85% when the source number reaches 18. As can be seen from Figure 6b, the RMSE of both the sparse array and ULA increases with the source. The DOA estimation performance of the sparse array is significantly better than that of ULA, and the DOA estimated RMSE of ULA increases significantly when the number of sources is greater than six; in particular, when the source number is greater than 15, the estimated RMSE exceeds 1 • , the RMSE of the sparse array is less than 0.6 • when the number of sources is less than 9, and the RMSE is around 0.7 • when the number of sources is greater than 13. Combined with Figure 6a,b, it can be seen that the proposed algorithm still has good performance in source number estimation and DOA estimation at low snapshot and low SNR and can meet the practical requirements.

Discussion
In the field of array signal processing, the source number is usually a prerequisite for estimating signal-related parameters. In order to simplify the source number and DOA estimation problem and improve the estimation accuracy, a joint source number and DOA estimation method based on an improved convolutional neural network is proposed in this paper. The model discards the pooling layer in the traditional convolutional neural network to fully retain and extract the features of the covariance matrix and introduces the dropout method in the fully connected layer to improve the generalization ability of the model. Simulation experiments show that the algorithm is not restricted by the type of array and can achieve joint estimation of source number and DOA for both sparse arrays and ULAs with high estimation accuracy. When the number of sources is large, the classical algorithm is difficult to apply, but the algorithm in this paper still has high estimation accuracy; especially under low signal-to-noise ratio and small snapshot conditions, the algorithm still has high estimation accuracy. The analysis of the model and results shows that in the signal model, the larger the number of snapshots, the closer the sampled signal is to the real signal and the more accurate the feature extraction is, while the higher the signal-to-noise ratio, the less the signal is disturbed by noise and the easier it is to extract effective features. Thus, in both cases, the angle estimation of the algorithm proposed in this paper is higher, and as the signal-to-noise ratio and the number of snapshots further increase, the difference between the algorithm proposed in this paper and the traditional algorithm gradually decreases. On the one hand, the traditional algorithm is more directly influenced by the signal-to-noise ratio and snapshot; on the other hand, it is limited by the fact that when the covariance matrix is used as input, it is not a complete signal covariance matrix, but a matrix of N × 2N with the real and imaginary parts stitched together to facilitate model training. This will also affect the feature extraction to a certain extent, so there is a limitation in the estimation accuracy. For the case of low SNR and small snapshots, the network model is estimated by extracting and combining features layer by layer, which is less affected by low SNR and small snapshots than the traditional algorithm based on feature decomposition for angle estimation; thus, the estimation accuracy is higher than the traditional algorithm under the condition of low SNR and small snapshots; combined with the limitation of the array's own degrees of freedom, the performance of the proposed algorithm degrades significantly after the number of sources exceeds a certain range. On the other hand, when the number of sources is large, the possibility of a smaller angular interval increases, and when the angular interval is small, the estimation error is likely to increase. Thus, in the application, when the spatial target is dense or the number of sources is too large, the algorithm proposed in this paper is suitable as an auxiliary reference. In future research, making fuller use of the signal covariance matrix in order to extract features more effectively for higher accuracy DOA estimation becomes one of the research priorities.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available, due to the data in this paper not being from publicly available datasets but obtained from the simulation of the signal models listed in the article.