## 1. Introduction

The essence of signal detection is a binary classification problem to judge whether an input sample contains a target signal. It is the main content of signal detection to find the characteristics of the target signal and distinguish the input sample as only containing interference or existing target signal. Feature-based signal detection is a hotspot in the signal processing field. There are many classic approaches that have helped drive the field, but their shortcomings are also obvious. The instantaneous method [

1], which utilizes the instantaneous frequency and amplitude of the signal as the detection target, works on a simple principle but is adversely affected by signal-to-noise ratio or signal-to-clutter ratio (SNR/SCR). The zero-crossing discrimination method [

2] has excellent performance at high SNR/SCR levels but requires a high data sampling rate. The maximum likelihood criterion method [

3] can theoretically reach optimal performance under the premise of following the Bayesian criterion; however, it relies on prior information. The spectral line feature method [

4] requires no prior knowledge and has strong robustness, but its scope of action is limited by the signal spectrum characteristics. These methods all exploit one or more signal spectral characteristic, but are still based on detecting Euclidean space. Under low SNR or SCR conditions, insufficient data, lack of prior information and so on, the detection performances of these methods are still unsatisfactory.

In recent years, the manifold of SPD (symmetric positive definite) matrices has attracted much attention due to its powerful statistical representations. In medical imaging, such manifolds are used in diffusion tensor magnetic resonance imaging [

5], and in the computer vision community, they are widely used in face recognition [

6,

7], object classification [

8], transfer learning [

9], action recognition in videos [

10,

11], radar signal processing [

12,

13,

14] and sonar signal processing [

15]. The powerful statistical representations of SPD matrices lie in the fact that they inherently belong to the curved Riemannian manifold. Hence, failing to consider the underlying geometric information leads to less than ideal results in practice.

Figure 1 shows the metric difference between two points on the SPD matrix. The blue surface represents the SPD matrix manifold, and A and B represent two SPD matrices. The red segment represents the Euclidean distance between two points, while the Riemannian distance between two points is represented by the green segment. Euclidean distance measures the nonlinear distance between two points, while the Riemannian distance measures the distance between two points along the matrix manifold, that is, the geodesic distance.

To make full use of the Riemannian geometry characteristics of SPD matrix manifolds, a series of studies have been carried out in the fields of nonlinear metric and matrix learning. To address the problem that SPD matrix manifolds are not applicable to the Euclidean metric, many non-Euclidean metrics are introduced, such as the affine-invariant metric [

5] and the log-Euclidean metric [

16,

17]. Utilizing these Riemannian metrics, a series of methods for working with the underlying Riemannian manifolds have been proposed, for example, flattening SPD matrix manifolds by tangent space approximation [

18], mapping SPD matrix manifolds into reproducing kernel Hilbert spaces [

19], and pursuing a mapping from an original SPD matrix manifold to another SPD matrix manifold with the same SPD matrix structure to preserve the original SPD matrix geometry [

20].

Regarding deep learning, ref. [

20] proposed a Riemannian network for SPD matrix learning (SPDnet) that combined a neural network, the method for manifold mapping under the same SPD structure and the log-Euclidean metric. This approach proved to be superior to the traditional neural network method and the shallow SPD matrix learning method. Ref. [

21] proposed a deep learning manifold to learn SPD matrices and applied it to face recognition. Ref. [

22] proposed some functional improvements on the deep SPD matrix learning network and applied the network to action recognition. In ref. [

11], SPDnet was shown to be robust and achieve high accuracy even when the training data are insufficient. Based on deep SPD matrix learning and convolutional neural networks, ref. [

23] proposed the SPD aggregation network. It is foreseeable that the combination of deep learning and Riemannian manifolds will be an important development direction of matrix learning.

In this work, we propose a spectral-based SPD matrix signal detection method based on a deep neural network. This method utilizes an SPD matrix deep learning network and two matrix models: an SPD matrix of spectral covariance and an SPD matrix of spectral transformation. This method transforms the problem of signal detection into a binary classification problem on a nonlinear spatial matrix manifold. When nonlinear metrics are applied to signal detection, the differences between the SPD matrices can be found well. Under low SCR conditions, our method outperforms the spectral signal detection method based on a convolutional neural network.

The remainder of this paper is organized as follows: In

Section 2, we illustrate that the theory of convolutional neural network is based on Euclidean space and describe the spectral signal detection method based on a convolutional neural network and focus on two classical convolutional neural networks. In

Section 3, we illustrate the reasons why the Euclidean metric cannot be used for the SPD matrix and introduce the spectrum SPD matrix signal detection method based on a deep neural network. In

Section 4, we first adopt a simulation dataset based on K-distribution clutter to evaluate the performance of the proposed spectral-based SPD matrix signal detection method based on a deep neural network and explore the influences of its hyperparameters, and then adopt a real sea clutter based semi-physical simulation dataset to compare the performance of the signal detection method based on the deep neural network. Finally, a conclusion is provided in

Section 5.

## 2. Spectral Image-Based Signal Detection with a Deep Neural Network

The signal detection problem can be transformed into a spectral dichotomy problem, making the methods applicable to image classification also applicable to signal detection. The spectrogram saved in the form of 2-D image is a typical Euclidean structure data [

24]. The strong feature extraction abilities of convolutional neural networks give them advantages in image classification and recognition tasks. There is a statistical correlation between the pixel nodes in the image data, and the convolution operation can extract and strengthen the correlation between global nodes or local nodes. Due to the neat arrangement of Euclidean structural data, the different between different samples can be measured by Euclidean metric. The

$n$-dimensional Euclidean space distance is as follows:

where

$d\left(x,y\right)$ denotes the Euclidean distance between two

$n$-dimensional points

$X\left({x}_{1},{x}_{2},\dots ,{x}_{n}\right)$ and

$Y\left({y}_{1},{y}_{2},\dots ,{y}_{n}\right)$.

For an image sample with N pixel points, after it is expanded into a vector, it can be regarded as a point in the N-dimensional Euclidean space, and each original pixel point can be regarded as a dimension value of that point. At this point, the difference between different samples is transformed into the distance of different points in N-dimensional Euclidean space. The idea of full connection layer algorithm in deep learning is based on it.

Based on the above characteristics, the convolutional neural network is well established. Since AlexNet [

25] first appeared, researchers have developed a series of convolutional neural networks, among which GoogLeNet [

26] and ResNet [

27] are classic models.

#### 2.1. GoogLeNet

GoogLeNet is a type of convolutional neural network proposed by Google in 2014 that won first place in the ILSVRC2014 competition. GoogLeNet has an improved sparse network design, called Inception. Inception has steadily been improved across multiple versions (V1, V2, V3 and V4). Here, we introduce the Inception V1 version used by GoogLeNet.

The Inception module of GoogLeNet originates from a simplified model of the neurons in the human brain. Its structure is shown in

Figure 2. Three convolutional layers (blue blocks) are used to perform feature extraction from the input image, and one pooling layer (purple blocks) is used to reduce overfitting. Each convolutional layer is followed by a nonlinear operation that increases the nonlinear characteristics of the network. A 1 × 1 convolutional layer (orange blocks) is used to reduce the feature map dimensions and decrease the number of parameters. The outputs of all the operations are eventually concatenated. GoogLeNet includes nine Inception structures. During training, GoogLeNet uses two auxiliary classifiers to avoid the vanishing gradient problem.

#### 2.2. ResNet

ResNet was proposed in 2015 and won first place at ILSVRC2015. Its innovation was the inclusion of a residual block structure that solves the vanishing gradient problem that occurs as the number of neural network layers increases.

ResNet networks currently have evolved through five versions: ResNet18, ResNet34, ResNet50, ResNet101 and ResNet152. In this study, we only use ResNet50.

The residual block structure of ResNet50 is shown in

Figure 3, where

$x$ is the input of the previous layer. After entering the residual block,

$x$ is processed using two methods. The first method conducts a convolution operation, whose output is

$F\left(x\right)$. The second method takes the input as the output via a shortcut connection. The output of the residual block is

$H\left(x\right)\u2254F\left(x\right)+x$. The bulk of ResNet is made up of many residual blocks. The goal when training the network is to make the residual

$F\left(x\right)\u2254H\left(x\right)-x$ close to 0. Under this training principle, this deep network will not suffer a decline in accuracy, which helps it extract the target characteristics.

The innovations introduced by GoogLeNet and ResNet improved the connection mode of each neural network layer, providing a strong inhibitory effect on the learning degradation caused by deepening the network.