FFSCN: Frame Fusion Spectrum Center Net for Carrier Signal Detection

: Carrier signal detection is a complicated and essential task in many domains because it demands a quick response to the existence of several carriers in the wideband, while also precisely predicting each carrier signal’s frequency centers and bandwidths, including single-carrier and multi-carrier modulation signals. Multi-carrier modulation signals, such as FSK and OFDM, could be incorrectly recognized as several single-carrier signals by using the spectrum center net (SCN) or FCN-based method. This paper designed a deep convolutional neural network (CNN) framework for multi-carrier signal detection by fusing the features of multiple consecutive frames of the broadband power spectra and estimating the information of each single-carrier or multi-carrier modulation signal in the broadband, called frame fusion spectrum center net (FFSCN), including FFSCN-R, FFSCN-MN, and FFSCN-FMN. FFSCN includes three base parts, the deep CNN-based backbone, the feature pyramid network (FPN) neck, and the regression network (RegNet) head. FFSCN-R and FFSCN-MN fusing the FPN out features, which use the Residual and MobileNetV3 backbone, respectively, and FFSCN-MN cost less inference time. To further reduce the complexity of FFSCN-MN, the designed FFSCN-FMN modiﬁes the MobileNet blocks and fuses the features at each block of the backbone. The multiple consecutive frames of broadband power spectra not only preserve the high-resolution ratio of the broadband frequency, but also add the features of the signal changes in the time dimension. Extensive experimental results demonstrate that the proposed FFSCN can effectively detect multi-carrier and single-carrier modulation signals in the broadband power spectrum and outperform SCN in accuracy and efﬁciency.


Introduction
Carrier signal detection in the wideband is usually the first and most vital step of blind communication signal processing. For further study, each sub-carrier signal demodulation, channel decoding, and other subsequent analyses, accurate carrier signal detection in the wideband is a prerequisite.
Similar to the primary signal detection in cognitive radio (CR) [1], carrier signal detection often requires the timely and precise detection of all sub-carrier signals in a noncooperative communication environment in a wideband signal, which can be formulated as follows [2]: where Y(n) denotes the received non-cooperative wideband signal, S i (n) is the i th subcarrier signal, M denotes the numbers of all sub-carrier signals in the received wideband signal, W(n) denotes the received noise, which can be modeled as the zero-mean additive white Gaussian noise (AWGN), and H 0 and H 1 denote the hypothesis of the absence and the presence, respectively, of the sub-carrier signal in the received wideband signal. The remainder of this paper is organized as follows. We start with a discussion of related work in Section 2. Section 3 introduces the details of the proposed method. In Section 4, the experimental dataset, training setup, evaluation metrics, results, and some ablation studies are given. Finally, Section 5 concludes the paper.

Related Work
Artificial intelligence (AI) technologies, especially deep learning techniques, have now been applied in many areas, such as computer vision (CV), speech recognition, and natural language processing (NLP) [16]. Furthermore, in the wireless communication field, many researchers have performed considerable exploration of deep learning and its application [17][18][19], as well as communicational signal detection problems [20][21][22].
Inspired by fully connected networks (FCNs) [23,24] applied in two-dimensional (2D) object semantic segmentation, References [12,13] used an FCN-based model consisting of an encoder and a decoder for carrier signal detection in the broadband power spectrum. The FCN-based methods cannot correctly distinguish between the demarcation points when two or more neighboring subcarriers are very close. Moreover, the FCN-based methods need much post-processing, and their performance degrades severely as the signal-to-noise ratio (SNR) decreases. Reference [14] proposed SCN, an end-to-end deep-learning-based CNN model for carrier signal detection in the broadband power spectrum. SCN regards the carrier signal detection problem as a 1D object localization problem and uses an end-to-end CNN model to regress each sub-carrier's frequency center (FC) and bandwidth (BW) in the broadband power spectrum. It achieved better performance than the FCN-based methods, but cost much more inference time for its complex computation.
In the past few years, many researchers have engaged in designing a small deep neural network architecture for an optimal trade-off between accuracy and efficiency, such as Xception network [25], SqueezeNet [26], ShuffleNet [27], CondenseNet [28], ShiftNet [29], and MobileNets serious [30][31][32]. Among these methods, MobileNetV1 [30] employs depthwise separable convolution to improve computation efficiency substantially. MobileNetV2 [31] expands on this by introducing a resource-efficient block with inverted Residuals and linear bottlenecks. Moreover, MobileNetv3 [32] uses a combination of hardware-aware network architecture search (NAS) complemented by the NetAdapt algorithm and subsequently improved through novel architecture advances.
In this study, we propose the FFSCN models. As an upgrade to SCN, we replaced the ResNet backbone with a MobileNetV3 base backbone to reduce network computation; moreover, we incorporated a frequency center (FC) shift regression in RegNet to correct the FC prediction. In particular, we created a fusion block based on the MobileNetV3 block, which is utilized in the MobileNet backbone. Extensive experimental results demonstrated that the proposed FFSCN can effectively detect multi-carrier and single-carrier modulation signals in the broadband and outperform current deep-learning-based approaches in accuracy and efficiency.

Data Preprocessing
In this study, we employed multiple consecutive frames of the broadband power spectra as the input of the proposed FFSCN for carrier signal detection. The Welch method [33,34] was used to obtain the broadband power spectrum.
First, the N-point frame of the received signal sequence Y(n) is subdivided into K overlapping segments, each with a length of M. Thus, the l th data segments can be represented as Y l (n) = Y(n + iD) n = 0, 1, . . . , M − 1 l = 0, 1, . . . , K − 1 where iD is the starting point for the l th data segments and M − D is the overlap between each of two neighbor segments. Then, for each segment, a window function w(n) of length M is used to window the data prior to computing the periodogram. The result is where ω is the frequency of the received signal and U is a normalization factor for the power in the window function and is selected as The Welch power spectrum estimation is the average of the K-modified periodograms, namely: Next, the network input is a matrix that can be formulated by where r denotes the number of consecutive frames of the broadband power spectra; furthermore, the logarithmic transformation is used to convert power to decibels, which scales the numerical range of the spectra. Finally, we adopted zero mean normalization to normalize the network input matrix, which follows as where P and σ(P) are the mean and standard deviation of all the elements in the matrix P, respectively.

FFSCN Architecture
FFSCN was built based on the SCN model, which consists of three main parts: the deep-CNN-based backbone, the FPN neck, and the RegNet head, as Figure 1 shows. In this work, to accurately detect the single-carrier and multi-carrier signals, we used multiple consecutive frames of the broadband power spectra in the wideband to replace the singleframe input in SCN. Therefore, the backbone and FPN neck will extract more valuable features of the sub-carriers, and our aim was to fuse these features, which is the difference between the FFSCN and SCN methods.
In FFSCN-R, we added an adaptive average pooling layer between the FPN neck and the RegNet head to fuse the output features of the FPN neck output features. However, compared to SCN, multi-frame input requires significantly more processing in the Residual backbone and the FPN neck, which reduces the network's inference speed and prevents it from responding quickly to the burst signals in the wideband. Therefore, FFSCN-MN uses the MobileNetV3 backbone, not the Residual backbone, to reduce the amount of the network. FFSCN-R and FFSCN-MN both fuse the features before RegNet. Although this is an effective solution for multi-carrier signal detection, it still sacrifices too much time in the backbone and the FPN neck parts. To further improve the network's performance, we modified the MobilleNetV3 backbone and propose FFSCN-FMN, which fuses the multiframe input features in all the blocks of the Fusion-MN backbone. Moreover, FFSCN-FMN optimizes the network complexity and improves the detection performance. In FFSCN-R, we added an adaptive average pooling layer between the FPN neck and the RegNet head to fuse the output features of the FPN neck output features. However, compared to SCN, multi-frame input requires significantly more processing in the Residual backbone and the FPN neck, which reduces the network's inference speed and prevents it from responding quickly to the burst signals in the wideband. Therefore, FFSCN-MN uses the MobileNetV3 backbone, not the Residual backbone, to reduce the amount of the network. FFSCN-R and FFSCN-MN both fuse the features before RegNet. Although this is an effective solution for multi-carrier signal detection, it still sacrifices too much time in the backbone and the FPN neck parts. To further improve the network's performance, we modified the MobilleNetV3 backbone and propose FFSCN-FMN, which fuses the multi-frame input features in all the blocks of the Fusion-MN backbone. Moreover, FFSCN-FMN optimizes the network complexity and improves the detection performance.

Network Backbones
The backbone network is the fundament of a deep learning object detection model. Figure 2 shows the basic block specification architecture of three deep-CNN-based backbone networks in our work.
Firstly, both FFSCN-R and SCN use the same Residual backbone network, which is modified by the deep residual network (ResNet) [35], and we added a simplified channel attention module (S-CAM) [36] prior to the last nonlinear activation of the Residual block. In SCN [14], we elaborated on the specification structure of the Residual backbone and block.
Then, by replacing the Residual backbone network with the MobileNetV3 backbone, we propose the FFSCN-MN model, and Figure 2b shows the MobileNetV3 block structure. MobileNets are based on a streamlined architecture that uses depthwise separable convolutions to build lightweight deep neural networks. MobileNetV3 adds Squeeze-and-Excite [37] in the inverted Residual with the linear bottleneck of the MobileNetV2 block. Furthermore, MobileNetV3 uses the hard-swish nonlinear activation to enhance the network inference speed [32,38]. The hard-swish function is as follows: Next, in FFSCN-FMN, we added two adaptive average pooling layers at the beginning and the end of the MobileNetV3 block, referred to as the Fusion block, as illustrated in Figure 2c. After adding the two adaptive average pooling layers, the features

Network Backbones
The backbone network is the fundament of a deep learning object detection model. Figure 2 shows the basic block specification architecture of three deep-CNN-based backbone networks in our work.  However, the original MobileNet only downsampled the input scale five times, which is not enough to extract the useful features for the carrier detection task, according to the experimentation in SCN [14]. Therefore, we added some Fusion blocks to increase the downsample times. All the added Fusion blocks' strides were set to 2, and the nonlinear activations were hard-swish functions. The specification for the FFSCN-FMN backbone network is shown in Table 1, and FFSCN-MN uses the same structure, but with Mo-bileNetV3 blocks in the operator.
In SCN, we found that the performance of SCN-11× is almost comparable to that of SCN-13×, but the inference time is shorter [14]. Therefore, in this paper, the downsample Firstly, both FFSCN-R and SCN use the same Residual backbone network, which is modified by the deep residual network (ResNet) [35], and we added a simplified channel attention module (S-CAM) [36] prior to the last nonlinear activation of the Residual block. In SCN [14], we elaborated on the specification structure of the Residual backbone and block. Then, by replacing the Residual backbone network with the MobileNetV3 backbone, we propose the FFSCN-MN model, and Figure 2b shows the MobileNetV3 block structure. MobileNets are based on a streamlined architecture that uses depthwise separable convolutions to build lightweight deep neural networks. MobileNetV3 adds Squeezeand-Excite [37] in the inverted Residual with the linear bottleneck of the MobileNetV2 block. Furthermore, MobileNetV3 uses the hard-swish nonlinear activation to enhance the network inference speed [32,38]. The hard-swish function is as follows: Next, in FFSCN-FMN, we added two adaptive average pooling layers at the beginning and the end of the MobileNetV3 block, referred to as the Fusion block, as illustrated in Figure 2c. After adding the two adaptive average pooling layers, the features of the consecutive frames of inputs are no longer kept as separate, but are fused to be a whole. Meanwhile, because the first adaptive average pooling layer fuses the multiple frames of inputs into one frame, compared with the MobileNetV3 block, the Fusion block also reduces the amount of computation.
However, the original MobileNet only downsampled the input scale five times, which is not enough to extract the useful features for the carrier detection task, according to the experimentation in SCN [14]. Therefore, we added some Fusion blocks to increase the downsample times. All the added Fusion blocks' strides were set to 2, and the nonlinear activations were hard-swish functions. The specification for the FFSCN-FMN backbone network is shown in Table 1, and FFSCN-MN uses the same structure, but with MobileNetV3 blocks in the operator. In SCN, we found that the performance of SCN-11× is almost comparable to that of SCN-13×, but the inference time is shorter [14]. Therefore, in this paper, the downsample times of our proposed FFSCN models were set to 11.

The FPN Neck
In this work, as Figure 3a shows, an adaptive average pooling layer was added at the end of the original FPN neck in [14], which is used to fuse the multiple consecutive broadband power spectra features and is still called the FPN neck. The

The FPN Neck
In this work, as Figure 3a shows, an adaptive average pooling layer was added at the end of the original FPN neck in [14], which is used to fuse the multiple consecutive broadband power spectra features and is still called the FPN neck. The      Figure 4 gives the RegNet head of this work, and we added a frequency center (FC) shift regression branch compared with that in the original SCN. Considering that we regressed the FC prediction in the 1/4 scale of the input length, a shifting bias exists when we set the target FC point as an integer. Therefore, the FC shift regression is to fix the bias, and it consists of the same structure as FC regression and BW regression, a depthwise separable convolutional layer [25] with 256 channels, rectified linear unit (ReLU) [39], and a 1 × 1 Conv with one channel in common. Figure 4 gives the RegNet head of this work, and we added a frequency center (FC) shift regression branch compared with that in the original SCN. Considering that we regressed the FC prediction in the 1/4 scale of the input length, a shifting bias exists when we set the target FC point as an integer. Therefore, the FC shift regression is to fix the bias, and it consists of the same structure as FC regression and BW regression, a depthwise separable convolutional layer [25] with 256 channels, rectified linear unit (ReLU) [39], and a 1 × 1 Conv with one channel in common.

FFSCN Targets and Loss Function
In this work, the proposed FFSCN regresses three sets of prediction key points, the power spectrum distribution (PSD) prediction for all subcarrier FC positions and the corresponding BW and FC Shift bias predictions. Here, the PSD and BW predictions are the same as those in the original SCN [14], and the loss functions are formulated as follows: where and denote the PSD loss and BW loss, denotes the number of all subcarriers in the power spectra input, denotes the input spectrum length, and are hyper-parameters and set to 2 and 4, respectively, denotes the score at the point in the predicted PSD, and Y denotes the ground-truth PSD.
denotes the broadband power spectrum bandwidth; denotes the subcarrier bandwidth; and are the BW ground-truth and prediction, respectively.
Let be the subcarrier FC in the input broadband spectrum, and we can formulate the corresponding FC shift ground-truth and the whole FC shift loss as follows:

FFSCN Targets and Loss Function
In this work, the proposed FFSCN regresses three sets of prediction key points, the power spectrum distribution (PSD) prediction for all subcarrier FC positions and the corresponding BW and FC Shift bias predictions. Here, the PSD and BW predictions are the same as those in the original SCN [14], and the loss functions are formulated as follows: where L psd and L bw denote the PSD loss and BW loss, N denotes the number of all subcarriers in the power spectra input, L denotes the input spectrum length, α and β are hyper-parameters and set to 2 and 4, respectively, P i denotes the score at the i th point in the predicted PSD, and Y i denotes the ground-truth PSD. BSW denotes the broadband power spectrum bandwidth; BW k denotes the k th subcarrier bandwidth;Ŵ k . and W k are the BW ground-truth and prediction, respectively. Let Pos k be the k th subcarrier FC in the input broadband spectrum, and we can formulate the corresponding FC shift ground-truth and the whole FC shift loss as follows: whereŜ k and S k denote the FC shift ground-truth and prediction. · represents rounding down. L shi f t denotes the FC shift loss. Like the BW loss, we applied the L1 loss and only focused on the subcarriers' center point.
To balance the three losses, we used two constants λ bw and λ shi f t to scale the BW and FC shift losses, respectively. The overall training loss is as follows: where we set λ bw = 0.01 and λ shi f t = 0.1 in all our experiments.

Experiments
We describe the dataset and evaluation metrics in detail. We report the experimental results and compared the performance with other methods to demonstrate the effectiveness of FFSCN models. Moreover, some ablation studies are shown to shed light on the effects of various design decisions. Table 2 shows detailed information on the dataset used in this work. We used Matlab to generate all 1000 time domain signals, which are all complex. Each signal sample rate was 3.2 MHz, and the time duration as 200 ms. Because the time domain signal is complex, the broad signal bandwidth equals the sample rate. To demonstrate the effectiveness of FFSCN for multi-carrier modulation signals, we used Matlab to generate multi-carrier modulation signals and single-carrier modulation signals, where multi-carrier modulation consisted of 2FSK and OFDM, and the single-carrier modulation consisted of binary phase-shift keying (BPSK), 16 quadrature amplitude modulation (16-QAM), and Gaussian minimum-shift keying (GMSK). Moreover, for each sub-carrier, the narrow signal bandwidth range was 4~117 kHz, the SNR range was −4~14 dB, and the time duration range was 20~200 ms. When the sub-carrier signal time duration is 200 ms, the signal is called a constant signal; otherwise, it is called a burst signal. We used a length of 3200 time domain signals to calculate the single frame broadband power spectrum; the FFT length was to 16,384, and the window function was selected as the Hanning window. We set the nums of consecutive frames of broadband power spectra inputs in the training phase to 10.

Training Setup
The training setup was mostly the same as SCN [14]. We implemented our models in the PyTorch [40] library on a machine with 2 NVIDIA GeForce RTX 3080Ti graphic process units (GPUs) and an Intel(R) Bronze 3204 CPU, with the Ubuntu 20.04 operation system. We used a cosine annealing warm restarts [41] learning rate strategy, with an initial value of 1 × 10 4 , T_0 = 10, T_mult = 2, and a batch size of 16. We used the Adam optimization method [42] to optimize the overall training loss and adopted Dropout [43] prior to RegNet to reduce overfitting. All the FFSCN models were trained for 150 epochs, and were appling the same data preprocessing steps described in Section 3.

Evaluation Metrics
In accordance with [14], we also used the intersection-over-unit (IoU) on carriers to decide the correctness of each sub-carrier on the broadband power spectrum, as shown and defined in Figure 5.
the PyTorch [40] library on a machine with 2 NVIDIA GeForce RTX 3080Ti graphic process units (GPUs) and an Intel(R) Bronze 3204 CPU, with the Ubuntu 20.04 operation system. We used a cosine annealing warm restarts [41] learning rate strategy, with an initial value of 1×10 4 , T_0 = 10, T_mult = 2, and a batch size of 16. We used the Adam optimization method [42] to optimize the overall training loss and adopted Dropout [43] prior to Reg-Net to reduce overfitting. All the FFSCN models were trained for 150 epochs, and were appling the same data preprocessing steps described in Section 3.

Evaluation Metrics
In accordance with [14], we also used the intersection-over-unit (IoU) on carriers to decide the correctness of each sub-carrier on the broadband power spectrum, as shown and defined in Figure 5. During the evaluation, when the detected sub-carrier IoU is greater than the IoU threshold, it is referred to as true positive (TP), otherwise as true negative (TN). Furthermore, false negative (FN) represents the sub-carrier that is not detected but a groundtruth. We calculate the harmonic means of the average precision rate (AP) and average recall rate (AR) to quantify and compare the performance of different trained models, called the − [44], using the following formula:

Results
Firstly, to demonstrate the effectiveness of the proposed FFSCN models, we compared the performance with other deep-learning-based methods, including SCN [14], FCN [12], and SigdetNet [13]. As can be seen in Table 3, our proposed FFSCN-FMN models outperformed the other models. The performances of SigdetNet and FCN degraded more than other models. With the IoU threshold increasing, all the model's detection performances degraded, but the proposed FFSCN models performed more robustly than our previous SCN model overall. Moreover, from the table, we also concluded that the Residual backbone performed better than the MobileNetV3 backbone. However, the fusion Mo-bileNetV3 backbone achieved the best performance, which indicates that the multiple time feature fusion is superior to the one-time fusion strategy. During the evaluation, when the detected sub-carrier IoU is greater than the IoU threshold, it is referred to as true positive (TP), otherwise as true negative (TN). Furthermore, false negative (FN) represents the sub-carrier that is not detected but a ground-truth. We calculate the harmonic means of the average precision rate (AP) and average recall rate (AR) to quantify and compare the performance of different trained models, called the F − Score [44], using the following formula:

Results
Firstly, to demonstrate the effectiveness of the proposed FFSCN models, we compared the performance with other deep-learning-based methods, including SCN [14], FCN [12], and SigdetNet [13]. As can be seen in Table 3, our proposed FFSCN-FMN models outperformed the other models. The performances of SigdetNet and FCN degraded more than other models. With the IoU threshold increasing, all the model's detection performances degraded, but the proposed FFSCN models performed more robustly than our previous SCN model overall. Moreover, from the table, we also concluded that the Residual backbone performed better than the MobileNetV3 backbone. However, the fusion MobileNetV3 backbone achieved the best performance, which indicates that the multiple time feature fusion is superior to the one-time fusion strategy.
Then, to further demonstrate the multi-carrier modulation signal detection performance of our proposed FFSCN models, Figure 6 shows the performances of the multi-carrier and single-carrier modulation signal detection on the validation dataset. The proposed models outperformed other methods on both the multi-carrier and single-carrier modulation signal validation datasets. Moreover, the performance of all models degraded as the IoU threshold increased. However, SigdetNet and FCN degraded more severely, especially when the SNR was lower in multi-carrier modulation signal detection performance. Therefore, the proposed FFSCN model predictions were more exact than the others. Moreover, from Table 3 and Figure 6, FFSCN-FMN achieved better performance than FFSCN-R and FFSCN-MN. Where F-S denotes the F-Score and the numbers after AP, AR, and F-S denote the IoU threshold. The downsample times of the SCN and FCN were also set to 11. Then, to further demonstrate the multi-carrier modulation signal detection performance of our proposed FFSCN models, Figure 6 shows the performances of the multicarrier and single-carrier modulation signal detection on the validation dataset. The proposed models outperformed other methods on both the multi-carrier and single-carrier modulation signal validation datasets. Moreover, the performance of all models degraded as the IoU threshold increased. However, SigdetNet and FCN degraded more severely, especially when the SNR was lower in multi-carrier modulation signal detection performance. Therefore, the proposed FFSCN model predictions were more exact than the others. Moreover, from Table 3 and Figure 6, FFSCN-FMN achieved better performance than FFSCN-R and FFSCN-MN.
Next, Table 4 shows the complexity comparison between the FFSCN models and other deep-learning-based methods. Compared to FFSCN-R and SCN, by adopting the MobileNet backbone, the floating-point operations (FLOPs) and inference time cost of FFSCN-MN decreased obviously. Furthermore, by applying the Fusion FPN neck, the FLOPs and inference time of FFSCN-FMN decreased further than FFSCN-MN. Furthermore, even though the FLOPS of FFSCN-FMN were much larger than FCN and SigdetNet, they consumed a comparable inference time. Therefore, FFSCN-FMN improved the effi- Figure 6. The multi-carrier and single-carrier modulation signal detection performances of our proposed FFSCN models and other deep-learning-based methods: (a1-a4) all modulation samples' detection performance; (b1-b4) multi-carrier modulation samples' detection performance; (c1-c4) single-carrier modulation samples' detection performance. α denotes the IoU threshold and increases from 0.6 to 0.9 from (1) to (4).
Next, Table 4 shows the complexity comparison between the FFSCN models and other deep-learning-based methods. Compared to FFSCN-R and SCN, by adopting the MobileNet backbone, the floating-point operations (FLOPs) and inference time cost of FFSCN-MN decreased obviously. Furthermore, by applying the Fusion FPN neck, the FLOPs and inference time of FFSCN-FMN decreased further than FFSCN-MN. Furthermore, even though the FLOPS of FFSCN-FMN were much larger than FCN and SigdetNet, they consumed a comparable inference time. Therefore, FFSCN-FMN improved the efficiency of SCN. Finally, from the performance and complexity comparison, our proposed FFSCN models can effectively detect multi-carrier and single-carrier signals in the broadband power spectrum. Compared to other deep-learning-based methods, the FFSCN models achieved better detection performance. Moreover, FFSCN-FMN not only achieved the best detection performance, but also cost a comparable inference time to that of the FCN-based methods, dramatically improving the model complexity of SCN. Meanwhile, SCN, FCN, and SigdetNet use broadband power spectra as the model input and can only detect the frequency locations. As Figures 7 and 8 show, since the multiple consecutive frames of the broadband power spectra were the model input, FFSCN can distinguish the burst signals from the constant signals and locate the frequency and time position. Note that the time location accuracy correlated with the number of consecutive and overlap frames. In Figures 7 and 8, we used 10 consecutive frames inputs without overlap. Compared to the ground-truth, our frequency location predictions were rather good, but some errors existed the in time location prediction, especially when the SNR was low.

Impact of the Two Adaptive Pooling Layer Types
The effects of the two different adaptive pooling layer types used in the FFSCN models are depicted in Figure 9. When the sub-carrier SNR was larger than 2 dB, the two types of adaptive pooling layers performed comparably well in the three FFSCN models. However, when the sub-carrier SNR was lower than 2 dB, the models using the adaptive average pooling layer outperformed the models using the adaptive maximum pooling layer. The effects of the two different adaptive pooling layer types used in the FFSCN models are depicted in Figure 9. When the sub-carrier SNR was larger than 2 dB, the two types of adaptive pooling layers performed comparably well in the three FFSCN models. However, when the sub-carrier SNR was lower than 2 dB, the models using the adaptive average pooling layer outperformed the models using the adaptive maximum pooling layer.

Impact of Downsample Times
In Table 5, we show the performance comparison of different downsample times used in the FFSCN-FMN model. We can see that FFSCN-FMN_11× achieved the best AR and F-Score and that FFSCN-FMN_13× achieved the best AP, but the increase was feeble compared to FFSCN-FMN_11×. Furthermore, the multi-carrier and single-carrier modulation signal detection performances of different downsample times in the FFSCN-FMN model are shown in Figure 10. We found that the main reason that FFSCN-FMN_13× had the best AP was that it performed better in the single-carrier modulation sample detection results. Considering the bigger gap in the AR and F-Score between FFSCN-FMN_13× and FFSCN-FMN_11× and the more downsample times used, complexity, and inference time cost, we think that using 11 downsample times is the better choice.

Impact of Downsample Times
In Table 5, we show the performance comparison of different downsample times used in the FFSCN-FMN model. We can see that FFSCN-FMN_11× achieved the best AR and F-Score and that FFSCN-FMN_13× achieved the best AP, but the increase was feeble compared to FFSCN-FMN_11×. Furthermore, the multi-carrier and single-carrier modulation signal detection performances of different downsample times in the FFSCN-FMN model are shown in Figure 10. We found that the main reason that FFSCN-FMN_13× had the best AP was that it performed better in the single-carrier modulation sample detection results. Considering the bigger gap in the AR and F-Score between FFSCN-FMN_13× and FFSCN-FMN_11× and the more downsample times used, complexity, and inference time cost, we think that using 11 downsample times is the better choice. F-S denotes the F-Score, and the numbers after AP, AR, and F-S denote the IoU threshold. We compared the performance of the FFSCN-FMN model using 9, 11, and 13 downsample times. lation signal detection performances of different downsample times in the FFSCN-FMN model are shown in Figure 10. We found that the main reason that FFSCN-FMN_13× had the best AP was that it performed better in the single-carrier modulation sample detection results. Considering the bigger gap in the AR and F-Score between FFSCN-FMN_13× and FFSCN-FMN_11× and the more downsample times used, complexity, and inference time cost, we think that using 11 downsample times is the better choice. F-S denotes the F-Score, and the numbers after AP, AR, and F-S denote the IoU threshold. We compared the performance of the FFSCN-FMN model using 9, 11, and 13 downsample times.
(a) (b) (c) Figure 10. The multi-carrier and single-carrier modulation signal detection performances of different downsample times used in the FFSCN-FMN model: (a) all modulation samples' detection performance; (b) multi-carrier modulation samples' detection performance; (c) single-carrier modulation samples' detection performance. Here, α denotes the IoU threshold, and we fixed it to 0.6. We compared the performance of the FFSCN-FMN model using 9, 11, and 13 downsample times. Here, α denotes the IoU threshold, and we fixed it to 0.6. We compared the performance of the FFSCN-FMN model using 9, 11, and 13 downsample times.

Conclusions
This paper introduced the FFSCN-FMN, FFSCN-MN, and FFSCN-R models for carrier signal detection. As an upgrade to SCN, by using multiple frames of the broadband power spectra as the model input rather than one, the model can extract the features of the broadband power spectra of frequencies as they vary with time, so that it can effectively detect the multi-carrier and single-carrier modulation signals. FFSCN-R adds an adaptive average pooling layer between the FPN neck and RegNet head in SCN. FFSCN-MN replaces the FFSCN-R backbone network with the MobileNetV3 backbone to reduce the complexity of the model. FFSCN-FMN further modifies the MobileNetV3 backbone and FPN neck to the Fusion backbone and Fusion FPN neck to design a more lightweight model. Extensive experimental results suggested that the proposed FFSCN models outperformed the other deep-learning-based methods SCN in accuracy and efficiency, and the FFSCN-FMN model performed the best. As it remains a problem to detect the burst signals in a timely manner, we will engage in solving this in future work.