Lightweight CFARNets for Landmine Detection in Ultrawideband SAR

: The high-resolution image obtained by ultrawideband synthetic aperture radar (UWB SAR) includes rich features such as shape and scattering features, which can be utilized for landmine discrimination and detection. Due to the high performance and automatic feature learning ability, deep network-based detection methods have been widely employed in SAR target detection. However, existing deep networks do not consider the target characteristics in SAR images, and their structures are too complicated. Therefore, lightweight deep networks with efﬁcient and interpretable blocks are essential. This work investigates how to utilize the SAR characteristics to design a lightweight deep network. The widely employed constant false alarm rates (CFAR) detector is used as a prototype and transformed into trainable multiple-feature network ﬁlters. Based on CFAR ﬁlters, we propose a new class of networks called CFARNets which can serve as an alternative to convolutional neural networks (CNNs). Furthermore, a two-stage detection method based on CFARNets is proposed. Compared to prevailing CNNs, the complexity and number of parameters of CFARNets are signiﬁcantly reduced. The features extracted by CFARNets are interpretable as CFAR ﬁlters have deﬁnite physical signiﬁcance. Experimental results show that the proposed CFARNets have comparable detection performance compared to other real-time state-of-the-art detectors but with faster inference speed.


Introduction
There are many landmines distributed around the world.Landmines have strong concealment and destructive power, greatly reducing the mobility of troops and interfering with civilian activity.Therefore, efficiently and quickly detecting and eliminating landmines is of great significance for both military and people's livelihoods.To detect landmines, sensing sensors such as ground-penetrating radars (GPR) which can penetrate the soil are essential [1,2].Synthetic aperture Radar (SAR) is another powerful sensor for target detection, which is capable of obtaining a high-resolution image of the targets and their surrounding areas.With the development of SAR techniques including hardware, waveform design, and imaging algorithm, the resolution and imaging quality have been greatly improved in past decades [3][4][5][6].The ultrawideband synthetic aperture radar (UWB SAR) has become an important means for detecting landmines.Although the UWB SAR image can reach a resolution of centimeters, landmines in the UWB SAR image are embedded in complicated environments with small sizes, weak usable features, and low SNR [3,4].These factors make landmine detection difficult.
The detection of landmines in UWB SAR images shares many merits with the detection of other targets in SAR images, such as vehicles and ships.For example, the processing steps of landmines and other targets are similar [3,4].The three processing stages proposed by the MIT Lincoln Laboratory, which include detection, discrimination, and classification, can apply to landmines as well [3,4,7,8].Numerous methods have been proposed for landmine detection in SAR images, which can be classified into traditional detection methods and deep learning-based methods.

Traditional Detection Methods
One of the popular detection methods is the constant false alarm rates (CFAR) detector [3,4,9], which is mainly based on the amplitude divergence between the target and the clutter.The CFAR detector can be viewed as a single-feature method.Although the performance of the CFAR detector is limited in complex clutter scenes, it has advantages in two aspects: the computation speed is fast and it is simple to implement.As a result, the research of CFAR detectors is still active currently, and several invariants of CFAR detectors are proposed.One major improvement is to extend the traditional pixel-level CFAR to the superpixel-level CFAR [10][11][12], which can retain the complete information of a target and reduce false alarms.The second improvement is to combine CFAR detectors with deep learning networks [13,14].In [13], a CFAR detector is used as a preprocessing stage of deep networks.In [14], the CFAR indicative maps are used to guide the classification loss function and feature extraction in deep networks.
As the SAR images contain rich target features such as shape and scattering features, features have played an important role in target detection.In addition to the amplitude feature, hand-crafted features are used for landmine detection as well.In [3], the double hump signature of the landmine in the UWB SAR image was analyzed and extracted, which was then used for landmine detection.In [4], the scale-invariant feature transformation (SIFT) was applied to the UWB SAR image to extract the landmine's features, and then feature point matching (FPM) was carried out based on the reference image to test whether the region of interest (ROI) is a landmine or clutter.In [15], the histogram of degree (HOG) features was used for landmine detection by GPR radar.These features-based methods have been proven to be useful for landmine detection.However, the hand-crafted features are susceptible to target and environment variation, leading to performance degradation in complex scenes.

Deep Learning-Based Methods
To overcome the problem of hand-crafted features, researchers have resorted to deep networks that can learn the high-level features from the images automatically.Convolutional neural networks (CNNs) have been widely used for SAR target classification due to their high performance and efficient training procedure [7,16,17].To reduce the overfitting problem, Chen et al. [7] proposed all-convolutional networks (A-ConvNets) without fully connecting layers.Kwak et al. [16] investigated the effect of speckle noise on CNN features and introduced regularization in the training procedure to minimize the noise effect.Wang et al. [17] proposed a CNN-based feature fusion method for target discrimination that jointly uses intensity and edge information of SAR images.Technically speaking, deep networks proposed for target classification have no target locating ability and should be combined with external target proposal generation methods such as CFAR or sliding window.Target proposal generation can be achieved by network modules as well, i.e., networks can be used to jointly locate and identify the targets.Representative methods include SSD, Faster-RCNN, YOLO, and other deep network-based detectors [18,19].With the development of deep learning techniques, novel network modules and learning paradigms have been introduced in the field of SAR as well, such as attention mechanism [20,21], transfer learning [22], semi-supervised learning [23], etc.It should be pointed out that the above deep networks are designed for other targets and there is currently no deep learningbased detection method customized for landmines.By carefully choosing the parameters, these methods can be applied to landmine detection with a high performance.However, the above deep learning-based methods do not consider the specialty of SAR images, and the employed networks are too sophisticated in terms of both network architectures and parameters.Furthermore, the behavior and extracted features of these methods are difficult to explain.

Contribution of This Work
The deficiency of deep networks in SAR target detection encourages the authors to develop novel networks that can utilize the SAR characteristics for landmine detection.We propose a new class of filter, block, and networks that can serve as an alternative to that in convolutional neural networks (CNNs) for landmine detection in SAR images.Inspired by the recent development of the CFAR technique which is implemented via GPU tensor operation [9], the authors re-investigated the possibility of incorporating CFAR in deep networks.It was found that CFAR detector can be implemented by using tensor operations such as convolution and mean pooling, which not only accelerates the computation speed but also makes it possible to be incorporated into the network as an automatic differentiation operation.Based on this, different CFAR blocks and deep CFAR networks which are coined as CFARNets are proposed for landmine detection.The main contributions of this work are as follows: (1) It is found that cell averaging-CFAR (CA-CFAR) can be implemented via mean pooling, which not only accelerates the computation speed but also makes it possible to be used as an interpretable network module.The rest of this work is organized according to the pipeline of developing novel deep learning-based methods, which follows the filter-block-network-detector order.First, the background and implementation of the CFAR filter are presented in Section 3. Following that, details of the proposed CFAR blocks and deep CFAR networks for target discrimination are provided in Section 3. Section 4 illustrates the two-stage target detection method based on deep CFAR networks and Section 5 provides the experimental results.

CFAR Filter Based on CA-CFAR
Filters are the basic module of deep networks.The deep networks are trained by back-propagation.Most deep learning frameworks such as Tensorflow and Pytorch use automatic differentiation operations to construct the networks.The newly developed filters should be composed of their support tensor operations.In this work, we would like to use the most widely detection method, i.e., the CFAR detector, to construct deep network.Thus, the bridge between the CFAR detector and common tensor operations should be built.In particular, the CA-CFAR detector is used as a prototype and transformed into a trainable network filter.

Review of CA-CFAR
CA-CFAR is widely used for its computational efficiency and easy implementation.The CA-CFAR algorithm uses the samples in the sliding window to estimate the clutter level.The test rule and threshold are dependent on the clutter model.For simplicity, assume that the clutter follows an exponential distribution.Denote z(i, j) as the cell under test (CUT), the following rule is used to determine whether it is a target pixel [9] z(i, j) where H 0 and H 1 stand for the absence and presence of a target pixel, respectively.α = ln(P FA ) is the multiplier controlling the probability of false alarms.Z(i, j) is the mean amplitude of the clutter in the sliding window, which is given by where k and g are the sizes of the square sliding window and guard area (For simplicity, k and g are odd numbers), respectively, and N s = k 2 − g 2 is the number of used samples in the sliding window.Equation ( 2) is referred to as CA operation.A demonstration of the sliding window mode in CA-CFAR is shown in Figure 1.

CFAR Filter
To speed up CFAR detectors, Ref. [9] suggested that the CFAR detectors can be implemented via graphics processing unit (GPU) tensor operations, including tensor convolution, shift, and Boolean operation.Inspired by this idea, we would like to use the tensor operation for CFAR implementation which not only accelerates the computation speed but also makes it possible to be incorporated into the network as an automatic differentiation operation.
Equation ( 2) can be transformed into different equivalent formulas, the first one is where W k = 1/N s • 1 k×k and W g = 1/N s • 1 g×g are the kernel matrixes of the sliding window and the guard area, respectively.Equation ( 3) can be implemented by convolution operation [9].Another equivalent formula for Equation ( 2) is The two terms in (4) can be computed by averaging on a defined window followed by a multiplier, where the averaging on a defined window can be implemented by the mean pooling operation.The kernel size of the mean pooling for the first term is k, while the kernel size for the second term is g.Note that it is essential to align the values of the two terms in ( 3) and ( 4), and the input image should be padded with k−1 2 and g−1 2 zeros on each side before applying convolution and mean pooling, respectively.Figure 2 shows the systematic view of these two implementations.The mean pooling-based implementation merely requires addition, which is cheaper than the convolution-based implementation.It is adopted to implement the CA Operation and used as a basic filter of deep networks.The mean pooling is a standard operation in deep learning frameworks such as Pytorch, Tensorflow, etc.Thus, the mean pooling-based implementation enables the network to incorporate CFAR as an automatic differentiation operation in deep networks.
Equation ( 1) can be rewritten as In the above equation, the left term z(i, j) − αZ(i, j) is the amplitude divergence between the CUT and clutter level, while the residual term is just a threshold decision that can be replaced by other sophisticated operation in the network.Thus, z(i, j) − αZ(i, j) is considered a basic filter for networks, which is referred to as CFAR filter in the following.Figure 3 shows the computation process of z(i, j) − αZ(i, j), where α is a trainable parameter.To distinguish from the convolution block, the sizes of the sliding window and the guard area in the CA operation are denoted as k g, as can represent the processing procedure of CFAR which includes two windows.For deep networks, a tensor F ∈ R M×N×C is usually used to represent the extracted feature map.If we apply the CFAR filter to each channel separately, the pth channel of the output feature map can be written as F:,:,p = F :,:,p − αF :,:,p , where 0 ≤ p ≤ C − 1, F :,:,p ∈ R M×N is the sliding mean value of F :,:,p ∈ R M×N estimated by CA (Index (i, j) in Equation ( 5) is omitted).For simplicity, Equation ( 6) is also denoted as F = f CFAR k g (F).

CFAR Blocks and Deep CFAR Networks
By combining the widely used network architecture and filters, one can design blocks and deep networks for target recognition and detection based on the CFAR filter.Two key works are essential to achieve this.First, network blocks with multiple feature extraction ability should be designed based on the single-feature CFAR filter.Second, other nonlinear filters should be added to improve the nonlinearity of network blocks.Deep CFAR networks are composed of several CFAR blocks and classifiers, which are trained based on clutter patches and target patches.After training, deep networks can be used to discriminate the clutter and target.

CFAR Blocks
Deep networks can be viewed as a combination of multiple nonlinear functions.The nonlinearity is an important factor in why deep networks work.However, the CFAR filter z(i, j) − αZ(i, j) is mainly based on sample averaging and subtraction, which is a linear function.To improve the representation ability of the CFAR filter, it is combined with other non-linear operations.A single-branch CFAR block was first designed based on the CFAR filter and 1 × 1 convolution [24].Following that, two invariants were proposed according to the prevailing micro-architectures in deep networks.
(1) Single-branch CFAR (S-CFAR) Block The incorporation of 1 × 1 convolution and rectification function can increase the nonlinearity of the network with low computational cost [24].It is a common choice in the design of the network.Given this, S-CFAR block is designed by stacking two 1 × 1 convolution layers and the CFAR filter together, which is shown in Figure 4a.The 1 × 1 convolution layers are followed by batch normalization and rectified linear unit (ReLU) activation function.In the S-CFAR block, we used the same number of channels between 1 × 1 convolution layers and the output features, which is denoted by C o .Denote the input of the S-CFAR block by tensor X ∈ R M×N×C , kernels of the first and second 1 × 1 convolution layers by tensor W 1 ∈ R C×C 0 and W 3 ∈ R C 0 ×C 0 , respectively.The output feature maps of the first 1 × 1 convolution layer, the CFAR filter, and the second 1 × 1 convolution layer are denoted by respectively.The processing step of S-CFAR block is denoted as where f 1×1 (•, C 0 ) represents 1 × 1 convolution and batch normalization with C 0 channels.According to [24,25], the p-th channel of F 1 is where 0 ≤ p ≤ C 0 − 1, µ p , and σ p are the values of channel-wise mean and standard deviation of batch normalization, and γ p and β p are the learned scaling factor and bias term, respectively.The CFAR filter is applied to F 1 in channel-by-channel manner.According to Equation (6), the output feature map is where F 1 :,:,p is the mean value of F 1 :,:,p estimated by CA.Similar to (8), the d-th channel of the final output feature map is From Equations ( 7)- (10), one can see that different channels of the input are first weighted and interacted.The CFAR filter is applied to obtain the divergence features.At last, the divergence features of different channels are weighted and combined.The nonlinearity of the S-CFAR block is achieved by the ReLU activation function max(•, 0).Compared to the single-feature CFAR filter, the S-CFAR block can extract multiple nonlinear features.
(2) Multiple-branch CFAR Block As shown in Figure 4b,c, the IN-CFAR-I block and the IN-CFAR-II block are designed based on the idea of Inception module [26].Each block consists of three branches with C o /2, C o /4, C o /4 channels.In the IN-CFAR-I block, the first branch is the 1 × 1 convolution, and the second and third branches are S-CFAR blocks with different sliding windows and guard areas.The output features of these three branches are concatenated and fed into the next layer.In the IN-CFAR-II block, the CFAR filter is replaced by CA operation.The output features of the 1 × 1 convolution and CA operations are concatenated.Since 1 × 1 convolution in the next layer weights and combines the features, it can be viewed that the divergence features are implicitly modeled.
The output feature map of the IN-CFAR-I block is given by where f S−CFAR • (•) denotes the S-CFAR block which is implemented via Equations ( 7)- (10).While the output feature map of the IN-CFAR-II block is given by where f S−CA • (•) is similar to S-CFAR block but with the CFAR filter replaced by CA operation.

Deep CFAR Networks
Similar to classical deep networks such as A-ConvNets [7] and ResNet [24], the proposed deep CFAR networks are composed of a feature extraction part and a classifier.The feature extraction part contains four stages, as shown in Figure 5.In each stage, a CFAR block and a max pooling layer with stride 2 are used.The average pooling layer and fully connection layer are used as a classifier, where the number of output channels is 2. The class probabilities of target and clutter are given by the Softmax function.Based on the CFAR blocks, three deep CFAR networks named A-CFARNet, B-CFARNet, and C-CFARNet are proposed.The network specifications are shown in Table 1.In the first stage, these three networks use a large sliding window and a guard area.In the last three stages, the sizes of sliding window and guard area are decreased.To keep the simplicity of the network, we did not exhaust all the possible sizes.Instead, we mainly use two scales, i.e., one with a large receptive field and one with a small receptive field.In A-CFARNet, 17 9 and 5 3 are used.In B-CFARNet and C-CFARNet, 11 7 and 7 5 are added.

Two-Stage Target Detection Based on Deep CFAR Networks
Target detection mainly includes two stages.First, possible target regions are generated by methods such as sliding window, traditional detection methods such as CFAR, and networks.Second, statistics or classifiers are applied to determine whether the candidates are targets or not.In this work, a two-stage target detection method is proposed by combining CA-CFAR and the proposed CFARNets.

Two-Stage Detection Framework
Figure 6 shows the proposed two-stage target detection framework.For an input SAR image, the CA-CFAR is first applied.Region proposals are generated based on the CFAR results.Next, region proposals are preprocessed and fed into CFARNets to determine whether they are target or clutter.Finally, results fusion is carried out to improve the detection results.

CFAR-Guided Region Proposals
Here, the two-parameter CA-CFAR is used [27].It is derived based on the assumption that the clutter is Gaussian distributed.The detection rule is given by z(i, j) − Z(i, j) where Z(i, j) and σ(i, j) are the estimated mean value and standard deviation of the clutter patch in the sliding window, respectively.α is the detection threshold which is determined by the probability of false alarms.The probability of false alarms for a given α is e − t 2 2 dt.After CFAR detection, clustering of the detected pixels is carried out based on their distances.Each cluster is represented by the pixel with maximum amplitude, which is denoted by x c i , y c i , i ∈ [1, N 0 ] is the cluster index, N 0 is the number of clusters (targets).Region proposals with a size of 48 × 48 based on the cluster centers were generated.

Multi-Crop Classification
Region proposals were resized to 55 × 55 and their center crops with a size of 48 × 48 were fed into the networks.This is referred to as standard mode.In addition to the standard mode, multi-crop classification, which provides flexible controlling of detection quality, was employed as well.As shown in Figure 7, for each region proposal, the center region and two random regions of the resized image are cropped out.These three crops are fed into CFARNets, and the classification results are fused.Denote target probabilities of the three crops by y 1 , y 2 , and y 3 .Two fusion strategies are used.
(1) Eager mode.A region proposal is judged as a target if one of the three crops is a target, i.e., the maximum target probability is larger than 0.5.
(2) Steady mode.A region proposal is judged as a target if the mean target probability of the three crops is larger than 0.5.

Dataset
The detection performance was evaluated by measured UWB SAR images.Figure 8 shows the four SAR images collected by the airship-mounted UWB SAR (AMUSAR) system [4].The range resolution and azimuth resolution of the SAR images is 0.15 m × 0.15 m.The illuminated areas mainly contain mountains, farmland, bare soil, and roads.The first two SAR images have almost the same illuminated area but with different landmine settings.These images are collected by different flights.The imaging conditions are not identical, as is the case for the imaging qualities of the landmines.
These four SAR images are divided into image patches with a size of 512 × 512.There are 61 image patches with landmines.These image patches are used to construct the detection dataset, which is denoted as AMUDet.The training set contains 55 image patches, while the test set contains six image patches.Figure 9 shows examples of image patches extracted from these four SAR images.The bright point-shaped targets are landmines.It can be observed that these landmines vary as the image conditions change.The landmines in the collected SAR images are annotated manually.A summary of AMUDet is provided in Table 2.

Experimental Setup
The performance of the proposed two-stage detection method was evaluated based on AMUDet.The two-parameter CA-CFAR detector was used for region proposal generation.
The sizes of the sliding window and guard area are 63 × 63 and 55 × 55, respectively.The probability of false alarms is set as 10 −3 .The deep networks are trained based on image chips of AMUDet.Landmine chips are cropped from images based on the annotated rectangles.For each image patch, 20 randomly extracted regions with a size of 48 × 48 were taken as clutter chips.The processing flow of the experiment is given in Figure 10.For comparison, TinyResNet-18, A-ConvNets48, and Conv1x1Net were used in the proposed two-stage detection method as well.The TinyResNet-18 and A-ConvNets48 are customized to fit the input 48 × 48 image based on ResNet-18 [24] and A-ConvNets [7].The Conv1x1Net is a stacking of 1 × 1 convolution layers.Details of these three networks are provided in Appendix A.    During validation, the shorter edge of the image is resized to 55 with a fixed aspect ratio.Following that, the resized image is cropped and normalized.

Detection Results
(1) The proposed two-stage detection method Detection performances with different modes are shown in Tables 3-5.Since the proposed two-stage detection methods combine CFAR and deep networks, these methods are denoted as CFAR-{networks}.The CFAR detector has a high recall of 98.28%, but its precision and F1 score are low.Equipped with feature learning ability, CFAR-A-ConvNets48, CFAR-TinyResNet-18, CFAR-A-CFARNet, CFAR-B-CFARNet, and CFAR-C-CFARNet have a much larger F1 score due to the increase of precision.Among the three detection modes, the overall F1 score of the steady mode is the highest, while the eager mode has the highest precision, and the standard mode has a median F1 score (recall) and precision.Based on these results, the detection mode can be chosen according to the preference of high recall or precision.When the standard mode is used, CFAR-C-CFARNet has the highest F1 score, 86.18%, followed by CFAR-A-CFARNet and CFAR-B-CFARNet.When the eager mode is used, CFAR-C-CFARNet has the highest F1 score, 83.46%, followed by CFAR-A-CFARNet, CFAR-TinyResNet-18, and CFAR-B-CFARNet.When the steady mode is used, CFAR-A-CFARNet has the highest F1 score, 87.80%, followed by CFAR-C-CFARNet, CFAR-TinyResNet-18, and CFAR-B-CFARNet.We can also see that the average F1 score of proposed CFAR-CFARNets is 1.8%, 4.1%, and 2.38%, higher than the F1 score of CFAR-A-ConvNets48 in standard mode, eager mode, and steady mode, respectively.
Figure 11 shows the landmine detection results of CFAR-A-CFARNet.There are four missed targets and 11 false alarms.The false alarms are mainly concentrated on the fifth image, which has seven false alarms.(2) Comparison with other detectors Faster R-CNN [28] and real-time state-of-art detectors, including YOLOv3 [29], YOLOX [30], and RTMDet [31], were used for comparison as well.The input sizes of all these detectors are 512 × 512.Their results are shown in Table 6.The F1 scores of CFAR-C-CFARNet in standard mode and steady mode, and CFAR-A-CFARNet in steady mode, are comparable to that of YOLOv3.All the proposed detectors outperformed YOLOX-S, YOLOX-Tiny, RTMDet-S, and RTMDet-Tiny in terms of F1 score.Although Faster R-CNN had a higher F1 score compared to the proposed method, its inference speed was the lowest.Figure 12 plots the speed and accuracy of the real-time detectors.Only the real-time detectors are included.The speed is measured with FP32-precision and batch = 1 on a single Tesla V100.Among these detectors, the proposed CFAR-guided two-stage detection methods are much faster than YOLOs and RTMDets.Remarkably, CFAR-A-CFARNet has a comparable F1 score with YOLOv3 but only one-half latency.The number of parameters and FLOPS for these detectors are shown in Table 7, where N T and N C are the target number of CFAR detectors and the crop number of image chips, respectively.Herein, the average detected target number per image of the test set is 17.8.Taking N T = 17.8 and N C = 3 for example, the computational cost of the proposed CFAR-X-CFARNets detectors is less than 1 GFLOPS.(3) Influence of receptive field The influence of receptive field, i.e., the sizes of sliding window and guard area, was evaluated.The receptive field of A-CFARNet is set as 5 3, 7 5, 9 5, 11 7,13 7, and 17 9.For each receptive field, the model is trained from scratch 5 times, and the detection metrics are averaged.Figure 13 plots the detection metrics of CFAR-A-CFARNet.The changes in detection metrics with different receptive fields are smaller than 3.8%.The maximum F1 score is achieved when the receptive filed is 13 7, which is 86.27%.When the receptive field is 7 5, 11 7 or 17 9, the F1 score is about 84.6%.As CFAR is used to generate target proposals, its parameter setting may affect the detection performance of the proposed method.To evaluate the influence, the probability of false alarms of CFAR was set from 10 −6 to 10 −2 .The detection metrics of CFAR and the proposed method are plotted in Figure 14.It can be seen that the recall of CFAR increases when P FA increases, and the precision has a converse trend.As the F1 score is the weighted sum of recall and precision, its value first increases when P FA is small and then decreases when is P FA large.The detection metrics of CFAR-A-CFARNet, CFAR-B-CFARNet, and CFAR-C-CFARNet have similar trends with CFAR.The CFAR-A-CFARNet has a maximum value of F1 score 85.45% at P FA = 10 −4 , the CFAR-B-CFARNet has a maximum value of F1 score 84.03% at P FA = 10 −3.75 , and the CFAR-A-CFARNet has a maximum value of F1 score 86.67% at P FA = 10 −2.75 .Note that the F1 scores of CFAR-A-CFARNets are relatively stable when 10 −4 ≤ P FA ≤ 10 −3 and decrease after P FA = 10 −3 .As the recall at P FA = 10 −3 is larger than that at P FA = 10 −4 , it is suggested that we use the setting of P FA = 10 −3 for CFAR to obtain a high F1 score and high recall.

Feature Analysis
To further obtain insight into the mechanism of the proposed CFARNets, a landmine image and a clutter image are fed into the proposed CFARNets, and their extracted features at each stage are analyzed.These two example images are plotted in Figure 15.
Figures 16 and 17 show the extracted features of A-CFARNet for these two example images.It can be found that there are two types of the first two stage features: the first type of features has similar shapes to the input images while the second type of features has shapes that are complementary to the input images.It can also be seen that the features have a decreasing resolution from the second stage to the fourth stage and the semantic information is increasing.According to these results and the mathematical property of CFARNets, we can infer that the CFARNets mainly extract images' divergent features.

Discussion
The high-resolution image obtained by UWB SAR includes rich features of landmines, which is helpful for landmine discrimination and detection.This work not only proposes lightweight deep networks based on CFAR filters but also studies the detection performances of other deep detectors.The experimental results suggest that: (1) Just using the nonlinearity of 1 × 1 convolution layer is insufficient to build a good network model.Other filters are essential.(2) By combining the CFAR filters and the 1 × 1 convolution layer, the image's multidimensional divergence features are extracted layer-by-layer, and we can obtain a high-performance network for SAR landmine detection.(3) Compared to other real-time state-of-the-art detectors, the proposed CFARNets have comparable performance in terms of F1 score, with a significant reduction in the number of parameters and flops.

Conclusions
In this work, we propose a new class of filter, block, and networks that can serve as an alternative to that in convolutional neural networks (CNNs) for landmine detection in SAR images.It was first shown that the CA-CFAR detector can be implemented by using tensor operations, which makes it possible to be used as an interpretable network module.Three CFAR blocks and three CFARNets were proposed by integrating classical network microarchitectures and nonlinear filters.Furthermore, a two-stage landmine detection method based on CFARNets was proposed.The features extracted by CFARNets are interpretable, as CFAR filters have definite physical significance.The proposed CFARNets can efficiently utilize the SAR characteristic, and their detection performances are comparable to YOLO detectors with a significant reduction in the number of parameters and flops.Although the proposed CFARNets have a high performance for landmine detection, their performance on other targets remains to be tested.The proposed two-stage detection method depends on the CFAR detector whose parameters are manually designed.Combining the CFAR detector and CFARNets as a whole network would be the key point of future research.

Figure 1 .
Figure 1.Demonstration of the sliding window mode in CA-CFAR.

Figure 2 .
Figure 2. A systematic view of the CA operation.(a) Convolution-based implementation; (b) mean pooling-based implementation.

Figure 3 .
Figure 3.A systematic view of the CFAR filter.

Figure 5 .
Figure 5. Proposed CFAR block-based network architectures.A 48 × 48 single-channel SAR image is taken as an example.

Figure 9 .
Figure 9. Examples of image patches: (a-d) Local part of minefield.

Figure 10 .
Figure 10.Processing flow of the experiment.
These models are trained by stochastic gradient descent (SGD) with a batch size of 128 examples, momentum of 0.9, and weight decay of 0.0005.The initial learning rate is 0.01 and is divided by 10 at the 30 and 45 epochs.Overall, 60 epochs were used for training.During training, the following preprocessing pipelines were used: (1) Randomly crop a rectangular region whose aspect ratio is randomly sampled in [3/4, 4/3] and area randomly sampled in [80%, 100%], then resize the cropped region into a 48-by-48 square image.(2) Flip horizontally with 0.5 probability.(3) Scale brightness with coefficients uniformly drawn from [0.6, 1.4].(4) Normalize the gray image by subtracting 0.5.

Figure 11 .
Figure 11.Landmine detection results of CFAR-A-CFARNet.(a-f) Local part of minefield.The rectangles indicate the detection results: Green rectangle, detected target; Red rectangle, false alarm; Yellow rectangle, missed target.

Figure 12 .
Figure 12.Speed-accuracy curve of the detectors.

Figure 13 .
Figure 13.Detection metrics with different receptive fields.

Table 1 .
The network specifications of CFARNets.

Table 3 .
Landmine detection performance with standard mode.

Table 4 .
Landmine detection performance with eager mode.

Table 5 .
Landmine detection performance with steady mode.

Table 6 .
Landmine detection performance of other detectors.

Table 7 .
Number of parameters and FLOPS.