A Lightweight Military Target Detection Algorithm Based on Improved YOLOv5

: Military target detection technology is the basis and key for reconnaissance and command decision-making, as well as the premise of target tracking. Current military target detection algorithms involve many parameters and calculations, prohibiting deployment on the weapon equipment platform with limited hardware resources. Given the above problems, this paper proposes a lightweight military target detection method entitled SMCA-α-YOLOv5. Specifically, first, the Focus module is replaced with the Stem block to improve the feature expression ability of the shallow network. Next, we redesign the backbone network of YOLOv5 by embedding the coordinate attention module based on the MobileNetV3 block, reducing the network parameter cardinality and computations, thus improving the model’s average detection accuracy. Finally, we propose a power parameter loss that combines the optimizations of the EIOU loss and Focal loss, improving further the detection accuracy and convergence speed. According to the experimental findings, when applied to the self-created military target data set, the developed method achieves an average precision of 98.4% and a detection speed of 47.6 Frames Per Second (FPS). Compared with the SSD, Faster-RCNN, YOLOv3, YOLOv4, and YOLOv5 algorithms, the mAP values of the improved algorithm surpass the competitor methods by 8.3%, 9.9%, 2.1%, 1.6%, and 1.9%, respectively. Compared with the YOLOv5 algorithm, the parameter cardinality and computational burden are decreased by 85.7% and 95.6%, respectively, meeting mobile devices' military target detection requirements.


Introduction
Military target detection technology is the key to improving battlefield situation generation, reconnaissance, surveillance, and command decision-making and is an essential factor for winning modern warfare.Real-time and accurate detection of battlefield targets will help us grasp the battlefield environment faster, search and track enemy units, and understand the enemy's dynamics to seize the opportunity in the war and be in a dominant position [1][2][3].
A large amount of data, rapid changes, and strong camouflage are features of battlefield targets in modern combat, influenced by artificial intelligence's development [4,5].Most traditional visual target detection technologies are based on hand-designed features for target detection, and it is challenging to obtain target information comprehensively, quickly, and accurately from the complex battlefield environment.
Computer vision technology has become widely used in various industries, including video surveillance, drone piloting, and military intelligence analysis, due to the rapid growth of deep learning [2].Currently, target detection algorithms based on deep learning can be divided into candidate frame-based and regression-based algorithms.The former is represented by the Region-based Convolutional Neural Network (R-CNN) [6], Fast R-CNN [7], and Faster R-CNN [8].The latter mainly include You Only Look Once (YOLO) series algorithms [9][10][11][12] and SSD [13][14][15] algorithms.In order to achieve a higher detection accuracy, the target detection algorithm based on the candidate frame first counts the target frame on the feature map and then obtains the detection result in a refined manner.However, there are drawbacks, such as high memory resource consumption and slow speed.The regression-based target detection algorithm is an end-to-end detection method.The target is obtained by direct regression on the feature map, so the detection speed is significantly improved, but the detection accuracy is slightly lower than that of the candidate frame-based detection algorithm.
Several scholars have successfully applied deep learning-based methods in military target detection recently.For instance, [16] proposed a neural network-based military vehicle detection method, attaining a recognition rate of 97.36%.In [17], the authors proposed an improved Fast R-CNN algorithm for small tank target detection.This algorithm is superior to the Faster R-CNN algorithm in detection speed and accuracy but suffers from miss-detections when detecting occluded targets.The work of [18] suggested a tank military robot with target detection and tracking functions, effectively improving the battlefield's combat capability.Reference [19] proposed a remote sensing image selection and searching method to solve the potential hot spot detection problem in large-scale remote sensing images and improve the detection accuracy of overlapping targets.This method improves the target detection accuracy without considering the model's space complexity.In [20], the authors fully integrated polarization imaging and deep learning to detect camouflaged artificial targets quickly under normal and low illumination conditions.Reference [21] developed a new military target detection algorithm, which introduced the GhostNet module to improve the detection accuracy and speed and then improved the loss function to enhance detection accuracy.The experimental results show that the model's parameters are about three times higher than the YOLOv5 model.Furthermore, reference [22] solved the DIOU defect when the center of the bounding box was aligned at the same point, which is conducive to the efficient deployment of detection algorithms in resource-constrained environments.Reference [23] proposed an armored target detection algorithm named GCD-YOLOv5 that utilized a LIDAR array in complex environments.This algorithm has a strong detection ability, but its network structure is complex and thus challenging to implement the transplantation of embedded terminals.
Based on the above research, with the continuous improvement in the performance of the network model, the increase in the number of model parameters and computation restricts its embedding in resource-constrained weapons and equipment.In order to meet the requirements of military target detection under limited resources of weapon hardware platforms, this paper proposes an improved YOLOv5 algorithm (SMCA-α-YOLOv5), which is tested and compared through ablation experiments.The results show that compared with YOLOv5s, the mean average precision is increased by 1.9%, the amount of model parameters is decreased by 85.7%, and the amount of computation is decreased by 95.9%.The main contributions of this paper can be summarized as follows: 1.The Stem block is used to replace the Focus module, and the multi-channel information fusion improves the feature expression ability, reducing the model's parameters and computation complexity.
2. The coordinate attention module is embedded in the MobileNetV3 block structure to redesign the backbone network of YOLOv5.This strategy reduces the network's parameters and computation complexity and improves its detection performance.
3. Considering the defects of CIOU loss, we propose a power parameter loss opti-mized by combining the EIOU loss and Focal loss.The experiments show that the convergence speed is faster and the regression error is lower.
The remainder of this paper is organized as follows: Section 2 introduces the construction of the military target dataset.Section 3 introduces the work related to the YOLOv5s structure, MobileNetV3 block module, coordinate attention mechanism, and Loss Metrics in Object Detection.Section 4 introduces the improved YOLOv5 algorithm.Section 5 analyzes and discusses the experimental results.Finally, Section 6 presents the conclusion and future work.

Datasets
With the vigorous development of deep learning, the performance of target detection algorithms based on deep learning depends on the quality of large-scale data sets.Therefore, preparing large-scale military target data sets is the basis and premise of research on military target detection.The current mainstream target detection datasets mainly include PASCAL VOC [24], MS COCO [25], ImageNet, etc.These datasets mainly include common objects such as furniture, electronic equipment, vehicles, and people, and some datasets contain tanks, soldiers, and military targets such as drones, but the data types are small, the amount of data is insufficient, and the background is simple.Due to the particularity of the types of military targets, and confidentiality considerations, the public dataset resources are relatively small, and it is difficult to train deep neural networks.Therefore, this paper makes the Military Image Target Dataset (MITD).

Source of Data
Military objectives can be divided into sea, land, and air.Maritime military targets mainly refer to submarines, naval warships, etc.; land military targets mainly include tanks, soldiers, trucks, and other weapons and equipment; air military targets mainly include helicopters, early warning aircraft, missiles, etc. [3].This article obtained 9369 military target images in jpg through the Google search engine.It mainly includes seven military targets: tank, missile, helicopter, air early warning, ship, submarine, and soldier.In this paper, all kinds of targets in the military target dataset are randomly divided into a training set, validation set, and test set according to 7:2:1.Figure 1 shows a sample example of MITD.

Label Format and Data Size
In this paper, labeling software is used to label the image targets in MITD, and the target's position information is stored in the text document in YOLO format, and 9369 text documents are finally obtained.Statistical analysis of various military target information is shown in Table 1.There are 9369 images, including 13,199 target boxes, and the number of targets ranges from 1 to 16.The image width pixel range is [233,4960], and the height pixel range is [167,2802].

Related Work
This section will introduce the related principles of YOLOv5, MobileNetV3 block, coordinate attention mechanism, and Loss Metrics in Object Detection.

YOLOv5 Algorithm
The YOLOv5 algorithm [26][27][28][29] is an open-source object detection project with good engineering results.At present, four versions of YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x are included in the released YOLOv5 project.Among them, the YOLOv5s structure is the network with the smallest depth and width, and has the advantages of high speed and small size.Therefore, this paper adopts the YOLOv5s structure, which consists of four parts: the Input, the Backbone network, the Neck network layer, and the Head output, as shown in Figure 2.


Input: The Input preprocesses the original image data, mainly including Mosaic data enhancement, random cropping, and adaptive image filling.In order to adapt to different target data sets, adaptive frame calculation is integrated into the input.
 Backbone: The Backbone network extracts the feature information at different levels of the image through the deep residual structure.The main structures are Cross Stage Partial (CSP) [30]and Spatial Pyramid Pooling (SPP) [31].The former aims to reduce the amount of calculation and improve the inference speed.The latter aims to perform feature extraction at different scales for the same feature map, which helps to improve detection performance. Neck: The Neck network layer includes Feature Pyramid Networks (FPN) and Path Aggregation Network (PAN).FPN transmits semantic information from top to bottom in the network, and PAN transmits positioning information from top to bottom.The information is fused to improve the detection performance further.
 Head: The head output uses the feature information extracted from the Neck end to filter the best detection frame through non-maximum suppression, and generates a detection frame to predict the target category.

MobileNetV3 Block
The MobileNet algorithm series is representative of lightweight network models.MobileNetV1 [32] introduced depthwise separable convolution instead of standard convolution and reduced the number of parameters and computation of the model through the combination of channel-by-channel convolution and point-by-point convolution.Mo-bileNetV2 [33] draws on the residual network to design an inverse residual structure that first increases the dimension, performs convolution, and then reduces the dimension.At the same time, it uses a linear bottleneck layer structure to retain the effective features to the greatest extent, and it is easy for the model to be deployed on mobile devices.Mo-bileNetV3 [34] introduces a lightweight attention mechanism SENet based on Mo-bileNetV2, uses the improved swish activation function to upgrade the nonlinear layer, and finally uses the neural network architecture to search for the best network model.The unit module is shown in Figure 3.

Coordinate Attention
The coordinate attention (CA) mechanism [35] embeds location information into channel attention, decomposing the channel attention into a one-dimensional encoding process that aggregates feature along two spatial directions.Long-term dependencies can be captured in one spatial direction, while precise location information can be preserved in the other.The representation of objects of interest can be improved by applying a pair of orientation-aware and position-sensitive feature maps in addition to the input feature maps.The fundamental purpose of a coordinate attention block is to enhance the expressive ability of mobile network learning features.As shown in Figure 4, this module is divided into embedding collaborative information and generating coordinated attention.
First, it arbitrarily takes two intermediate feature tensors , where X is the input and Y is the output.The embedding of collaborative information is given input X , using multiple pooling kernels for   ,1 H and   1,W to encode channels along the direction and longitudinal direction, respectively.
Therefore, the output of the height hand th c channel can be expressed as: Similarly, the output of the width w and th c channel can be expressed as:  convolution, which can be expressed as: In Equation (3), ,  is the splicing operation along the spatial dimension, is the nonlinear activation function, and is the horizontal Intermediate feature maps that encode spatial information in the directional and vertical directions.r is the reduction ratio.f is split into two separate tensors along the spatial dimension.In addition, using two 1 1  convolutional transforms  is a sigmoid function.Extend the outputs h g and w g .Finally, the output of Y can be written as:

Loss Metrics in Object Detection
Bounding box regression (BBR) is a critical step in object detection techniques, and a well-designed loss function is crucial to the success of BBR.Currently, most detection methods use BBR, and the loss function of BBR can be roughly divided into horizontal and rotational detection regression loss.
Researchers have carried out much work on designing level detection regression loss functions.For instance, YOLOv1 [9] proposes the square root of the predicted bounding box size to compensate for the difference between scales.Fast R-CNN [7] and Faster R-CNN [8] use the 1 l loss function, which is less sensitive to outliers than the 2 l loss.However, most of the n l norm loss functions assume that the bounding box variables are independent, which is inconsistent with the real situation.In response to the above problems, the IOU [36] loss was proposed, achieving good performance then.To address the shortcoming of IOU loss, the Generalized IOU (GIOU) [37] loss was proposed, i.e., the IOU loss is always zero when the two boxes do not overlap.In order to further solve the shortcomings of GIOU's slow convergence speed in the horizontal and vertical directions, Distance IOU (DIOU) and Complete IOU (CIOU) [38] have been proposed, with experiments demonstrating that these two losses converge faster and have better performance.Since the aspect ratio of the CIOU loss is relative and does not balance the hard and easy samples, the EIOU loss and the Focus-EIOU loss [39] are proposed.In order to achieve a more flexible accuracy of the bounding box regression at different levels, an Alpha-IoU loss was proposed [40].
The above IOU loss only applies to the simple axis alignment case and cannot be directly applied to rotation detection.Thus, [41] studied the IoU computation of two rotating boxes and implemented a unified framework for 2D and 3D object detection tasks.The PIoU method [42] does this by simply counting the number of pixels.Furthermore, to address the convex uncertainty caused by rotation, [43] proposed a projection operation to estimate the intersection area and [44] developed a new regression loss based on the Gauss Wasserstein distance to solve the boundary discontinuity and detection index inconsistency problems in the design of rotation detection regression loss.

Approach
This section details the improvement methods of YOLOv5, including the introduction of the Stem block, the design of the MNtV3-CA module, the optimization of the loss function, and the overall structure design of the network.

Introduction of Stem Block
Military target detection not only puts forward higher requirements on the detection accuracy and detection speed of the target but also is affected by the limitations of the memory and computing resources of the weapon equipment platform.The Focus module of the YOLOv5 algorithm improves the detection speed of the model to a certain extent but greatly increases the amount of calculation and parameters.
Therefore, designing a military target detection algorithm with small memory and less computation is very important.With the above requirements, this paper introduces the Stem block structure, as shown in Figure 5.This structure has achieved good results in real-time detection algorithms on mobile devices, such as PELEE [45], PP-LCNet [46], YOLO5Face [47], etc. Inceptionv4 and Deeply Supervised Object Detector inspire the de-sign of the Stem block.By replacing the large convolution module with a smaller computation cost and parameters, the module improves feature expression ability with almost no increase in computation and parameters.

MNtV3-CA Block Structure
The backbone network of the YOLOv5 algorithm adopts the traditional residual structure, which solves the problem of network degradation caused by the increase in the network structure's depth, and has a faster convergence speed under the same number of network layers [48].Residual networks have been widely used in deep neural networks, improving the network performance by increasing the network depth.However, this substantially increases the network parameters, making it difficult to train the model.It is not easy for the network to calculate Deploy on weapons with limited capabilities and memory resources.Therefore, this paper designs a lightweight MNtV3-CA structure to redesign the backbone network of the YOLOv5 algorithm, as shown in Figure 6.This structure is based on the MobliNetV3 block and integrates the lightweight coordinate attention module, enhancing the model's detection performance while ensuring a light network structure.

Optimization of Loss Function
The IOU function is the most commonly used evaluation index in the field of target detection, used to measure the overlap rate between the target box and the predicted box,

  ,
The YOLOv5 algorithm uses the CIOU loss [38], which considers three important geometric factors: the overlap between the predicted box and the target box, the distance between the center points, and the aspect ratio.The disadvantage is that v in the formula only reflects the difference in aspect ratio, which increases the similarity of aspect ratio to a certain extent, but sometimes hinders the real difference between aspect ratio and confidence and does not consider the balance of difficult and easy samples [39].
To solve the shortcomings of the CIOU loss, this paper introduces the EIOU loss [39], which improves the CIOU loss by discarding the penalty term of the aspect ratio and employing the prediction results of width and height to guide the loss convergence.EIOU loss is formulated as: Equation ( 8) reveals that the EIOU loss is divided into three parts: the IOU loss IOU L , the distance loss dis L , and the aspect loss asp L .The EIOU loss not only retains the char- acteristics of the CIOU loss but also reduces the difference between the width and height of the target box and the anchor box, affording a rapid model convergence and accuracy improvement.Inspired by Alpha-IoU [40], this paper generalizes EIOU loss to a loss function with power terms, defined as α-EIOU loss, formulated as: where α is the power parameters.The Focus-EIOU loss cannot flexibly achieve the accuracy of different levels of the bounding box regression, and the Alpha-IoU does not consider the problem of difficult and easy sample balance.Therefore, this paper combines Focus Loss with the α-EIOU by using the IOUα to weight α-EIOU.This scheme is defined as Focal-α-EIOU Loss and is formulated as: When α = 1, Equation ( 10) is , and γ is a parameter that controls the degree of outlier suppression.In summary, the proposed Focal-EIOU loss with the power alpha function has the following advantages: (1) adjusting α provides the detector more flexibility to achieve different levels of box regression accuracy, (2) considers the difficulty Easy sample balance problem, and (3) the regression loss is lower, and the convergence speed is faster.

Network Structure of SMCA-α-YOLOv5
Regarding the SMA-α-YOLOv5 network structure, Section 4.1 introduced the Stem block, and Section 4.2 presented the MNtV3-CA block, which is used to build the backbone network of YOLOv5.Additionally, the second to twelfth layer network structures in the MobileNetV3-Small [34] specification are used for reference.Finally, the loss function is optimized, and the improved structure is illustrated in Figure 7.

Experiment Platform
The experiments in this paper are carried out on the Google Colab development platform, and the experimental environment is Python3.6,Pytorch1.11.0, CUDA11.2, and Tesla V100-SXM2-16G.Data training, validation, and testing are performed with the same hyperparameters.Among them, the number of iterations is set to 100, the learning rate is set to 0.01, the initial learning rate momentum is 0.937, the weight decay coefficient is 0.0005, and the batch size is 64.

Evaluation Indicators
In order to verify the validity of the proposed model, a comprehensive evaluation is carried out using four indicators: mean average precision (mAP), model parameters (Parameters), model operation (GFLOPs), and detection speed (FPS).The average precision rate (AP) is the detection accuracy rate of a single target, composed of the area enclosed by the recall rate and the accuracy rate.The average precision is the average of all categories of AP values [24] and is used to evaluate the comprehensive detection performance of the model; the number of model parameters obtained during the model training process directly determines the size of the model file and the memory resources that the model consumes.The number of computations required throughout the model training process is referred to as the model computation volume, which directly represents the model's requirement for the hardware platform's computing capacity.The number of image data the model can detect per second is referred to as detection speed, and it is used to measure the model's performance in real time.

Ablation Experiment of Backbone Network
To verify the effectiveness of the developed algorithm, we conduct six groups of ablation experiments on the MITD dataset, while the YOLOv5s in Ultralytics 5.0 is used as the benchmark algorithm.The input image pixel size is set to 640×640, and the number of training iterations to 100.
The ablation results for each component are reported in Table 2, revealing that introducing the Stem block and MobileNetV3 block (MNtV3) increases inference time.However, the network is more lightweight in structure.The SENet attention mechanism in No. 3 is replaced with a CBAM and a CA attention mechanism in Nos. 4 and 5, respectively.Among them, the MobileNetV3 block utilizing the stem block and the embedded CA attention mechanism presents the best detection performance.Compared with YOLOv5, the mAP value of this model increased by 1.3%.The amount of parameters and computation amount decreased by 85.52% and 95.8%, respectively, and the average inference time increased by 0.003 sec.Although the detection speed is slightly reduced, other performances have been greatly improved, so the overall effect is the best.The best results of every metric are bolded.

Ablation Experiment of Loss Function
To demonstrate the effectiveness of the proposed loss function, we perform five sets of ablation experiments on YOLOv5s and SMCA-YOLOv5, with the corresponding experimental results reported in Table 3.For fairness, after extensive experiments, we set the parameters to α = 3 and γ = 0.5, which afford the best performance.Figure 9 illustrates the effect of five loss functions on the SMCA-YOLOv5 algorithm.The Focal-α-EIOU Loss has a better convergence speed and regression accuracy than the other four losses.The dataset in this paper has more high-quality samples.Thus, Focal-α-EIOU Loss uses the weight of IOU α to focus on high-quality samples.Therefore, when there are more high-quality samples, the convergence speed is faster and the regression accuracy is lower.4.Moreover, we use the training and validation sets of the CGMU dataset as a training set (containing 8007 images) and its test set (containing 1000 images) for testing.The results of the CGMU dataset are reported in Table 5.The experimental results in Tables 4 and 5 show that the proposed Focal-α-EIOU Loss attains the best performance on the AP55, AP60, AP95, and mAP metrics.Overall, the proposed loss outperforms the competitor's horizontal box regression losses.* Indicates cited reference [22].The best results of every metric are bolded.

Compare with Other Algorithms
The detection effect of the algorithm in this paper on military targets is further analyzed, as shown in Table 6.Compared with SSD, Faster R-CNN, YOLOv3 algorithm of Ultralytics 9.5.0 version, Pytorch_YOLOv4 of WongKinYiu, and YOLOv5 of Ultralytics 5.0 version, the average inference time of SSD algorithm is the fastest, and the other optimal indicators are proposed in this paper.The best results of every metric are bolded.
According to Table 6, the proposed SMCA-α-YOLOv5 has the highest mAP value, and the average detection speed is 19.1 Frames Per Second (FPS), 5.0 FPS lower than the SSD and the YOLOv5 algorithms.Additionally, the proposed model has significant advantages considering network parameters and computation complexity.Overall, the improved model of this paper not only improves the detection accuracy but also effectively realizes the lightweight of the network structure and meets the needs of military target detection with limited platform resources.

Analysis of Detection Results
In order to more intuitively reflect the performance of the proposed algorithm, representative images are selected from the MITD test set as the test objects, and the military target detection results of the SMCA-α-YOLOv5 and YOLOv5 algorithms in different scenarios are analyzed.Figure 10 illustrates the detection result of scene 1, while Figure 10(a) presents one helicopter target and nine soldier targets.Figure 10b presents the detection result of the YOLOv5 algorithm, where two soldier targets are missed (marked by the yellow ellipse in Figure 10b).Figure 10c is the detection result of the SMCA-α-YOLOv5 algorithm, where a soldier target is missed (marked by a yellow ellipse in Figure 10c).Figure 11 is the detection result of scene 2, and Figure 11b presents the detection result of the YOLOv5 algorithm, where a tank target and a soldier target are missed (marked by the yellow ellipse in Figure 11b).Figure 11c depicts the detection result of the SMCA-α-YOLOv5 algorithm, where a soldier target is missed (marked by a yellow ellipse in Figure 11c).Although the improved algorithm has missed detection, it still has advantages compared to the YOLOv5 algorithm.In conclusion, introducing the Stem block and MobileNetV3 block into the backbone network reduces the network's parameters and computation complexity and increases the network structure's depth, thereby increasing the area of the receptive field.At the same time, combined with the lightweight coordinate attention mechanism, the network's feature extraction ability for occluded and small targets is enhanced further.Improving the loss function affords the regression process to focus on high-quality anchor boxes, and thus the improved algorithm has strong robustness.

Conclusions
Aiming at the difficulty of deploying military target detection algorithms on embedded platforms with limited resources, a lightweight military target detection method based on improved YOLOv5 is proposed.This method redesigns the backbone network of YOLOv5 by introducing Stem block and MobileNetV3 block to reduce the number of parameters and computation of the model.In order to further improve the feature expression ability of the network, a coordinate attention module is embedded in the Mo-bileNetV3 block structure, which improves the detection performance of the model for military targets.Based on EIOU Loss and Focal Loss, a loss with power parameter α is designed to optimize CIOU Loss, which provides more flexibility for the detector and achieves different levels of bounding box regression accuracy.The experimental results show that the algorithm proposed in this paper can ensure real-time performance and detection accuracy, and can also meet the needs of military target detection under the condition of limited resources of weapon equipment platforms.
The experimental results show that the average inference time of the algorithm proposed in this paper has increased, and the next step is to use the pruning algorithm to compress the backbone network composed of Stem block and MNtV3-CA block to im-prove the average detection speed.At the same time, the algorithm is deployed on embedded devices with limited hardware resources to verify the applicability of the algorithm in this paper.

Figure 1 .
Figure 1.Sample images in MITD.(a) A picture contains a single military target; (b) A picture contains multiple military targets.

Figure 3 .
Figure 3.The structure of MobileNetV3 block.NL denotes the type of nonlinearity used.

Figure 4 .F
Figure 4.The structure of the Coordinate Attention block.
the same number of channels for the input X yields:

Figure 5 .
Figure 5.The structure of Stem block.
where A represents the area of the target box, and B represents is the predicted box area.The formula is as follows:

and 2 hC
are the width and height of the minimum circumscribed rectangle of the prediction box and the target box, respectively, is the Euclidean distance between the prediction box and the target box, b is the center point of the prediction box, gt b is the center point of the target box, w and h is the width and height of the predic- tion box, respectively, and gt w and gt h are the width and height of the target box, re- spectively.

Figure 8
Figure 8 shows the improved PR curve of the YOLOv5s model.It can be seen that the improved model has achieved better detection results for various military targets.

Figure 8 .
Figure 8. PR curve graph of the improved YOLOv5s model.

Table 1 .
Details of the MITD.

Table 2 .
Results of ablation experiments.

Table 3 .
Ablation experiment results under different losses.

Table 6 .
Performance comparison results of different algorithms.