An SAR Imaging and Detection Model of Multiple Maritime Targets Based on the Electromagnetic Approach and the Modified CBAM-YOLOv7 Neural Network

: This paper proposes an Synthetic Aperture Radar (SAR) imaging and detection model of multiple targets at the maritime scene. The sea surface sample is generated according to the composite rough surface theory. The SAR imaging model is constructed based on a hybrid EM calculation approach with the fast ray tracing strategy and the modiﬁed facet Small Slope Approximation (SSA) solution. Numerical simulations calculate the EM scattering and the SAR imaging of the multiple cone targets above the sea surface, with the scattering mechanisms analyzed and discussed. The SAR imaging datasets are then set up by the SAR image simulations. A modiﬁed YOLOv7 neural network with the Spatial Pyramid Pooling Fast Connected Spatial Pyramid Convolution (SPPFCSPC) module, Convolutional Block Attention Module (CBAM), modiﬁed Feature Pyramid Network (FPN) structure and extra detection head is developed. In the training process on our constructed SAR datasets, the precision rate, recall rate, mAP@0.5 and mAP@0.5:0.95 are 97.46%, 90.08%, 92.91% and 91.98%, respectively, after 300 rounds of training. The detection results show that the modiﬁed YOLOv7 has a good performance in selecting the targets out of the complex sea surface and multipath interference background.


Introduction
Synthetic Aperture Radar (SAR) is an all-weather and all-time remote sensing platform that can provide high-resolution microwave radar imaging features of the target.The SAR imaging and detections of maritime targets have significant applications in the aspects of sea target monitoring and recognition, sea rescuing, sea crisis management, etc. [1][2][3][4][5].Sea targets usually have different sizes and various scattering mechanisms.These factors are challenges for sea target detection.In recent years, Deep Learning (DL) methods have been introduced and show excellent performance in the image recognition and detection domain [6][7][8][9][10].You Only Look Once (YOLO) is one of those methods, which has been developed in various series and shows good performance in image detection [11][12][13][14][15]. YOLOv7 is the latest version at the time of this research.It has an optimized training process and a higher detection accuracy, as indicated in the latest studies [15][16][17].In maritime SAR detections, the DL method requires plenty of SAR datasets to serve as training samples.The practical maritime environment varies with time, showing different radar features for various sea states.In addition, it can take a long time to obtain extensive SAR samples by means of practical measurements.Furthermore, SAR image samples during high sea states are also not easy to acquire.The physical mechanisms behind the SAR features and their influence on SAR detection performance are also not easy to illuminate by measurement alone.
Electromagnetic models can serve as a close-to-real tool to set up the SAR imaging model and employ the SAR data simulations with higher efficiencies owing to their low cost, high verisimilitude and convenient implementation.The electromagnetic scattering calculation model needs to be set up for the sea targets.This issue has complex composite scattering mechanisms, which include not only the individual target and sea scattering but also the target-and-target and target-and-sea interactions.At the SAR frequencies, the composite targets and real sea surface model are electrically very large with complex scattering mechanisms in the radar sights.All these factors make the issue troublesome and intriguing.The rigorous full-wave models (e.g., MoM, MLFMA, FEM, FDTD) [18][19][20] are accurate, but they are mostly limited to handling relatively electrically small-scale EM simulations, which are far from practical uses.Thus, the analytical models are more attractive in practical applications since they have more applicable formulations with computational efficiencies.Some models have emerged in recent years, such as the Four Path Model (FPM) [21] and Half-Space Physical Optics (HSPO) [22].They are simple and efficient to apply in reality.However, they generally simplify the scattering mechanisms and ignore the local sea surface scattering and the complex high-order interactions in the composite target-sea scattering calculation, which is significant in the SAR imaging simulations.A "facet-based" model was developed in some previous works, which can describe the local scattering in the electrically large maritime scene accurately and efficiently.The large-scale sea surface is meshed by the facet elements with local scattering configurations.The SAR echo from these facet elements can be easily obtained.Franceschetti et al. [23,24] developed the facet model with the Kirchhoff solution and applied the model in ocean SAR echo simulations.Min Zhang et al. [25,26] modified the facet model with Two-Scale Methods (TSM) by further considering the Bragg phenomena and combining it with the four-path model for simulating the SAR image of the composite ship-ocean scene.However, deficiencies still exist.The two-scale method utilizes an ad hoc "cutoff" wavenumber as the selection criteria to determine the proportion of contributions from the multi-scale sea surface structures, which may affect the correctness of the calculation results.In addition, the facet-based four-path model is also too approximate to fully consider the local interactions between the multiple targets and the sea.
This paper aims to provide an SAR imaging and detection model with the EM approach and a modified CBAM-YOLOv7 neural network.The EM approach utilizes a facet-based sea scattering description and a hybrid high-frequency electromagnetic model.The EM approach is applied to build the SAR image simulation datasets.A modified CBAM-YOLOv7 neural network is applied for multiple target detection in the maritime scene.The rest part of the paper is organized as follows: The description of the EM approach to calculate the composite target and sea scattering is given in Section 2.Then, the SAR imagery simulator is set up, and the simulations are created.Section 3 constructs the modified YOLOv7 neural network model and accomplishes the SAR image training and detections, with the performance of the model compared and discussed.Finally, Section 4 ends with the conclusions.

The Sea Surface Scattering Calculations
The practical sea surface has composite structures, as illustrated in Figure 1.Assume the time-varied sea height map is ξ(r, t).The large-scale waves, z(r, t), can be generated at each time moment as a rough surface sample by the Monte Carlo method, according to the rough surface theory with a certain sea spectrum [27].The Elfouhaily's sea spectrum and the spreading function [28] are used in this paper.In the facet model, the long waves are meshed by large planar facets.The short waves ζ c (r) serve as modifications upon the planar facet.As indicated in the Bragg theory, the radar scattering contributions from the short waves are mainly from its Bragg wave components [29,30], which can be represented by the expansion of sinusoidal waves traveling at the Bragg resonant frequency with the form: where κ c is the Bragg wave vector, B(κ c ) = S(κ c )/∆S is the amplitude, ψ is the random phase, S(κ c ) is the short wave spectrum, ω c is the circle frequency, which is assumed to follow the dispersion relationship in the deep water with ω 2 c = κg and g as the gravity acceleration.A Facet Small Slope Approximation (SSA) method is employed as the EM approach in calculating the sea surface scattering.The SSA method was proposed by Voronovich et al. [31][32][33] in a description of the scattering from an electrically large and complex environment.However, it only provides an average scattering coefficient of the sea surface, which is not enough to provide a high-resolution SAR imaging return.In the earlier studies, the SSA method was combined with the facet model to study the local scattering of the multi-scale structures of the complex environment, with good agreement with the measurement data of the practical environment radar scattering [34][35][36].In Figure 1, the global and local coordinates are adopted to illustrate the scattering configurations.The Scattering Amplitude (SA) can be calculated by where B(k, k 0 ) is a polarization matrix [31][32][33], k 0 and q 0 are the horizontal and vertical projections of the incident vector k i , and k and q are the horizontal and vertical projections of the scattering vector k s .The scattering vector is q = k(k s − k i ).T(r, z(r)) is the taper wave function.P inc is the incident power.The taper factor is set as g x = L x /4, and g y = L y , as suggested by [37].L x and L y are the lengths of the sea surface along the x-and y-directions.
In the classical SSA, the size of the meshed facets is requested to ensure that the integral upon the short-wave structure is precisely solved [31][32][33].It is hard for the method to handle the electrically very large environment.The calculation also consumes a lot of computation resources.Here, the SSA is employed on the large sea facets with the Bragg wave modifications, which can save a lot of calculation time.The integral kernel upon the facet can be analytically solved as: where ∆S is the facet area, r c is the center of the large sea facet, J n (•) is the Bessel function, and only the dominant terms (n = 0, ±1) are reserved in the Bessel series expansion.At the facet, the local coordinate, I 0 (κ c ), is given by The scattering coefficient is defined as Figure 2 presents the simulated backward scattering coefficients.The simulated area of the sea is 128 m × 128 m.The sea facets are meshed into the size of 1 m × 1 m.The dielectric constant of the sea is calculated according to the Klein model [38] at 20 • C and 32.5% of salinity.The incident frequency is 13.9 GHz.The incident angle is θ i = 30 • and ϕ i = 0 • .The results are obtained by averaging the simulated scattering from 100 sea realizations.Comparisons are made between the simulated results and the measurement data in the reference [39], the results from the SSA-2 method proposed by Voronovich [31][32][33], and the two-scale method proposed by Andreas Arnold-Bos and Ali Khenchaf [40].The sea wind speed in Elfouhaily's sea spectrum U 10 = 5 m/s.From the results, one can see that our method and the SSA-2 method better agree with the measurement results in the real scene compared with the TSM results for different incident angles in both polarizations.Additionally, the facet SSA method has a unified formula and better adaptability in the description of the specular and diffuse scattering upon the local sea surface in different sea conditions.This can validate the accuracy of our method.
where  is the facet area, is the center of the large sea facet,  is the Bessel function, and only the dominant terms ( =  ) are reserved in the Bessel series expansion.At the facet, the local coordinate, κ , is given by The scattering coefficient is defined as Figure 2 presents the simulated backward scattering coefficients.The simulated area of the sea is 128 m × 128 m.The sea facets are meshed into the size of 1 m × 1 m.The dielectric constant of the sea is calculated according to the Klein model [38] at 20 °C and 32.5% of salinity.The incident frequency is 13.9 GHz.The incident angle is θi = 30° and φi = 0°.The results are obtained by averaging the simulated scattering from 100 sea realizations.Comparisons are made between the simulated results and the measurement data in the reference [39], the results from the SSA 2 method proposed by Voronovich [31][32][33], and the two scale method proposed by Andreas Arnold Bos and Ali Khenchaf [40].The sea wind speed in Elfouhaily's sea spectrum U10 = 5 m/s.From the results, one can see that our method and the SSA 2 method better agree with the measurement results in the real scene compared with the TSM results for different incident angles in both polarizations.Additionally, the facet SSA method has a unified formula and better adaptability in the description of the specular and diffuse scattering upon the local sea surface in different sea conditions.This can validate the accuracy of our method.

The Calculations of the Composite Target and Sea Model Scattering
The composite target and sea model involve hybrid scattering mechanisms.At the high microwave radar frequencies, the scattering from the target and sea surface model becomes localized and uncorrelated.A facet-based hybrid high-frequency scattering calculation approach is adopted in calculating the composite EM scattering.For the illuminated target facets, the scattering field is calculated by the Physical Optics (PO) method, as given by where ks is the unit scattering vector.R is the radar-to-target distance.J(r ′ ) is the equivalent current for the illuminated facets, which is calculated as Here, the tapered wave is also used as the incident wave.The integral in Equation ( 6) can be solved by Gordon's method [41].The coupling interactions among facets are processed by a fast ray tracing process, as shown in Figure 3.
where ˆ is the unit scattering vector.R is the radar to target distance.is the equivalent current for the illuminated facets, which is calculated as Here, the tapered wave is also used as the incident wave.The integral in Equation ( 6) can be solved by Gordon's method [41].The coupling interactions among facets are processed by a fast ray tracing process, as shown in Figure 3.The incident wave is modeled by the ray tube, which traces and bounces according to the GO principles until it no longer intersects with any facet.Once the target or the sea facet is shone by the ray, the scattered field is, respectively, calculated by PO and the facet SSA method.The total scattering field is vector summed by the scattered field from all the facet elements.The ray tracing process can be further accelerated by the bi directional ray tracing technique [42] and the KD tree acceleration technique [43].
Apart from the surface scattering, the diffractions from the target edges also contribute.They make a specific contribution to the composite scattering.These components are evaluated by the EEC method as follows and have the form: where and are the incident fields.ˆ is the unit vector along the edges.
, , are the EEC diffraction coefficients [44].The scattering coefficient is defined for the composite scattering, as given by The incident wave is modeled by the ray tube, which traces and bounces according to the GO principles until it no longer intersects with any facet.Once the target or the sea facet is shone by the ray, the scattered field is, respectively, calculated by PO and the facet SSA method.The total scattering field is vector summed by the scattered field from all the facet elements.The ray tracing process can be further accelerated by the bi-directional ray tracing technique [42] and the KD-tree acceleration technique [43].
Apart from the surface scattering, the diffractions from the target edges also contribute.They make a specific contribution to the composite scattering.These components are evaluated by the EEC method as follows I e and I m have the form: where E inc and H inc are the incident fields.t is the unit vector along the edges.D EEC e , D EEC em , D EEC m are the EEC diffraction coefficients [44].The scattering coefficient is defined for the composite scattering, as given by The bistatic scattering coefficients are shown in Figure 4.
Figure 4a shows the different polarization scattering coefficients of the multiple targets above the sea surface, which consisted of a long cone (cone length is 5 m, base radius is 1 m) and a short cone target (cone length is 3 m, base radius is 1 m). Figure 4b calculates the bistatic scattering from a group of two long cones and two short cones.The cones are 5 m high above the sea.The incident angle wave is θ i = 40 • , ϕ i = 0 • .One can see that the composite scattering has the strongest value in the specular direction, which is mainly contributed by the sea surface scattering.The reflection from the cone surface, as well as the coupling between the sea and the cone, also causes peak values in the backward directions.One can also see that the coupling effect for the four-cone group is stronger, which disperses the scattering energy into other diffuse directions.Figure 4a shows the different polarization scattering coefficients of the multiple targets above the sea surface, which consisted of a long cone (cone length is 5 m, base radius is 1 m) and a short cone target (cone length is 3 m, base radius is 1 m). Figure 4b calculates the bistatic scattering from a group of two long cones and two short cones.The cones are 5 m high above the sea.The incident angle wave is  = = .One can see that the composite scattering has the strongest value in the specular direction, which is mainly contributed by the sea surface scattering.The reflection from the cone surface, as well as the coupling between the sea and the cone, also causes peak values in the backward directions.One can also see that the coupling effect for the four cone group is stronger, which disperses the scattering energy into other diffuse directions.

The SAR Imaging Model
The SAR image model is set up based on the airborne strip mode in this section, as shown in Figure 5.The SAR carrier platform is flying at a velocity of v along the y axis.θi is the radar squint angle.R is the distance from the radar to the scatter in the simulated scene.The SAR raw echo is set up at the coordinate system in Figure 5, given by

The SAR Imaging Model
The SAR image model is set up based on the airborne strip mode in this section, as shown in Figure 5. Figure 4a shows the different polarization scattering coefficients of the multiple targets above the sea surface, which consisted of a long cone (cone length is 5 m, base radius is 1 m) and a short cone target (cone length is 3 m, base radius is 1 m). Figure 4b calculates the bistatic scattering from a group of two long cones and two short cones.The cones are 5 m high above the sea.The incident angle wave is  = = .One can see that the composite scattering has the strongest value in the specular direction, which is mainly contributed by the sea surface scattering.The reflection from the cone surface, as well as the coupling between the sea and the cone, also causes peak values in the backward directions.One can also see that the coupling effect for the four cone group is stronger, which disperses the scattering energy into other diffuse directions.

The SAR Imaging Model
The SAR image model is set up based on the airborne strip mode in this section, as shown in Figure 5.The SAR carrier platform is flying at a velocity of v along the y axis.θi is the radar squint angle.R is the distance from the radar to the scatter in the simulated scene.The SAR raw echo is set up at the coordinate system in Figure 5, given by The SAR carrier platform is flying at a velocity of v along the y-axis.θ i is the radar squint angle.R is the distance from the radar to the scatter in the simulated scene.The SAR raw echo is set up at the coordinate system in Figure 5, given by where where ∆R and ∆y are the range and azimuth position alterations of the scatterers.f is the radar working center frequency.∆ f is the frequency bandwidth.τ is the pulse duration.
where • refers to the average of the scattering coefficients in the integration time.u r refers to the orbital velocity.f r (•) refers to the range resolution function.ρ aN = Nρ a is the azimuth resolution after N times of SAR incoherent illuminations.ρ aN ′ is the degraded azimuth resolution because of the target and sea scatter movements.
In the following simulations, the SAR image characteristics of the low-flying cone targets at the maritime scene are investigated.The targets fly along the negative x-axis direction with a constant velocity of 100 m/s.The velocity of the SAR platform is v = 100 m/s.The radar incident angle θ i = 30 • .The carrier frequency is 10 GHz.The base-band bandwidth is 1 GHz.
Figure 6 presents the simulated SAR images for the two-cone group targets at different sea states (U 10 = 3 m/s and U 10 = 10 m/s).The long and short cone targets, as well as the distribution characteristics of the wave propagations along the wind directions, can all be clearly identified.The color bar indicates the radar scattering intensities.The shadow areas of the sea images are the targets' shadows under the radar wave illuminations.One can also identify the artificial target images caused by the multipath echo responses, which can also be clearly observed in Figure 6a.These images are quite similar to the images from the direct target responses, but they have weaker intensities, shifts and distortions compared with the target images in both the range and azimuth directions.They are caused by the phase delay of the multipath interactions.These phenomena are also observed and referred to as "multipath ghosts" in other related studies [46,47].One can also observe in Figure 6b that the intensities of the multipath response images can become even weaker at the higher sea state and are not easily identified.In Figure 7, the SAR characteristics of the four-cone group targets are further investigated.It can be seen that the artificial targets are distributed in a broader area.At the high sea state in Figure 7b, more artificial images exist, but their intensities are weaker.From the above simulations, one can see that the SAR images of the targets at maritime scenes are much more complex than those of targets in the free space.In this scenario, the target detections can suffer more influence from the sea clutter and the multipath interactions between the target and the sea surface.This issue is also indicated in many references [46][47][48][49].

Modified CBAM-YOLOv7 Neural Network
YOLOv7 neural network is an effective target detection method in the YOLO series [11][12][13][14][15].In comparison with the earlier version of the YOLO series, the YOLOv7 network introduces a multi-scale testing technique that shows good potential in complex SAR image target detections [15][16][17].To handle the difficulties in the issue of the SAR detections of the targets in the complex maritime scene, a modified YOLOv7 method is adopted, the structure of which is shown in Figure 8.The YOLOv7 structure has three main parts: the backbone network, the Feature Pyramid Network (FPN) and the Detection head.The backbone network has four CBS modules.The CBS module extracts the underlying target features by means of the composite convolutional module, the batch normalization (BN) module and the SiLU module.The Max Pool (MP) module and the Efficient Layer Aggregation Network (ELAN) are used for feature fusion [15].Their structures are shown in Figure 9.The Spatial Pyramid Pooling Connected Spatial Pyramid Convolution (SPPCSPC) module is used in the original YOLOv7 [15], which performs down sampling through parallel pooling operations, as in Figure 10.The SPPCSPC has three differently sized convolution kernels, which can greatly improve network calculations.A Spatial Pyramid Pooling Fast Connected Spatial Pyramid Convolution (SPPFCSPC) module is used in this study instead of the SPPCSPC module [50], whose structure is shown in Figure 10.In the SPPFCSPC module, the serial maximum pooling operations are adopted in the parallel maximum pooling operation, which can raise the computation efficiency while maintaining the receptive field.The Spatial Pyramid Pooling Connected Spatial Pyramid Convolution (SPPCSPC) module is used in the original YOLOv7 [15], which performs down-sampling through parallel pooling operations, as in Figure 10.The SPPCSPC has three differently sized convolution kernels, which can greatly improve network calculations.A Spatial Pyramid Pooling Fast Connected Spatial Pyramid Convolution (SPPFCSPC) module is used in this study instead of the SPPCSPC module [50], whose structure is shown in Figure 10.In the SPPFCSPC module, the serial maximum pooling operations are adopted in the parallel maximum pooling operation, which can raise the computation efficiency while maintaining the receptive field.On the basis of the original YOLOv7 network, the Convolutional Block Attention Module (CBAM) [51][52][53] is employed to highlight the important target features.It can enhance the important features while suppressing non important features.The spatial features are added together through the average pooling and the maximum pooling.The structure of the CBAM is presented in Figure 11.On the basis of the original YOLOv7 network, the Convolutional Block Attention Module (CBAM) [51][52][53] is employed to highlight the important target features.It can enhance the important features while suppressing non-important features.The spatial features are added together through the average pooling and the maximum pooling.The structure of the CBAM is presented in Figure 11.On the basis of the original YOLOv7 network, the Convolutional Block Attention Module (CBAM) [51][52][53] is employed to highlight the important target features.It can enhance the important features while suppressing non important features.The spatial features are added together through the average pooling and the maximum pooling.The structure of the CBAM is presented in Figure 11.It contains the Channel Attention Mechanism (CAM) module and the Spatial Attention Mechanism (SAM) module [51].The channel attention concerns the meaningful information in a given image.The spatial attention concerns the location of the meaningful area in the image.The input feature F is transferred into the feature map of F ∈ R C×H×W , where C refers to the channel number and H and W refer to the height and width of the feature map.The channel attention map M c ∈ R C×1×1 is calculated by the CAM module by squeezing the feature maps of the pooling layers with the activation function given by M c (F) = Signmoid(MLP(AvgPool(F)) + MLP(MaxPool(F))) (15) where Signmoid() refers to the sigmoid activation function [53].MLP is the parameterized multilayer perceptron function, which can integrate the channel information and output the weighted map.The spatial attention is acquired by the average pooling and max pooling operations along the channel.Then, the spatial attention map M s ∈ R 1×H×W is obtained by concatenation and convolution operations given by where 7 × 7 denotes the 7 × 7 filter-sized convolution operation.The output feature map can be obtained by The target-sea image has multi-scale image features.The target features decrease with the increase in the convolutional layers.The Feature Pyramid Network (FPN) structure is adopted to mix the shallow and high-level features [54][55][56].The semantic features are transferred from the top to the bottom by up-sampling.The localization feature fusion is conducted by the down-sampling of the feature map.On the basis of the traditional FPN layers, an extra 160 × 160 feature layer is added to fuse the information of the smaller feature layers before down-sampling.Correspondingly, an extra detection head is introduced in the modified YOLOv7, as presented in Figure 8.The added prediction head is produced from the low-level and high-resolution feature maps to improve the capability of target feature extractions in the complex environment.The ELAN-H [57] is employed to increasingly improve the training and learning ability of the network, whose structure is given in Figure 12.
The target sea image has multi scale image features.The target features decrease with the increase in the convolutional layers.The Feature Pyramid Network (FPN) structure is adopted to mix the shallow and high level features [54][55][56].The semantic features are transferred from the top to the bottom by up sampling.The localization feature fusion is conducted by the down sampling of the feature map.On the basis of the traditional FPN layers, an extra 160 × 160 feature layer is added to fuse the information of the smaller feature layers before down sampling.Correspondingly, an extra detection head is introduced in the modified YOLOv7, as presented in Figure 8.The added prediction head is produced from the low level and high resolution feature maps to improve the capability of target feature extractions in the complex environment.The ELAN H [57] is employed to increasingly improve the training and learning ability of the network, whose structure is given in Figure 12.

SAR Imaging Training and Detection Process
The SAR imaging training and detection process is given in Figure 13.

SAR Imaging Training and Detection Process
The SAR imaging training and detection process is given in Figure 13.Firstly, the SAR dataset is constructed.The SAR images are obtained by the SAR imaging model with the EM calculations of the scattering from the long and short cone targets, and sea surface samples are generated at different times in different sea states and radar incident conditions.Then, the SAR image data are framed and labeled by well-trained researchers with the help of the image annotation tool labelImg [58].The SAR images are then preprocessed, resized into 640 × 640 and input into the backbone structure of the YOLOv7 network.The Stochastic Gradient Descent (SGD) optimizer is employed to train YOLOv7.The batch size for the optimization is set at 16.The training epoch is 300.There are 1024 SAR image samples used to establish the training datasets; 1/3 of the 1024 training samples are randomly selected to establish the validation dataset.The initial training was conducted using the pre-trained weights for YOLOv7 on the COCO dataset [59].The best weights can be obtained after the whole training process, which is used in the final SAR imaging detections.The numerical simulations are conducted on a WorkStation with Intel(R) Core(TM) i9-10885H CPU @ 2.40 GHz.The GPU version is NVIDIA GeForce GTX 1650Ti with Max-Q Design.
ELAN H [57] is employed to increasingly improve the training and learning ability of the network, whose structure is given in Figure 12.

SAR Imaging Training and Detection Process
The SAR imaging training and detection process is given in Figure 13.Firstly, the SAR dataset is constructed.The SAR images are obtained by the SAR imaging model with the EM calculations of the scattering from the long and short cone targets, and sea surface samples are generated at different times in different sea states and radar incident conditions.Then, the SAR image data are framed and labeled by well trained researchers with the help of the image annotation tool labelImg [58].The SAR images are then preprocessed, resized into 640 × 640 and input into the backbone structure of the YOLOv7 network.The Stochastic Gradient Descent (SGD) optimizer is employed to train YOLOv7.The batch size for the optimization is set at 16.The training epoch is 300.There are 1024 SAR image samples used to establish the training datasets; 1/3 of the 1024 training samples are randomly selected to establish the validation dataset.The initial training was conducted using the pre trained weights for YOLOv7 on the COCO dataset [59].The best weights can be obtained after the whole training process, The precision rate, recall rate and Mean Average Precision (mAP) are employed to evaluate the training performance of the network according to the comparisons with the Intersection-Over-Union (IOU) value [60].TP (True Positive) represents the number of correctly detected targets.The IOU of the predicted result is no less than 0.5 with the actual label.Otherwise, it is regarded as a False Positive (FP) detection.False Negative (FN) represents the number of missed targets.The precision rate and recall rate refer to the proportion of the correct detected targets in the total detected targets and the total number of targets, respectively, as given by Precision = TP (TP + FP) (18) Recall = TP (TP + FN) (19) The Average Precision (AP) can be calculated by the integration of the average value of the highest precision under different recall conditions.The mean Average Precision (mAP) is the mean value of AP for each category, given by AP = There are two kinds of commonly used mAPs, mAP@0.5 and mAP@0.5:0.95.The mAP@0.5 is the mAP with the IoU set at 0.5; mAP@0.5:0.95 has an average mAP at different IoU thresholds from 0.5 to 0.95 with a step size of 0.05. Figure 14 compares the evaluation metrics of the original YOLOv7 and the modified YOLOv7 when training on the SAR image datasets constructed by the numerical simulations.One can see from the comparison results that the precision and recall rates of the modified YOLOv7 are 97.46% and 90.08% after 300 rounds of training, which are 4.04% and 3.71% higher than those of the original YOLOv7 network.In addition, the value of the precision and recall for the modified YOLOv7 can stabilize with much fewer training times.The values of mAP@0.5 and mAP@0.5:0.95 of the modified YOLOv7 model are 92.91% and 91.98%, which are 3.94% and 14.26% higher than those of the original YOLOv7 model.mAP@0.5 is the mAP with the IoU set at 0.5; mAP@0.5:0.95 has an average mAP at different IoU thresholds from 0.5 to 0.95 with a step size of 0.05. Figure 14 compares the evaluation metrics of the original YOLOv7 and the modified YOLOv7 when training on the SAR image datasets constructed by the numerical simulations.One can see from the comparison results that the precision and recall rates of the modified YOLOv7 are 97.46% and 90.08% after 300 rounds of training, which are 4.04% and 3.71% higher than those of the original YOLOv7 network.In addition, the value of the precision and recall for the modified YOLOv7 can stabilize with much fewer training times.The values of mAP@0.5 and mAP@0.5:0.95 of the modified YOLOv7 model are 92.91% and 91.98%, which are 3.94% and 14.26% higher than those of the original YOLOv7 model.Figure 15 shows the comparisons of the bounding box loss and object loss curves for the two networks.It can be seen that the loss value of the modified YOLOv7 network is lower and decreases more quickly than the original one.In the training process, the hyperparameters of the networks were fine tuned as well.The default hyperparameters were used in the first training to establish a performance baseline.Then, the random search method was adopted to find the hyperparameters with better performance than the baseline.By doing so, the fine tuned neural network and desired precision, recall and mAP values are achieved.The above comparisons show that the modified YOLOv7 can obtain a great performance improvement in the training.These results are consistent with the fact that modules, like CBAM and FPN   Figure 15 shows the comparisons of the bounding box loss and object loss curves for the two networks.It can be seen that the loss value of the modified YOLOv7 network is lower and decreases more quickly than the original one.In the training process, the hyperparameters of the networks were fine tuned as well.The default hyperparameters were used in the first training to establish a performance baseline.Then, the random search method was adopted to find the hyperparameters with better performance than the baseline.By doing so, the fine tuned neural network and desired precision, recall and mAP values are achieved.The above comparisons show that the modified YOLOv7 can obtain a great performance improvement in the training.These results are consistent with the fact that modules, like CBAM and FPN In the training process, the hyperparameters of the networks were fine-tuned as well.The default hyperparameters were used in the first training to establish a performance baseline.Then, the random search method was adopted to find the hyperparameters with better performance than the baseline.By doing so, the fine-tuned neural network and desired precision, recall and mAP values are achieved.The above comparisons show that the modified YOLOv7 can obtain a great performance improvement in the training.These results are consistent with the fact that modules, like CBAM and FPN in references [61][62][63], are effective in improving the performance of neural network models.
In the following simulations, the detections are employed on the SAR images of the multiple cone targets in the maritime environment with the two YOLOv7 networks.Figures 16 and 17 show the comparisons of the SAR detection results for a long-cone and short-cone target above the sea surface when U 10 = 3 m/s and U 10 = 10 m/s by means of the original YOLOv7 network and the modified YOLOv7 network.In the detected images, the targets need to be selected from the sea surface background and the multipath echo interference.Figures 16a and 17a show the detection results by the original YOLOv7 network.From the results, one can see that the detection performance of the original YOLOv7 network worsens in the higher sea state conditions.According to the aforementioned analysis, the backscattering intensities of the sea surface can be stronger at the higher sea state.In this condition, the target detection can be more severely affected by the sea surface background.Additionally, it can be seen that the detections of the smaller cone target are more easily affected by the sea background.It means that smaller target detections in large maritime scenes are more difficult.Figures 16b and 17b show the detection results by the modified YOLOv7 network.One can also see that in comparison with the results of the original YOLOv7 network, the modified YOLOv7 network apparently improves the ability to extract the multi-scale features from the complex background.Thus, a better detection performance can be obtained.
Figures 18 and 19 show the detection results in a more complex environment.SAR detections are employed for the SAR images of the two long-cone targets and two short-cone targets in the maritime scene.From the above results, one can see that the interference of the sea background and the artificial targets caused by the multipath interactions between the targets, as well as the target and sea surface, are more severe.The detection performance of the original YOLOv7 network suffered more loss.In Figure 18a, the target detection performance is mainly affected by the artificial target image caused by the multipath echo at the low sea state.In this condition, the backscattering intensity of the multipath echo is apparently very strong.The detections of the nearby targets can be affected.In Figure 19a, the target detection performance is mainly affected by the sea surface when the sea wind speed is strong.The reason is similar to the analysis of the results in Figures 16 and 17.In Figures 18b and 19b, an apparent improvement in the detection performance of the modified YOLOv7 network has been seen in the same detection condition.It further proves that the modified YOLOv7 network has a stronger ability to extract accurate multi-scale features and detect the targets in the complex maritime environment.

Conclusions
SAR imaging and detection of maritime targets play a significant role in many fields, but a reliable and convenient way to obtain SAR datasets is still lacking.This paper proposes an SAR imaging model of the multiple targets in the maritime scene based on a hybrid electromagnetic approach.A facet SSA is derived to evaluate the local scattering with the Bragg phenomenon upon the electrically large sea environment with great accuracies and efficiencies, which is validated through a comparison with methods in previous studies and the measurement data [39,40].The coupling interactions among the target and sea surface scatterers are calculated by a fast ray tracing process with the bidirectional tracing technique and the KD-tree acceleration method.The model is well applied in handling the SAR imagery simulations of the multiple targets in the maritime scene with different sea wind speeds.Numerical simulations show the distribution characteristics of the sea wave propagations as well as the multipath artificial images caused by the coupling interactions.The target image has a relatively distinct profile.The images caused by the multipath interference are obscure and weaker.The phenomena are also indicated in the references [46][47][48][49].A modified YOLOv7 neural network with the SPPFCSPC module, CBAM, FPN structure and the extra detection head is developed for the SAR image target detections.The dataset is constructed by the simulated SAR images of the short-cone and long-cone targets above the sea surface in the various sea and target conditions.In the training process, the modified YOLOv7 network shows apparent better performance in the precision rate, recall rate and average precision.The modules used in the modified YOLO network were also used in other research fields and showed good performance [61][62][63], which further demonstrates the improvement effect of the modified YOLOv7 network.In references [61][62][63], the SAR image simulations provide diverse samples to train the neural network model.Thus, the overfitting to a specific dataset of the target features will have a limited influence on the generalization of the target detections in different maritime scenes and conditions.The modified YOLOv7 network also shows a stronger ability to extract the features from the multiple targets in the maritime scene at different sea states.However, it also needs to be admitted that the real sea surface structures and scattering mechanisms are very complex.It's hard for a model to wholly characterize the geometry features of various kinds of real sea surfaces.In this article, the non-Bragg scattering mechanisms from the sea surface structures in some complex sea states, such as breaking waves, whitecaps and internal waves, as indicated in references [33], are not included.In addition, the practical measurement experiments in the scenarios described in this article have not been carried out yet.These can be further researched and serve as improvements to the existing model to make it more robust in future studies.

Figure 1 .
Figure 1.The geometry of multi-scale sea surface and the coordinates.
bistatic scattering coefficients are shown in Figure 4. (a) (b)

Figure 4 .
Figure 4.The scattering from multiple cone targets above the sea surface.(a) Two cone targets.(b) Four cone targets.

Figure 5 .
Figure 5. Sketch of the synthetic aperture radar imaging model.

Figure 4 .
Figure 4.The scattering from multiple cone targets above the sea surface.(a) Two cone targets.(b) Four cone targets.

Figure 4 .
Figure 4.The scattering from multiple cone targets above the sea surface.(a) Two cone targets.(b) Four cone targets.

Figure 5 .
Figure 5. Sketch of the synthetic aperture radar imaging model.

Figure 5 .
Figure 5. Sketch of the synthetic aperture radar imaging model.
λ is the wavelength.rect(•) is the rectangular window function.c is the propagation speed of the electromagnetic waves.ω(•) is the antenna directionality function.Y = λR 0 /L is the azimuth width of the antenna beam footprint.σ is the backscattering coefficient of the scatterer in the scene.l is the length of the radar wave traveling.The sea height maps are generated, and σ for each scatterer is calculated at each slow time moment.Through image focusing [45], the ensemble-averaged SAR image intensity is obtained by

Figure 6 .
Figure 6.Synthetic aperture radar images of two-cone group targets above the sea at different sea states.(a) Low sea state (U 10 = 3 m/s).(b) High sea state (U 10 = 10 m/s).

Figure 7 .
Figure 7. Synthetic aperture radar images of four-cone group targets above the sea at different sea states.(a) Low sea state (U 10 = 3 m/s).(b) High sea state (U 10 = 10 m/s).

Figure 8 .
Figure 8.The modified Convolutional Block Attention Module-You Only Look Once version 7 (CBAM YOLOv7) structure.The YOLOv7 structure has three main parts: the backbone network, the Feature Pyramid Network (FPN) and the Detection head.The backbone network has four CBS modules.The CBS module extracts the underlying target features by means of the composite convolutional module, the batch normalization (BN) module and the SiLU module.The Max Pool (MP) module and the Efficient Layer Aggregation Network (ELAN) are used for feature fusion [15].Their structures are shown in Figure 9.

Figure 8 .
Figure 8.The modified Convolutional Block Attention Module-You Only Look Once version 7 (CBAM-YOLOv7) structure.The YOLOv7 structure has three main parts: the backbone network, the Feature Pyramid Network (FPN) and the Detection head.The backbone network has four CBS modules.The CBS module extracts the underlying target features by means of the composite convolutional module, the batch normalization (BN) module and the SiLU module.The Max Pool

Figure 9 .
Figure 9.The structures of the CBS, Max Pool (MP) and Efficient Layer Aggregation Network (ELAN) in the You Only Look Once version 7 (YOLOv7) backbone.

Figure 9 .
Figure 9.The structures of the CBS, Max Pool (MP) and Efficient Layer Aggregation Network (ELAN) in the You Only Look Once version 7 (YOLOv7) backbone.

Figure 10 .
Figure 10.The structures of the Spatial Pyramid Pooling Connected Spatial Pyramid Convolution (SPPCSPC) and the Spatial Pyramid Pooling Fast Connected Spatial Pyramid Convolution (SPPFCSPC).

Figure 11 .
Figure 11.The Convolutional Block Attention Module (CBAM) structure.It contains the Channel Attention Mechanism (CAM) module and the Spatial Attention Mechanism (SAM) module[51].The channel attention concerns the meaningful in-

Figure 10 .
Figure 10.The structures of the Spatial Pyramid Pooling Connected Spatial Pyramid Convolution (SPPCSPC) and the Spatial Pyramid Pooling Fast Connected Spatial Pyramid Convolution (SPPFCSPC).

Figure 10 .
Figure 10.The structures of the Spatial Pyramid Pooling Connected Spatial Pyramid Convolution (SPPCSPC) and the Spatial Pyramid Pooling Fast Connected Spatial Pyramid Convolution (SPPFCSPC).

Figure 11 .
Figure 11.The Convolutional Block Attention Module (CBAM) structure.It contains the Channel Attention Mechanism (CAM) module and the Spatial Attention Mechanism (SAM) module [51].The channel attention concerns the meaningful information in a given image.The spatial attention concerns the location of the meaningful area in the image.The input feature is transferred into the feature map of

Figure 13 .
Figure 13.The synthetic aperture radar imaging training and detection process.

Figure 13 .
Figure 13.The synthetic aperture radar imaging training and detection process.

Figure 14 .
Figure 14.The comparisons of the evaluation metrics between the original You Only Look Once version 7 (YOLOv7) network and the modified YOLOv7 network.(a) Precision rate.(b) Recall rate.(c) mAP@0.5.(d) mAP@0.5:0.95.

Figure 15 .
Figure 15.The loss value comparisons between the original and modified You Only Look Once version 7 (YOLOv7) networks.(a) Box loss (b) Object loss.

Figure 14 .
Figure 14.The comparisons of the evaluation metrics between the original You Only Look Once version 7 (YOLOv7) network and the modified YOLOv7 network.(a) Precision rate.(b) Recall rate.(c) mAP@0.5.(d) mAP@0.5:0.95.

Figure 15
Figure15shows the comparisons of the bounding box loss and object loss curves for the two networks.It can be seen that the loss value of the modified YOLOv7 network is lower and decreases more quickly than the original one.

Figure 14 .
Figure 14.The comparisons of the evaluation metrics between the original You Only Look Once version 7 (YOLOv7) network and the modified YOLOv7 network.(a) Precision rate.(b) Recall rate.(c) mAP@0.5.(d) mAP@0.5:0.95.

Figure 15 .
Figure 15.The loss value comparisons between the original and modified You Only Look Once version 7 (YOLOv7) networks.(a) Box loss (b) Object loss.

Figure 15 .
Figure 15.The loss value comparisons between the original and modified You Only Look Once version 7 (YOLOv7) networks.(a) Box loss (b) Object loss.

Figure 16 .
Figure 16.The synthetic aperture radar detection results for the two-cone targets at the maritime scene when U 10 = 3 m/s.(a) Original You Only Look Once version 7 (YOLOv7) network.(b) Modified YOLOv7 network.

Figure 17 .
Figure 17.The synthetic aperture radar detection results for the two-cone targets at the maritime scene when U 10 = 10 m/s.(a) Original You Only Look Once version 7 (YOLOv7) network.(b) Modified YOLOv7 network.

Figure 18 .
Figure 18.The synthetic aperture radar detection results for the four cone targets at the maritime scene when U 10 = 3 m/s.(a) Original You Only Look Once version 7 (YOLOv7) network, (b) modified YOLOv7 network.

Figure 19 .
Figure 19.The synthetic aperture radar detection results for the four cone targets at the maritime scene when U 10 = 10 m/s.(a) Original You Only Look Once version 7 (YOLOv7) network, (b) modified YOLOv7 network.