3.2. Breadth Search Compensation Module (BSCM)
In the inshore scenes, complex background information introduces scattering noise in the SAR imaging mechanism, leading to interference and erroneous detections in the network. To tackle this concern, we propose a BSCM consisting of two main parts: MLKA and NDCL. It enables an extensive information search that leverages the contextual cues surrounding the targets to enhance the recognition, shape furnishing, and positional insights.
        First, we treat the input feature 
 with a 
 convolutional layer, a Batch Normalization (BN) layer, and an activation function, where 
C is the channel number, and 
H and 
W give the spatial size of the input. This yields 
, serving as the BSCM input.
Multi-scale Large Kernel Attention (MLKA): We employed MLKA to achieve an extensive information search. The pivotal component of MLKA is Multi-scale Large Kernel Convolution (MLKC). MLKC utilizes various sizes of Large Kernel Convolution (LKC) to create a multi-scale search window. This approach enables the effective selection of appropriate search windows for different-sized ship targets, thereby enhancing the target recognition capability. Specifically, 
 is achieved by decomposing the 
 convolution kernel into three consecutive convolutional layers, namely, 
 depthwise convolution 
, 
 depthwise dilated convolution 
 (
d is the dilated rate), and 
 convolution 
, formulated as
 MLKC constructs four 
 with different kernel sizes: 3-5-1, 5-7-1, 7-9-1, and 9-11-1, where 
a-
b-1 means cascading 
 depth-wise convolution, 
 depth-wise-dilated convolution, and point-wise convolution. Different from related work [
28] that used different expansion rates to realize different scales of receptive fields, this study uniformly set the expansion rate to 3, reduced the setting of hyperparameters, and made the network easier to understand and adjust. Specifically, we first applied a 1 × 1 convolution and GELU activation function to 
, which obtained 
, while preserving both the spatial and channel dimensions. Subsequently, we evenly divided 
 into 
n parts along the channel dimension 
. Each 
 underwent processing through 
, and their outcomes were concatenated along the channel dimension to construct feature information 
 with different receptive fields, formulated as follows:
        where 
 denotes the feature map concatenation along the channel dimension.
To enhance the connection between different receptive fields, we employed average pooling and max pooling to process 
, which effectively extracts spatial relationships from different receptive fields.
        where 
 and 
 are the max pooling and average pooling operators, which both reduce the channel dimension to 1. By concatenating these outcomes to yield spatial attention (SA) with a channel size of 2, we subsequently utilized a 
 convolution to expand the channel size to 4 to match the four distinct receptive fields. The sigmoid function processes 
 to capture crucial information. Multiplying and summing the processed 
 with 
 achieves effective spatial information fusion, detailed as
        where 
 represents the sigmoid function. We employed 
 to selectively extract feature information from different receptive fields and subsequently summed these values to derive the multi-head spatial attention (MSA). We multiplied 
 with 
 to obtain the output of the MLKC component.
Finally, we performed convolutional processing on  and applied skip connections to obtain the MLKA output .
Neural Discrete Codebook Learning (NDCL): MLKA adopts a strategy of employing dilated convolutions to achieve extensive information exploration. However, due to the presence of holes in convolutions, there might be a risk of information loss. To mitigate this loss, we introduced the NDCL method, which involves learning discrete information through the codebook, thereby compensating for the potential information deficiency that can arise from MLKA. As shown in 
Figure 3, the feature heatmap of MLKA had a large receptive field but lacked local detail attention. NDCL made up for this shortage and made the heatmap show more sensitive and detailed features in local areas. Finally, BSCM combined the output of MLKA and NDCL to achieve an accurate capture of the global information.
 For the input feature 
, we first obtained 
 through the stem block, which was then integrated into our NDCL module. We utilized a learnable codebook 
 to represent the dimensional information in 
Z, where 
K signifies the number of codewords 
 and 
N is the dimension of each codeword. By employing 
-dimensional codewords, we discretely represented 
Z, which effectively compensated for fine-grained information. Unlike previous dictionary-learning methods [
35,
36] that only establish codewords in the channel dimension, we extended this concept to include codewords within the spatial dimensions (
) to achieve a three-dimensional representation of local information. This was accomplished as follows:
        where 
 and 
 are the codebooks in the channel and spatial dimensions, respectively. 
 represents the 
k-th codeword. We replaced the corresponding dimensions of 
Z with the codewords to obtain the quantized feature 
v. Additionally, we employed a learnable scale factor 
 to adjust the similarity between the codeword and the dimensional information, whether in the channel or spatial dimensions.
        where 
 is the information in the channel dimension. 
 represents the feature information of each pixel in the spatial dimension. 
 is the 
k-th scaling factor. 
 denotes the softmax function. 
 and 
 mean the k-th quantized channel and spatial information, respectively. We computed the 
 distance between 
Z and the codewords using 
 and subsequently employed the softmax function to yield smoothed features. Following this, we employed 
 to combine all 
 and 
, where 
 comprises a BN layer with a ReLU activation layer and a mean layer. Based on this, the full information of the whole image with respect to the 
K codewords is calculated:
        We performed element-wise multiplication of the codewords and the input vector 
Z along the channel dimension, followed by summing the products. The output value 
a was obtained by applying the sigmoid function to the sum:
        where 
 represents the sigmoid function. The outcome of the NDCL could be determined using the following equation:
        where 
a aggregates the codeword information of the channel and space to adjust the required information by multiplying it with the feature 
Z.
Finally, we fused 
 and 
 along the channel dimension to achieve the output of the BSCM:
  3.3. Sine Fourier Transform Coding (SFTC)
In order to deal with the problem of boundary discontinuity caused by rotation angle periodicity, this section mainly introduces the encoding and decoding process of the detection box angle information predicted by the detection head. As shown in 
Figure 4, we sine-encoded the predicted angle 
. The angle was encoded using a four-step phase shift method [
37], where the initial phases were set at 0, 90, 180, and 270 degrees. This angle representation method complies with the sampling theorem and possesses encoding fault tolerance. The angle is represented by the sine function of two different frequencies:
        where 
 and 
 are two frequencies representing the conversion relationship with the predicted angle 
.
Sine encoding:  and 
 are encoded as 
 and 
 using sine functions as follows:
        where 
 and 
M is the number of sine components. We can deduce that 
 represents a rotation period of 
, while 
 corresponds to a rotation period of 
, corresponding to the period of the rectangular OBB and square OBB.
 Discrete Fourier transform (DFT): Directly performing regression calculations on these sinusoidal components will result in the loss of phase information of the components. Inspired by the wave–particle duality in quantum mechanics, a wave typically contains both amplitude and phase attributes, and the wave equation of a particle can fully describe its state from amplitude and phase. Therefore, we regard the sinusoidal wave components as the wave equation of free particles. Furthermore, there is a connection between the wave equation of free particles and the discrete Fourier transform. In the wave equation, the frequency is related to the oscillatory nature of the wave function, while in the Fourier transform, frequency represents the signal intensity. The superposition of free particle wave functions is analogous to spatial superposition, while superposition in the Fourier transform occurs in the frequency domain. Based on interference effects, the superposition state of wave functions reflects the phase relationship between particles. With the assistance of DFT, we realized the superposition of particles (sine components) to allow for a better observation of wave phenomena in the results and comprehensive utilization of amplitude and phase information to describe angles. The superposition of wave equations is shown as follows:
        where 
 is the frequency domain representation. 
 and 
 represent the discrete values of particles at different frequencies. 
k denotes the wave number. 
N is the number of particles where 
. 
e means the base of the natural logarithm. 
j refers to the imaginary unit. 
 and 
 are the amplitude and phase, respectively.
 Decoding function: The formula for decoding 
 from 
 can be described as
        where 
 and 
 are calculated as follows:
        where 
 has a twofold frequency relationship with respect to 
. We calculated the cosine of the difference between 
 and 
 to help restore the predicted angle:
        where 
 denotes the cosine value of the angular difference between two phases and was utilized to recover the ultimate predicted angle.
 The formula for restoring the predicted angle using 
 is
        where 
 is the predicted angle output by the network.