2.1. Locally Oriented Scene Complexity Analysis
Since ocean ship detection is sensitive and has a timeliness requirement, real-time processing is very important in military and civil applications. In addition, full scene ocean optical remote sensing images are usually divided into many blocks to achieve ocean ship detection, which is a popular method in real-time ocean ship detection frameworks. However, the ocean optical remote sensing image sub-blocks contain many situations. For example, these blocks include quiet sea, strong waves, heavy cloud cover, broken clouds, and so on. The various detection methods vary in performance under local conditions resulting in failure in some complex local scene situations, leading to missing targets or false alarms. If we can find a way to distinguish these complex local scene blocks, we can apply appropriate treatment to ensure a high overall detection performance. Therefore, for full scene ocean optical remote sensing images, we aim to improve both the speed of simple local scene detection and the accuracy of complex local scene detection in order to satisfy the real-time ship detection system requirements.
In this paper, the fast scene partitioning strategy is proposed to achieve the sub-block’s local scenes classifications. Due to the illumination problem in optical remote sensing images, we have chosen the texture descriptor to analyze the local scenes’ characteristics. Then, the blocks are defined according to whether they are simple or complex local scenes. Related to the texture characteristic description, first we calculate the gradient feature map from the gray level local scenes, which can be expressed as Equations (1)–(3):
In this instance, in Equations (1) and (2),
w and
h are the length and width of local scenes, respectively. Here,
BL is the input data quantization level.
Gh and
Gv are the horizontal and vertical gradients, respectively.
x and
y are the index coordinates in local scenes, and
Ix,y is the gray value of index coordinates (
x,
y). Then, the gradient feature map
G(
i,j) is calculated as in Equation (3). The texture feature would be expressed by summing the local gradient features, as shown in Equation (4).
In Equation (4),
w and
h are the size of local scenes.
G is the gradient feature map, which is generated from Equation (3). In this equation,
n is the size of the sliding window, which defines the summation of the gradient feature values of the gradient feature map.
Ir is the constructed texture feature map that can be used to avoid the illumination problem which can enhance optical remote sensing images without low contrast ratio. If ships, broken clouds, islands, and strong waves appear in local scenes, these objects can be highlighted in the constructed texture feature map because they have rich texture features. Therefore, the proposed locally oriented scene analysis method utilizes the OTSU thresholding algorithm [
21] to generate the binary image from the texture feature map to analyze local scenes’ characteristics. The texture feature map construction process [
22,
23,
24] is shown in
Figure 3, and the binary texture feature map analysis method is shown in
Figure 4.
In
Figure 3c, we can see that the texture feature map clearly shows the texture information of local scenes. Before applying the OTSU algorithm, the strong textures are set to “1”, and the weak textures to “0”. Here,
Figure 3d is the binary image, prepared for the local scene characteristic analysis.
Figure 4 shows an example of local scene characteristics analysis. It uses a 3 × 3 partitioning strategy to analyze nine cells texture information in part I. Then, it sums
cell(
N) to evaluate the texture quality, here,
N equal to 1 to 9. The evaluation processes can be expressed as Equations (5)–(7):
In Equation (5),
N is the index of the cell number, and (
i,
j) is the coordinate in the binary image
Ibinary. Then, Equation (5) can achieve the accumulation of each local scene’s spatial partition cells. The accumulation result Equation (6) is used to evaluate the significant textural information of the cells. When the accumulation is more than a third of one cell, this sub-cell is set to “1”, which is the textural statistical process, since there are “1 s” or “0 s” in the local scene textural distribution based on 3 × 3 partitioning. In Equation (7),
M is the total number of local scene partition cells, and
D is the distributional statistical value of the textural information appearing in each cell which can indicate whether a large variation of texture occurs in the current local scene. If this local scene has large scale texture, it would be classified as a complex local scene, and
D would be endowed with a large number following the definitions of Equations (5)–(7). Here, the local scene structure also affects the complexity judgment. The examples of structural complexity analysis are shown in
Figure 5.
From
Figure 5a,b, we can see that these local scenes have the same
D ratio, but (a,b) have significantly different structures. Then, the same phenomenon is seen in (c,d). Therefore, the object’s structure is complete or dispersed, which is also an important element in evaluating the complexity of local scenes. In this paper, Run-Length coding [
24] is employed to evaluate the object’s structural complexity index
R which is shown in Equation (8).
In Equation (8),
LRL is the length of the Run-Length coding [
24] which can investigate structure complexity of object’s binary images, and
A is the area of the object’s texture in the binary image.
W and
H are the size of the local scene.
D is the textural distribution ratio which was previously defined. We can use the index
R in Equation (8) to rapidly separate a full ocean optical remote sensing scene into local scenes. The index
R considers the textural spatial distribution and the structural complexity analysis of local scenes. To achieve fast scene partition,
R would be a flexible threshold value in order to meet the real-time processing system requirement. If the local scene has a smaller
R, it would be defined as a simple local scene, and if it has a larger
R, it would be defined as a complex local scene. Here, simple local scenes contain fewer targets and/or a quiet sea, and complex local scenes include large cloud cover, many broken clouds, and/or a fleet of ships. Due to the real-time processing requirement, the rapid saliency model (RSM) is applied to simple local scenes, and the accurate ship feature clustering model (SFCM) is applied to complex local scenes. Next, through the RSM and SDCM, the proposed ship detection framework can achieve the rapid and accurate extraction of suspected ship candidates.
2.2. RSM for Simple Local Scene Ship Candidate Extraction
We focus on improving the simple local scene ship detection speed to meet the real-time processing system constraint. In simple local scenes, the ship target is obviously different from the background. However, most methods use a complex detection algorithm model, which leads to significant processing time which is not suitable for a real-time processing system [
18,
19,
20]. In this section, a novel saliency model, RSM, is proposed. RSM comprehensively utilizes spatial and frequency domain information to generate the saliency map because of its good performance in simple local scenes [
25,
26,
27,
28]. The saliency map can be calculated using Equations (9)–(11).
In (9),
I is the defined simple local scene, and
fb(·) and
ft(·) are the bottom and top hat operations, respectively. The bottom and top hat operations can keep brightness and darkness objects in the transformation image and also enhance the difference between ships and background. These operations can prevent missing target ships caused by a weak gray level.
MI is the fusion image that combines the bottom hat and top hat features. We employ the Fourier transform
F(·) and
sign(·) to obtain the responses in the frequency domain. Next,
abs(·) and
F−1(·) are used to get the pulse responses, and the square operation enhances the pulse responses. Finally, the two-dimensional Gaussian low pass filter
G is employed to generate the saliency map, which can produce the ship candidates using the OTSU algorithm [
23]. An example performance of an RSM is shown in
Figure 6.
In
Figure 6, we can see that our proposed RSM has better performance with low contrast ships in simple local scenes. Furthermore, it is also valid for one ship target that is divided into several non-overlapping local scenes. The RSM for simple local scene ship detection is shown in
Figure 7. Where, we can see that the ship target and parts of ships are highlighted. The OTSU algorithm is applied to the saliency map for ship candidate extraction. All of the simple local scenes in
Figure 7 show that the RSM has a good performance for any sized ships and parts of ships in different local scenes. Therefore, using our proposed RSM can rapidly and accurately extract ship candidates. The quantitative analysis of the RSM will be discussed using detailed examples and the discussion section.
2.3. SFCM for Complex Local Scene Ship Candidate Extraction
For ship detection in complex local scenes, which is the most challenging task for ship detection, many methods perform poorly because of scene information interference. The major problem is focusing on the discriminative feature description of ocean ships [
29,
30,
31,
32,
33,
34,
35,
36]. For the SFCM, we build a synthetic information image that has three channels similar to RGB images to increase the descriptive dimension of each pixel feature vector. Then, each pixel of the synthetic information image includes gray, gradient, and textural feature information, which is shown in
Figure 8.
The gradient feature map that is generated by Equation (3) is added to the synthetic image, and the textural feature map that is produced from Equation (4) is also added to enhance the ship feature description. Then, the original one-dimensional feature is projected into the three-dimensional feature space. Which can better enable distinguishing the ship target from the background interferences. However, the high dimensional feature would increase the time taken for the ship feature clustering calculation. Therefore, in this paper, considering the real-time processing requirement, we use three-dimensional features to build the ship clustering model. Then, the proposed SFCM can be expressed as Equation (12).
In Equation (12),
a,
b, and
c are the feature clustering center coordinates and
G,
S, and
T are the values from the feature vector in the three-dimensional feature space.
d is the controlling distance between the input feature vector and the clustering center coordinate. Here, to automatically get the clustering center coordinate (
a,
b, and
c) and the controlling distance
d, we use the winner-take-all competitive learning rule to generate a one dimensional topological map using a two layers neural network. Here, the two layers neural network structure includes one input layer and one output layer. The input layer is used to receive the six dimension feature vector which consists of three pieces of synthetic information and three pieces of labeling information.
Figure 9 shows the process of one-dimensional topological map generation.
Then, we individually collected 36,000 pixel vectors with the labeled information of ships, clouds, and the sea surface, and used these collected pixel feature vectors to generate a one-dimensional topological map that follows the special loss function in Equation (13).
where
j is the index of the initialization neurons which are ships, clouds and the sea surface.
x is the input pixel feature vector with the labeled information.
f(
x) is the winner neuron which has the minimum Euclidean distance between the inputs and three competitive neurons. When we obtain the winner neuron, its weight is updated by Equation (14).
Here,
t is the iteration number of the whole training process, and
η(
t) is the learning rate, which changes iteratively.
hj,f(x(t)) is the topological neighborhood function for the weight updates. The learning rate and topological neighborhood function in Equation (14) can be defined by Equations (15) and (16), respectively.
In Equations (15) and (16), we used k-fold cross-validation method [
37,
38] to ensure and verify learning parameters. Here, “k” is set to 10, which is “10-fold cross-validation”. First step, we initialized
,
, and
as a small number range from 0 to 1. Then, the 36,000 pixel level feature vectors are separated as 10 sub-datasets. Nine of sub-datasets are used to train SFCM (i.e., automatically getting clustering center and controlling distance by one dimensional topologic map), and remaining sub-dataset is used to test trained SFCM performance. If the performance is not good,
would be adjusted. The second step involves changing the testing dataset for another sub-dataset and repeating the first step 10 times. Here,
is a very important parameter which has to be ensured to impact SFCM final performance. Other parameters such as
and
only affect the speed of convergence.
t is the iteration number. In Equation (15), we set
as 0.4.
η0 is the initial learning rate, which is set as 0.12. In Equation (16),
is the initial variance of the Gaussian function, which set as 0.3.
is set to 0.6. Through the processes that were mentioned before, we can automatically obtain the one-dimensional topological map. Next, we chose the updated competitive neuron of ships without labeled information as the clustering center coordinates in Equation (12). Then, we chose the minimum distance between the clouds or the sea surface compared with the ship’s neuron vector in the one dimension topological map as the controlling distance
d in Equation (12). Finally, we set up a SFCM in Equation (12) that can analyze and extract ship candidates pixel by pixel from the defined complex local scene.