Visual Detection and Association Tracking of Dim Small Ship Targets from Optical Image Sequences of Geostationary Satellite Using Multispectral Radiation Characteristics

: By virtue of the merits of wide swath, persistent observation, and rapid operational response, geostationary remote sensing satellites (e.g., GF-4) show tremendous potential for sea target system surveillance and situational awareness. However, ships in such images appear as dim small targets and may be affected by clutter, reef islands, clouds, and other interferences, which makes the task of ship detection and tracking intractable. Considering the differences in visual saliency characteristics across multispectral bands between ships and jamming targets, a novel approach to visual detecting and association tracking of dense ships based on the GF-4 image sequences is proposed in this paper. First, candidate ship blobs are segmented in each single-spectral image of each frame through a multi-vision salient features fusion strategy, to obtain the centroid position, size, and corresponding spectral grayscale information of suspected ships. Due to the displacement of moving ships across multispectral images of each frame, multispectral association with regard to the positions of ship blobs is then performed to determine the ﬁnal ship detections. Afterwards, precise position correction of detected ships is implemented for each frame in image sequences via multimodal data association between GF-4 detections and automatic identiﬁcation system data. Last, an improved multiple hypotheses tracking algorithm with multispectral radiation and size characteristics is put forward to track ships across multi-frame corrected detections and estimate ships’ motion states. Experiment results demonstrate that our method can effectively detect and track ships in GF-4 remote sensing image sequences with high precision and recall rate, yielding state-of-the-art performance.


Introduction
Sea target system awareness (especially ship detection and tracking) is one highly significant task for marine information surveillance, which has a wide range of applications, in the areas of maritime piracy, marine traffic, sea pollution, illegal fishing, irregular migration, defense and maritime security, border control, and so on [1].With the advancement of space-based remote sensing technology, satellite data with various modalities, including satellite automatic identification system (AIS), synthetic aperture radar (SAR), multispectral and hyperspectral optical sensors, and global navigation satellite system reflectometry (GNSS-R) [2], have become an important means of ship targets monitoring.Among them, the geostationary orbit (GEO) remote sensing satellite has attracted more and more attention because of the merits of wide-swath scanning, nearly real-time persistent observation, and high response sensitivity, and consequently has tremendous application potential in marine surveillance.The Chinese GF-4 satellite, launched in 2015, is equipped with a staring-imaging optical sensor, providing images with a spatial resolution of 50 m and wide coverage of 400 km × 400 km in the panchromatic and multispectral (PMS) bands.This video-like satellite has high-revisit observation capability with temporal resolutions of up to 20 s.The above-mentioned advantages enable detecting and tracking ships to obtain the motion parameters, such as geographic position, speed, course, and moving trajectory [3,4], which can assist decision making and guide low Earth orbit (LEO) satellites with a higher spatial resolution for further recognition and identification [5,6].
However, ships in GEO satellite images appear as dim small targets that possess fewer pixels, fewer features, and lower discrimination.Moreover, optical remote sensing images are usually affected by sea clutter, reef islands, clouds, and other interferences, which makes it intractable to detect and track ships.The current GF-4-image-based ship tracking strategy mainly involves two steps: (1) ships are detected in each single frame of GF-4 image sequences, and then (2) ship tracking is performed via data association across the detected results of multi-frame images [7].As for ship detection, the mainstream approaches developed in optical remote sensing images embody the methods based on grayscale-statistical features analysis, deep-learning-based methods, and methods utilizing the human vision system (HVS) mechanism [3].The first category of methods [6,[8][9][10] detects ships aided by the grayscale statistics of their hulls or wakes, which cannot work when reef islands, broken clouds, clutter, and target-intensive scenarios occur.Deeplearning-based methods [7,[11][12][13][14] can extract the middle-level and high-level features of targets through training and improve the robustness of target detection.Nevertheless, the limitation of deep learning for dim small target detection is the lack of texture, structure, and shape information, resulting in performance degradation of the trained detector.By contrast, HVS-based methods [15][16][17][18][19][20][21] have received great interest because of the sensitivity to local singularity and the selective attention mechanism, which fully utilize the intensity, contrast, color, shape, multiscale representation, and other characteristics to yield the visual saliency map and extract regions of interest from the whole scene.Yao et al. [5] designed a local saliency map algorithm based on the peak signal-to-noise ratio (PSNR) to detect the weak and small ships in GF-4 images.Likewise, a multiscale dual-neighbor difference contrast measure method was proposed in [3] to determine the positions of candidate ships, followed by false alarm removal via shape characteristics analysis.The deficiencies of these methods applied to GF-4 images lie in (1) they do not exploit multispectral radiation characteristics of ships that are distinguishable from those of other jamming targets, and more importantly, (2) they ignore the displacement of moving ships across multiple bands.Both these issues cause numerous false alarms.
As regards another essential procedure for ship tracking, data association aims to correlate the detected results across multi-frame images, which can assist in eliminating the false alarms and simultaneously reducing the missed alarms generated in the singleframe detection stage.Classical data association methods mainly include nearest neighbor (NN) and its variants [22], probability hypothesis density (PHD) [23], joint probabilistic data association (JPDA) [24], and multiple hypothesis tracking (MHT) [25][26][27] methods.The MHT method was introduced in [6] to track ships from GF-4 sequential images.By embedding the amplitude information of the near-infrared (NIR) band into the ships' motion model, an improved MHT approach was proposed by Yao et al. [5], which can further suppress the false alarms and achieve better tracking performance.In [28], the discriminative correlation filter with channel and spatial reliability [29] was adopted to track ships and correlate the results of multi-frame detections.Yu et al. [3] utilized the JPDA method to associate the detected ships from GF-4 time-series images, while [7] correlated the detected ships via the intersection over union (IoU) and then computed the similarity measurement to estimate the appearance stability of the ships in the detected trails.Nonetheless, these existing methods do not fully utilize the unique vision information (such as multispectral radiation and size characteristics) of each ship itself.This can lead to erroneous associations when dealing with scenes of dense ship targets and severely interfering targets (such as reef islands and broken clouds).
The motivation of our work is to develop a robust tracking method for dim small ship targets in GEO optical remote sensing image sequences, with higher precision and recall rate to tackle the dilemma of existing methods.By analyzing the visual saliency and motion features of ships across multispectral images, we propose a visual detection and association tracking method embedded with multispectral radiation characteristics, to track ships' trajectories from GF-4 PMS image sequences.First, for each single-spectral image of each frame, preliminary ship detection is performed through an image segmentation strategy that incorporates multi-vision salient features, to extract the centroid position, size (number of pixels), and corresponding intensity (mean grayscale) information of candidate ship blobs.Subsequently, considering the displacement of moving ships across multispectral images, multispectral association with regard to the positions of ship blobs is carried out to determine the final ship detection result in each frame.Meanwhile, because the geometric positioning error of the GF-4 satellite will affect the following multi-frame association and tracking accuracy, we utilize multimodal data association between GF-4 detected positions and AIS positions of ships to register each frame of GF-4 image and correct the detected positions.Last, an improved multiple hypothesis tracking model embedded with multispectral radiation and size characteristics is put forward for multi-frame association and trajectory tracking.
The rest of this paper is organized as follows: the methodological framework is described in detail in Section 2. In Section 3, we demonstrate the experiments and analysis of our method for ship tracking in GF-4 time-series images.Finally, discussions and conclusions are presented in Section 4.

Methodological Framework
The whole flowchart of the proposed visual detection and association tracking method for dim small ships is illustrated in Figure 1.At the preliminary stage of multi-source data preprocessing, sea-land separation and cloud removal are performed based on the normalized difference water index (NDWI) and thresholding segmentation [6], to extract sea region of interest (ROI) from GF-4 multispectral image sequences.The spatiotemporal interpolation of ships' AIS data is also implemented to derive ships' motion information (e.g., geographic positions) at the acquisition time of GF-4 sequential images.The motion information is used partly as ground control points (GCPs) of geometric correction and partly as performance verification of the proposed method.In order to facilitate subsequent processing and display, we adjust each single-band image of sea ROI to an 8-bit grayscale after data preprocessing.To sum up, the workflow mainly consists of four steps: (1) preliminary ship blobs segmentation, (2) multispectral association for final ship detection in each frame, (3) multimodal data association for position correction of detected ships, and (4) MHT tracking across multi-frame detections.The core techniques will be introduced below in detail.

Preliminary Ship Detection in Single-Spectral Image of Each Frame
As mentioned above, ships in the GF-4 geostationary satellite images possess a few pixels and low distinguishability (see Figure 2), which can be considered as dim small targets.As shown in Figure 2, the visual saliency of a ship generally depends on its multispectral radiation (i.e., intensity and color), contrast, size, and other characteristics.It should be noted that object detection methods based on the HVS mechanism have recently attracted more and more attention and demonstrated the highly robust performance for dim small targets detection from complex scenarios.Herein, we design a multi-vision salient features fusion strategy to segment candidate ship blobs from the single-spectral image of each frame, which mainly includes: (1) adaptive mean shift based on space-gray joint domain to generate a saliency map of grayscale features and simultaneously filter sea clutter and noises, (2) lateral inhibition network with protection domain to generate a saliency map of contrast features, (3) region growth based image segmentation of grayscale saliency map through seed points derived from the contrast saliency map, and finally (4) candidate ship blobs extraction aided by prior size features of ships.

Preliminary Ship Detection in Single-Spectral Image of Each Frame
As mentioned above, ships in the GF-4 geostationary satellite images possess a few pixels and low distinguishability (see Figure 2), which can be considered as dim small targets.As shown in Figure 2, the visual saliency of a ship generally depends on its multispectral radiation (i.e., intensity and color), contrast, size, and other characteristics.It should be noted that object detection methods based on the HVS mechanism have recently attracted more and more attention and demonstrated the highly robust performance for dim small targets detection from complex scenarios.Herein, we design a multi-vision salient features fusion strategy to segment candidate ship blobs from the single-spectral image of each frame, which mainly includes: (1) adaptive mean shift based on space-gray joint domain to generate a saliency map of grayscale features and simultaneously filter sea clutter and noises, (2) lateral inhibition network with protection domain to generate a saliency map of contrast features, (3) region growth based image segmentation of grayscale saliency map through seed points derived from the contrast saliency map, and finally (4) candidate ship blobs extraction aided by prior size features of ships.The basic idea of the mean shift algorithm lies in that: given an initial sample space, the density function can be calculated according to the sample points in the feature space, and the position where the function value is the largest is the solution [30].Let

{ }
, 1, 2, , denote the set of feature vectors in a d-dimensional feature space.

Adaptive Mean Shift with Space-Gray Joint Domain
The basic idea of the mean shift algorithm lies in that: given an initial sample space, the density function can be calculated according to the sample points in the feature space, and the position where the function value is the largest is the solution [30].Let The mean shift vector is given as where g( * ) = −K( * ).
In this paper, we utilize the Gaussian kernel function with a space-gray joint domain to process the GF-4 images.The spatial and gray feature vectors are adopted to estimate the corresponding feature bandwidths h s and h g .Based on such kernel function, the distribution of point ) on the GF-4 image can be estimated as denote the spatial part and the gray part of the feature vector, respectively; C is the normalization constant; and k( * ) is the profile function of the kernel K( * ).Correspondingly, the men shift vector is formulated as When utilizing the mean shift algorithm to obtain grayscale salient features, the spatial bandwidth determines the region range of the density gradient estimation of the current point.A too-small value may lead to ship target splitting or poor clutter/noise removal performance, while a too-large bandwidth may cause over-smoothing and loss of target features.In addition, the selection of spatial bandwidth also affects the iteration speed of the algorithm.Therefore, adaptive spatial bandwidth is adopted; that is, the spatial bandwidth is initialized as 2 according to the statistics of ships' size features, and the increment step is set to 1. Experiments show that, when the number of sample points whose grayscale values are similar to the currently smoothed point is less than half the number of all sample points in the spatial bandwidth, the corresponding spatial bandwidth is the optimal solution.The grayscale value of the sample point is considered to be similar to that of the currently smoothed point, provided that the grayscale difference is not more than 4. When processing the scenario with dense ship targets, smaller spatial bandwidth can preserve details and grayscale features.Otherwise, the larger the bandwidth, the better the smoothing effect.
Similarly, a larger grayscale bandwidth may cause over-smoothing, while a smaller bandwidth cannot better filter the noise and sea clutter.Generally, global optimal fixedrange bandwidth and adaptive-range bandwidth are two common bandwidth selection methods.Herein, we introduce the asymptotic mean integrated square error (AMISE) crite-rion commonly used by the plug-in rule to estimate the adaptive grayscale bandwidth [31], where d is the dimension of feature space, n is the number of pixels, σ j is the standard deviation, and g i , g denotes the pixel grayscale and mean grayscale, respectively.
The adaptive mean shift algorithm with a space-gray joint domain considers the space and grayscale information simultaneously, which can well preserve the intensity feature of one ship, filter sea clutter and noise, and yield a better grayscale saliency map without changing the size of the target.

Lateral Inhibition Network with Protection Domain
Due to the advantages of highlighting object edges and enhancing contrast, a lateral inhibition network [32] is employed to improve the contrast of dim small ships.The model used in our work is formulated as where e i,j , r i,j denote the input and output of the receptor cell (i, j), respectively; k ij,pq is the inhibition coefficient of the receptor cell (i, j) regarding the cell (p, q); and R is the radius of the inhibition domain.For the grayscale image, if the radius is 2 pixels, the model can be expressed as where g(i, j) denotes the grayscale of pixel (i, j), and k i (i = 1, 2) are inhibition coefficients.
Considering the intensity and size characteristics of dim small ships, we design a protection domain surrounding the current pixel to prevent target pixels from contributing to the output.In this study, we set the radius of the protection domain and the inhibition domain as 1 and 2, respectively, and the formula is as follows: As for the coefficients, let k i = α/R i , R 1 , R 2 be set as 2 and 3, respectively, and α is a constant, which is subject to the condition 1 + 16k 1 + 24k 2 = 0.The matrix of inhibition coefficients is a template with the size of 7 × 7, where 24 coefficients with value k 2 are distributed in the outermost circle, 16 coefficients with value k 1 are distributed in the secondary outer circle, and the central coefficient is 1.
Through convolution operation on the image using the coefficient matrix, the contrast saliency map is obtained.The position distribution of potential ships is then derived by thresholding the contrast saliency map.Based on such a coefficient template, the contrast features of small weak ships can be further enhanced, and meanwhile, the integrity of the target ontology can also be protected.As shown in Figure 3, compared with the conventional lateral inhibition network, the contrast features of most ships have been amplified, and almost the entire sea background information has been suppressed based on our lateral inhibition network with a protection domain.
trast saliency map is obtained.The position distribution of potential ships is then derived by thresholding the contrast saliency map.Based on such a coefficient template, the contrast features of small weak ships can be further enhanced, and meanwhile, the integrity of the target ontology can also be protected.As shown in Figure 3, compared with the conventional lateral inhibition network, the contrast features of most ships have been amplified, and almost the entire sea background information has been suppressed based on our lateral inhibition network with a protection domain.In summary, the combination of grayscale and contrast saliency can utilize the complementary advantages to weaken sea clutter and background interference, and diminish false alarms and missed alarms simultaneously.By introducing adaptive parameter selection and protection domain, the features of dim small ship targets can be well preserved or even enhanced.Under the guidance of the distribution of potential ships, we perform image segmentation on the grayscale saliency map.By virtue of the size characteristics of ships in the GF-4 satellite image, we can then extract the candidate ship blobs in each single-spectral image of each frame.Finally, the centroid position, mean grayscale, and size features of ship blobs in each band can be obtained.Herein, the image blobs whose sizes are between 2 and 75 pixels will be kept because of the possible contribution of ship In summary, the combination of grayscale and contrast saliency can utilize the complementary advantages to weaken sea clutter and background interference, and diminish false alarms and missed alarms simultaneously.By introducing adaptive parameter selection and protection domain, the features of dim small ship targets can be well preserved or even enhanced.Under the guidance of the distribution of potential ships, we perform image segmentation on the grayscale saliency map.By virtue of the size characteristics of ships in the GF-4 satellite image, we can then extract the candidate ship blobs in each single-spectral image of each frame.Finally, the centroid position, mean grayscale, and size features of ship blobs in each band can be obtained.Herein, the image blobs whose sizes are between 2 and 75 pixels will be kept because of the possible contribution of ship wakes.The whole workflow of the proposed dim small ship target detection method is given in Figure 4. wakes.The whole workflow of the proposed dim small ship target detection method is given in Figure 4.

Multispectral Association for Final Ship Detection
Almost all existing ship detection approaches based on GF-4 images only utilize single-spectral information, such as the NIR band, where the contrast between ships and sea background is the largest among five bands, which generates a large number of false alarms due to the presence of jamming targets (e.g., reef islands, broken clouds, and sea clutter).Furthermore, the accuracy and efficiency of the subsequent ship tracking will also be greatly decreased.Indeed, multispectral radiation characteristics are inherent attributes of the target itself, which are the salient features discriminable from those of other targets.As can be seen from Figure 5, the reefs and islands, which usually have larger contrast features in the NIR band may tremendously affect ship detection, and there are some static islands that are similar to ships in appearance.By contrast, the reef islands disappear in

Ship Positions Association across Multispectral Detections and across Multimodal Data 2.2.1. Multispectral Association for Final Ship Detection
Almost all existing ship detection approaches based on GF-4 images only utilize single-spectral information, such as the NIR band, where the contrast between ships and sea background is the largest among five bands, which generates a large number of false alarms due to the presence of jamming targets (e.g., reef islands, broken clouds, and sea clutter).Furthermore, the accuracy and efficiency of the subsequent ship tracking will also be greatly decreased.Indeed, multispectral radiation characteristics are inherent attributes of the target itself, which are the salient features discriminable from those of other targets.As can be seen from Figure 5, the reefs and islands, which usually have larger contrast features in the NIR band may tremendously affect ship detection, and there are some static islands that are similar to ships in appearance.By contrast, the reef islands disappear in the red and green bands, while the contrast of some weak ships becomes weaker in the red band until it disappears in the green band.So, ship detection in the single NIR band will bring many false alarms, while detection in other bands may cause missed alarms.For these reasons, we should make full use of the multispectral radiation information.Meanwhile, there is a common phenomenon of time lag between the acquisitions of different spectral images.As for GF-4 PMS images, the band-to-band time interval is as long as several seconds, and the time lag between the panchromatic and NIR bands is about 40 s [4], which results in a displacement of the detected positions across multispectral images (see Figure 6).The larger the ship's velocity, the greater the offset distance.Therefore, in order to determine the final ship detection results in each frame, it is still essential to perform multispectral association on the positions of ship blobs.Specifically, the related criteria in this paper are as follows: (1) the association is founded upon the positions of candidate ships in each single-spectral image of one frame; (2) positions of ship blobs in all the other bands should be associated to those in the NIR band; (3) fix a contrast threshold (e.g., 45), and define the candidate ship in the NIR band where the contrast value in the saliency map is larger than the threshold as a high-contrast target; (4) set a distance threshold (e.g., 15 pixels), and it is considered that the association cannot occur if the distance between two detections in two bands exceeds the threshold; (5) a candidate ship is discarded if it is a high-contrast target and is not associated with any detection in other bands; (6) if the detection in a band is associated with one in the NIR band, the mean grayscale of the band is retained, and if no association occurs, the corresponding grayscale of the band is marked as NULL; (7) the attribute features of the ultimately determined ships include their centroid positions in the NIR band, mean-grayscale vectors across mul- Meanwhile, there is a common phenomenon of time lag between the acquisitions of different spectral images.As for GF-4 PMS images, the band-to-band time interval is as long as several seconds, and the time lag between the panchromatic and NIR bands is about 40 s [4], which results in a displacement of the detected positions across multispectral images (see Figure 6).The larger the ship's velocity, the greater the offset distance.Therefore, in order to determine the final ship detection results in each frame, it is still essential to perform multispectral association on the positions of ship blobs.Specifically, the related criteria in this paper are as follows: (1) the association is founded upon the positions of candidate ships in each single-spectral image of one frame; (2) positions of ship blobs in all the other bands should be associated to those in the NIR band; (3) fix a contrast threshold (e.g., 45), and define the candidate ship in the NIR band where the contrast value in the saliency map is larger than the threshold as a high-contrast target; (4) set a distance threshold (e.g., 15 pixels), and it is considered that the association cannot occur if the distance between two detections in two bands exceeds the threshold; (5) a candidate ship is discarded if it is a high-contrast target and is not associated with any detection in other bands; (6) if the detection in a band is associated with one in the NIR band, the mean grayscale of the band is retained, and if no association occurs, the corresponding grayscale of the band is marked as NULL; (7)  As one of the most popular methods for data association, the globa (GNN) algorithm is adopted in this paper to associate across multispect across multimodal data.GNN assignment seeks the best association global cost, which allows each element j θ in set Θ to be assigned to i ψ in set Ψ .The optimization model can be formulated as

Multimodal Data Association for Ship Positions Correction
After the association across multispectral detections, the final ships their corresponding static attribute characteristics can be determined.Th motion and static attribute features greatly improves the reliability and As one of the most popular methods for data association, the global nearest neighbor (GNN) algorithm is adopted in this paper to associate across multispectral detections and across multimodal data.GNN assignment seeks the best association with the lowest global cost, which allows each element θ j in set Θ to be assigned to only one element ψ i in set Ψ.The optimization model can be formulated as where d ij = θ j − ψ i , n and m are separately the number of elements in the sets Θ and Ψ, δ is the distance threshold, and M ij denotes the association variable.M ij = 1 denotes one-to-one association, while M ij = 0 means no association.The association relationship in GNN assignment can be obtained by the Hungarian algorithm, which is an efficient and simplest method to solve the linear assignment problem.

Multimodal Data Association for Ship Positions Correction
After the association across multispectral detections, the final ships in each frame and their corresponding static attribute characteristics can be determined.The combination of motion and static attribute features greatly improves the reliability and accuracy of ship tracking.However, due to the systematic error of RPCs-based rectification, the precision of ships' positions is quite low, which may affect the performance of the ship tracker.To precisely correct the geometric positions, the commonly used affine transformation model is introduced, i.e., l = e 0 + e 1 l + e 2 s s where (l , s ) (l, s) are image coordinates obtained from RPC-based GCP projection and matching points in the GF-4 image, respectively.(e i , f i ), (i = 0, 1, 2) are the transformation coefficients.As there are few or even no GCPs on the sea surface, AIS information on ships will be an alternative choice.AIS is a type of cooperative self-reporting system that records ships' static information (e.g., identification number, type, length, and width) and motion information (e.g., longitude, latitude, course, and speed).In our work, a portion of AIS data is used for geometric correction.
Since the positions of detected ships in the NIR band are retained for the following ship tracking, a time lag of approximately 40 s between the acquisition time of the NIR band and the time given in the image files should be considered when performing the linear interpolation of AIS data.After the spatiotemporal unification of GF-4 detections and AIS data, GNN-based data association will also be employed to seek the optimal association matching of ship positions between multimodal data.To robustly correct the system error, we further utilize the random sample consensus (RANSAC) algorithm to eliminate the gross errors in point-pair association.Later, the precise geographic coordinates of ships in each frame can be transformed using the RPCs-based model and Equation (10).

Motion Modeling of Ships
Single-frame detection can only obtain the system distribution and attribute characteristics of ship targets.More importantly, for sea surveillance, it is necessary to track the motion trajectories of ships in order to predict the overall situation.Moreover, multi-frame data association can further reduce false alarms caused by jamming targets, as well as occasionally missed alarms in single-frame detection.Based on the attribute features of ultimately detected ships, we embed multispectral radiation and size information into the tracking framework in this paper.
The motion state of a ship at frame k is depicted as where λ k , ϕ k and v λ k , v ϕ k denote the geographic position and velocity components along the longitude and latitude directions, respectively; (g nir k , g r k , • • • ) represents the vector of multispectral grayscale values; and s k is the size.The speed and course over ground (relative to true north) can be calculated by the velocity components in the plane rectangular coordinate system.The measurement at frame k can be described as T , where lon k and lat k denote the longitude and latitude coordinates of the measurement, (gray nir k , gray r k , • • • ) is the mean-grayscale vector across multiple bands, and size k is the size of the measurement.The unit of longitude, latitude, and course is the degree ( • ), and the units of distance, time, and velocity are separately nautical mile (nm), hour (h), and knot (kn).
The state transition model used in our study can be formulated as where f k−1 is the state transition function, and G k−1 and v k−1 are the process noise matrix and process noise, respectively.Let Gaussian distribution with zero mean and covariance , where σ v is the standard deviation of process noise.The dead reckoning of a rhumb-line track from one point (λ 1 , ϕ 1 ) to another (λ 2 , ϕ 2 ) can be described as where T is the time interval, a ≈ 60 nm/ • denotes a constant that converts the distance (nm) to degree ( • ), and ϕ mid is the middle latitude.Consequently, the state transition equation can be expressed as The first-order extended Kalman filter (EKF) is adopted to predict and update ships' states in the tracking procedure as Equation ( 13) is nonlinear.
The measurement equation can be modeled as where where H k is the measurement matrix, w k denotes the measurement noise following the Gaussian distribution with zero mean and covariance R k , and σ p , σ g , and σ s are separately the standard deviations of position, grayscale, and size components.

MHT with Multispectral Saliency Characteristics
Known as the theoretically optimal data association algorithm, MHT generates alternative logical hypotheses to delay the decision making when measurement-track association conflicts so that the subsequent measurements can resolve the uncertainty.In this work, we propose an improved MHT tracker embedded with multispectral grayscale and size features for dim small ship tracking across multi-frame detections.The cumulative loglikelihood ratio is often used as the track score to evaluate the probability of track hypotheses, and the score of track j at frame k can be expressed as the following recursive form where ∆S j (k) is the increment of the score and can be calculated by where P D is the probability of detection, λ n and λ f are the spatial densities of new targets and clutter, respectively; S ij k denotes the residual covariance matrix; and d ij is the Mahalanobis distance.i = 0 means no association, while i > 0 represents track j is associated with measurement i at frame k.
Generally, the static attribute features of a target are independent of its motion information and also contribute greatly to association tracking.For static attribute features such as multispectral grayscale and size, we utilize the normal distribution function to measure the similarity between measurements and tracks.Therefore, the score increments produced by target features can be expressed as where ∆S j m (k) and ∆S j s (k) are separately the multispectral and size increments, m denotes the spectral band, gray m i and size i are the features of measurement i, g m j and s j are the predicted features of track j, and c is the constant probability for the posterior of the background (null) hypothesis.Ultimately, the overall score increment is the joint increment produced by motion, multispectral radiation, and size features, i.e., the sum of Equations ( 17) and (18).
At the stage of multi-frame association, since it is not guaranteed that the grayscale features of all bands can be extracted during ship detection, only the spectral components common to the measurement and the track will be used for association.Empirically, most ships detected in the NIR band can also be detected in the red band, so it is useful to balance performance and efficiency by utilizing grayscale information of these two bands to participate in association tracking.In addition, the maximum speed threshold for moving ships is essential to avoid infeasible data association, while the minimum speed constraint can further remove the jamming targets such as reefs and islands.

Research Area and Dataset
The GF-4 PMS sequential images composed of five frames of Level 1A data are selected to demonstrate the detection and tracking experiments of dim small ships.The imaging area is located in the East China Sea, as shown in Figure 7.The time range is from 03:47:24 to 03:59:47 (UTC) on 9 March 2017, and the time interval between adjacent frames is about 186 s.Each frame has five spectral bands (i.e., the NIR, red, green, blue, and panchromatic bands), each with a size of 10,240 × 10,240 pixels, and a spatial resolution of 50 m.Two ROIs bounded by red boxes are chosen for the experiments, with each ROI exceeding 100 km × 100 km coverage.Furthermore, the AIS dataset of ships is used for geometric position correction and performance verification, which covers the entire study area in space and time.
measure the similarity between measurements and predicted tracks.Therefore, the increments produced by target features can be expressed as where Δ ( ) S k and Δ ( ) S k are separately the multispectral and size increments denotes the spectral band, m i gray and i size are the features of measurement i and j s are the predicted features of track j , and c is the constant probability fo posterior of the background (null) hypothesis.Ultimately, the overall score increme the joint increment produced by motion, multispectral radiation, and size features the sum of Equations ( 17) and (18).At the stage of multi-frame association, since it is not guaranteed that the gray features of all bands can be extracted during ship detection, only the spectral compon common to the measurement and the track will be used for association.Empirically, ships detected in the NIR band can also be detected in the red band, so it is usef balance performance and efficiency by utilizing grayscale information of these two b to participate in association tracking.In addition, the maximum speed threshold for ing ships is essential to avoid infeasible data association, while the minimum speed straint can further remove the jamming targets such as reefs and islands.

Research Area and Dataset
The GF-4 PMS sequential images composed of five frames of Level 1A data ar lected to demonstrate the detection and tracking experiments of dim small ships.Th aging area is located in the East China Sea, as shown in Figure 7.The time range is 03:47:24 to 03:59:47 (UTC) on March 9, 2017, and the time interval between adjacent fr is about 186 s.Each frame has five spectral bands (i.e., the NIR, red, green, blue, and chromatic bands), each with a size of 10,240  10,240 pixels, and a spatial resolution m.Two ROIs bounded by red boxes are chosen for the experiments, with each RO ceeding 100 km × 100 km coverage.Furthermore, the AIS dataset of ships is used for metric position correction and performance verification, which covers the entire s area in space and time.

Results and Analysis
At the ship detection stage, the region growth threshold for image segmentation is 2 in this paper, and the other parameters are mostly adaptive or automatically calculated.Figure 8 shows a locally enlarged view of preliminary ship detection in the NIR band.There are many dense ships in Figure 8a, where some ships seem smaller and weaker.From Figure 8b, we can find that all the ships have been well detected.The visual effect of final ship detection about two ROIs in the first frame is presented in Figure 9, where the positions of the detected ships are marked with red circles.The size of a circle represents the size of a ship, while the grayscale of green color filled inside the circle means the average grayscale of a ship in the NIR band.We can learn that ships with stronger radiation characteristics typically have larger size features, and the distribution of ships becomes increasingly sparse as the size of ships increases.ROI1 is an inhomogeneous region with brighter intensity on the whole, which makes the contrast of ships seem lower.ROI2 contains some reef islands at the top-left corner of the image and a few broken clouds at the bottom of the image.

Results and Analysis
At the ship detection stage, the region growth threshold for image segmentation is 2 in this paper, and the other parameters are mostly adaptive or automatically calculated.Figure 8 shows a locally enlarged view of preliminary ship detection in the NIR band.There are many dense ships in Figure 8a, where some ships seem smaller and weaker.From Figure 8b, we can find that all the ships have been well detected.The visual effect of final ship detection about two ROIs in the first frame is presented in Figure 9, where the positions of the detected ships are marked with red circles.The size of a circle represents the size of a ship, while the grayscale of green color filled inside the circle means the average grayscale of a ship in the NIR band.We can learn that ships with stronger radiation characteristics typically have larger size features, and the distribution of ships becomes increasingly sparse as the size of ships increases.ROI1 is an inhomogeneous region with brighter intensity on the whole, which makes the contrast of ships seem lower.ROI2 contains some reef islands at the top-left corner of the image and a few broken clouds at the bottom of the image.The multispectral grayscale and size characteristics extracted after ship detection are the static attributes of ships, which vary quite slightly across multi-frame images, as shown in Figures 10 and 11. Figure 10 presents the grayscale variations of 10 ships in the NIR and red bands over multiple frames, where different marks denote different ships, and the red and green curves separately represent the grayscale values of NIR and red bands.It should be noted that the grayscale variation in the red band is generally much smaller than that in the NIR band, and the variation value is not more than 8.Moreover, some ships have larger grayscale values in the red band than in the NIR band.Similarly, the variation values of size features are usually not more than 4 pixels in the NIR band (see Figure 11).Further, we can find that the multispectral grayscale and size features of a ship are distinguishable from those of other ships.This is also the reason we embed these characteristics into the ship tracking model.The point pairs in the first frame after multimodal data association The multispectral grayscale and size characteristics extracted after ship detection are the static attributes of ships, which vary quite slightly across multi-frame images, as shown in Figures 10 and 11. Figure 10 presents the grayscale variations of 10 ships in the NIR and red bands over multiple frames, where different marks denote different ships, and the red and green curves separately represent the grayscale values of NIR and red bands.It should be noted that the grayscale variation in the red band is generally much smaller than that in the NIR band, and the variation value is not more than 8.Moreover, some ships have larger grayscale values in the red band than in the NIR band.Similarly, the variation values of size features are usually not more than 4 pixels in the NIR band (see Figure 11).Further, we can find that the multispectral grayscale and size features of a ship are distinguishable from those of other ships.This is also the reason we embed these characteristics into the ship tracking model.The point pairs in the first frame after multimodal data association and RANSAC processing are shown in Figure 12, which are uniformly distributed in the study area for position correction of the detected ships.The positioning errors differ within different sea areas.In the ship tracking procedure, the relevant parameters are set as follows: σ p = 0.002 • , σ v = 0.01 nm/h 2 , σ g = 10, σ s = 4, c = 0.1, P D = 0.95, λ f = 10 −11 , and λ n = 0. Figure 13 presents the results of ship detection and tracking based on the GF-4 image sequences.In Figure 13a,b, we show the ship detection results of the two ROIs in all five frames where the positions have been geometrically corrected.We note that there are still some false alarms and very few missed alarms in the detection results.Nevertheless, through the subsequent multi-frame association tracking, most of the false alarms have been removed, and simultaneously a few missed detections have been well estimated and appropriately filled.A group of five circles with the same color represent one ship's tracks generated by our tracker in Figure 13c,d, from where we can see that the tracking results can maintain high consistency with the interpolated AIS points.At last, the ships' trajectories in the two ROIs are presented in Figure 13e,f, where a red line denotes one trajectory, and the asterisk and diamond marks represent the starting and ending of a trajectory, respectively.less, through the subsequent multi-frame association most of the false alarms have been removed, and simultaneously a few missed detections have been well estimated and appropriately filled.A group of five circles with the same color represent one ship's tracks generated by our tracker in Figure 13c,d, from where we can see that the tracking results can maintain high consistency with the interpolated AIS points.At last, the ships' trajectories in the two ROIs are presented in Figure 13e,f, where a red line denotes one trajectory, and the asterisk and diamond marks represent the starting and ending of a trajectory, respectively.

Comparison and Evaluation
To validate the performance of our method for ship detection and tracking, quantitative evaluation metrics such as the precision and recall are employed, and some recently related methods are also used for comparison.The precision and recall can be calculated by

Comparison and Evaluation
To validate the performance of our method for ship detection and tracking, quantitative evaluation metrics such as the precision and recall are employed, and some recently related methods are also used for comparison.The precision and recall can be calculated by precision = N TP N TP +N FP recall = N TP N TP +N FN (19) where N TP , N FP and N FN are the number of true positives, false positives and false negatives, respectively.Figure 14 shows comparison of visual effects on ship detection, which contains scenarios with dense targets (Figure 14a), broken clouds (Figure 14f) and severe sea clutter (Figure 14k).The red string labels on the three GF-4 image chips denote groundtruth ship targets and their numbers.We can intuitively see that our method has fewer false and missed alarms than other methods.Table 1 presents the calculated measures about ship detection and tracking of two ROIs in five frames.N TP , N FP and N FN are determined by AIS data cross reference and manual identification.There are separately 86 ships and 39 ships in ROI1 and ROI2, whose lifetimes run through the entire image sequences.In total, 49 tracked ships can be verified by AIS data of ships.Again from Table 1, we can see that both false and missed targets can be further reduced effectively by MHT tracking.By comparison, our method outperforms other methods in terms of the total precision and recall, and seems to have a more prominent advantage in ship detection, which further demonstrates the effectiveness of integrating multispectral vision saliency into the methodological framework.The ship trajectories of two ROIs verified by AIS data are utilized to estimate the motion states.Figure 15 presents the proximity of motion states between the tracking results and AIS data in five frames.Table 2 shows the comparison of estimation error of motion states between our method and others, indicating the high accuracy and reliability of our method.The average estimation errors for location, speed, and course of our method can reach up to 83.2 m, 0.26 kn, and 2.24 • , respectively.

Conclusions
The GEO remote sensing satellite, which has the capabilities of wide swath, high revisit, and persistent observation, shows good application prospects in sea surveillance and situational awareness.Most existing methods of ship detection and tracking based on such satellite images will produce high false alarms and missed alarms, since ships appear as dim small targets, which are usually interfered with by sea clutter, reef islands, broken

Conclusions
The GEO remote sensing satellite, which has the capabilities of wide swath, high revisit, and persistent observation, shows good application prospects in sea surveillance and situational awareness.Most existing methods of ship detection and tracking based on such satellite images will produce high false alarms and missed alarms, since ships appear as dim small targets, which are usually interfered with by sea clutter, reef islands, broken clouds, and so on.Moreover, erroneous associations are prone to occur when tracking dense ship targets.Through analyzing the multispectral radiation and motion characteristics of marine ships, we found that the reason for the current dilemma is that the issue of modeling the time-series characteristics of dim small targets has not been effectively resolved.By introducing visual saliency characteristics across multispectral bands, we can (1) capture or even enhance the visual self-attention of dim small ships, significantly reducing false alarms and missed alarms while suppressing background interferences; and (2) robustly track ship trajectories in scenarios with dense targets and high sea states, further improving the precision and recall rate of ship tracking.
Overall, the contributions of our work are as follows: (1) developing a feature fusion method based on multi-vision saliency to segment candidate blobs of dim small ship targets from GEO remote sensing images; (2) constructing the criteria of multispectral association to determine the final results of ship detection; (3) embedding multispectral visual saliency into the motion model and MHT tracker.Experiments and comparisons demonstrate the high performance and reliability of our approach.
Nevertheless, due to the high mobility of targets on the sea, the tracking performance will be greatly diminished when the course or speed of the ship suddenly changes significantly, which is the biggest limitation of our method and also a key factor that puzzles existing methods.Hence, more robust and reliable motion modeling for ship states prediction and updating can ameliorate the accuracy of ship tracking.Motion parameter estimation across multispectral bands in an image frame is also useful for multi-frame association tracking.In addition, space-based multimodal remote sensing data (e.g., satellite videos, hyperspectral, and SAR images) fusion can utilize the complementary advantages to better serve marine vessel surveillance.These are the potential study areas we will focus on in the future.
gave advice for the preparation revision of the paper.All authors have read and agreed to the published version of the manuscript.

22 Figure 1 .
Figure 1.The flowchart of the proposed method.

Figure 2 .
Figure 2. Moving ships in GF-4 sequential image chips (displayed as false-color composites) and corresponding automatic identification system (AIS) information.The lifetimes of two ships in two red boxes run through the entire image sequences, while the ship in yellow box is treated as a new target.
denote the set of feature vectors in a d-dimensional feature space.The density function at point → z can be estimated by the kernel function K( * ) and bandwidth h:

Figure 3 .
Figure 3.Comparison of the contrast saliency maps of GF-4 image: (a) result generated by conventional lateral inhibition network; (b) result by ours with protection domain.

Figure 3 .
Figure 3.Comparison of the contrast saliency maps of GF-4 image: (a) result generated by conventional lateral inhibition network; (b) result by ours with protection domain.

Figure 4 .
Figure 4.The whole workflow of the proposed dim small ship target detection method.

Figure 4 .
Figure 4.The whole workflow of the proposed dim small ship target detection method.

Figure 6 .
Figure 6.The displacement of moving ships' positions across multispectral im ment distances of ships in three red boxes vary depending on their velocities.
n and m are separately the number of elemen and Ψ , δ is the distance threshold, and ij M denotes the association denotes one-to-one association, while = 0 ij M means no association.T lationship in GNN assignment can be obtained by the Hungarian algor efficient and simplest method to solve the linear assignment problem.

Figure 6 .
Figure 6.The displacement of moving ships' positions across multispectral images.The displacement distances of ships in three red boxes vary depending on their velocities.

Figure 7 .
Figure 7. Research area and dataset coverage.

Figure 8 .
Figure 8. Locally enlarged result of preliminary ship detection in the NIR band: (a) local view of NIR band; (b) detection result in the NIR band.Figure 8. Locally enlarged result of preliminary ship detection in the NIR band: (a) local view of NIR band; (b) detection result in the NIR band.

Figure 8 .
Figure 8. Locally enlarged result of preliminary ship detection in the NIR band: (a) local view of NIR band; (b) detection result in the NIR band.Figure 8. Locally enlarged result of preliminary ship detection in the NIR band: (a) local view of NIR band; (b) detection result in the NIR band.

Figure 9 .
Figure 9. Visual effect of final ship detection in two regions of interest (ROIs): (a) result of ROI1; (b) result of ROI2.

Figure 9 .Figure 10 .
Figure 9. Visual effect of final ship detection in two regions of interest (ROIs): (a) result of ROI1; (b) result of ROI2.Remote Sens. 2023, 15, x FOR PEER REVIEW

Figure 10 .
Figure 10.Multispectral grayscale variations of 10 ships over multiple frames.Different marks denote different ships.

10 .
Multispectral grayscale variations of 10 ships over multiple frames.Different mark denote different ships.

Figure 11 .
Figure 11.The size variations of 10 ships over multiple frames.Lines with different colors and marks denote different ships.

Figure 12 .
Figure 12.The point pairs in the first frame used for geometric correction.The blue lines conn ing AIS points and GF-4 points denote the associations between point pairs.

Figure 11 .
Figure 11.The size variations of 10 ships over multiple frames.Lines with different colors and marks denote different ships.

Figure 10 .
Figure 10.Multispectral grayscale variations of 10 ships over multiple frames.Different marks denote different ships.

Figure 11 .
Figure 11.The size variations of 10 ships over multiple frames.Lines with different colors and marks denote different ships.

Figure 12 .
Figure 12.The point pairs in the first frame used for geometric correction.The blue lines conn ing AIS points and GF-4 points denote the associations between point pairs.

Figure 12 .
Figure 12.The point pairs in the first frame used for geometric correction.The blue lines connecting AIS points and GF-4 points denote the associations between point pairs.

Figure 13 .
Figure 13.The results of ship detection and tracking in GF-4 image sequences: (a,b) are ship detection results of two ROIs in all five frames; (c,d) are ship tracking results of two ROIs along with AIS data for verification; (e,f) are ships' trajectories in the two ROIs.Different colors of circles denote trajectories of different ship targets in the subfigure (c,d).

Figure 13 .
Figure 13.The results of ship detection and tracking in GF-4 image sequences: (a,b) are ship detection results of two ROIs in all five frames; (c,d) are ship tracking results of two ROIs along with AIS data for verification; (e,f) are ships' trajectories in the two ROIs.Different colors of circles denote trajectories of different ship targets in the subfigure (c,d).

Figure 14 .
Figure 14.Comparison of visual effects on ship detection: (a,f,k) are locally enlarged GF-4 image chips; (b,g,l) are corresponding results of Liu et al. [6]; (c,h,m) are results of Yao et al. [5]; (d,i,n) are results of Wang et al. [7]; (e,j,o) are results of ours.

Figure 14 .
Figure 14.Comparison of visual effects on ship detection: (a,f,k) are locally enlarged GF-4 image chips; (b,g,l) are corresponding results of Liu et al. [6]; (c,h,m) are results of Yao et al. [5]; (d,i,n) are results of Wang et al. [7]; (e,j,o) are results of ours.

Figure 15 .
Figure 15.The proximity of motion states between the tracking results and AIS data in five frames: (a) location error between estimated positions and AIS positions; (b) estimated speed with regard to AIS speed; (c) estimated course with regard to AIS course.

Figure 15 .
Figure 15.The proximity of motion states between the tracking results and AIS data in five frames: (a) location error between estimated positions and AIS positions; (b) estimated speed with regard to AIS speed; (c) estimated course with regard to AIS course.

Table 1 .
Metrics comparison of ship detection and tracking.

Table 1 .
Metrics comparison of detection and tracking.

Table 2 .
Comparison of estimation error of motion states.

Error Average Error of ROI1 Average Error of ROI2 Total Error of Our Method Total Error of Yao et al. [5] Total Error of Liu et al. [10]
Remote Sens. 2023, 15, x FOR PEER REVIEW 20 of 22 of our method.The average estimation errors for location, speed, and course of our method can reach up to 83.2 m, 0.26 kn, and 2.24°, respectively.(a)

Table 2 .
Comparison of estimation error of motion states.