3.2.1. Differential Histogram of Oriented Gradient Algorithm
After saliency detection, salient regions are enhanced while most redundant background information is suppressed. However, saliency maps may still contain both true targets and noise, particularly in complex backgrounds with strong edges or textured structures. Therefore, an effective discrimination mechanism is required to distinguish targets from background clutter.
To address this issue, we propose a Differential Histogram of Oriented Gradients (DHOG) algorithm, which differentiates targets from background regions by analyzing disparities in HOG features.
First, a sliding window is constructed to scan the entire image from left to right and from top to bottom, as illustrated in
Figure 3e. The sliding window consists of
cells of equal size. The central cell represents the candidate target region, denoted as
, while the eight surrounding cells correspond to background regions, commencing sequentially from the top-left corner and labeled clockwise as
to
. The size of each cell is selected to approximately match the expected target size (e.g., (
) pixels). To measure local contrast within the sliding window more precisely, the window is partitioned into block regions. Each block is formed by combining the central cell with one of the surrounding background cells, resulting in eight block regions in total. For example, the combination of the central cell and the top-left background cell forms the first block, denoted as
; the remaining blocks are defined analogously.
The sliding window is moved across the image with a predefined step size
in both horizontal and vertical directions. For each cell within the sliding window—including the central cell and the eight surrounding cells—the HOG feature vector is computed. Let
denote the HOG feature of the central cell and
to
denote the HOG feature of the eight background cells. The HOG difference for each block is defined as
:
where
represents the (
) norm.
To further enhance target responses while suppressing background clutter and noise, we introduce the Differential Histogram of Oriented Gradients (DHOG) operator. When the sliding window is positioned at a given location
, the DHOG response of the central cell is defined as Equation (
11):
where
denotes the mean of the HOG difference values across the eight block regions:
and
denotes their variance, as shown in Equation (
13):
By jointly considering the mean and variance of local HOG differences, the DHOG operator effectively emphasizes structural discrepancies between targets and their surrounding background, thereby improving candidate object extraction performance.
The idea of using traditional algorithms to extract features for detection assistance has been studied in ship detection. Ke et al. [
44] proposed a Laplace and LBP feature guided SAR ship detection method with an adaptive feature enhancement block. Compared with this method, a brief discussion of how the proposed DHOG approach differs. Key distinctions are as follows: (1) our method operates on infrared (rather than SAR or optical) imagery; (2) we exploit HOG orientation histograms rather than LBP or Laplacian responses, making the method insensitive to absolute intensity but sensitive to local gradient structure; and (3) DHOG is applied at the candidate extraction stage as a lightweight filter, whereas the cited methods use similar features as auxiliary inputs to deep networks. Our paper further introduces spatiotemporal features to better exploit the information from sequential images for extracting moving ship targets.
3.2.2. Analysis of DHOG
- (1)
Analysis of Target’s DHOG Characteristics
The experimental dataset used in this study consists exclusively of infrared (IR) images; therefore, the following analysis focuses on infrared imaging mechanisms and target characteristics. All objects with temperatures above absolute zero (−273.15 °C) emit infrared radiation, and the emitted energy is positively correlated with temperature. Infrared radiation is a form of electromagnetic wave, typically with wavelengths ranging from 0.78 m to 1000 m, corresponding to a frequency range of approximately 300 THz to 400 THz. In infrared imaging systems, temperature differences between objects and their surroundings are converted into grayscale variations in the captured image.
As man-made objects, ship targets generally exhibit higher temperatures than their natural surroundings. This observation also applies to other artificial targets, such as vehicles, aircraft, and missiles, which often contain high-temperature components (e.g., engines or heat sources). These components produce stronger infrared radiation, resulting in higher grayscale values compared with the background. However, under certain conditions—such as the presence of cooling systems, long imaging distances, or low ambient temperatures—the target may appear darker than the surrounding background. Such targets are referred to as dark targets. The thermal contrast between target and background leads to distinctive grayscale distributions in infrared images. Target grayscale values tend to concentrate within a relatively narrow range, whereas background grayscale values are typically more dispersed. Furthermore, atmospheric refraction, optical defocusing, and lens aberrations often cause small infrared targets to appear as approximately circular Gaussian-like point sources. Numerous studies have shown that small targets can be effectively modeled using a two-dimensional Gaussian point spread function (PSF), expressed as Equation (
14):
where
denotes the peak amplitude of the target;
and
represent the dispersion in the horizontal
x and vertical
y directions, respectively; and
denotes the target center. Smaller values of
and
indicate a more compact energy distribution.
For small ship targets, geometric and observational factors introduce anisotropy in the energy distribution. In particular, variations in viewing angle lead to differences in dispersion in the horizontal and vertical directions, which constitute a distinctive characteristic of ship targets. As illustrated in
Figure 4, the point spread effect, digital discretization, and approximate three-dimensional energy distributions differ depending on whether the ship is observed from a head-on or side-view perspective.
After digital discretization, a ship observed from a head-on perspective typically appears as an approximately isotropic point target. The Gaussian profiles projected along the x- and y-axes exhibit similar amplitudes and dispersions, which is consistent with the commonly adopted isotropic Gaussian model in many existing algorithms. In contrast, when the ship is observed from a side view, the target appears elongated and approximately elliptical. Although the peak amplitude remains similar in both directions, the dispersions and differ, reflecting the anisotropic structure of the ship.
Figure 5 presents several ship targets and their corresponding three-dimensional energy distributions, redder regions stand for higher energy close to the peak. The target energy profiles are fitted using the Gaussian model described above, and the estimated parameters
,
and
are summarized in
Table 1. For example, the parameters in the first row of
Table 1 are the Gaussian model fitting results for the target shown in Image 1 of
Figure 5.
Based on the fitting results and the measured target sizes in the images, parameter is primarily associated with the grayscale contrast between the ship target and the background. Parameters and characterize the spatial energy distribution along the horizontal and vertical directions, respectively, and are influenced by the target orientation and the angle between the ship and the camera optical axis. Specifically, under side-view observation, is approximately one-sixth of the ship length, while is approximately one-sixth of the ship height. Their ratio reflects the length-to-height ratio of the ship. Under head-on observation, is approximately one-sixth of the ship width and is approximately one-sixth of the ship height; their ratio corresponds to the width-to-height ratio. For intermediate viewing angles, typically lies between one-sixth of the ship width and one-sixth of the ship length, whereas remains close to one-sixth of the ship height.
These observations are consistent with the energy distribution patterns shown in
Figure 4. Therefore, for head-on views, a circular point template is appropriate for target modeling, whereas for side-view observations, an elliptical template provides a more accurate representation. In both cases, the DHOG features extracted from these target templates differ significantly from those of the surrounding background.
Based on this analysis, the following section examines the DHOG characteristics of circular bright and dark targets under front-view conditions, as well as elliptical targets observed from side-view perspectives.
- (a)
Circular Bright Target
Figure 6a presents an infrared image with sea and sky as the background, in which small moving ship targets appear as bright point-like objects. Owing to their relatively strong thermal radiation, ship targets typically manifest as bright regions in infrared imagery. At long imaging distances or under head-on viewing conditions, a ship target may appear approximately circular and point-like. A
pixel region centered at the target centroid is magnified in
Figure 6b, and its three-dimensional intensity distribution is shown in
Figure 6d. The target can be well approximated by a two-dimensional Gaussian surface, where the maximum grayscale value occurs at the center and gradually attenuates toward the periphery with approximately isotropic decay rates.
Figure 6c illustrates the gradient vector field within this local region. The arrow length represents the gradient magnitude at the corresponding pixel (located at the arrow tail), while the arrow direction indicates the orientation of the image gradient. As shown in
Figure 3, the region is divided into
cells of equal size, and the HOG descriptor is computed for each cell. With an orientation interval of 20°, each cell is represented by a 9-dimensional HOG vector. The corresponding HOG distributions are plotted in
Figure 6e, where the thick blue solid line denotes the HOG of the central cell and the thinner colored lines represent the HOG curves of the surrounding background cells.
It can be observed that the HOG response of the central cell is significantly higher than that of the background cells. This is because the target region exhibits a pronounced grayscale gradient from the center toward the surrounding area, as described by the quasi-two-dimensional Gaussian model. The resulting gradient magnitudes are relatively large and distributed in a nearly uniform manner across all directions, leading to a strong and evenly distributed HOG response in the central cell.
- (b)
Circular Dark Target
Although ship targets are generally bright in infrared images, certain scenarios may lead to lower grayscale values relative to the background. This situation may arise when the background is comparatively brighter (e.g., due to high-altitude clouds) or when the target includes cooling mechanisms. In such cases, the target appears as a dark circular point and is referred to as a dark target. The proposed algorithm is equally applicable to dark targets; therefore, their characteristics are also analyzed.
Figure 7a shows an infrared image with the sky as the background, containing altocumulus clouds with relatively uniform yet bright clustered textures. When a ship target enters such a region, its relative grayscale becomes lower than that of the background, resulting in a dark target appearance. A
pixel region centered at the target centroid is extracted for analysis. Its grayscale distribution and three-dimensional representation are shown in
Figure 7b and
Figure 7d, respectively. The target still conforms approximately to a two-dimensional Gaussian model; however, in this case, the grayscale value at the center is lower than that of the surrounding area. The intensity gradually increases from the center toward the periphery with approximately isotropic variation. The gradient vector field is shown in
Figure 7c. It can be observed that gradient vectors in the outer background region are shorter and exhibit diverse orientations, whereas vectors within the central target cell are longer and predominantly point toward the center, reflecting stronger and more coherent gradient structures.
After computing the HOG descriptors for the nine cells in
Figure 7b, the resulting HOG curves are shown in
Figure 7e. Similar to the bright target case, the HOG of the central cell indicated by the thick blue line is significantly higher than that of the surrounding cells. Moreover, since the dark target closely approximates an ideal two-dimensional Gaussian distribution, the HOG response in the central cell is relatively uniform across orientation bins.
- (c)
Elliptical Bright Target
Figure 8a presents an infrared image of a ship target observed from a side-view perspective. As shown in the magnified local region in
Figure 8b, the target exhibits an elliptical shape with high grayscale values. Its three-dimensional intensity distribution is shown in
Figure 8d. The centroid corresponds to the maximum grayscale value, which gradually attenuates toward the boundary. Unlike the circular case, the attenuation rates differ along different directions. Specifically, the decay is slower along the longitudinal direction of the ship, that is along the direction of motion, and faster along the transverse direction, resulting in an anisotropic energy distribution.
Figure 8c illustrates the gradient vector field in the target region, and
Figure 8e shows the HOG curves for the nine cells. It can be observed that orientation bins corresponding to the longitudinal direction of the ship (approximately 0–20° and 160–180°) exhibit relatively high responses, with the maximum typically occurring in bin 1 or bin 9. This phenomenon is consistent with the anisotropic structure of the elliptical target, where dominant gradients align with the principal axes of the ship. It can be observed that orientation bins near and exhibit relatively high responses, with the maximum typically occurring near. This phenomenon is consistent with the anisotropic structure of the elliptical target, where dominant gradients align with the principal axes of the ship.
In contrast, the HOG responses in the surrounding background cells are generally weaker and irregularly distributed, making it difficult to characterize them with a consistent structural pattern. A more detailed analysis of background behavior will be provided in the
Section 3.2.2 part (2).
- (2)
Analysis of Target’s DHOG Characteristics
Ship target detection in sea–sky scenes is often challenged by various forms of background clutter. When a target appears against a relatively homogeneous background, the grayscale contrast between the target and its surroundings is pronounced, resulting in minimal interference during detection. However, in practical scenarios, infrared images frequently contain complex and heterogeneous backgrounds, which can generate false alarms and significantly reduce the signal-to-clutter ratio (SCR) [
45], thereby increasing detection difficulty. Typical background types include flat sea surfaces, tree-lined or mountainous coastal regions, buildings, clouds, and turbulent sea waves. The DHOG characteristics of each background type are analyzed below.
- (a)
Flat Backgrounds
Flat background regions exhibit minimal grayscale variation, as illustrated by the calm sea surface in
Figure 9a. Under such conditions, the sea remains relatively smooth with limited wave activity. Due to the absorption and scattering properties of seawater, the emitted infrared radiation is relatively weak. Moreover, the temperature distribution across the sea surface is generally uniform, resulting in the absence of significant thermal gradients and an overall dark appearance in the infrared image. The grayscale distribution is shown in
Figure 9b, and the corresponding three-dimensional representation is provided in
Figure 9d. Pixel intensities in this region are consistently low, with only slight differences between adjacent pixels. Consequently, the grayscale difference between the central cell and its neighboring cells is negligible.
The gradient vector field in
Figure 9c shows short arrows with irregular orientations, indicating weak and incoherent gradient structures. The HOG curves for the nine cells are presented in
Figure 9e. The HOG curve of the central cell marked with bold blue line is embedded among the curves of the surrounding cells, and all curves exhibit similar trends. This indicates that the DHOG response in flat background regions is weak and lacks distinctive structural characteristics.
- (b)
Tree-Lined Backgrounds
In nearshore or island environments, backgrounds often include mountains, trees, and vegetation, which can produce strong interference in ship detection tasks. Vegetation typically emits relatively high levels of infrared radiation and may appear bright in the image, whereas mountainous terrain often exhibits lower surface temperatures and appears darker. In addition, trees and vegetation possess complex textures and structural patterns. Combined with shadowing, reflection, and scattering effects, these characteristics produce alternating bright and dark regions in the image. As shown in
Figure 10a, mountainous areas may exhibit relatively uniform grayscale distributions, whereas vegetation regions display irregular and chaotic intensity variations, often generating localized bright spots that resemble small targets.
Due to variations in tree size and density, these bright spots may appear elongated or irregularly shaped. The gradient magnitude and orientation vary randomly from the center of such bright regions outward. This randomness leads to similar HOG distributions for both the central and surrounding cells, as shown in
Figure 10e. Consequently, tree-lined backgrounds do not produce the structured and concentrated DHOG responses typically associated with true ship targets.
- (c)
Building Backgrounds
In coastal scenes, building structures frequently appear in the background, as illustrated by the region marked in
Figure 11a. Building surfaces partially reflect solar radiation, resulting in specific brightness patterns. However, differences in construction materials across building components lead to variations in thermal capacity and conductivity, causing spatially heterogeneous temperature distributions in infrared imagery. Furthermore, due to their structural thickness, buildings exhibit significant thermal inertia, meaning that temperature changes occur more gradually compared with vegetation. Nevertheless, sharp temperature gradients may appear along structural edges. As shown in
Figure 11b, the edge region of a building displays a step-like grayscale distribution, with lower intensities on one side and higher intensities on the other. The gradient vector field in
Figure 11c shows relatively long arrows near the edge, oriented toward regions of lower grayscale values. Cells located along the edge as cells
,
, and
, exhibit similar HOG distributions, with gradients concentrated around bin 5 corresponding to the edge normal direction. In contrast, HOG distributions in non-edge cells are relatively flat.
This directional concentration differs from the isotropic distribution observed in circular ship targets and the anisotropic yet coherent distribution of elliptical targets.
- (d)
Cloud Backgrounds
Clouds and fog exhibit diverse shapes and structural characteristics, resulting in significant variations in morphology, brightness, and texture. Thick and dense cumulus or stratus clouds often appear brighter, whereas thinner cirrus clouds tend to appear darker. In general, cloud tops are warmer and brighter, while cloud bases are cooler and darker. Cirrus clouds frequently exhibit filamentous or banded structures with fine textures.
Figure 12a shows an infrared image with clouds as the dominant background. The central regions of large cloud clusters may exhibit relatively smooth textures and can sometimes be approximated as flat backgrounds. However, significant intensity fluctuations occur near cloud edges or within smaller cloud clusters. As shown in
Figure 12b, grayscale values change abruptly in the direction perpendicular to the cloud boundary. The maximum image gradients are located along these edges. Consequently, the gradient orientations of cells
,
, and
, positioned near the edge are strongly aligned with the normal direction of the cloud boundary, resulting in pronounced peaks in the corresponding HOG bins.
This highly directional gradient distribution differs from the more uniformly distributed gradients of Gaussian-like ship targets.
- (e)
Turbulent Sea Wave Backgrounds
In contrast to the calm sea surface described earlier, variations in wind speed and sea temperature can produce turbulent sea conditions characterized by pronounced fluctuations and complex textures, as shown in
Figure 13a. Solar radiation interacting with wave crests generates specular and diffuse reflections, producing alternating bright and dark streaks or spots in the infrared image. Turbulent mixing of seawater also induces localized temperature variations, leading to irregular brightness distributions. In the
region shown in
Figure 13b, several bright point-like structures are visible. These structures can easily trigger false alarms, particularly when one appears within the central cell, producing a spike-like response in the HOG curve as shown in
Figure 13e. However, these interfering bright spots vary in size and often deviate from an ideal isotropic two-dimensional Gaussian model. Their gradient structures lack the coherent and symmetric properties of true ship targets. As a result, although localized HOG responses may be elevated, the overall HOG patterns are not significantly distinguishable from those of neighboring background cells.
Overall, the above analysis demonstrates that background regions typically exhibit either weak, irregular, or highly directional gradient structures. In contrast, true ship targets produce coherent and structurally consistent HOG patterns. This distinction provides the theoretical foundation for the effectiveness of the proposed DHOG-based discrimination method.
3.2.3. Multi-Scale Differential Histogram of Oriented Gradient Algorithm
In summary, the characteristics of small maritime targets can be described as follows: the target information is concentrated within a very limited number of pixels, and the targets typically lack clear contours and distinct shape features. The grayscale values within the target region are relatively uniform, and the contrast between the target and the surrounding background is generally low. In infrared imagery, such targets usually appear as bright or dark point-like structures or small elliptical regions. Therefore, the differences in the Histogram of Oriented Gradients (HOG) between the target region and the surrounding background can effectively distinguish the target from most background clutter. The candidate target extraction algorithm based on the Differential Histogram of Oriented Gradients (DHOG) can generate single-scale HOG maps. The detailed procedure is described in Algorithm A1.
Ideally, the cells illustrated in
Figure 3 should have a size comparable to that of the target in order to achieve optimal detection performance when the central cell
contains the target. However, in practical scenarios, the size of the target may vary due to differences in target distance, motion, and imaging conditions. A multi-scale design is introduced to address this issue. The proposed method utilizes gradient features derived from multi-scale HOG differences to perform saliency detection while effectively suppressing background clutter.
Specifically, different scales are considered. Cells with different sizes form sliding windows of corresponding dimensions and scan the entire image independently. For each pixel location, the maximum response among the different scales is selected as the final value of the multi-scale DHOG response map. The detailed procedure is described in Algorithm A2, where the input image has a size of .
Subsequently, a fusion feature map
is generated by multiplying the saliency map
S with the multi-scale DHOG response map
, which further enhances the target response:
Finally, the target location is determined using an adaptive threshold, whose calculation is given in Equation (
16):
where
denotes the mean intensity of the image,
represents the standard deviation of the image, and
k is a scaling factor. Based on empirical engineering experience,
k is typically set to 3.
Since the candidate targets obtained after binarization may vary in size and shape, the target region is further analyzed using a fixed-size neighborhood centered at the centroid of each candidate region. The centroid
is computed as Equation (
17):
where
denotes the coordinates of the i-th pixel in the region,
represents the grayscale intensity of that pixel, and
M is the total number of pixels within the region.
Based on the above procedure, the complete candidate target extraction process can be implemented as described in Algorithm A3.
The multi-scale Differential Histogram of Oriented Gradients (DHOG) algorithm is designed to adapt to multi-scale small ship targets in sea–sky backgrounds, and the selection of its key parameters is based on the statistical characteristics of the target size in the experimental dataset and the balance between detection performance and computational complexity. Specifically, the cell size is set to be similar to the minimum target size (3 × 3 pixels) in the dataset to ensure effective capture of target gradient features; the maximum number of scales () is set to 5, covering the target size range (3 × 3 to 15 × 15 pixels) in the experimental sequences. The candidate cell sizes are 3 × 3, 5 × 5, 7 × 7, 9 × 9, and 11 × 11 pixels, corresponding to the 5 scales respectively. The sliding step is fixed at NT/2 (half the cell size per scale) pixels to balance detection accuracy and computational efficiency, while the number of HOG bins is set to 9 (0–180°, interval of 20°), which is optimal for capturing the gradient orientation distribution of small ship targets. The sliding window size for candidate extraction is set to 3 × 3 cell window, covering the maximum target size in the dataset to avoid missing large-scale targets. The scale-wise performance differences are mainly reflected in the fact that small cell sizes (3 × 3, 5 × 5) are more sensitive to small targets (3 × 3–7 × 7 pixels), while large cell sizes (9 × 9, 11 × 11) perform better for relatively large targets (9 × 9–15 × 15 pixels); the 7 × 7 cell size achieves the best comprehensive performance for multi-scale targets.
To verify the impact of scale settings (i.e., and candidate cell sizes) on the detection performance of the DHOG algorithm, a sensitivity analysis is conducted by adjusting the number of scales and cell sizes while keeping other parameters unchanged. For the example experiments, the performance indicators (BSF, SCRG, and AUC) were recorded separately when was set to 3 (cell sizes: , , ), 5 (cell sizes: , , , , ), and 7 (cell sizes: to pixels). It was found that the computational complexity was the highest when was set to 7. When the cell size was too small, the gradient feature extraction became unstable, resulting in relatively low BSF and AUC. When the cell size was too large, it was difficult to detect small targets, leading to a decrease in SCRG and AUC. Therefore, and its corresponding candidate cell sizes ( to pixels) were ultimately selected as the optimal scale settings, which achieved a good balance between detection performance and computational complexity.