2. Local Prior Weight Based on Multidirectional Uniformity
Using the principal component analysis method to detect an infrared target often needs to introduce prior weight. On the one hand, it can make the optimization problem have fast convergence; on the other hand, it can ensure the detection accuracy. The structure tensor [
45] is a common method to construct the weight. Let the original image matrix be
D; then, the structure tensor can be obtained by:
where
represents the Gaussian kernel function with a variance of
,
* is the convolution operator symbol,
represents the Gaussian kernel function with a variance of
, ⊗ represents the Kronecker product symbol, ∇ represents the gradient operator,
and
represent the gradient of
along the x and y directions, respectively. Then, two eigenvalue matrices
and
of
can be obtained:
Let the two eigenvalues of a pixel in
D be
and
, respectively, where
. When
, it indicates that the grayscale of
D around the pixel changes very little, which belongs to the flat area; when
, it indicates that the grayscale of the
D around this pixel changes sharply, which can be regarded as the corner; when
, it indicates that there is a grayscale change distance in the direction near this pixel, and the grayscale change in the direction perpendicular to it is very small, which can be regarded as the edge. Therefore, the possible corner points in the image can be extracted from the flat background area and the edge area by using the eigenvalue. Brown et al. [
46] proposed a “corner strength” function to find the interest points:
We selected four kinds of infrared maritime images with strong edge interference and calculated their corner strength maps by Equation (3). The results are shown in the second row of
Figure 1. The target is circled in green, and the strong edge clutter is circled in red.
Obviously, the corner strength method can effectively extract the corner area. Although most of the edges are suppressed, the partial edge of the islands and the strong waves still cannot be effectively restrained. The prior weight with strong edge residue will seriously affect the detection result and cause many false alarms. Due to the obvious edge characteristic in eigenvalue
, to further suppress these unnecessary interferences, we analyze the relationship between
and the corner strength map
in
Figure 1, as shown in
Figure 2.
We selected a target region “T” and an edge interference region “E” in
with higher intensity in each scene. It can be clearly seen that the prominent target area in
tends to present a relatively uniform annular in the position of
, while other edge interference areas prominent in
tend to present irregular shapes in the position of
, with large intensity only in a few directions. Therefore, we use the multidirectional uniformity of the target and the non-uniformity of the edge in
to suppress the strong edge corner interference. Firstly, since the intensity of the target is very obvious in
, a simple threshold segmentation can be used to obtain the region of interest:
where
indicates the mean value of
,
indicates the standard deviation of
,
indicates the segmentation threshold, and
indicates the result of the segmentation. Performing a nonzero pixel operation on
saves a lot of time compared to
. By traversing the nonzero elements in
, a scattering field of the corresponding pixel in
is constructed, and then, the multidirectional uniformity of that pixel is calculated to suppress the background and highlight the target. The schematic diagram is shown in
Figure 3. After obtaining
, the location information is mapped into
by traversing every nonzero element in
. Suppose that the current traversal is at the mapping point
p marked in the red box in
, and let the intensity value of
p be
. Taking
p as the center pixel, the vectors
with length
are extended in the horizontal and vertical directions, and the vectors
with length
are extended in the other four directions in the scattering field. Calculate and obtain the difference vector
between the multidirectional vector
and the center pixel by:
Then, calculate the element of
with the largest absolute value
and its distance
from the center pixel:
Let the vector formed by
calculated in eight directions be called
and the vector formed by
be called
.
c is a decimal that prevents the denominator from being zero, which is set to 0.001 in this paper; then, the multidirectional uniformity calculation result of any pixel
in
can be obtained by:
We set the map calculated by Equations (6)–(8) is the prior weight map
. The third columns in
Figure 1 show the effect of the proposed improved method. It can be seen that although the target is shrunk to a certain extent, the interference of high corner strength is obviously suppressed; that is, under the premise of losing certain morphological characteristics of the target, the target is not missed, and the number of false alarms is greatly reduced.
3. Proposed Method
After determining the prior weight, how to solve the unknown polarity of target detection needs to be further considered. We use the efficient PSTNN to estimate the rank of the background and use the sparse-weight similarity to judge the polarity of the target in the process of solving the ADMM, and we realize the detection of the unknown polarity target. In order to clearly represent the difference of the local grayscale of the target, the concept of polarity of the target is defined according to [
47]. In this paper, we set targets with grayscales higher than the local background to have positive polarity, and we set targets with grayscales higher than the local background to have negative polarity. Suppose the variable related to the positive polarity target is
X; then, the variable with the opposite polarity of
X is
.
3.1. Infrared Patch-Tensor Model
The infrared patch-tensor (IPT) model was proposed by Dai et al. [
42]. The patch-tensor is constructed by sliding rectangular patches of the same size on the original image and finally stacking the patches into a three-dimensional tensor cube, as shown in
Figure 4. The original expression of the IPT model is as
where
denotes the patch-tensor constructed from the original image;
denotes the low-rank background patch-tensor;
denotes the sparse target patch-tensor; and
denotes the random noise patch-tensor. Then, a tensor robust principle component analysis (TRPCA) [
48] problem can be obtained as follows:
where
indicates the rank of a matrix or tensor,
is a compromising parameter, and
represents the
-norm. Since solving the
-norm of a patch-tensor is an NP-hard problem, the
-norm is used to approximate the
-norm convexly.
3.2. IPT Model Based on PSTNN
Unlike matrices, the rank of patch-tensor is not uniquely defined. It is important to select a suitable tensor rank with a tight convex relaxation to ensure the speed and accuracy of the solution. Partial sum of the tubal nuclear norm [
49] is selected to estimate the rank of patch-tensor. The definition is as follows:
where
represents the estimate of the patch-tensor rank, and
n represents the number of patches shown in
Figure 4.
denotes the matrix obtained by Fourier transformation of the
i-th frontal slice of the tensor
,
denotes the partial sum of singular values (PSSV) [
50]. In order to obtain the ideal
N value,
is decomposed into a matrix along the frontal slice, and we calculate the singular values and set the number of singular values greater than 10% of the maximum singular value to
N value [
49].
Thus, the low-rank and sparse infrared small target detection model based on PSTNN joint
-norm is defined as:
where ⊙ indicates the Hadamard product,
is the final weight tensor, which can be obtained by Equation (13),
is the tensor obtained by inverting the elements in
, sparse weight
is a reweighted scheme [
51], which is used to improve the accuracy and speed of solving the
-norm minimization problem and can be calculated by Equation (14), where
c is generally set to 1,
is a small number to to prevent the denominator from being zero, and
k denotes the number of iterations.
3.3. Solution of the Proposed Model
As a convex optimization problem-solving method, the alternating direction method of multipliers (ADMM) [
52] is currently one of the most efficient methods, so it is applied in this paper to solve the problem in Equation (12). The augmented Langrangian function of Equation (12) can be represented as:
where
is the Lagrange multiplier,
denotes the inner product,
is the Frobenius norm, and
is a penalty factor.
Then, the problem
in Equation (15) can be solved by the following several subproblems,
and
at
step iteration is computed as follows:
The subproblem (16) is solved via soft thresholding operator [
53]:
The subproblem (17) is solved via a partial singular value thresholding operator (PSVT) [
50] through Fourier fast t-SVD computation, as shown in Algorithm 1 [
54]. Then,
Y and
are updated by:
Algorithm 1: Solve Equation using PSVT |
|
The
k-th iteration flow of ADMM is shown in Algorithm 2. After the calculation of Algorithm 1, compared with
, the smaller singular values of
are suppressed, which inhibits more sparse components and causes more details loss. So, compared with
, the grayscale of sparse components whose grayscale values are higher than local background will descend, yet the grayscale of sparse components whose grayscale values are lower than local background will ascend in
. Therefore, in Equation (18), there will be some pixels with values less than zero in
, which are discarded by the soft thresholding operator that only considers positive elements and default the grayscale of the target as higher than the local background. However, as we have discussed above, the grayscale of the target is not always higher than the local background. The dark targets will be missed by Equation (18). Unfortunately, we can not directly take the absolute value of the result of
, because in most cases, targets in the infrared maritime image have uniform polarity. If the absolute value is taken directly, it is bound to introduce some interference with the opposite polarity to the target and then affect the false alarm rate. Therefore, determining the polarity of the target in a certain scene is the key to ensure a low miss rate and false alarm rate.
Algorithm 2: The k-th iteration flow of ADMM |
- Input:
- 1
Fix the others and update by Algorithm 1; - 2
Fix the others and update by Equation ; - 3
Fix the others and update by Equation ; - 4
Fix the others and update by - 5
- 6
- 7
Update by Equation ; - Output:
, , , , ,
|
We set the maximum grayscale of D to be 255, and the image of D with its polarity reversed () is defined as . If the polarity of the target in D is negative, substituting into Equation (12) can ensure that the dark target is not missed. We find that the prior weight is always significant at the target location, regardless of the polarity of the target positive or negative. Therefore, and can be substituted into Equation (12) to calculate the sparse components with different polarities, and the polarity of the target can be judged by comparing the similarity with . However, if and are iterated to convergence using ADMM and then judged for similarity to , this will double the computation time.
We notice that in the iterative process, when , the sparse component or is sufficiently separated from or . There is a significant difference between and . Therefore, we only need to determine the polarity when , which can effectively shorten the running time.
Figure 5 shows the comparison of the sparse components
,
,
,
and
for the scene with different polarity targets. It can be seen that after the first iteration, the similarity of
and
to
is not well resolved. After the second iteration, in both
Figure 5a,b,
is clearly more similar to
, since the targets are of positive polarity; in both
Figure 5c,d,
is clearly more similar to
, since the targets are of negative polarity. Features whose prior weight
is clearly similar to the sparse component
or
are highlighted in the figure.
Since the sparse component with the correct polarity
or
has a higher intensity value at the same position as the prior weight
, we propose the concept of sparse-weight similarity
. The sparse components
and
are obtained by iterating the original image and its grayscale inversion image twice through ADMM. Then, the polarity of the target is determined by comparing their similarity with
:
Table 1 shows the values of
and
for each scene of
Figure 5. Apparently, when
, the polarity of the target is positive; when
, the polarity of the target is negative. After judging the target polarity, the branch with higher sparse-weight similarity continues to iterate, and the other branch stops iteration, which can save a lot of computing time.
Since the -norm of the target patch-tensor will stop changing after several iterations, in order to reduce the running time of the method, the iteration will be stopped when the -norm of the target patch-tensor of the two adjacent iterations is equal or the relative error () is less than a certain threshold, as shown in Algorithm 3. The overall flow of the proposed PSTNN-based and ADMM methods with target polarity judgement is shown in Algorithm 4.
3.4. The Overall Procedure of the Proposed Method
Figure 6 shows the overall procedure of the proposed method, which can be described in the following steps:
Prior weight extraction. The prior weight map is extracted by Equations (3)–(8) using structure tensor and multidirectional uniformity.
Patch-tensor construction. The patch-tensors of the original image
D and its polarity reversed
and prior weight
are constructed by the illustration of
Figure 4.
Target-background separation and polarity judgment. The input patch-tensor and are decomposed into low-rank patch-tensors , and sparse patch-tensors , by Algorithm 2. The polarity of target is judged by comparing the similarity of the sparse components and with after two iterations.
Image reconstruction and target extraction. When the iterative process meets the convergence condition in Algorithm 3, the background component B and target component T are reconstructed from the low-rank patch-tensor and sparse patch-tensor . The construction and reconstruction are opposite processes. Finally, the target we need to detect is obtained.
Algorithm 3: Iteration stop judgment |
|
Algorithm 4: The proposed method |
|
4. Experiments and Analysis
In this section, the experimental setup, including the dataset employed in this paper, quantitative evaluation indicators and baseline methods are introduced. Then, the parameters of the proposed method are determined by experiments. The proposed method is compared with baseline methods qualitatively and quantitatively. Finally, the running time of each method is compared.
4.1. Experimental Setup
4.1.1. The Data Set
In this paper, 14 groups of infrared maritime images of different scenes are selected to verify the effectiveness and robustness of the proposed methods. Each image sequence contains 100 frames. The size of each frame in sequence (a) is 284 × 236 and in sequence (b) to (l) is 640 × 512. Typical images in each sequence are shown in
Figure 7. Among them, (a) to (e) are the scenes without island and with bright targets. (f) to (i) are the scenes with island, and (i) to (l) are the scenes with dark targets.
Table 2 shows the target sizes and local mean contrast (LMC) in each sequence. LMC can be calculated by Equation (
22), where
represents the average grayscale of the target area,
represents the average grayscale of the local background area, and the local background area is obtained by extending the target boundary by 20 pixels.
4.1.2. Evaluation Metrics
In target detection, MAR, FAR, BSF and SCRG are used as evaluation indexes to evaluate the effect of the method. MAR (Missing Alarm Rate) represents the ratio between the number of missed targets calculated by the method and the number of real targets. FAR (False Alarm Rate) represents the ratio between the number of false targets detected by the method and the number of all detected targets, BSF (Background Suppression Factor) is used to represent the residual degree of background clutter in the image and to characterize the effect of background noise suppression before and after detection. SCRG (Signal-to-Clutter Ratio Gain) is used to evaluate signal-enhanced performance. MAR and FAR can be expressed by the following equations:
where MT represents the missed target, DT represents the detected real target, and FT represents the detected false target. BSF and SCRG can be represented by:
where
in BSF represents the standard deviation of the hole image except for the target area, in and out represent input and output images, respectively, SCR represents the signal-to-clutter ratio of input or output signals,
represents the average intensity of the target or background area,
t represents the target,
b represents the background, and
in SCR represents the standard deviation of the local background of target. We set the size of the background in SCR as the area obtained by extending the target area boundary by 20 pixels.
c is a small positive constant, which is set as 0.001 in this paper, to avoid the denominator becoming zero [
55]. The larger the standard deviation of the image is, the more complex the image is, and the small and weak target is more likely to be submerged in the image with large standard deviation; otherwise, the target will be salient in the image. Therefore, the higher the BSF value of the image, the more obvious the background suppression, and the easier it is to detect the target. The larger the SCRG, the greater the saliency of the target relative to the background, indicating that the target is easier to be detected.
4.1.3. Baseline Method
We selected nine public typical baseline methods and compared them with the methods proposed in this paper to verify the effectiveness of our method, including: YOLOv5 (
https://github.com/ultralytics/yolov5 (accessed on 20 July 2022 )), GST [
56], FKRW [
57], RLCM [
19], NRAM [
35], NOLC [
38], PSTNN [
43], SRWS [
41], NTFRA [
24]. Among them, YOLOv5 is a deep learning method, GST is based on structural tensor, FKRW is based on facet kernel and random walk, RLCM is based on local contrast, NRAM, NOLC, PSTNN, SRWS and NTFRA are based on principal component analysis.
In the comparison test, the parameter settings of each traditional baseline method are shown in
Table 3. The parameters of baseline methods are the same as the default parameters in their open-source code except SRWS due to slower speed and non-ideal results.
4.2. Analysis of Parameters
Four different scenes in
Figure 1 are selected to discuss the influence of the values of each parameter in the proposed methods on detection results so as to provide a basis for the selection of parameters and obtain the best parameters to achieve the best detection results.
Figure 8 shows the impact of six key parameters on the MAR and FAR.
(a) Segmentation threshold k.
In the multidirectional uniformity method proposed in this paper, in order to reduce the running time, a simple adaptive threshold segmentation is first carried out. If the value of
k is too large, the target with weak corner strength will be missed, and if it is too small, the running time of the method will be increased. By comparing the curves related to
k in
Figure 8, it can be seen that a larger value of
k increases MAR and a smaller value of
k increases FAR, so
is taken in this paper to trade off FAR and MAR.
(b) Extended length of multidirectional uniformity .
It is necessary to determine the extended length
when constructing the element-wise local multidirectional vectors. If the value of
is too small to cover the size of the target, the target will be missed; if the value of
is too large, other edge interference may be introduced when constructing multidirection vectors centered on the target, which will lead to missed detection. By comparing the curves related to
in
Figure 8, when
, both the FAR and the MAR are relatively low. Considering that the value of
should not be too large,
is chosen as the final value in this paper
(c) Patch size.
The size of the patch in
Figure 4 affects the accuracy and complexity of the method. When the patch size is large, the target has better sparsity and is easier to be separated from the background. When the patch size is small, the complexity of singular value decomposition of each patch will be reduced. By comparing the curves related to patch size in
Figure 8, the ideal effect is achieved when the patch size=40, so the patch size is set as 40 in this paper.
(d) Sliding step.
The sliding step in the construction of patch-tensor should not be too small. A too small sliding step will result in insufficient sparsity of the target and increase the running time. Meanwhile, the distance should not be larger than the patch size to ensure that all information in the image is not lost. By comparing the curves related to the sliding step in
Figure 8, the ideal effect is achieved when the sliding step = 40, so the sliding step is set as 40 in this paper.
(e) Penalty Factor .
controls the tradeoff between the low-rank and sparse tensors. If the value of
is too small, more sparse components will remain in the low-rank background, resulting in an increase in MAR. If the value of
is too large, the more background interference will be extracted into the sparse component, leading to an increase in FAR. By comparing the curves related to
in
Figure 8, the ideal effect is achieved when
, so
is set as
in this paper.
(f) Compromising Parameter .
also controls the tradeoff between the low-rank and sparse tensors, which is set as
with reference to [
48] (
,
and
denote the length, width and number of patches, respectively). By comparing the curves related to
L in
Figure 8, it can be seen that when
L increases, MAR tends to increase, and when
L decreases, FAR tends to increase. Therefore,
is set as
in this paper.
4.3. Accuracy of Polarity Judgment
The sparse-weight similarity to judge the polarity of the target proposed in this paper is not completely accurate, especially in some scenes with obvious sparse interference opposite to the polarity of the target. We counted the polarity judgment error rates
for the 14 sequences in
Figure 7, as shown in
Table 4. Combined with
Figure 7, it can be seen that there are a large number of negative polarity wave clutter in sequences (d) and (e), and a large number of positive polarity wave clutter in (i)–(l), leading to a certain amount of wrong judgments.
4.4. The Qualitative Comparison
After determining the values of the parameters of the proposed method, we compared the detection results of nine baseline methods and the proposed method in 14 different sequences in
Figure 7 and show representative single frame results in
Figure 9. It can be seen in traditional methods that FKRW and NTFRA have higher FAR and are easily disturbed by wave clutter, while NOLC and SRWS have higher MAR. When detecting sequences with strong island edges, RLCM and the proposed method have a high suppression effect, while GST, FKRW, NRAM, PSTNN, NTFRA have a relatively poor suppression effect. Although RLCM has a good effect in detecting targets with positive polarity, the morphology of the targets is lost. When detecting sequences with negative polarity targets, GST, PSTNN, NTFRA can detect part of negative polarity targets, but they are accompanied by a large number of false alarms. VOLOv5 has a significant effect on background clutter suppression, and the main source of false alarm is misjudging islands as targets. At the same time, the phenomenon of missing detection appears in a few scenes. Due to the low multidirectional uniformity at the edge of the target, the proposed method will cause the target to shrink to a certain extent. In summary, the proposed method achieves strong robust detection for scenes with strong island edge and targets with different polarities at the cost of shrinking the detected target size.
4.5. The Quantitative Comparison
In this paper, MAR, FAR, BSF and SCRG are used to measure the detection effect of the baseline methods and the proposed method.
Table 5,
Table 6,
Table 7 and
Table 8, respectively, show the comparison of average values of MAR, FAR, BSF and SCRG calculated by different baseline methods and proposed methods for 14 different sequence scenes. Each sequence contains 100 frames. For BSF and SCRG, the input image is the original image, and the output image is the image of the final result after normalization and before binarization.
From the calculation results of the four parameters of MAR, FAR, BSF and SCRG, it can be seen that FKRW and NTFRA have high FAR in most scenes, NOLC and SRWS have high MAR in most scenes, and RLCM shows better detection results in scenes with positive polarity targets. Most of the methods do not have the ability to detect negative polarity targets. Although GST, RLCM and PSTNN can detect some negative polarity targets, they are accompanied by a large number of false alarms. VOLOv5 shows excellent results on many datasets and is not affected by the polarity of the target but shows high MAR and FAR on some specific datasets. Compared with other methods, the detection effect of the proposed method for negative polarity target is significant. However, compared with RLCM and PSTNN which have a better detection effect on positive polarity target, the proposed method causes some false and misalarm when detecting the positive polarity target in some cases. There are strong interferences of opposite polarity to the target in the corresponding datasets, which causes the wrong judgment of the polarity of the target. From the calculation results of BSF, it can be seen that YOLOv5 shows significant background suppression ability, although some islands are wrongly targeted. The suppression ability of the proposed method to the background interference is stronger than most traditional baseline methods. In order to suppress the strong edge islands, the multidirection uniformity method causes the target result to shrink, which leads to the proposed method having no obvious advantage compared with other methods in SCRG results. Although the results of the proposed method are not the best on some sequences when measuring either BSF or SCRG separately, the proposed method is still a better choice compared to other baseline methods when considering both BSF and SCRG parameters, which are underlined in the table. Combining the four parameters of FAR, MAR, BSF and SCRG, it can be concluded that the proposed method is more robust, can adapt to more complex scenes and has a wider range of application compared with traditional baseline methods.
4.6. Runtime Comparison
All experiments in this paper are run on a MAC computer with 2 GHz quad-core Intel Core I5 CPU and 16 GB memory. The codes of the traditional methods are implemented in MATLAB 2022a. The codes of VOLOv5 are implemented in PyCharm 2022.2. The average runtimes of the proposed method and nine other baseline methods for sequences (a) to (n) are calculated and shown in
Table 9. It can be seen that the runtime of the proposed method is relatively short. Let the size of the input image be
, the size of the patch-tensor be
, the size of the sliding window in multidirectional uniformity be
l, and
x be the number of nonzero elements in Equation (
5), which is a small number compared with
. In the calculation of multidirectional uniformity, a sliding window is used to traverse every nonzero element in the whole image, which needs an
cost. The main consumption in PSTNN lies in SVD and FFT, which requires
cost, so the total computation cost of the proposed model is
. Methods based on component analysis can greatly reduce the running time by GPU accelerated methods [
58]. We implemented the proposed method on VS2015 by GPU acceleration technology on the server equipped with an infrared detection system in the laboratory. After acceleration, the average processing time of each frame of 14 sequential scenes is 0.047 s, which can meet the requirements of real-time monitoring in engineering.
5. Discussion
There is still room to improve the accuracy of infrared maritime target detection. The biggest challenge lies in the unexpected complex background and strong interference. The method based on background estimation filtering and the method based on local features make use of the global information and local information of the image, respectively, which have certain deficiencies. The optimization method based on local feature weight and structure tensors takes full account of global and local information and shows strong robustness. Some islands in infrared maritime images often cause a lot of false alarms because of their obvious edges. It is critical to reduce these interferences without affecting the intensity of the target. Therefore, we take advantage of the characteristic that the eigenvalue of the structure tensor has obvious edges features and propose the multidirectional uniformity to suppress the strong edges. Although the size of the target is shrunk to a certain extent, the detection accuracy is greatly improved. In addition, most of the methods neglect the case that the target grayscale is lower than the background, which leads to the lack of robustness in practical applications. Therefore, it is particularly important to achieve target detection with unknown polarity. The strategy of substituting images of opposite polarities into the optimization algorithm, respectively, and making polarity judgment in the second iteration and stopping the wrong polarity iteration is adopted, which can accurately judge polarity and ensure the effectiveness. After comparing with the advanced baseline methods on a large number of datasets, it can be concluded that the proposed method is more robust, although there are a small number of cases of polarity judgment errors. Deep learning-based methods also show excellent results, although the current infrared maritime datasets are still insufficient. In the future, our research focus will fall on more accurate polarity judgment and control the missed detection rate and false alarm rate in a lower range. In addition, we will consider the use of deep learning methods to obtain stronger robustness while expanding the dataset.