In this section, we introduce the imaging model of the dim-weak target optical system, describe the mathematical model and imaging process of each component, and introduce the multi-frame superposition method and its problems.
3.1. A Model of Optical Imaging System for Dim-Weak Targets
We consider the imaging process to be a linear process in both time and energy. In particular, the formation of each component is independent of each other, and the final imaging result comprises three components: background signal
, target signal
, and noise signal
. The following is the image model definition for this study:
Considering a dark background, which is stable under a normal scenario, is set as constant
in the experiment to minimize the influence of the initial environment on the target imaging, as shown in the following formula.
The imaging process of a point target can be considered using an array to down-sample the response of the point target after imaging and forming discrete gray values on the image [
43]. According to the linear system response formula, the response of point target
after passing through the optical system can be considered the convolution of point target
and point spread function
. Therefore, the following point-target response formula can be obtained:
The convolution of the point pixels and point spread function in optical imaging forms the point target response, which is discretely down-sampled by the array to form the final point target imaging.
Figure 2 shows the imaging process of a point target. Before passing through the imaging system, the point target is represented by the prominent value on the point pixel, whereas after passing through the imaging system, the target presents a stepped distribution of grayscale values on the array.
The point spread function is a two-dimensional Gaussian-like model centered on the point pixel response value, and its essence is the spatial domain expression of the transfer function of the imaging system. In this simulation, the extended two-dimensional Gaussian function is used as the mathematical model of the point spread function, and the intensity of the target is set by controlling the values of
,
, and
to constrain the spread range of the target. Therefore, the objective function is defined as follows:
3.2. Multi-Frame Superposition
In general, the weak signal of a single-frame image can be enhanced by extending the exposure time of the shooting device; however, this improvement is limited. When the exposure time reaches a critical value, the energy of the target no longer increases significantly, and when there is movement of the target, continuous exposure causes changes in the energy distribution of the target within the frame, resulting in smearing. Therefore, using the multi-frame superposition method to enhance the dim-weak point target data can break through the detection limit of the physical device and effectively improve the SNR of the target, thereby resulting in the convergence of the energy of the target, which is convenient for subsequent detection. Formula (5) represents the calculation formula for the power SNR, where
represents the signal power,
represents the noise power,
represents the average of the target signal, and
represents the standard deviation of the noise signal. After multi-frame superposition, the signal energy is also superimposed. As shown in Formula (6),
indicates the signal after
frames are superimposed, because the target signal remains unchanged. Thus,
. Additionally, the noise of each frame is independent of each other, so
. Thus, from Formulas (5) and (6), it can be proven that after superimposing the K frame image, the SNR can be increased by a factor of
. However, because of the instability of the imaging system, the target produces random small movements on the imaging plane. The target randomly shifts for a short period of time near the center. In this new position, a new response value is generated according to the imaging system. After each frame is accumulated, the response value of multiple point targets is formed eventually, resulting in the energy accumulation error of multi-frame superposition. The target energy spreads in the neighborhood, making detection difficult. For this representation of the data, we propose a new detection method.
The actual imaging process faces many problems, such as platform jitter, instrument noise, light pollution, and atmospheric pollution that interfere with the imaging results. However, most interference factors can be avoided by certain methods, such as changing the shooting location and selecting a suitable shooting time. However, the slight jitter generated by the shooting platform and thermal noise brought by the instrument is unavoidable, having a significant impact on the detection. Therefore, this study analyzed the effects of these two types of interference on the target. In particular, the first type includes small movements of the target imaging brought by the instability of the imaging system, and the second type is interference from the thermal noise of the imaging system.
3.2.1. Type I Interference
This slight movement is generated by the shaking of the shooting equipment or relative movement of the object position on the imaging surface because of other influences in the imaging process. This movement is disordered and irregular and occurs simultaneously in both directions. In particular, we ignore the influence of these motions on the energy distribution of weak point objects within the frame, that is, the point objects can be considered to be stationary within the frame for extremely short shooting times. According to the above explanation, the disturbance suffered by the target conforms to a random Gaussian distribution, where
is the number of superimposed frames and (
,
) represents the offset of the target in two directions. The form is shown in Formula (7).
As shown in
Figure 3, because the offset of the point target generates a new response at the new position, the superposition of the response values at multiple positions causes the responses of each point to mix with each other. Generally, the peak no longer appears at the original center point, multiple peaks appear, energy diffuses into nearby pixels, or image fragmentation occurs.
3.2.2. Type II Interference
Unlike bright targets with high SNR, targets with low energy and dim features have weakened resistance to noise, making them easily susceptible to noise. The target may be completely submerged or misshapen, ensuring that it is no longer a complete Gaussian-like shape because of contamination. As can be seen in
Figure 4, as the SNR decreases, the target’s performance in the airspace becomes weaker. Because most of the noise brought by electronic instruments is thermal noise, which is a typical Gaussian white noise [
44], it is used for simulation in this study.
Through the superposition of the above two types of interference, it can be found that the current detection difficulties are mainly because of the weak energy of the target, being submerged by noise, losing significance in the space domain, and exhibiting faint features. In the process of multi-frame superposition, the target energy dispersion and fragmentation are caused by the target energy accumulation error owing to small movements.
To address the above problems, we propose a multi-frame superposition detection algorithm based on clustering optimization, which models faint target detection as a problem of finding the center of the connected domain in graphics and uses the difference between the distribution of the target area and noise to detect the target.
3.3. Detection Based on Optimized Clustering Algorithm
In order to solve the problem of the limitation of target space domain features, our solution is to convert the image into a scatter plot and combine the energy distribution difference feature of the target and noise on the image with the clustering algorithm for detection. Since the target follows a Gaussian distribution in the imaging system, a small, connected domain is projected onto the plane after dimensionality reduction, and the noise is random and often appears as some isolated points, and this difference can help us detect the target.
It can be derived from Formulas (8)–(10) that because the target satisfies the Gaussian distribution model; theoretically, when the specified
is set, the set of
, forms an ellipse centered on
,
under ideal circumstances; the energy values of the points inside the ellipse are higher than
, and, therefore, there must exist an ellipse-like connected domain on the projection surface. Because the noise does not have this distribution feature, we use the difference of this characteristic to detect the target.
In the proposed method, the three-dimensional data are first reduced to two-dimensional data, and a projected scatter diagram is formed. Subsequently, the center point of the connected domain is calculated using the clustering method in machine learning. Finally, it is used as the original center point of the target energy distribution function for inspection. Each step is explained below.
For the faint target, after specifying the intensity range, screening the specific intensity size does not contribute to detection. More importantly, it should be clear whether there is a point that satisfies the intensity value. Through this method, the original three-dimensional data are downscaled into two dimensions, and hence, a two-dimensional scatter plot is obtained.
Figure 5 below demonstrates processing under different SNR values. It can be seen that a difference in the distribution exists, although the target energy is weakened.
The clustering method is an important technique of classification that divides the data into different clusters according to the given rules to generate significant similarities in the same cluster. Furthermore, the difference between the data of different clusters is as large as possible.
The k-means clustering algorithm, a division-based clustering method that uses distance as a rule for division, was used to solve the above problems. The process is as follows: First, we randomly selected K data objects in the given data
as the initial K clusters
. For the remaining points, we calculated the distance
from the center point of each cluster according to the formula and compared each of them. We calculated the distance from a point to the center point of each cluster, adding the point to be classified to the cluster closest to the cluster center. Subsequently, the center of the new cluster was computed, and the amount of change from the last cluster was recorded for this center, which was iterated until all the data were traversed. For K-means, various distance calculation formulas can be used. Based on the experimental results, the Euclidean distance, Formula (11), was selected in this study. In this study, a single target is mainly used as the detection scenario, so the hyperparameter K is set to 1.
Because the noise model was Gaussian white noise, according to the nature of white noise, the first-order moments were constant, and the second-order moments were uncorrelated. Its amplitude distribution obeyed the Gaussian distribution, the power spectral density obeyed a uniform distribution, and the noise at any location at any moment was random with equal probability. Therefore, as shown in Formulas (12) and (13) where
represents the variance of the noisy signal,
indicates the standard deviation of the noisy signal. The cluster center calculated according to the Euclidean distance formula always appears in the center of the image, and it must be discriminated at this time.
Just like
Figure 6, when the target center is in the same position, distinguishing the target center from the noise center becomes difficult. When the target center is not in the center of the image, the noise significantly impacts the clustering result of the target center if left untreated. Therefore, further manipulation of the data is needed to obtain a more accurate center position after forming the scatter plot.
According to the analysis in our study, there is a connected domain in the target area, but noise does not have this characteristic, so many randomly distributed unrelated points will be formed on the scatter plot. To minimize the impact of noise on the detection results, we removed isolated points from the scatterplot. We introduced the idea of an outlier removal method in point cloud data preprocessing. Points that do not have other points in the surrounding eight neighborhoods are defined as outliers.
Figure 7 is a schematic of the outliers.
In the experiment, in order to obtain better results, we adjusted the judgment criterion and experimented on the cases of 4-neighborhoods and 8-neighborhoods. We removed the data with different SNRs for 4-neighborhood outliers and 8-neighborhood outliers once, and the results are shown in
Figure 8 and
Figure 9. We found that selecting no other points in the 4-neighborhood as outliers removed more noise. Subsequently, an ablation experiment was performed to compare the effects of the two criteria on the detection results. From
Table 1, it can be found that the improvement in the detection results when using four points as the judgment criterion is more obvious, so the isolated noise is removed in this method in subsequent experiments.
Figure 10 shows the processing results of different SNRs. When the SNR of the target is high, it can be seen that the distribution characteristics of the target are significant. Although there is the first type of interference, there are still many aggregation points near the target point. The processed scatter diagram can effectively leave the aggregation point-connected domain. However, as the SNR of the target decreases, the influence of the background noise on the target becomes greater, and the energy distribution of the target is clearly diffused because of the slight movement of the target. Additionally, the point distribution in the target area changes. When the SNR drops to a very low level, the joint effect of the two kinds of interference on the target makes the original distribution characteristics of the target no longer obvious, and the noise and the target are mixed together, making it difficult to distinguish between them.