Bio-Inspired Visual Network for Detecting Small Moving Targets in Low-Light Dynamic Complex Environments Based on Target Gradient Temporal Features

Ling, Jun; Meng, Hecheng; Gong, Deming

doi:10.3390/app15169207

Open AccessArticle

Bio-Inspired Visual Network for Detecting Small Moving Targets in Low-Light Dynamic Complex Environments Based on Target Gradient Temporal Features

by

Jun Ling

^1,2,*,

Hecheng Meng

¹ and

Deming Gong

²

¹

School of Food Science and Engineering, South China University of Technology, Guangzhou 510006, China

²

Postdoctoral Research Workstation of Mltor Numerical Control Technology Limited Company, Zhongshan 528400, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(16), 9207; https://doi.org/10.3390/app15169207

Submission received: 25 July 2025 / Revised: 16 August 2025 / Accepted: 17 August 2025 / Published: 21 August 2025

Download

Browse Figures

Versions Notes

Abstract

Monitoring and tracking small moving objects in cluttered environments remain a major challenge for artificial-intelligence-based motion vision systems. This difficulty is not only due to the limited features presented by small objects themselves but also because of the numerous fake features present in complex dynamic environments. Drawing inspiration from the efficient small target motion detection mechanisms in insects’ brains, researchers have developed various visual networks for detecting tiny moving objects within complex natural environments. Although these networks perform well in detecting small-object motion by leveraging motion information, their ability to distinguish true targets from background noise remains severely limited under low-light conditions, where the contrast of small targets drops sharply and they are more easily overwhelmed by false motion in the background. To resolve the aforementioned limitation, this research proposes a new visual neural network. The network achieves effective discrimination between small moving targets and false targets in the background in low-light environments by leveraging the motion information for the targets and the differences in the response gradients between real moving targets and fake objects from the background. The designed network is composed of two main components: a motion perception module and a response gradient analysis module. The motion information perception module is responsible for acquiring the motion and position information for small targets, while the response gradient detection module extracts the response gradients between a tiny object and a background object and integrates the motion information, thereby effectively distinguishing small targets from fake background objects. The experimental results demonstrate that the proposed model can effectively distinguish small targets and suppress background false alarms in low-light environments. Comparisons of the experimental performance show that under a fixed false alarm rate, our model achieved a detection rate of 0.8. In addition, the proposed method recorded an average precision of 0.1 and an average F1-score of 0.1888. In contrast, the highest average precision achieved by the other methods was only 0.0075, and the highest F1-score was 0.0151. These results clearly indicate that our method substantially outperforms previous approaches in both its average precision and F1-score. These results collectively validate the effectiveness and competitiveness of the proposed model in small target detection tasks under low-light conditions.

Keywords:

small object motion detection; gradient feature; object trajectories; insect visual system

1. Introduction

Target tracking is a fundamental technology within the computer vision domain aiming to accurately identify and continuously track moving objects within environments. This technology not only facilitates the perception of dynamic environments and a deeper understanding of changes in scene but also provides essential data support and decision-making foundations for diverse high-level vision applications. For example, in intelligent surveillance systems [1], it offers strong public safety support through efficient early-warning mechanisms and post-event analyses. In the rapidly evolving field of autonomous driving [2], it serves as a critical component of environmental perception, enabling vehicles to accurately detect and avoid moving obstacles, thereby significantly enhancing driving safety. Additionally, this technology holds substantial value in domains such as natural disaster early warnings and military reconnaissance.

Although recent years have seen significant progress in target motion detection in terms of the detection accuracy, processing efficiency, and adaptability to diverse environments—showing broad application prospects in various real-world scenarios—the detection of small targets in low-light, complex dynamic backgrounds still faces severe challenges. These difficulties mainly stem from the fact that when the target is extremely small or located at a far distance, it usually occupies only a few pixels—or even a single pixel—within an image. This results in extremely limited visual information (see Figure 1), lacking stable contours, a clear color distribution, or distinctive textures as recognizable physical features. Consequently, the traditional feature extraction methods, including background elimination [3,4], frame differencing [5], and optical flow estimation [6], become ineffective, and deep learning models struggle to capture sufficiently rich discriminative semantic and spatial information [7,8,9]. Moreover, in low-light complex real-world scenarios, small targets often exhibit low contrast, making them easily overwhelmed by dynamically changing background noise and false targets (such as swaying branches, flowers, and debris). These background distractions often share similar visual characteristics with the targets, significantly interfering with the detection process and greatly increasing the risk of false alarms and misclassification. Given the above challenges, there is an urgent need to develop more robust visual perception networks capable of accurately extracting fine-grained target features and effectively suppressing various types of interference under low-light and complex dynamic background conditions.

Research in biological sciences has shown that some insects, including dragonflies, are able to precisely track small flying targets within complex natural scenarios by relying solely on limited visual cues, achieving a prey capture success rate of over 97% [10]. This remarkable ability is attributed to a specialized category of neurons within their brains named Small Target Motion Detectors (STMDs) which exhibit high sensitivity to the motion of small-sized targets [11,12]. STMD neurons generate strong responses to small targets occupying approximately 1 to 3 degrees of the visual angle while showing little or no response to large objects or wide-field stimuli exceeding 10 degrees of the visual angle [13]. More importantly, research has revealed that even in complex and dynamically changing backgrounds, STMD neurons retain their ability to respond selectively to small moving targets. These findings suggest that visual systems in insects—particularly their precise mechanism for small target tracking—may offer valuable biological inspiration and theoretical grounding for the development of robust visual neural networks specifically designed for small target detection.

Over the past few decades, with a growing understanding of the electrophysiological mechanisms of STMD neurons, scholars have introduced a range of biologically inspired visual networks aiming to detect tiny moving targets from natural surroundings. For instance, Wiederman et al. originally established the ESTMD system network [14] to emulate the size-selective characteristics of STMD neurons. This method identifies small moving targets by correlating the brightness variations in each pixel across consecutive frames. Subsequently, the DSTMD model [15], along with two cascaded models [16,17], incorporated information from paired spatial positions to enable recognition of the direction of motion. However, these models still exhibit certain limitations when applied to tracking small-scale targets in highly cluttered dynamic surroundings. To resolve these issues, Wang et al. and Ling et al. leveraged feedback mechanisms in combination with the neural perception mechanisms of insect small target motion to propose the Feedback STMD model [18], the Spatiotemporal Feedback STMD model (

ST

-STMD) [19], and the

F

STMD model [20]. These models effectively enhance the detection accuracy by either suppressing background responses similar to those of real small targets or amplifying the responses to motion of the actual target. Moreover, Xu et al. designed a fractional-order inspired visual network founded on fractional derivative concepts [21] capable of detecting small moving targets in low-frame-rate scenarios. To achieve a better performance in identifying low-contrast tiny objects, Wang enhanced the classic STMD framework by incorporating both an attention mechanism and a prediction module, resulting in the attention-based apg-STMD model [22]. This model dynamically adjusts the output of the current frame based on the detection results from the previous frame, thereby improving the system’s sensitivity to potential tiny moving targets. Furthermore, Chen et al. provided a theoretical proof demonstrating that the computational STMD model can accurately convey the dynamic features of translational visual motion, maintaining consistency between the input motion stimuli and the model’s output [23]. M. A. Billah et al. incorporated tiny object recognition techniques into control theory, establishing the corresponding theoretical constructs [24]. Meanwhile, Uzair M and colleagues extended the STMD-inspired visual framework to the infrared imaging domain, achieving precise detection of small and distant targets [25,26,27].

Most of the aforementioned STMD-related models rely on motion information captured by large monopolar cells (LMCs) [28,29] and have successfully reproduced the response patterns of STMD neurons to small moving targets. However, since LMCs primarily sense luminance changes, they struggle to effectively distinguish tiny targets from background noise under low-light conditions, resulting in detection outputs containing numerous background-induced false positives. To address this challenge, it is imperative to introduce additional discriminative features that can separate real targets from spurious responses in dynamic, complex, and low-light environments. One highly promising approach is to exploit the temporal gradient differences between the responses of true targets and those of background false positives. Gradients characterize the rate of change in the responses, serving as a key metric for capturing spatial and spatiotemporal variations. For moving targets, temporal gradient variations are typically more pronounced than those for background-induced false features. Numerous studies have demonstrated that gradient-based features not only enable effective motion detection but also significantly improve the ability to identify and suppress background-triggered false alarms, thereby enhancing both the accuracy and robustness of small target detection.

This paper presents a biologically inspired visual neural system, termed

RT

-STMD, specifically developed for detecting tiny moving targets within low-light and complex dynamic background conditions. The proposed system consists of two core modules: a motion perception module, which extracts the motion cues of tiny targets to accurately detect movement and localize the target, and a response gradient analysis module, which analyzes the gradient features of the output responses and integrates them with motion information. By leveraging the temporal differences in the response gradients between true targets and background-induced false features, this system effectively distinguishes true targets from distractors. Extensive numerical experiments demonstrate that the

RT

-STMD network significantly enhances the distinction of tiny targets from false background features. Compared with the existing models,

RT

-STMD shows a superior performance in terms of its detection accuracy and robustness under challenging conditions such as low illumination and complex dynamic scenes.

The structure of this paper is as follows. Section 2 provides a comprehensive review of related work. Section 3 elaborates on the proposed visual neural network in detail. Section 4 validates the effectiveness of the proposed network through extensive experiments. Section 5 highlights the model’s performance advantages in detecting small moving targets under low-light conditions via comparative studies. Finally, Section 6 concludes this paper.

2. Prior Work

This section primarily summarizes prior research on networks inspired by motion perception neurons in insect vision, along with various approaches to infrared small target detection. Additionally, it provides a brief overview of studies employing gradient features for object tracking.

2.1. A Network Inspired by Motion Perception Neurons in Insect Vision

Physiological studies have revealed that insects in nature are capable of accurately perceiving diverse motion cues within their visual fields, including collision, wide-field, and tiny object motion stimuli. The visual processing system in an insect achieves these functions through three key neuron types specialized for motion detection: lobula giant movement detectors (LGMDs) [30,31], lobula plate tangential cells (LPTCs) [32,33], and STMD neurons [34]. Evidence indicates that LGMDs are highly sensitive to approaching stimuli, whereas their response to receding stimuli is weak. This functionality has inspired the design of several neural visual networks for collision detection [35,36,37]. Within insect vision research, it has been found that LPTC neurons respond to wide-field motion when objects cover a substantial part of the visual field. Drawing inspiration from LPTC neuron functions, researchers have introduced models such as EMD [38], TQD [39], and the weighted-quadrant detector [40] to identify broad-field motion. These visual systems are highly effective for detecting collisions and wide-area motion. However, they are unable to detect or distinguish small objects due to their inability to discern object size.

STMD neurons represent a specialized type of neuron that can precisely detect motion stimuli from small objects. Leveraging the response properties of these neurons to tiny moving targets, researchers have developed various STMD-inspired visual systems, including ESTMD [14], DSTMD [15], Feedback STMD [18],

F

STMD [20],

ST

-STMD [19], and fractional-order STMD [21] to identify small moving objects in complex environments. While these models demonstrate a strong performance in dynamic scenes, they continue to encounter difficulties in differentiating small targets from background noise under low-light conditions.

2.2. Gradient Features for Object Detection

A real-world visual scene is highly intricate, and multiple motion-related cues can facilitate the detection of movement, among which gradient information is particularly significant. A gradient represents the rate of change in an image’s intensity with respect to the spatial coordinates and is primarily used to describe how an image varies in a specific direction. It is widely utilized in computer vision applications, including target identification. For example, Hao et al. [41] proposed an infrared small target detection algorithm based on multi-directional gradient filtering and adaptive size estimation. By estimating and iteratively utilizing the optimal target size, this method effectively suppresses false alarms caused by bright clutter and achieves high-precision small target detection in complex scenes. Zhang et al. [42] introduce the Gradient-Correlation Filtering and Contrast Measurement (GCF-CM) approach, which leverages a synergistic enhancement–suppression mechanism based on infrared gradient vector fields, gradient-correlation filtering, and contrast measurement to achieve robust detection of weak infrared small targets in challenging backgrounds such as dense clouds and dynamic clutter. Li et al. [43] introduces a target detection algorithm based on a Gradient-Intensity Saliency Metric (GISM), which jointly leverages gradient and intensity domains to suppress complex background clutter, enabling efficient and robust recognition of dim infrared small objects.

Moreover, previous studies have consistently demonstrated that incorporating gradient information can substantially enhance the system performance, with applications spanning edge detection [44], feature extraction [45], and image enhancement [46]. However, within the domain of STMD (Small Target Motion Detector) vision systems, few efforts have explored the integration of motion cues and gradient features for the precise detection of small target movements. This highlights a potential research gap and an opportunity for innovation in this field.

2.3. Small Object Detection in Infrared Images

Detecting small infrared targets primarily involves recognizing heat-emitting objects like missiles and bombs. Numerous approaches have emerged over the past decade to confronting this challenge. For example, Zhang et al. [47] introduced an improved IPT framework for infrared small target detection, employing a hybrid constraint that integrated a tensor nuclear norm with a weighted

L_{1}

norm to effectively reduce the background interference and the enhance detection accuracy. Nie et al. [48] presented a novel MLHM framework for infrared small target discrimination, achieving highly competitive detection outcomes. Zhang et al. [49] proposed an approach to improving infrared small target detection by jointly leveraging intensity and gradient information. Despite their ability to identify small target motion effectively, these methods are strongly dependent on temperature differences between the targets and backgrounds. Furthermore, they necessitate clear backgrounds, which are seldom present in natural settings.

3. The Model Framework

Figure 2a illustrates the processing flowchart of the model, and Figure 2b shows the structure of the model. Our visual system is built around two core modules: the motion information perception module and the response gradient feature discrimination module (see Figure 2b). The motion perception module leverages the neural pathways of the STMD vision system to sense luminance changes induced by target motion, thereby detecting both the movement and position of a tiny object. Figure 3 illustrates the operating principle of the STMD vision system. Through its ommatidium structure, the STMD network receives external luminance inputs, which first undergo preliminary denoising to minimize the effect of environmental noise. The LMC neurons then compute the temporal variations in luminance—an essential step for motion detection, as it captures brightness fluctuations caused by a moving target. The resulting luminance signals are processed in parallel by Tm3 and Tm1 neurons. Finally, the STMD neurons integrate these outputs to produce the final response, which represents the detection of a small moving target, typically manifested as a strong activation signal indicating the target’s presence and location.

The response gradient feature discrimination module further refines this detection by exploiting the differences in the temporal gradient patterns between true targets and background false positives. Specifically, it extracts gradient information from the response outputs of the motion perception module and calculates the neural response gradients based on detected motion data. The system then records the motion trajectory and its corresponding gradient trajectory. At each time step, it computes the coefficient of variation (CV) in the gradient along each trajectory (see Figure 4). This CV serves as a key discriminative feature: by applying a threshold to the gradient variability, the network reliably separates genuine tiny moving targets from false alarms.

The theoretical foundation of this approach stems from the observation that genuine small targets exhibit pronounced temporal fluctuations in their response gradients, whereas background-induced false features—typically static or slowly varying—show minimal gradient changes over time. Building on this insight, we propose a novel method that employs the temporal coefficient of variation in the response gradients as a reliable cue for distinguishing tiny moving targets in complex environments.

3.1. The Motion Perception Module

This subsection provides a detailed mathematical formulation of the motion information perception module.

3.1.1. Ommatidia

As depicted in Figure 2, the motion information detection module relies on ommatidia [50,51], which act as sensory receptors to acquire brightness information from the surrounding environment. They are also equipped with Gaussian filters to apply Gaussian blurring to the incoming information. Mathematically, the luminance captured by the ommatidial neurons can be expressed as

I (x, y, t) \in R, (x, y, t) \in R^{3}

. The output responses of the ommatidia, denoted as

O_{O M M} (x, y, t)

, are mathematically represented as

O_{O M M} (x, y, t) = \int \int I (u, v, t) G_{σ_{1}} (x - u, y - v) d u d v,

(1)

where

G_{σ_{1}} (x, y)

stands for a Gaussian kernel, defined as

G_{σ_{1}} (x, y) = \frac{1}{2 π σ_{1}^{2}} e^{\frac{- (x^{2} + y^{2})}{2 σ_{1}^{2}}},

(2)

where

σ_{1}

stands for the Gaussian function’s standard deviation.

3.1.2. Large Monopolar Cells

Large monopolar cells (LMCs) are positioned downstream of the ommatidia and exhibit a strong response to variations in brightness [39,40]. As a target moves across a pixel, the corresponding brightness fluctuates over time t. In the motion perception module, the LMCs operate as temporal band-pass filters, filtering ommatidial signals to detect brightness fluctuations over time t. Specifically,

L_{L M C} (x, y, t) = \int O_{O M M} (x, y, s) H_{B P} (t - s) d s .

(3)

Here,

L_{L M C} (x, y, t)

denotes the response of the LMC, with

H_{B P} (t)

functioning as the temporal kernel, defined by

H_{B P} H (t) = Γ_{n_{1}, τ_{1}} (t) - Γ_{n_{2}, τ_{2}} (t),

(4)

\begin{matrix} Γ_{n, τ} (t) = \{\begin{matrix} {(n t)}^{n} \frac{e^{\frac{- n t}{τ}}}{(n - 1)! \cdot τ^{n + 1}}, & for t \geq 0, \\ 0, & for t < 0, \end{matrix} \end{matrix}

(5)

where n and

τ

are the order and time constant associated with the Gamma kernel [52].

3.1.3. Tm3 and Tm1

Tm3 and Tm1 neurons constitute the downstream components of the LMC neuron (see Figure 2) [53], with the Tm3 neurons reacting vigorously to brightness increases and the Tm1 neurons being more responsive to brightness decreases. As part of the motion information perception module, Tm3 and Tm1 function as half-wave rectifiers to segregate and process the LMC output in parallel. Let

O N_{T m 3} (x, y, t)

signify the response of the Tm3 neuron, which is derived by extracting the excitatory part of the LMC response

L_{L M C} (x, y, t)

. Specifically,

O N_{T m 3} (x, y, t) = {[L_{L M C} (x, y, t)]}^{+} .

(6)

Furthermore, research indicates that Tm1 responds with a temporal delay compared to Tm3 at identical coordinates

(x, y)

(see Figure 3). Accordingly, Tm1’s output

O F F_{T m 1} (x, y, t)

is expressed via

O F F_{T m 1} (x, y, t) = \int {[L_{L M C} (x, y, s)]}^{-} Γ_{n_{3}, τ_{3}} (t - s) d s .

(7)

Here,

{[u]}^{+} = \frac{| u | + u}{2}, {[u]}^{-} = \frac{| u | - u}{2}

.

3.1.4. STMDs

In the proposed visual network, STMD neurons, specialized for motion detection, utilize signals from Tm3 and Tm1 to localize moving targets [14,15]. The STMD response is obtained by combining

O N_{T m 3} (x, y, t)

and

O F F_{T m 1} (x, y, t)

multiplicatively, producing strong activation at

(x, y)

. Specifically,

S_{S T M D} (x, y, t) = O N_{T m 3} (x, y, t) \times O F F_{T m 1} (x, y, t) .

(8)

Moreover, to filter out large-object responses, the STMD response

S_{S T M D} (x, y, t)

leverages lateral inhibition (see Figure 2). Specifically,

D_{S T M D} (x, y, t) = {[\int \int S_{S T M D} (u, v, t) L_{I N} (x - u, y - v) d u d v]}^{+},

(9)

where

L_{I N} (x, y)

corresponds to the inhibition kernel, formulated as

L_{I N} (x, y) = λ {[r (x, y)]}^{+} + η {[r (x, y)]}^{-},

(10)

and the function

r (x, y)

is formulated by offsetting the linear combination of two Gaussian functions with a constant, defined as

r (x, y) = G_{σ_{2}} (x, y) - ϵ G_{σ_{3}} (x, y) - ω,

(11)

and

λ, η, ϵ, ω

are predefined constants.

3.2. The Response Gradient Analysis Module

The response gradient analysis module comprises two parallel processing pathways. The first pathway is dedicated to determining the target’s spatial position and estimating its trajectory by examining the motion detection module’s output over successive frames. The second pathway is dedicated to extracting the response gradient information from the same motion module output. These two streams of information are then integrated to compute the coefficient of variation in the response gradient over time. By comparing this temporal variability against a predefined threshold, the module can accurately detect tiny target motion while effectively filtering out false positives caused by background interference.

3.2.1. Recording the Motion Trajectory

In the proposed visual neural network, the motion detection module’s outputs are utilized to identify the positions of small features and to document their motion trajectories, as illustrated in Figure 5. By setting a detection threshold

ς

, the module determines the position of a small feature through a comparison with its outputs. At a specified threshold

ς

and time

t_{0}

, when the output

D_{S T M D} (x_{0}, y_{0}, t_{0})

at location

(x_{0}, y_{0})

exceeds

ς

, a small feature is identified at

(x_{0}, y_{0})

. Similarly, at the subsequent time point

t_{1}

, another location

(x_{1}, y_{1})

can be identified and recorded. If

(x_{1}, y_{1})

at

t_{1}

is within a small vicinity of

(x_{0}, y_{0})

at

t_{0}

, these positions are considered part of the same trajectory. By iteratively applying this process over time, a complete motion trajectory

T R C

can be constructed. Formally, the motion trajectory

T R C

is described as

\begin{matrix} T R C = (x (t), y (t)), t \in [t_{0}, t_{i}] . \end{matrix}

(12)

The pair

(x (t), y (t))

specifies the spatial point with time t, with

t_{0}

and

t_{i}

corresponding to the initial and current time.

3.2.2. Extracting Gradient Information and Gradient Trajectories

In the proposed visual model, we utilize the response output of the motion module to compute the spatial gradient. Specifically,

\begin{matrix} \nabla D_{S T M D} (x, y, t) = (\frac{\partial D_{S T M D} (x, y, t)}{\partial x}, \frac{\partial D_{S T M D} (x, y, t)}{\partial y}), \end{matrix}

(13)

Here,

\frac{\partial D_{S T M D} (x, y, t)}{\partial x}

and

\frac{\partial D_{S T M D} (x, y, t)}{\partial y}

indicate the image’s partial derivatives along the x and y directions, respectively, at time t.

In the subsequent experiments, we obtain the gradients of the response output in the x and y directions in the spatial domain by convolving the response output with two Sobel operators. To integrate the gradient information, we determine the magnitude of the gradient by calculating the gradients in the x and y directions in the spatial domain. Specifically, the magnitude of the gradient is obtained by synthesizing the gradient components in the x and y directions, and the formula for this calculation is as follows:

\begin{matrix} G (x, y, t) = \sqrt{{(\frac{\partial D_{S T M D} (x, y, t)}{\partial x})}^{2} + {(\frac{\partial D_{S T M D} (x, y, t)}{\partial y})}^{2}}, \end{matrix}

(14)

where

G (x, y, t)

represents the Euclidean norm of the gradient vector, which is the length of the gradient vector. For each motion trajectory, the magnitude trace can be derived by integrating the trajectory position and the STMD neuron’s response magnitude. The magnitude motion trajectory

M R C

is given by

\begin{matrix} M R C = (G (x (t)), G (y (t))), t \in [t_{0}, t_{i}] . \end{matrix}

(15)

After extracting the magnitude motion trajectory

M R C

, we can calculate the coefficient of variation

C V

in the magnitude for each movement trace with time t. The coefficient of variation

C V

is used to measure the degree of variation in the magnitude over time and is defined as the ratio of the standard deviation to the mean. Therefore, we first calculate the mean and standard deviation of the magnitude for each motion trajectory at each time point t, as follows:

First, we calculate the mean magnitude

μ

:

\begin{matrix} μ = \frac{1}{n} \sum_{i = 1}^{n} M R C_{i} (t) \end{matrix}

(16)

where

M R C_{i} (t)

is the magnitude of the i-th motion trajectory at time point t, and n is the total number of motion trajectories. Next, we calculate the standard deviation

σ

of the magnitude:

\begin{matrix} σ = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(M R C_{i} (t) - μ)}^{2}} \end{matrix}

(17)

Finally, we calculate the coefficient of variation

C V

:

\begin{matrix} C V = \frac{σ}{μ} \end{matrix}

(18)

In this way, we obtain the coefficient of variation

C V

for each motion trajectory at each time point t, which is used to measure the degree of variation in the magnitude over time. To differentiate small targets, we establish a threshold

υ

and evaluate the coefficient of variation

C V

at each time point t. When

C V (x, y, t)

exceeds

υ

, a small object is recognized at the coordinate

(x_{0}, y_{0})

at time point

t_{0}

.

4. Experimental Results and Analysis

We performed a series of comprehensive evaluations on simulated and actual image sequence datasets to validate the proposed method. The simulated videos [54] were generated by compositing small computer-generated targets onto complex background images, vividly simulating the motion of tiny targets in cluttered environments. All simulated sequences were sampled at a high frame rate of 1000 Hz (i.e., 1000 frames per second) to capture subtle motion variations, and a video segment consisting of 800 frames was used in the experiments. The RIST dataset [55], on the other hand, consists of real-world videos captured using high-definition cameras, containing various moving targets and diverse environmental conditions. The dataset features difficult conditions like significant occlusion, camera movement, and variations in the overall brightness under low-light settings, enabling a rigorous assessment of our method’s robustness in complex real-world situations. A sample video containing 750 frames was selected for evaluation. All of the experiments were conducted on a high-performance computer with a 3.10 GHz i7 CPU, 16 GB of RAM, and running Windows 10. MATLAB R2017a was used as the experimental platform, leveraging its powerful image processing and data analysis capabilities to implement the proposed approach. As shown in Table 1, the

RT

-STMD model parameters are specified in detail. The parameters for the motion detection channel were determined through a sensitivity analysis based on the findings from [15], ensuring the model’s adaptability to various motion patterns and compliance with the STMD neurons’ biological constraints on the response speed and receptive field size. The CV threshold parameter was optimized through a series of sensitivity experiments, as visualized in Figure 6. The findings indicate that as the CV threshold increases, the model’s detection performance initially improves and then slightly declines. When the threshold is set too low, it fails to effectively suppress false positive signals from the background, leading to an increase in false detections. Conversely, if the threshold is too high, the model may mistakenly suppress true target signals by misclassifying them as background noise, resulting in more missed detections. The experiments ultimately determined that a CV threshold of nine yields the best detection performance. This confirms that at this configuration, the model achieves its peak performance in terms of precision and robustness. Consequently, the selected threshold was employed across all subsequent simulations to uphold the reliability and uniformity of the findings.

4.1. The Effectiveness of the Motion Perception Module

To provide a clear understanding of the

RT

-STMD’s operation, we adopted Vision Agg-generated synthetic sequences to visualize its information flow. As illustrated in Figure 7a, a representative picture from the synthetic sequences captures a situation in which a tiny target moves against a complex low-light background, with the respective speeds being 250 and 150 pixels per second. The directions of motion are represented by the vectors

V_{T}

(target) and

V_{B}

(background), respectively. Man-made background features are interpreted as false objects and share the background’s motion pattern. Figure 7b visually illustrates the trajectory of the target. To clearly present the model’s response output, we fix the time at

t_{0} = 600

ms and extract the neural response at the vertical position

y_{0}

= 69, 195, 237, 319 pixels and the artificial feature position

y_{0} = 167

pixels. Figure 8 shows the received luminance signal

I (x, y_{0}, t_{0})

and the corresponding ommatidium response

P (x, y_{0}, t_{0})

. In detail, in Figure 8a,c, the input brightness and ommatidium outputs of the true target column are shown, while Figure 8b,d illustrate these outputs for a column containing a false object. It is evident that the incoming signals are heavily contaminated by background noise, making it difficult to distinguish the real target from background interference. As shown in Figure 8c,d, the ommatidium neurons apply Gaussian denoising, resulting in smoother output signals compared to the raw input. Figure 9a,b further show the complex outputs of the LMCs, which are obtained by computing the temporal changes in the ommatidium responses. Positive LMC signals indicate an increase in luminance at time

t_{0}

, while negative signals indicate a decrease. However, since these figures only reflect signal variations at

t_{0}

, they fail to highlight target-specific features and are insufficient for identifying target motion or suppressing false responses.

The responses generated by the Tm3 and Tm1 neurons are depicted in Figure 10. The signals are processed by dividing the LMC outputs into positive and negative parts to obtain these results. In particular, the Tm3 neuron output represents the positive part of the LMC response, mainly indicating rises in luminance. Conversely, the Tm1 neuron encodes the delayed negative portion of the LMC signal, signifying light decreases. Figure 11 shows the STMD fusing outputs from the Tm3 and Tm1 neurons in the motion perception module. As shown in the figures, both the real tiny object (located at

x = 195

) and the false objects embedded into the surroundings can elicit reactions from the STMD neurons. This highlights a critical limitation: relying solely on the intrinsic processing mechanisms of the STMD neurons is insufficient to effectively distinguish true tiny objects from complex background noise. Thus, enhancing tiny target detection requires incorporating additional discriminative features that enable precise identification and separation of the target in complex, low-light cluttered environments, thereby improving both the accuracy and robustness.

4.2. The Working Mechanisms of the Response Gradient Analysis Module

In this study, we designed a response gradient analysis module based on the STMD neural network. This module is intended to extract gradient information from the response signals of tiny targets and integrate it with the original response output, thereby improving the system’s ability to distinguish between true moving targets and false features embedded into the background. To validate the effectiveness of this module, we first elaborate on its working principles and then evaluate the performance differences before and after its integration through an experimental analysis.

Figure 12 illustrates the gradient response information for tiny targets and false features. It can be noted that both the positions of the tiny targets and the background false alarms display pronounced gradient signals. Therefore, relying solely on gradient response information is insufficient to effectively suppress background false positives. Figure 13a,b illustrates the temporal variations in the gradient responses for tiny targets and false features. It can be observed that the gradient fluctuation curve for the tiny target exhibits a distinct oscillatory pattern over time, whereas the gradient signal of the false features remains relatively stable, with minimal variation. This difference indicates that by analyzing the temporal variability in the gradient information, we can effectively distinguish moving tiny targets from background-induced false features. Such an approach can significantly enhance the accuracy and robustness of target detection.

Hence, the unique characteristics of gradient variation make it possible to distinguish true tiny targets from background artifacts. The key to this separation lies in the fact that tiny targets and false background features exhibit markedly different dynamic characteristics in their gradient fluctuation curves. To quantify this intrinsic difference, we introduce the coefficient of variation as a statistical metric. The CV standardizes the degree of fluctuation in the gradient variation curve, allowing for an effective description of the differences between tiny targets and false background features. This enables accurate identification of small moving targets even under heavy background interference. Figure 13c,d present the temporal variation curves for the CV for the tiny target and the false target. From the observations, we can see that the CV in the tiny target gradually increases over time and reaches relatively high values. In contrast, the CV in the background false target remains small, with negligible variation. This difference suggests that by selecting an appropriate CV threshold, false positive signals from the background can be effectively suppressed, thereby improving the accuracy of target detection. Figure 14 depicts the motion trajectories for small targets (blue) and false positives (red) obtained using various CV threshold values

ν

within our detection framework. It can be clearly observed that a large number of false trajectories is always present in the recorded results. Nevertheless, with a rise in the threshold parameter, the quantity of false trajectories steadily declines, highlighting the threshold’s key function in managing the false alarm rate. A higher threshold enables a more effective distinction between true target trajectories, thereby reducing the probability of false alarms. This demonstrates that our model can effectively suppress false positives from the background, thus improving the detection accuracy. In Figure 15, the motion traces identified by different models, including the ESTMD, DSTMD,

F

STMD, Feedback STMD, Frac-STMD, ST-STMD and

RT

-STMD methods, are displayed. It can be observed that the trajectories generated by the ESTMD, DSTMD,

F

STMD, Feedback STMD, Frac-STMD, and ST-STMD models contain a large number of false features, making it difficult to accurately identify the tiny targets. In contrast, our proposed model, after applying the coefficient of variation (CV) threshold, effectively suppresses background false positives. As a result, the trajectories consist mainly of the true paths of tiny targets with only a few residual false positives from the background. This significantly reduces false detections and enhances both the accuracy and robustness of target detection.

Figure 16a,b illustrate the response outputs after threshold processing. As clearly shown, only the responses corresponding to the true tiny target locations are successfully retained, while the responses caused by false features at other locations are effectively suppressed. These results confirm the capability of the proposed method to precisely distinguish tiny targets from background noise, significantly boosting the detection performance and robustness. In addition, we present the final detection results for several comparison models, as shown in Figure 17. It is apparent that the other models generate outputs with excessive false positives, thereby impairing effective target–background discrimination. In contrast, the proposed model significantly suppresses false positive responses and retains only the responses corresponding to true tiny targets. These findings further substantiate the capability of the proposed approach to accurately and robustly detect tiny targets under low-light complex background conditions.

5. The Comparative Analysis

The advantages of the

RT

-STMD network are showcased in this section through comparative performance evaluations. A quantitative evaluation of the detection performance is carried out by comparing seven STMD-based frameworks:

RT

-STMD, ESTMD [14], DSTMD [15], Feedback STMD [18],

F

STMD [20], ST-STMD [19], and Frac-STMD [21]. We employ the ROC curve, plotted with

D_{A} - F_{A}

, as the evaluation metric, with

D_{A}

and

F_{A}

defined as follows:

D_{A} = \frac{t r u e d e t e c t i o n s}{t o t a l t a r g e t s}, F_{A} = \frac{f a l s e d e t e c t i o n s}{i m a g e c o u n t} .

(19)

We first compared our model with existing benchmark models, including ESTMD, DSTMD,

F

STMD, Feedback-STMD, Frac-STMD, and ST-STMD, considering various simulation factors such as the target speed, size, luminance, background motion speed and direction, and background image. The characteristics of the simulated frame sequences are listed in Table 2. The ROC curves presented in Figure 18a demonstrate the detection capabilities of different comparative models for identifying small objects in simulated image sequences. The

RT

-STMD model consistently outperforms the other three by maintaining elevated detection rates at various false alarm levels. This suggests that our proposed model successfully suppresses background false positives, thereby greatly improving the detection accuracy. Figure 18b–f show the ROC curves for the comparative networks evaluated under different simulation settings, keeping the false alarm rate (

F_{A}

) fixed at 5. In particular, Figure 18b displays ROC curves related to the detection performance for various object sizes in the simulated images. It can be observed that most models prefer target sizes between

4 \times 4

and

8 \times 8

pixels. However, thanks to the addition of the response gradient recognition module, our model’s detection performance significantly surpasses that of the other six models. This demonstrates that incorporating the response gradient recognition module markedly enhances the detection capability, yielding superior results across various target sizes. Figure 18c illustrates the detection performance curves for different models across varying target brightness levels. It is evident that as the target’s brightness increases from 0 to 0.1, the ROC curves for all four models exhibit a downward trend. Notably, despite this brightness range, the

RT

-STMD model consistently maintains the best performance across all target brightness levels. This indicates that our model remains highly robust when detecting low-contrast targets in low-light conditions, further validating the crucial role of the response gradient recognition module in enhancing both the robustness and detection accuracy in such environments. Figure 18d illustrates the detection curves for different models under varying target velocities. As shown in Figure 18d, across the entire range of 0–300 pixels/s, the ROC curve of the RT-STMD model consistently remains above those of ESTMD [14], DSTMD [15], Feedback STMD [18],

F

STMD [20], ST-STMD [19], and Frac-STMD [21]. This indicates that the

RT

-STMD model achieves a higher detection rate (

D_{A}

) while significantly reducing the false alarm rate. These findings suggest that the response gradient recognition module introduced effectively captures velocity-sensitive motion features, thereby suppressing background dynamic interference and markedly enhancing the model’s robustness and accuracy across different motion speeds. The detection results for all of the compared models under changing background velocities and movement directions are shown in Figure 18e,f. These results show that regardless of changes in background speed or target motion direction, the ROC curve for the

RT

-STMD model consistently remains above those of ESTMD, DSTMD,

F

STMD, Feedback-STMD, Frac-STMD, and ST-STMD, achieving higher detection rates while significantly lowering false alarms. These findings further confirm that the incorporated response gradient recognition module can adaptively capture differential features between the background and target motion, thereby enhancing the model’s robustness and accuracy in complex dynamic background scenarios.

The ROC curves in Figure 19 demonstrate the detection capabilities of the comparison model for real small objects across multiple low-light and complicated motion settings. Figure 19a–c,g–i depict sample frames of small target motion in complex scenes, while Table 3 shows that all videos have an average brightness below 0.5, reflecting low-light conditions. The motion orientations of the background and the real target are represented by the arrows

V_{B}

and

V_{T}

. In Figure 19d–f,j–l, the ROC curves depict the performance of seven models in small target detection under dynamic, complex background conditions. The experimental data reveal that our visual neural network exceeds the performance of the six comparison models in all assessed backgrounds. Figure 20 presents the performance of various comparison models in multi-target detection tasks under different complex and dynamic backgrounds. The results demonstrate that even when detecting two or more moving targets simultaneously, our model can fully exploit the temporal characteristics of the response gradient differences between true targets and background false positives; effectively suppress background interference; and maintain a significant performance advantage. This further validates the robustness and applicability of the proposed method in multi-target detection scenarios. In addition, the detection performance of the

RT

-STMD model on real-world datasets is examined. Figure 21 illustrates representative frames and comparative results across the seven models. In Figure 21a–c, typical frames from three unique real datasets are presented, with Figure 17d–f showing the ROC performance curves of the comparison models in various practical environments. The findings indicate that the

RT

-STMD model achieves a better detection performance than ESTMD, DSTMD,

F

STMD, Feedback-STMD, Frac-STMD, and ST-STMD while keeping the false alarm rate unchanged.

In Table 3, Table 4, Table 5 and Table 6, we present a comprehensive quantitative comparison of various models, including GISM, GCF-CM, DSTMD, ESTMD,

F

STMD, Feedback STMD, Frac-STMD, ST-STMD, and

RT

-STMD, in terms of the detection rate, precision, F1-score, and runtime. When the false alarm rate (

F_{A}

) is set to 5, our proposed

RT

-STMD achieves an average detection rate of 0.80, significantly outperforming all other models—GISM (0.04), GCF-CM (0.10), DSTMD (0.39), ESTMD (0.53), FSTMD (0.55), Feedback STMD (0.43), Frac-STMD (0.29), and ST-STMD (0.51)—which are all below 0.60. In terms of the precision and F1-score,

RT

-STMD attains an average precision of 0.1 and an F1-score of 0.1888, markedly higher than those of all other models (all precision values below 0.008 and F1-scores below 0.1), demonstrating superior accuracy in identifying small moving targets while effectively suppressing background false alarms and maintaining a balanced trade-off between precision and recall. Regarding the runtime, our model averages at 88.8 ms/frame, outperforming GISM (145.8 ms/frame), GCF-CM (585.5 ms/frame), and ST-STMD (234.1 ms/frame) while being slightly slower than DSTMD (86.8 ms/frame), ESTMD (21.5 ms/frame),

F

STMD (25.9 ms/frame), Feedback STMD (33.3 ms/frame), and Frac-STMD (20.2 ms/frame). Although it is slightly slower than some models, its high accuracy and robustness in complex backgrounds make it highly promising for practical applications. Overall, these results highlight the exceptional robustness, stability, and detection performance of the

RT

-STMD visual network under low-light and complex dynamic backgrounds, as well as across different target types, confirming its significant advantages for practical small target detection applications.

6. Conclusions

This paper presents a visual network designed to distinguish tiny moving targets from background false alarms in low-light, dynamic, and complex environments by exploiting differences in their temporal response gradient patterns. The network comprises two key modules: a motion perception module and a response gradient analysis module. The motion perception module identifies target positions and trajectories, while the response gradient extraction module obtains the gradient features of the motion responses through the motion perception pathway. The network then derives gradient trajectories aligned with the target’s movement and computes their coefficients of variation. Utilizing the variations in the gradient characteristics between true targets and background artifacts, along with a thresholding strategy, the network successfully reduces false alarms and accurately detects small moving objects. To assess the effectiveness of the proposed approach, comprehensive experiments were performed on both simulated and real-world datasets. The results clearly demonstrate that compared to existing approaches, this visual network significantly reduces false alarms and improves the detection accuracy. Moreover, the method attains an average detection rate of 0.8, substantially outperforming the other state-of-the-art techniques, thereby confirming its strong robustness and precision in detecting small moving targets in complex scenarios.

Author Contributions

Methodology: J.L.; validation: J.L.; investigation: H.M.; data curation: J.L. and H.M.; writing—original draft preparation: J.L.; writing—review and editing: J.L., H.M. and D.G.; visualization: J.L.; supervision: H.M. and D.G.; project administration: H.M. and D.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

Author Prof. Deming Gong was employed by the Miltor Numerical Control Technology Limited Company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Escobar-Alvarez, H.D.; Ohradzansky, M.; Keshavan, J.; Ranganathan, B.N.; Humbert, J.S. Obstacle avoidance and path planning methods for autonomous navigation of mobile robot. Sensors 2024, 24, 3573. [Google Scholar] [CrossRef]
Mahaur, B.; Mishra, K.K. Small-object detection based on YOLOv5 in autonomous driving systems. Pattern Recognit. Lett. 2023, 168, 115–122. [Google Scholar] [CrossRef]
Teja, Y.D. Static object detection for video surveillance. Multimed Tools Appl. 2023, 82, 21627–21639. [Google Scholar] [CrossRef]
Kalsotra, R.; Arora, S. Background subtraction for moving object detection: Explorations of recent developments and challenges. Vis. Comput. 2022, 38, 4151–4178. [Google Scholar] [CrossRef]
Xiao, Y.; Yuan, Q.; Jiang, K.; Jin, X.; He, J.; Zhang, L.; Lin, C.W. Local-global temporal difference learning for satellite video super-resolution. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 2789–2802. [Google Scholar] [CrossRef]
Sun, S.; Mo, B.; Xu, J.; Li, D.; Zhao, J.; Han, S. Multi-YOLOv8: An infrared moving small object detection model based on YOLOv8 for air vehicle. Neurocomputing 2024, 588, 127685. [Google Scholar] [CrossRef]
Xu, S.; Zhang, M.; Song, W.; Mei, H.; He, Q.; Liotta, A. A systematic review and analysis of deep learning-based underwater object detection. Neurocomputing 2023, 527, 204–232. [Google Scholar] [CrossRef]
Diwan, T.; Anirudh, G.; Tembhurne, J.V. Object detection using YOLO: Challenges, architectural successors, datasets and applications. Multimed Tools Appl. 2023, 82, 9243–9275. [Google Scholar] [CrossRef]
Zhou, Y. A YOLO-NL object detector for real-time detection. Multimed Tools Appl. 2024, 238, 122256. [Google Scholar] [CrossRef]
Mischiati, M.; Lin, H.T.; Herold, P.; Imler, E.; Olberg, R.; Leonardo, A. Internal models direct dragonfly interception steering. Nature 2015, 517, 333–338. [Google Scholar] [CrossRef]
Barnett, P.D.; Nordström, K.; O’carroll, D.C. Retinotopic organization of small-field-target-detecting neurons in the insect visual system. Curr. Biol. 2007, 17, 569–578. [Google Scholar] [CrossRef]
Keleş, M.F.; Frye, M.A. Object-detecting neurons in Drosophila. Curr. Biol. 2017, 27, 680–687. [Google Scholar] [CrossRef]
Nordström, K.; Barnett, P.D.; O’Carroll, D.C. Insect detection of small targets moving in visual clutter. PLoS Biol. 2006, 4, e54. [Google Scholar] [CrossRef]
Wiederman, S.D.; Shoemaker, P.A.; O’Carroll, D.C. A model for the detection of moving targets in visual clutter inspired by insect physiology. PLoS ONE 2008, 3, e2784. [Google Scholar] [CrossRef] [PubMed]
Wang, H.; Peng, J.; Yue, S. A directionally selective small target motion detecting visual neural network in cluttered backgrounds. IEEE Trans. Cybern. 2018, 50, 1541–1555. [Google Scholar] [CrossRef] [PubMed]
Wiederman, S.D.; O’Carroll, D.C. Biologically inspired feature detection using cascaded correlations of off and on channels. J. Artif. Intell. Soft. 2013, 3, 5–14. [Google Scholar] [CrossRef]
Wiederman, S.D.; O’Carroll, D.C. Biomimetic target detection: Modeling 2 nd order correlation of off and on channels. In Proceedings of the 2013 IEEE Symposium on Computational Intelligence for Multimedia, Signal and Vision Processing (CIMSIVP), Singapore, 16–19 April 2013; pp. 16–21. [Google Scholar]
Wang, H.; Wang, H.; Zhao, J.; Hu, C.; Peng, J.; Yue, S. A time-delay feedback neural network for discriminating small, fast-moving targets in complex dynamic environments. IEEE Trans. Neural Netw Learn. Syst. 2021, 34, 316–330. [Google Scholar] [CrossRef]
Wang, H.; Zhong, Z.; Lei, F.; Peng, J.; Yue, S. Bio-inspired small target motion detection with spatio-temporal feedback in natural scenes. IEEE Trans. Image Process. 2024, 33, 451–465. [Google Scholar] [CrossRef]
Ling, J.; Wang, H.; Xu, M.; Chen, H.; Li, H.; Peng, J. Mathematical study of neural feedback roles in small target motion detection. Front. Neurorobot. 2022, 16, 984430. [Google Scholar] [CrossRef]
Xu, M.; Wang, H.; Chen, H.; Li, H.; Peng, J. A fractional-order visual neural model for small target motion detection. Neurocomputing 2023, 550, 126459. [Google Scholar] [CrossRef]
Wang, H.; Zhao, J.; Wang, H.; Hu, C.; Peng, J.; Yue, S. Attention and prediction-guided motion detection for low-contrast small moving targets. IEEE Trans. Cybern. 2022, 53, 6340–6352. [Google Scholar] [CrossRef]
Chen, H.; Fan, B.; Li, H.; Peng, J. Rigid propagation of visual motion in the insect’s neural system. Neural Netw. 2025, 181, 106874. [Google Scholar] [CrossRef]
Billah, M.A.; Faruque, I.A. Modeling Small-Target Motion Detector Neurons as Switched Systems with Dwell Time Constraints. In Proceedings of the 2022 American Control Conference (ACC), Atlanta, GA, USA, 8–10 June 2022; pp. 3192–3197. [Google Scholar]
Uzair, M.; Finn, A.; Brinkworth, R.S. Efficient Sampling of Bayer Pattern for Long Range Small Target Detection in Color Images. In Proceedings of the 2023 38th International Conference on Image and Vision Computing New Zealand (IVCNZ), Palmerston North, New Zealand, 29–30 November 2023; pp. 1–5. [Google Scholar]
Uzair, M.; Brinkworth, R.S.; Finn, A. Detecting small size and minimal thermal signature targets in infrared imagery using biologically inspired vision. Neuron 2021, 21, 1812. [Google Scholar] [CrossRef]
Uzair, M.; Brinkworth, R.S.A.; Finn, A. A bio-inspired spatiotemporal contrast operator for small and low-heat-signature target detection in infrared imagery. Neuron 2021, 33, 7311–7324. [Google Scholar] [CrossRef]
Stöckl, A.L.; O’Carroll, D.C.; Warrant, E.J. Hawkmoth lamina monopolar cells act as dynamic spatial filters to optimize vision at different light levels. Sci. Adv. 2020, 6, eaaz8645. [Google Scholar] [CrossRef]
Clark, D.A.; Demb, J.B. Parallel computations in insect and mammalian visual motion processing. Curr. Biol. 2016, 26, R1062–R1072. [Google Scholar] [CrossRef]
Sztarker, J.; Rind, F.C. A look into the cockpit of the developing locust: Looming detectors and predator avoidance. Dev. Neurobiol. 2014, 74, 1078–1095. [Google Scholar] [CrossRef] [PubMed]
Rind, F.C. Recent advances in insect vision in a 3D world: Looming stimuli and escape behaviour. Curr. Opin. Insect Sci. 2024, 63, 101180. [Google Scholar] [CrossRef] [PubMed]
Scheffer, L.K.; Xu, C.S.; Januszewski, M.; Lu, Z.; Takemura, S.Y.; Hayworth, K.J.; Huang, G.B.; Shinomiya, K.; Maitlin-Shepard, J.; Berg, S.; et al. A connectome and analysis of the adult Drosophila central brain. eLife 2020, 9, e57443. [Google Scholar] [CrossRef] [PubMed]
Maisak, M.S.; Haag, J.; Ammer, G.; Serbe, E.; Meier, M.; Leonhardt, A.; Schilling, T.; Bahl, A.; Rubin, G.M.; Nern, A.; et al. A directional tuning map of Drosophila elementary motion detectors. Nature 2013, 500, 212–216. [Google Scholar] [CrossRef]
Hussaini, M.M.; Evans, B.J.; O’Carroll, D.C.; Wiederman, S.D. Temperature modulates the tuning properties of small target motion detector neurons in the dragonfly visual system. Curr. Biol. 2024, 34, 4332–4337. [Google Scholar] [CrossRef]
Zheng, Y.; Wang, Y.; Wu, G.; Li, H.; Peng, J. Enhancing LGMD-based model for collision prediction via binocular structure. Front. Neurosci. 2023, 17, 1247227. [Google Scholar] [CrossRef]
Chang, Z.; Fu, Q.; Chen, H.; Li, H.; Peng, J. A look into feedback neural computation upon collision selectivity. Neural Netw. 2023, 166, 22–37. [Google Scholar] [CrossRef] [PubMed]
Chang, Z.; Chen, H.; Hua, M.; Fu, Q.; Peng, J. A bio-inspired visual collision detection network integrated with dynamic temporal variance feedback regulated by scalable functional countering jitter streaming. Neural Netw. 2025, 182, 106882. [Google Scholar] [CrossRef] [PubMed]
Hassenstein, B.; Reichardt, W. Systemtheoretische analyse der zeit-, reihenfolgen-und vorzeichenauswertung bei der bewegungsperzeption des rüsselkäfers chlorophanus. Z. FüR Naturforsch. B 1956, 11, 513–524. [Google Scholar] [CrossRef]
Clark, D.A.; Bursztyn, L.; Horowitz, M.A.; Schnitzer, M.J.; Clandinin, T.R. Defining the computational structure of the motion detector in Drosophila. Neuron 2011, 70, 1165–1177. [Google Scholar] [CrossRef]
Warrant, E.J. Matched filtering and the ecology of vision in insects. Ecol. Anim. Senses Matched Filters Econ. Sens 2016, 11, 143–167. [Google Scholar]
Hao, C.; Li, Z.; Zhang, Y.; Chen, W.; Zou, Y. Infrared Small Target Detection Based on Adaptive Size Estimation by Multi-directional Gradient Filter. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5007915. [Google Scholar] [CrossRef]
Zhang, X.; Ru, J.; Wu, C. Infrared small target detection based on gradient correlation filtering and contrast measurement. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–12. [Google Scholar] [CrossRef]
Li, Y.; Li, Z.; Li, W.; Liu, Y. Infrared small target detection based on gradient-intensity joint saliency measure. IEEE Trans. Geosci. Remote Sens. 2022, 15, 7687–7699. [Google Scholar] [CrossRef]
Liu, L.; Liu, Z.; Hou, A.; Qian, X.; Wang, H. Adaptive edge detection of rebar thread head image based on improved Canny operator. IET Image Process. 2024, 18, 1145–1160. [Google Scholar] [CrossRef]
Tiwari, M.; Bhargava, A.; Chaurasia, V.; Shandilya, M.; Siddiqui, E.A.; Bhardwaj, S. Automated lung cancer detection using Prewitt gradient kernel and SVM from CT-Lung images. In Proceedings of the 2023 1st International Conference on Innovations in High Speed Communication and Signal Processing (IHCSP) 2023: 16th European Conference, Glasgow, UK, 23–28 March 2023; pp. 508–513. [Google Scholar]
Vijayalakshmi, D.; Nath, M.K. A novel contrast enhancement technique using gradient-based joint histogram equalization. Circuits Syst. Signal Process. 2021, 40, 3929–3967. [Google Scholar] [CrossRef]
Zhang, L.; Peng, Z. Infrared small target detection based on partial sum of the tensor nuclear norm. Remote Sens. 2019, 11, 382. [Google Scholar] [CrossRef]
Nie, J.; Qu, S.; Wei, Y.; Zhang, L.; Deng, L. An infrared small target detection method based on multiscale local homogeneity measure. Infrared. Phys. Technol. 2018, 90, 186–194. [Google Scholar] [CrossRef]
Zhang, H.; Zhang, L.; Yuan, D.; Chen, H. Infrared small target detection based on local intensity and gradient properties. Infrared. Phys. Technol. 2018, 89, 88–96. [Google Scholar] [CrossRef]
Borst, A.; Helmstaedter, M. Common circuit design in fly and mammalian motion vision. Nat. Neurosci. 2015, 18, 1067–1076. [Google Scholar] [CrossRef]
Sanes, J.R.; Zipursky, S.L. Design principles of insect and vertebrate visual systems. Neuron 2010, 66, 15–36. [Google Scholar] [CrossRef]
De Vries, B.; Príncipe, J. A theory for neural networks with time delays. In Proceedings of the Conference: Advances in Neural Information Processing Systems 3, Denver, CO, USA, 26–29 November 1990; pp. 162–168. [Google Scholar]
Joesch, M.; Schnell, B.; Raghu, S.V.; Reiff, D.F.; Borst, A. ON and OFF pathways in Drosophila motion vision. Nature 2010, 468, 300–304. [Google Scholar] [CrossRef]
Straw, A.D. Vision egg: An open-source library for realtime visual stimulus generation. Front. Neuroinform. 2008, 2, 339. [Google Scholar] [CrossRef]
RIST Data Set. 2020. Available online: https://sites.google.com/view/hongxinwang-personalsite/download/ (accessed on 6 April 2020).

Figure 1. Small targets in a low-light background.

Figure 2. (a) A flowchart of the

RT

-STMD model framework. (b) The architecture of the

RT

-STMD model comprises two main components: a motion information extraction module and a response gradient feature extraction module. The former identifies the positions of small targets by capturing their motion characteristics, while the latter computes gradient features from the response data. Ultimately, integrating motion cues with gradient features enables the system to effectively suppress background-induced false alarms under low-light conditions.

Figure 2. (a) A flowchart of the

RT

-STMD model framework. (b) The architecture of the

RT

-STMD model comprises two main components: a motion information extraction module and a response gradient feature extraction module. The former identifies the positions of small targets by capturing their motion characteristics, while the latter computes gradient features from the response data. Ultimately, integrating motion cues with gradient features enables the system to effectively suppress background-induced false alarms under low-light conditions.

Figure 3. The operational mechanism of the STMD (Small Target Motion Detector) neural network can be divided into several stages. First, the STMD network acquires external luminance signals through its ommatidium structure. These raw optical inputs undergo an initial denoising step to minimize the influence of environmental noise on subsequent processing. Next, the LMC (large monopolar cell) neurons compute the temporal variations in the received luminance, which is critical for motion detection, as it captures the brightness fluctuations caused by the target’s movement. The extracted luminance changes are then processed in parallel by Tm3 and Tm1 neurons. Finally, the STMD neurons integrate the outputs from Tm3 and Tm1 to generate a final response. This response reflects the network’s detection of small target motion, typically manifesting as a strong activation signal that indicates the presence and position of the target.

Figure 4. The working mechanism of the response gradient extraction module is as follows: this module first derives gradient information from the output of the motion information perception module and then analyzes the temporal gradient variations between true small targets and background-induced false positives. As illustrated in Figure 4, the gradient of a genuine small target exhibits significant fluctuations over time, whereas that of a background false positive remains nearly constant due to its fusion with the background. Based on this distinction, the system computes the coefficient of variation (CV) in the gradient for each trajectory: true small targets produce noticeably higher CV curves, while background false positives yield markedly lower ones. Consequently, the gradient coefficient of variation serves as a reliable criterion for distinguishing real small targets from false alarms.

Figure 5. The procedure for documenting the motion trajectories involves checking whether a detected small target at

(x_{t + 1}, y_{t + 1})

with

t + 1

is within a limited field of the location

(x_{t}, y_{t})

at time t. If so, both points are assigned to a common trajectory. This step is repeated to record the trajectory over time.

Figure 5. The procedure for documenting the motion trajectories involves checking whether a detected small target at

(x_{t + 1}, y_{t + 1})

with

t + 1

is within a limited field of the location

(x_{t}, y_{t})

at time t. If so, both points are assigned to a common trajectory. This step is repeated to record the trajectory over time.

Figure 6. The sensitivity analysis curve for the coefficient of variation (CV) threshold. As the threshold increases from low to high, the detection rate of the model first rises steadily, reaches its peak, and then gradually declines. The eperimental results indicate that the model achieves the optimal detection performance when the CV threshold is set to 9.

Figure 7. (a) A dataset frame presents a case where the tiny object and the complex low-light background move in opposite directions, with velocities of 250 and 150 pixels per second, respectively. The movement directions are denoted by the vectors

V_{T}

for the target and

V_{B}

for the background. Man-made structures within the background are regarded as false targets and share the same motion as the background. (b) The trace of a true small target.

Figure 7. (a) A dataset frame presents a case where the tiny object and the complex low-light background move in opposite directions, with velocities of 250 and 150 pixels per second, respectively. The movement directions are denoted by the vectors

V_{T}

for the target and

V_{B}

for the background. Man-made structures within the background are regarded as false targets and share the same motion as the background. (b) The trace of a true small target.

Figure 8. (a) The original input for the small target’s position. (b) The original input for the fake target’s position. (c) The ommatidia response for the small target’s position. (d) The ommatidia response for the fake target’s position.

Figure 9. (a) The LMC response for the small target’s position. (b) The LMC response for the fake target’s position.

Figure 10. (a) Tm3 response for small target position (

x = 195

) and background false positives (x = 69, 237, 319). (b) Tm3 response for fake target position (

x = 167

). (c) Tm1 response for small target position (

x = 195

) and background false positives (x = 69, 237, 319). (d) Tm1 response for fake target position (

x = 167

).

Figure 10. (a) Tm3 response for small target position (

x = 195

) and background false positives (x = 69, 237, 319). (b) Tm3 response for fake target position (

x = 167

). (c) Tm1 response for small target position (

x = 195

) and background false positives (x = 69, 237, 319). (d) Tm1 response for fake target position (

x = 167

).

Figure 11. (a) STMD response for small target position (

x = 195

) and background false positives (x = 69, 237, 319). (b) STMD response for the fake target’s position (

x = 167

).

Figure 11. (a) STMD response for small target position (

x = 195

) and background false positives (x = 69, 237, 319). (b) STMD response for the fake target’s position (

x = 167

).

Figure 12. (a) The gradient response in small target position (

x = 195

) and background false positives (

x = 69, 237, 319

). (b) The gradient response in fake target position (

x = 167

).

Figure 12. (a) The gradient response in small target position (

x = 195

) and background false positives (

x = 69, 237, 319

). (b) The gradient response in fake target position (

x = 167

).

Figure 13. (a,b) The temporal variation in the gradient signals for both the tiny target and spurious objects within

[0, 800]

ms. (c,d) The curve for the coefficient of variation for both the tiny target and spurious objects across the interval

[0, 800]

ms.

Figure 13. (a,b) The temporal variation in the gradient signals for both the tiny target and spurious objects within

[0, 800]

ms. (c,d) The curve for the coefficient of variation for both the tiny target and spurious objects across the interval

[0, 800]

ms.

Figure 14. The trajectories detected by the

RT

-STMD model across varying thresholds

ν

.

Figure 14. The trajectories detected by the

RT

-STMD model across varying thresholds

ν

.

Figure 15. The movement traces detected by the existing benchmark models and

RT

-STMD approaches.

Figure 15. The movement traces detected by the existing benchmark models and

RT

-STMD approaches.

Figure 16. (a) Final STMD response for true small target position. (b) Final STMD response for fake target position.

Figure 17. Two-dimensional views of the detection results from several methods. A comparison reveals that the results from ESTMD, DSTMD,

F

STMD, Feedback STMD, Frac-STMD, and ST-STMD contain numerous false positive features. In contrast, the

RT

-STMD detection results primarily consist of the true trajectories of the tiny targets, significantly reducing the interference from background false positives. This demonstrates the superior capability of

RT

-STMD to suppress false features, thereby enhancing the accuracy of tiny target detection.

Figure 17. Two-dimensional views of the detection results from several methods. A comparison reveals that the results from ESTMD, DSTMD,

F

STMD, Feedback STMD, Frac-STMD, and ST-STMD contain numerous false positive features. In contrast, the

RT

-STMD detection results primarily consist of the true trajectories of the tiny targets, significantly reducing the interference from background false positives. This demonstrates the superior capability of

RT

-STMD to suppress false features, thereby enhancing the accuracy of tiny target detection.

Figure 18. (a) The receiver operating characteristic (ROC) curves obtained by the existing benchmark models and

RT

-STMD methods on original video. (b–f) The ROC curves derived from existing benchmark models and

RT

-STMD methods under different data parameters, including object size, object brightness, object velocity, and background velocity.

Figure 18. (a) The receiver operating characteristic (ROC) curves obtained by the existing benchmark models and

RT

-STMD methods on original video. (b–f) The ROC curves derived from existing benchmark models and

RT

-STMD methods under different data parameters, including object size, object brightness, object velocity, and background velocity.

Figure 19. (a–c,g–i) Representative frames from six simulated datasets under low-light conditions. (d–f,j–l) The receiver operating characteristic (ROC) curves obtained by the existing benchmark models and

RT

-STMD methods in detecting small moving target in different low-light synthetic datasets.

Figure 19. (a–c,g–i) Representative frames from six simulated datasets under low-light conditions. (d–f,j–l) The receiver operating characteristic (ROC) curves obtained by the existing benchmark models and

RT

-STMD methods in detecting small moving target in different low-light synthetic datasets.

Figure 20. (a–c) Representative multi-target frames from three simulated datasets under low-light conditions. (d–f) The receiver operating characteristic (ROC) curves obtained by the existing benchmark models and

RT

-STMD methods in detecting many small moving targets in different low-light synthetic datasets.

Figure 20. (a–c) Representative multi-target frames from three simulated datasets under low-light conditions. (d–f) The receiver operating characteristic (ROC) curves obtained by the existing benchmark models and

RT

-STMD methods in detecting many small moving targets in different low-light synthetic datasets.

Figure 21. (a–c) Representative frames from three real datasets under low-light conditions. (d–f) The receiver operating characteristic (ROC) curves obtained by the existing benchmark models and

RT

-STMD methods in detecting small moving target in different low-light real datasets.

Figure 21. (a–c) Representative frames from three real datasets under low-light conditions. (d–f) The receiver operating characteristic (ROC) curves obtained by the existing benchmark models and

RT

-STMD methods in detecting small moving target in different low-light real datasets.

Table 1. Model coefficients.

Equation	Parameters
(1)	$σ_{1} = 1$
(3)	$n_{1} = 2, τ_{1} = 3, n_{2} = 6, τ_{2} = 9$
(7)	$n_{3} = 5, τ_{3} = 25$
(10)	$σ_{2} = 1.5, σ_{3} = 3$
(11)	$φ = 1, ψ = 3, e = 1, ρ = 0$

Table 2. Settings for the simulated data.

Video Data	Primary Video	Evaluation One	Evaluation Two	Evaluation Three	Evaluation Four	Evaluation Five
Target scale	$5 \times 5$	$1 \times 1 \sim 11 \times 11$	$5 \times 5$	$5 \times 5$	$5 \times 5$	$5 \times 5$
Target illumination range	0	0	$0 \sim 0.1$	0	0	0
Target motion speed range (pixels per second)	250	250	250	$100 \sim 350$	250	250
Range of background motion speeds (pixels per second)	150	150	150	150	$100 \sim 250$	$100 \sim 250$
Background motion direction	In the right direction	In the right direction	In the right direction	In the right direction	In the right direction	In the left direction
Background scene	Figure 7a	Figure 7a	Figure 7a	Figure 7a	Figure 7a	Figure 7a

Table 3. Detection rate across various approaches (

F_{A} = 5

).

Table 3. Detection rate across various approaches (

F_{A} = 5

).

Video	Mean Brightness	GISM	GCF-CM	DSTMD	ESTMD	$F$ STMD	Feedback STMD	Frac-STMD	ST-STMD	$RT$ -STMD
Primary video	0.29	0.01	0.03	0.41	0.64	0.64	0.35	0.19	0.64	0.96
Simulated one	0.24	0.01	0.01	0.17	0.30	0.38	0.11	0.03	0.30	0.91
Simulated two	0.21	0.03	0.03	0.46	0.50	0.50	0.37	0.32	0.49	0.71
Simulated three	0.21	0.02	0.02	0.52	0.56	0.56	0.33	0.17	0.58	0.92
Simulated four	0.30	0.04	0.04	0.46	0.64	0.68	0.51	0.33	0.64	0.92
Simulated five	0.23	0.03	0.02	0.45	0.58	0.58	0.51	0.25	0.59	0.73
Simulated six	0.16	0.02	0.06	0.18	0.41	0.47	0.47	0.09	0.41	0.74
Real one	0.48	0.08	0.13	0.28	0.50	0.47	0.45	0.43	0.43	0.59
Real two	0.44	0.06	0.07	0.45	0.55	0.57	0.56	0.50	0.47	0.65
Real three	0.43	0.06	0.62	0.36	0.62	0.62	0.62	0.62	0.59	0.84
Mean		0.04	0.10	0.39	0.53	0.55	0.43	0.29	0.51	0.80

Table 4. Comparative precision of various algorithms.

Methods	GISM	GCF-CM	DSTMD	ESTMD	$F$ STMD	Feedback STMD	Frac-STMD	ST-STMD	$RT$ -STMD
Primary video	0.0048	0.0072	0.0087	0.0070	0.0070	0.0067	0.0071	0.0070	0.3667
Simulated one	0.0047	0.0062	0.0097	0.0074	0.0074	0.0076	0.0093	0.0074	0.1094
Simulated two	0.0035	0.0049	0.0101	0.0081	0.0080	0.0061	0.0098	0.0084	0.0857
Simulated three	0.0035	0.0050	0.0090	0.0078	0.0077	0.0070	0.0093	0.0080	0.0509
Simulated four	0.0042	0.0056	0.0074	0.0068	0.0068	0.0061	0.0068	0.0068	0.0555
Simulated five	0.0040	0.0029	0.0078	0.0068	0.0066	0.0061	0.0073	0.0070	0.0316
Simulated six	0.0038	0.0051	0.0062	0.0058	0.0058	0.0063	0.0063	0.0058	0.1064
Real one	0.0044	0.0051	0.0055	0.0045	0.0043	0.0042	0.0046	0.0046	0.0264
Real two	0.0039	0.0050	0.0058	0.0042	0.0042	0.0041	0.0046	0.0043	0.0857
Real three	0.0031	0.0050	0.0048	0.0037	0.0036	0.0044	0.0042	0.0036	0.0748
Mean	0.0040	0.0050	0.0075	0.0062	0.0061	0.0059	0.0069	0.0063	0.1000

Table 5. Comparative F1-scores of various algorithms.

Video	GISM	GCF-CM	DSTMD	ESTMD	$F$ STMD	Feedback STMD	Frac-STMD	ST-STMD	$RT$ -STMD
Primary video	0.0096	0.0142	0.0173	0.0139	0.0139	0.0133	0.0141	0.0139	0.5343
Simulated one	0.0093	0.0123	0.0191	0.0146	0.0146	0.0317	0.0184	0.0147	0.1969
Simulated two	0.0070	0.0096	0.0200	0.0160	0.0159	0.0121	0.0194	0.0167	0.1565
Simulated three	0.0070	0.0098	0.0196	0.0154	0.0153	0.0139	0.0185	0.0160	0.3071
Simulated four	0.0083	0.0110	0.0146	0.0136	0.0135	0.0122	0.0134	0.0135	0.1050
Simulated five	0.0080	0.0057	0.0154	0.0135	0.0131	0.0122	0.0145	0.0138	0.0611
Simulated six	0.0076	0.0100	0.0123	0.0115	0.0116	0.0126	0.0125	0.0116	0.1882
Real one	0.0087	0.0101	0.0110	0.0090	0.0087	0.0085	0.0091	0.0091	0.0511
Real two	0.0078	0.0100	0.0116	0.0084	0.0084	0.0082	0.0091	0.0086	0.1539
Real three	0.0063	0.0100	0.0096	0.0073	0.0072	0.0088	0.0083	0.0073	0.1342
Mean	0.0080	0.0103	0.0151	0.0123	0.0122	0.0134	0.0137	0.0125	0.1888

Table 6. Runtime of various algorithms (frame/ms).

Video	GISM	GCF-CM	DSTMD	ESTMD	$F$ STMD	Feedback STMD	Frac-STMD	ST-STMD	$RT$ -STMD
Primary video	145.2	627.1	91.8	23.2	27.8	35.4	21.5	234.3	108.0
Simulated one	149.0	599.4	86.7	21.0	26.4	32.4	16.5	234.2	96.2
Simulated two	149.2	582.8	82.1	20.3	24.3	31.0	19.7	233.8	92.3
Simulated three	137.4	610.0	96.8	20.8	24.3	35.5	19.0	234.1	94.9
Simulated four	146.8	595.0	110.8	24.8	28.7	33.6	24.6	233.2	112.4
Simulated five	145.2	580.7	107.8	25.8	36.2	35.6	20.7	235.0	119.8
Simulated six	175.2	602.7	90.9	25.7	29.7	39.2	21.9	245.1	115.2
Real one	118.8	490.0	53.7	15.0	15.6	24.5	16.2	212.8	41.4
Real two	183.2	730.9	98.5	24.9	31.6	47.4	25.7	278.1	71.7
Real three	107.6	437.1	48.5	13.5	14.7	18.5	15.7	200.6	36.3
Mean	145.8	585.5	86.8	21.5	25.9	33.3	20.2	234.1	88.8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ling, J.; Meng, H.; Gong, D. Bio-Inspired Visual Network for Detecting Small Moving Targets in Low-Light Dynamic Complex Environments Based on Target Gradient Temporal Features. Appl. Sci. 2025, 15, 9207. https://doi.org/10.3390/app15169207

AMA Style

Ling J, Meng H, Gong D. Bio-Inspired Visual Network for Detecting Small Moving Targets in Low-Light Dynamic Complex Environments Based on Target Gradient Temporal Features. Applied Sciences. 2025; 15(16):9207. https://doi.org/10.3390/app15169207

Chicago/Turabian Style

Ling, Jun, Hecheng Meng, and Deming Gong. 2025. "Bio-Inspired Visual Network for Detecting Small Moving Targets in Low-Light Dynamic Complex Environments Based on Target Gradient Temporal Features" Applied Sciences 15, no. 16: 9207. https://doi.org/10.3390/app15169207

APA Style

Ling, J., Meng, H., & Gong, D. (2025). Bio-Inspired Visual Network for Detecting Small Moving Targets in Low-Light Dynamic Complex Environments Based on Target Gradient Temporal Features. Applied Sciences, 15(16), 9207. https://doi.org/10.3390/app15169207

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bio-Inspired Visual Network for Detecting Small Moving Targets in Low-Light Dynamic Complex Environments Based on Target Gradient Temporal Features

Abstract

1. Introduction

2. Prior Work

2.1. A Network Inspired by Motion Perception Neurons in Insect Vision

2.2. Gradient Features for Object Detection

2.3. Small Object Detection in Infrared Images

3. The Model Framework

3.1. The Motion Perception Module

3.1.1. Ommatidia

3.1.2. Large Monopolar Cells

3.1.3. Tm3 and Tm1

3.1.4. STMDs

3.2. The Response Gradient Analysis Module

3.2.1. Recording the Motion Trajectory

3.2.2. Extracting Gradient Information and Gradient Trajectories

4. Experimental Results and Analysis

4.1. The Effectiveness of the Motion Perception Module

4.2. The Working Mechanisms of the Response Gradient Analysis Module

5. The Comparative Analysis

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI