Next Article in Journal
Routing and Scheduling in Time-Sensitive Networking by Evolutionary Algorithms
Previous Article in Journal
Multi-Threshold Remote Sensing Image Segmentation Based on Improved Black-Winged Kite Algorithm
Previous Article in Special Issue
MHO: A Modified Hippopotamus Optimization Algorithm for Global Optimization and Engineering Design Problems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Gaussian Mixture Model-Based Unsupervised Dendritic Artificial Visual System for Motion Direction Detection

1
Division of Electrical, Engineering and Computer Science, Graduate School of Natural Science & Technology, Kanazawa University, Kakuma, Kanazawa 920-1192, Japan
2
Faculty of Electrical and Computer Engineering, Kanazawa University, Kakuma-Machi, Kanazawa 920-1192, Japan
3
Institute of AI for Industries, Chinese Academy of Sciences, 168 Tianquan Road, Nanjing 211135, China
4
Brain Science Institute, Jilin Medical University, Jilin 132000, China
*
Authors to whom correspondence should be addressed.
Biomimetics 2025, 10(5), 332; https://doi.org/10.3390/biomimetics10050332
Submission received: 8 April 2025 / Revised: 9 May 2025 / Accepted: 16 May 2025 / Published: 19 May 2025

Abstract

:
Motion perception is a fundamental function of biological visual systems, enabling organisms to navigate dynamic environments, detect threats, and track moving objects. Inspired by the mechanisms of biological motion processing, we propose an Unsupervised Artificial Visual System for motion direction detection. Unlike traditional supervised learning approaches, our model employs unsupervised learning to classify local motion direction detection neurons and group those with similar directional preferences to form macroscopic motion direction detection neurons. The activation of these neurons is proportional to the received input, and the neuron with the highest activation determines the macroscopic motion direction of the object. The proposed system consists of two layers: a local motion direction detection layer and an unsupervised global motion direction detection layer. For local motion detection, we adopt the Local Motion Detection Neuron (LMDN) model proposed in our previous work, which detects motion in eight different directions. The outputs of these neurons serve as inputs to the global motion direction detection layer, which employs a Gaussian Mixture Model (GMM) for unsupervised clustering. GMM, a probabilistic clustering method, effectively classifies local motion detection neurons according to their preferred directions, aligning with biological principles of sensory adaptation and probabilistic neural processing. Through repeated exposure to motion stimuli, our model self-organizes to detect macroscopic motion direction without the need for labeled data. Experimental results demonstrate that the GMM-based global motion detection layer successfully classifies motion direction signals, forming structured motion representations akin to biological visual systems. Furthermore, the system achieves motion direction detection accuracy comparable to previous supervised models while offering a more biologically plausible mechanism. This work highlights the potential of unsupervised learning in artificial vision and contributes to the development of adaptive motion perception models inspired by neural computation.

1. Introduction

Motion perception is a fundamental capability of biological visual systems, essential for survival and effective interaction with ever-changing environments. Over millions of years of evolution, these systems have developed highly efficient and specialized mechanisms for detecting and processing motion direction, enabling organisms to respond rapidly to potential threats, locate prey, and navigate complex surroundings [1,2,3,4,5,6]. In recent years, the study of motion vision has gained significant attention, not only due to its relevance in advancing image-processing technologies but also because it provides valuable insights into the underlying neural computations of the brain [7,8,9,10]. Understanding how biological systems perceive motion can inspire the development of more sophisticated artificial vision models and improve real-time motion detection in various applications [11,12].
Traditional machine learning approaches for computer vision often depend on supervised learning, requiring large, labeled datasets and extensive computational resources [13,14]. Although supervised learning models have achieved remarkable success across various domains, researchers have pointed out that continuous mathematical optimizations in modern deep learning models have led to a growing divergence from the mechanisms of the human brain [15,16]. Moreover, studies suggest that biological visual systems do not depend on labeled data; instead, they leverage unsupervised processes that self-organize through repeated exposure to motion stimuli [17,18,19]. This natural adaptation enables the formation of structured and highly efficient neural representations. Recent studies further support this view by demonstrating biologically plausible unsupervised learning frameworks that can capture complex visual dynamics without labeled supervision [19,20,21]. For example, Ligeralde et al. [19] showed that efficient spatial representations can emerge in neural networks solely through training on spontaneous retinal activity, aligning with early-stage biological development. Studies have also demonstrated that motion selectivity, including direction and speed, can emerge in an unsupervised manner within hierarchical spiking neural networks. For instance, Paredes-Vallés et al. [22] presented a spiking architecture where motion selectivity developed through unsupervised learning from raw stimuli captured by event-based cameras. Additionally, perceptual learning studies have shown that training on motion direction discrimination tasks can enhance sensitivity to motion through unsupervised learning processes. For example, research by Thompson et al. [23] demonstrated that participants improved their ability to discriminate motion direction after repeated exposure to motion stimuli, suggesting that the visual system can adapt to motion information without explicit supervision.
The processing of visual motion begins in the retina, where specialized retinal ganglion cells, particularly direction-selective ganglion cells (DSGCs), respond to motion stimuli and encode initial directional information [24]. These signals are then transmitted via the optic nerve to the lateral geniculate nucleus (LGN) of the thalamus, which serves as a relay station before forwarding visual input to the primary visual cortex (V1) [25]. Within V1, motion information is further processed by orientation-selective and motion-sensitive neurons, which extract essential motion features such as direction and speed. From there, the signals are sent to the middle temporal (MT) and medial superior temporal (MST) areas, where neurons become highly selective to motion direction and complex motion patterns [26,27]. These areas undergo synaptic plasticity and self-organizing processes, allowing for continuous adaptation and refinement of motion perception [28,29]. Studies on unsupervised learning in biological vision have shown that neurons can develop motion selectivity purely through exposure to motion stimuli, without requiring explicit feedback or labeled training data [12,30].
In our previous research, we constructed various motion direction-detecting models inspired by the visual system [31,32,33]. Most studies focused on modeling direction-selective cells in the retina, forming local motion direction-detecting models. For global motion direction detection, since most models are built based on the visual systems of adult individuals, we merely assumed that the outputs of motion direction-selective neurons with the same preferred direction converge, without detailed discussion, including the process of formation. Some studies have suggested that the number of activated motion direction-selective neurons in different directions influences the final perception of an object’s overall motion direction [34]. However, this phenomenon may develop through postnatal learning, and the precise mechanism by which local motion information integrates into global motion perception remains unclear.
In this paper, inspired by these principles, we propose a novel mechanism for detecting macroscopic motion direction. We performed unsupervised learning classification on local motion direction detection neurons, grouping the outputs of neurons detecting the same direction to form macroscopic motion direction detection neurons. The output of these macroscopic motion direction detection neurons is positively correlated with the received input. Ultimately, the neuron with the highest output represents the macroscopic motion direction of the object. Based on these principles, we propose an Unsupervised Artificial Visual System for motion direction detection. This model consists of two layers: the local motion direction-detecting neuron layer and the unsupervised learning global motion direction detection layer. In the local motion direction detection neuron layer, we adopt the Local Motion Detection Neuron (LMDN) model proposed by Hua et al. [35]. This model can detect motion in eight directions as well as speed. In this study, we use it to detect local motion directions. We used the modified output of the local motion direction-detecting neurons constructed in our previous work as the input to the global motion direction-detecting layer based on the Gaussian Mixture Model (GMM). In the global motion direction detection, we used the GMM to classify local motion detection neurons according to their preferred directions. The GMM is a widely used unsupervised learning method for clustering and density estimation. GMM represents complex data distributions as a weighted sum of multiple Gaussian components, capturing underlying patterns in high-dimensional data [36]. This probabilistic framework aligns with biological sensory processing, where neurons respond to stimuli with varying degrees of activation, akin to the overlapping distributions of Gaussian components. In the visual system, neurons in regions such as V1 and MT exhibit selective tuning to motion direction and spatial features, forming structured representations through continuous adaptation to sensory inputs [37]. Similarly, GMMs assign data points to different Gaussian components based on probability distributions, dynamically adapting to data structures. Unlike deterministic clustering methods, GMMs accommodate uncertainty and overlapping categories, reflecting the probabilistic nature of neural processing [38]. This alignment makes GMM a valuable tool for modeling biological motion perception and developing artificial visual systems that adaptively cluster the outputs of similar neurons together. At the end of the unsupervised learning layer, we selected the neuron with the maximum output as the global motion direction of the object, based on the theory of Cafaro et al. [34]. We demonstrate how this approach captures key properties of biological motion perception and enables the formation of a global motion direction-detecting system through repeated exposure to motion stimuli, providing a more natural method for motion detection in artificial visual systems. The results demonstrate that the Gaussian Mixture Model-based global motion direction-detecting layer can accurately classify signals from different directions after repeated exposure to motion stimuli from those directions.

2. Materials and Methods

2.1. Dendritic Neuron Model for Motion Direction Detection

In our previous work, we constructed multiple AVSs (Artificial Visual Systems) designed to detect motion in eight different directions [31,32,33]. The structure of these systems consists of two main components: the local motion direction detection layer and the global motion direction detection layer.
Our model typically uses two images as input, representing the scene before and after the motion occurs. The input images are first processed by the local motion direction detection layer, which corresponds to the retinal stage of the visual pathway. The number of nodes in this layer corresponds to the resolution of the input image, and each node is associated with eight types of neurons, each specialized in detecting motion in a different direction. These neurons detect the motion direction at the pixel level. Extensive research studies have shown that some retinal ganglion cells in the retina of mammals and certain neurons in the optic lobe of flies exhibit direction selectivity, meaning that they become activated only when stimulated by motion in a specific direction [2]. In our previous models, the local motion direction detection layers were primarily based on either Barlow’s model, the Hassenstein–Reichardt Correlator (HRC) model, or the dendritic neuron model [39,40,41]. Moreover, previous studies have demonstrated that these neuronal models can effectively extract local motion direction from two consecutive image frames. Although these models represent different fundamental approaches to motion detection in biological visual systems, they ultimately provide two key pieces of information: the detected motion direction and whether the neuron is activated. These two pieces of information serve as the input to the global motion direction detection layer, which corresponds to some parts of the visual cortex. Studies have shown that certain neurons in the visual cortex determine the overall motion direction by statistically analyzing the number of activated local motion direction-detecting neurons [34]. In previous models, neurons in the global motion direction detection layer count the activations of local motion direction neurons for each of the eight directions and determine the global motion direction based on the direction with the highest activation count.
In this study, we adopt the Local Motion Detection Neuron (LMDN) model proposed by Hua et al. [35] for local motion direction detection. The LMDN simulates direction-selective ganglion cells (DSGCs) in the retina and detects motion across 8 directions (e.g., upper-right, leftward) and it is proposed based on the principle of the dendritic neuron model. Our decision to adopt this model rather than the HRC or Barlow models is supported by evidence suggesting that the dendritic neuron model provides a closer approximation to the functional characteristics of real biological neurons [42,43,44]. Figure 1A shows the structure of the LMDN. It consists of photoreceptor cells (PCs), bipolar cells (BCs), horizontal cells (HCs), and ganglion cells (GCs). Each LMDN type shares a hierarchical structure inspired by retinal layers and processes motion scenarios by scanning receptive fields with predefined spatial–temporal parameters.
Photoreceptor cells (PCs) convert light signals into electrical impulses. They are categorized into luminance-sensitive cone cells (CCs) and color-specific rod cells (L-RC, M-RC, S-RC) for RGB processing. Mathematically, their function is defined as P i , j , t = X , where X represents the luminance value at coordinates ( i , j ) and time t. PCs route data to subsequent layers based on image type (grayscale or color), mimicking biological phototransduction.
Horizontal cells (HCs) detect edges by comparing luminance differences between neighboring regions. The lateral inhibition from horizontal cells enhances contour detection, critical for motion direction analysis. Using Equation (1), they activate when the absolute difference between a pixel’s luminance at time t and its neighbor at t + Δ t exceeds a threshold L:
H C = 1 , | X i , j , t X i + α , j + β , t + Δ t   | > L ; 0 , o t h e r w i s e .
Bipolar cells (BCs) detect temporal changes by comparing sequential frames. They relay motion-triggered signals to ganglion cells, filtering static or insignificant luminance variations. Their activation logic (Equation (2)) mirrors HCs but focuses on time intervals:
B C = 1 , | X i , j , t X i , j , t + Δ t   | > L ; 0 , o t h e r w i s e .
Ganglion cells (GCs) integrate synaptic inputs using Sigmoid functions (Equation (3)) to emulate excitatory/inhibitory synapses. The parameter x i , j symbolizes the external signal received by the jth synapse on the ith branch. The constants ω i , j and θ i , j are sets of parameters specific to each synapse. By adjusting these parameters, the model can emulate the characteristics and functionalities of both excitatory and inhibitory synapses. Excitatory synapses ( 0 < θ i , j ω i , j ) and inhibitory synapses ( ω i , j < θ i , j < 0 ) are modeled with distinct input–output relationships. In this model, BCs form excitatory connection with GCs while HCs form inhibitory connection with GC. Finally, GCs combine synaptic outputs from BCs and GCs via AND logic as shown in Equation (4):
S i , j = 1 1 + e k ( ω i , j x i , j θ i , j )
O u t p u t = O u t p u t B C · O u t p u t H C ¯
A previous study has presented a schematic representation of eight kinds of LMDNs, each corresponding to a distinct direction of motion. Figure 1B shows that the structure of the LMDN detects upward motion, the orange area represents the central pixel before the movement, while the blue area indicates its subsequent position after movement. The activation of the LMDN will occur only in response to motion along its preferred direction; in the example in Figure 1B, it only responds to upward motion.

2.2. GMM-Based Unsupervised AVS

The Gaussian Mixture Model (GMM) is an unsupervised learning algorithm designed to model complex data distributions by representing them as a mixture of multiple Gaussian components [36]. This method effectively captures underlying patterns in high-dimensional input data, making it a valuable tool for clustering and classification tasks. In biological systems, similar clustering mechanisms are believed to contribute to sensory processing, particularly in the organization of receptive fields in the visual cortex.
In the visual system, neurons in the primary visual cortex (V1) exhibit topographically organized receptive fields, meaning that nearby neurons respond to similar visual stimuli, a phenomenon observed in retinotopic maps [2,26]. Likewise, Gaussian Mixture Models (GMMs) categorize neurons with analogous response properties into distinct clusters, facilitating an efficient representation of motion information. Studies have shown that clustering mechanisms akin to GMMs play a crucial role in neural coding and perceptual organization [37,38]. This parallel between biological neural organization and GMM-based clustering has led to the adoption of GMMs as computational models for motion perception and feature integration in artificial vision systems [45,46].
In this study, we combined the AVS for motion direction detection with the GMM to construct a GMM-based Unsupervised AVS for motion direction detection. This approach enables the formation of a global motion direction-detecting system through repeated exposure to motion stimuli, providing a more adaptive and biologically plausible method for motion detection in artificial visual systems. Figure 2A illustrates the overall framework of the model. The general structure of this model is similar to that of our previous work, as both consist of a local motion direction detection layer and a global motion direction detection layer. The primary difference lies in the output format of the local motion direction detection layer and the structure of the global motion direction detection layer.
The local motion direction detection layer retains the same mechanism as in our previous models, with a modified output format. The input format also remains consistent with prior models, comprising two consecutive image frames representing temporally adjacent visual stimuli. In previous studies, the identity of each local motion direction-detecting neuron was unimportant; we only needed to focus on binary values indicating whether the neurons were activated. In GMM-based Unsupervised AVS, we use a (8,2) matrix I as the output for each local motion direction-detecting neuron, in order to output the identity of each neuron. The eight dimensions correspond to the eight motion directions. We define the direction corresponding to the first dimension as rightward (0 degree), and as the dimensions increase, the corresponding direction rotates counterclockwise by 45 degrees. When a neuron is activated, it outputs a matrix where the dimension corresponding to the detected motion direction contains the index of the neuron’s position, while all other positions remain zero. If a neuron is not activated, it does not produce any output. The two components of index I, I x and I y , are calculated using the following equations:
I x = a + h
I y = a + w
In the equations, to prevent the I from being entirely zero, we introduce a constant offset a. In this study, we set a = 16 . The variables h and w represent the neuron’s horizontal and vertical coordinates, respectively. Since we use 32 × 32 images as input, the maximum values of h and w are 31. The matrix of index I is shown below:
I 1 x I 1 y I 2 x I 2 y I 3 x I 3 y I 4 x I 4 y I 5 x I 5 y I 6 x I 6 y I 7 x I 7 y I 8 x I 8 y
In this matrix, each row represents a motion direction. The first column indicates the neuron’s x-coordinate, and the second column indicates the y-coordinate. For each neuron, only one row contains nonzero values. For example, the output matrix of a neuron that detects rightward motion is as follows:
I 1 x I 1 y 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Before passing I to the GMM, we normalize and flatten it into a one-dimensional vector, as shown below:
I 1 x I 1 y I 2 x I 2 y I 3 x I 3 y I 8 x I 8 y
Then, we pass these indexed data into the GMM-based global motion direction detection layer. In this layer, we apply the Gaussian Mixture Model (GMM) principle to cluster LMDNs corresponding to each motion direction. According to the GMM mechanism, each data point is assumed to be generated from a mixture of multiple Gaussian distributions. The probability of a data point belonging to a particular Gaussian component is determined based on the Expectation-Maximization (EM) algorithm.
The responsibility of each Gaussian component for a given input vector I is computed using the following equation:
R i k = π k N ( I | μ k , Σ k ) j = 1 K π j N ( I | μ j , Σ j )
where R i k represents the responsibility of the k-th Gaussian component for the input vector I, π k is the mixture weight, μ k is the mean, and Σ k is the covariance matrix of the kth Gaussian distribution.
During the training process, once the responsibilities are computed, the parameters of the Gaussian components are updated iteratively using the Expectation-Maximization (EM) algorithm. The parameter update process in the GMM model is governed by the following equations:
  • Mean Update Equation:
    The mean of each Gaussian component is updated based on the weighted sum of all input vectors, where N k represents the effective number of data points assigned to the kth Gaussian component. This concept reflects the soft clustering nature of the GMM, where data points are not assigned to a single cluster but distributed across components with associated probabilities.
    μ k n e w = 1 N k i = 1 N R i k I i
  • Covariance Matrix Update:
    The covariance matrix of each Gaussian component is updated based on the weighted sum of the squared differences between the input vectors and the updated mean.
    Σ k n e w = 1 N k i = 1 N R i k ( I i μ k n e w ) ( I i μ k n e w ) T
  • Mixing Coefficient Update:
    The mixing coefficient, which determines the proportion of data points assigned to each Gaussian component, is updated as follows:
    π k n e w = N k N
    where N is the total number of data points.
  • Log-Likelihood Computation:
    To assess the convergence of the EM algorithm, the log-likelihood of the observed data is computed at each iteration. The algorithm iterates until the log-likelihood converges to a stable value.
    log L = i = 1 N log k = 1 K π k N ( I i | μ k , Σ k )
Once the GMM training is complete, we can use it to classify the LMDNs. For an LMDN, we calculate its responsibility for each Gaussian component using Equation (7). LMDNs are then assigned to the component with the highest responsibility. This equation is used to perform a hard assignment of LMDNs to a specific Gaussian component after the GMM has been trained.
k * = arg max k γ i k
The labels assigned to LMDNs by the GMM can be seen as the connection between the LMDNs and the global motion direction-detecting layer, meaning that each LMDN is linked to a global motion direction-detecting neuron corresponding to its label. Before training the GMM, these labels are randomly assigned, meaning that the connections between LMDN and the global motion direction-detecting layer are random. However, after training, all LMDNs that detect motion in the same direction will be connected to the same global motion direction-detecting neuron. Figure 2C illustrates the schematic diagram of the connection between one LMDN and global motion direction-detecting neurons before and after training. The objects at the top represent an LMDN, while those at the bottom represent the global motion direction detection layer. The lines in the middle indicate the connections between the two layers. Before training, the connections between LMDNs and global motion direction-detecting neurons are random, meaning that each LMDN has an equal probability of being assigned to any direction. However, after training, only the connections between the neurons with the most similar preferred direction are strengthened, while the others are weakened. As a result, strong connections are preserved only among neurons tuned to the same motion direction, indicating that LMDNs detecting the same direction are most likely to be grouped together.

3. Results

In the first experiment, we assume an ideal case that after birth, a biological system has sufficient time to observe motion in all directions without perceptual impairments. We assume that the resolution of the retina is 32 × 32, with eight LMDNs detecting different motion directions beneath each pixel. As a result, the 32 × 32 retinal array consists of 8192 individual neurons. Therefore, the training dataset consists of the outputs of all LMDNs across all pixels, forming a comprehensive dataset with a shape of (8192, 8, 2). We initialized the GMM with eight components without performing training. The initial means of the components are manually set using random values rather than being inferred from the data. We generated a random number, 8595, as the seed for this study. The covariance matrices are initialized as identity matrices, ensuring an isotropic variance structure. The component weights are uniformly distributed, assigning equal prior probability to each component. Finally, the Cholesky decomposition of the precision matrices is computed based on the inverse of the covariance matrices.
To better visualize the classification results, we used t-SNE to reduce the dimensionality of the LMDN outputs and visualize them. The reason for choosing t-SNE is that it can map LMDNs detecting the same motion direction to nearby regions after dimensionality reduction, forming distinct clusters or ’islands’ for each direction. Figure 3A illustrates the results of t-SNE dimensionality reduction. We labeled the data using the original tags. The results show that LMDNs detecting different directions are grouped into separate islands, allowing us to determine the true motion direction represented by each cluster.
Next, we classified the LMDNs using GMM. We first performed classification without training the GMM, generating initial labels. The second row of Table 1 shows the number of elements assigned to each label before training, while Figure 4A presents the t-SNE visualization with these untrained labels. The results indicate that the untrained GMM failed to correctly classify LMDNs by direction. We then trained the GMM using the full dataset with a shape of (8192, 8, 2). By default, the GMM runs up to 100 iterations until convergence or until reaching the max_iter limit. In this experiment, the training process converged within two epochs. The third row of Table 1 shows the number of elements assigned to each label after training, while Figure 4B presents the t-SNE visualization with labels from the trained GMM. The results demonstrate that the elements were correctly assigned, with only one label appearing in each island. Since GMM-based classification does not inherently determine the angle associated with each label, we referenced the t-SNE visualization in Figure 3 to identify the corresponding angle for each island. This allows us to determine the directional angle for each label. Table 2 presents the mapping between labels, angles, and motion directions.
Subsequently, we embedded the trained GMM parameters into the AVS to construct a GMM-based Unsupervised AVS. To verify whether the model was successfully trained, we randomly generated a pair of images containing a 64-pixel object moving to the right for testing. When these images were input into the GMM-based Unsupervised AVS, the local motion direction detection layer first processed them, activating several LMDNs, which then sent signals to the GMM-based global motion detection neuron layer. These signals were classified by the GMM using assigned labels. By applying t-SNE for dimensionality reduction, we obtained an activation map, as shown in Figure 5. Due to the intrinsic requirements of t-SNE, we visualized the activated LMDNs by concatenating them with the previously used full dataset. This resulted in a shift in island positions compared to earlier experiments. However, since the GMM-assigned labels remained unchanged, the direction corresponding to each label remained the same as in Figure 3. In Figure 5, we marked the newly activated neurons with black circles.
From Figure 5B, we can observe that signals corresponding to different motion directions have clustered into distinct regions. All activated neurons are located within the islands corresponding to their detected motion directions. Additionally, label 0, which represents neurons detecting 0-degree (rightward) motion, was activated the most, aligning with the motion direction of the object in the image. We counted the number of activated neurons in each region and found that these numbers match the counts of activated local motion detection neurons for the corresponding directions, as shown in Table 3. As a result, the GMM layer successfully classified LMDNs detecting different directions, allowing us to determine the global motion direction by comparing the activation counts across directions.
To further validate the feasibility of our model, we trained it using only 10% of the complete training dataset. We randomly chose 10% of data from the complete training dataset and trained the model. The results are shown in Figure 6. Even with a smaller training set, the model still successfully separated the regions corresponding to different motion directions. The key difference compared to the fully trained model is that the angle associated with each label has changed. This indicates that after each training session, the model must verify the correspondence between labels and motion directions before it can be used.
Finally, we conducted an accuracy experiment. We embedded the GMM trained on the full dataset into the AVS, forming the GMM-based Unsupervised AVS. In this system, when an image pair is input, certain LMDNs are activated. The signals from these activated LMDNs are then passed to the GMM layer, where they are automatically classified according to their corresponding motion directions. The model then outputs the number of activated LMDNs for each direction. The final motion direction of the object is determined based on the direction associated with the highest number of activated LMDNs.
We utilized Dataset-C and NoiseType1 from an earlier study by Hua et al. [35] to conduct our experiments. Examples from each of the two datasets are shown separately in Figure 7. Dataset-C is a color image dataset in which both the object and background in each sample are assigned randomly selected, uniform colors that remain constant throughout the object’s motion. NoiseType1 introduces visual noise by injecting randomly colored pixels at random spatial locations; these noise pixels remain static and do not move with the object. Table 4 presents the test results. The results show that our GMM-based Unsupervised AVS produced the same outcomes as the AVS developed by Hua et al. [35], demonstrating that a GMM-based global layer can perform the same function as the global layer in previous studies.

4. Summary

In this paper, we combined the motion direction-detecting AVS with a Gaussian Mixture Model (GMM) to construct an unsupervised motion direction-detecting model, simulating the effects of early visual experience on the development of the visual system. The model consists of two main components:
  • A local motion direction detection layer, which corresponds to the retina and retains the previously established structure and mechanisms.
  • A global motion direction detection layer, which was redesigned from a simple summation-based approach to a GMM-based unsupervised learning mechanism.
We used the GMM training process to simulate how early visual experience shapes the development of the motion direction recognition in visual systems. When trained on a dataset that simulated a normal environment (8-direction stimuli), the model produced results consistent with prior summation-based models, confirming its validity in motion detection. Additionally, our model introduces a new approach for detecting macroscopic motion direction. We apply unsupervised learning to classify local motion direction-detecting neurons, clustering neurons that detect the same direction to form macroscopic motion direction detection neurons. The outputs of these macroscopic neurons are proportional to the input signals they receive. Ultimately, the neuron with the highest output indicates the macroscopic motion direction of the object. This study not only proposes a new motion direction-detecting model but also introduces a theoretical framework for understanding how early visual experience influences visual system development.

5. Discussion

In this study, our primary goal was to validate the model’s conceptual soundness and biological plausibility, rather than to optimize for performance on large-scale datasets. Although the proposed LMDN-GMM framework demonstrates the potential of biologically inspired unsupervised learning for motion direction detection, we acknowledge that the current output structure of the LMDN layer—specifically, the use of an (8,2) matrix per neuron—is not necessarily the most efficient or optimal choice for representing motion features. This format was originally selected to balance biological relevance and computational simplicity; however, it may impose limitations in terms of scalability and performance.
In future work, we aim to systematically investigate alternative model architectures and data representations that enhance computational efficiency while maintaining consistency with biological principles. This includes experimenting with lower-dimensional outputs, dynamic connection strategies, and adaptive data encoding schemes. Additionally, further investigation is needed to determine the most suitable forms of unsupervised learning for capturing motion direction in a robust and scalable manner.
In conclusion, while our current implementation provides a foundational step toward biologically inspired motion processing, we recognize its limitations and are committed to refining both the model and its data structures to achieve better efficiency and accuracy.

Author Contributions

Conceptualization, Z.T. and Y.T.; methodology, C.C. and D.Q.; software, Z.Q.; validation, T.C. and Y.H.; formal analysis, Z.Q.; investigation, Z.Q., T.C. and Y.H.; resources, Y.H.; data curation, T.C.; writing—original draft preparation, Z.Q.; writing—review and editing, Y.T. and Z.T.; visualization, Z.Q.; supervision, Y.T. and Z.T.; project administration, Y.T. and Z.T.; funding acquisition, Z.Q. Y.T. and Z.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by JST SPRING, grant number JPMJSP2135.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Barlow, H.B. Possible principles underlying the transformation of sensory messages. Sens. Commun. 1961, 1, 217–233. [Google Scholar]
  2. Hubel, D.H.; Wiesel, T.N. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 1962, 160, 106. [Google Scholar] [CrossRef] [PubMed]
  3. Land, M.F.; Nilsson, D.E. Animal Eyes; Oxford University Press: Oxford, UK, 2012. [Google Scholar]
  4. Clifford, C.W.; Ibbotson, M.R. Fundamental mechanisms of visual motion detection: Models, cells and functions. Prog. Neurobiol. 2002, 68, 409–437. [Google Scholar] [CrossRef] [PubMed]
  5. Nakayama, K. Biological image motion processing: A review. Vis. Res. 1985, 25, 625–660. [Google Scholar] [CrossRef]
  6. Fleishman, L.J. The influence of the sensory system and the environment on motion patterns in the visual displays of anoline lizards and other vertebrates. Am. Nat. 1992, 139, S36–S61. [Google Scholar] [CrossRef]
  7. Pack, C.C.; Bensmaia, S.J. Seeing and feeling motion: Canonical computations in vision and touch. PLoS Biol. 2015, 13, e1002271. [Google Scholar] [CrossRef]
  8. Zarei Eskikand, P.; Grayden, D.B.; Kameneva, T.; Burkitt, A.N.; Ibbotson, M.R. Understanding visual processing of motion: Completing the picture using experimentally driven computational models of MT. Rev. Neurosci. 2024, 35, 243–258. [Google Scholar] [CrossRef]
  9. Mazzia, V.; Angarano, S.; Salvetti, F.; Angelini, F.; Chiaberge, M. Action transformer: A self-attention model for short-time pose-based human action recognition. Pattern Recognit. 2022, 124, 108487. [Google Scholar] [CrossRef]
  10. Rideaux, R.; Welchman, A.E. Exploring and explaining properties of motion processing in biological brains using a neural network. J. Vis. 2021, 21, 11. [Google Scholar] [CrossRef]
  11. Fu, Q.; Wang, H.; Hu, C.; Yue, S. Towards computational models and applications of insect visual systems for motion perception: A review. Artif. Life 2019, 25, 263–311. [Google Scholar] [CrossRef]
  12. Abel, R.; Ullman, S. Biologically Inspired Learning Model for Instructed Vision. Adv. Neural Inf. Process. Syst. 2024, 37, 45315–45358. [Google Scholar]
  13. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
  14. Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
  15. Zhou, Y.; Liu, E.; Neubig, G.; Tarr, M.; Wehbe, L. Divergences between Language Models and Human Brains. Adv. Neural Inf. Process. Syst. 2025, 37, 137999–138031. [Google Scholar]
  16. Song, Y.; Millidge, B.; Salvatori, T.; Lukasiewicz, T.; Xu, Z.; Bogacz, R. Inferring neural activity before plasticity as a foundation for learning beyond backpropagation. Nat. Neurosci. 2024, 27, 348–358. [Google Scholar] [CrossRef]
  17. Krotov, D.; Hopfield, J.J. Unsupervised learning by competing hidden units. Proc. Natl. Acad. Sci. USA 2019, 116, 7723–7731. [Google Scholar] [CrossRef]
  18. Chen, L.; Singh, S.; Kailath, T.; Roychowdhury, V. Brain-inspired automated visual object discovery and detection. Proc. Natl. Acad. Sci. USA 2019, 116, 96–105. [Google Scholar] [CrossRef]
  19. Ligeralde, A.; Kuang, Y.; Yerxa, T.E.; Pitcher, M.N.; Feller, M.; Chung, S. Unsupervised learning on spontaneous retinal activity leads to efficient neural representation geometry. In Proceedings of the UniReps: The First Workshop on Unifying Representations in Neural Models, New Orleans, LA, USA, 15 December 2023. [Google Scholar]
  20. Ciampi, L.; Lagani, G.; Amato, G.; Falchi, F. Biologically-inspired Semi-supervised Semantic Segmentation for Biomedical Imaging. arXiv 2024, arXiv:2412.03192. [Google Scholar]
  21. Yun, Z.; Zhang, J.; Olshausen, B.; LeCun, Y.; Chen, Y. Urlost: Unsupervised representation learning without stationarity or topology. arXiv 2023, arXiv:2310.04496. [Google Scholar]
  22. Paredes-Vallés, F.; Scheper, K.Y.; De Croon, G.C. Unsupervised learning of a hierarchical spiking neural network for optical flow estimation: From events to global motion perception. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 42, 2051–2064. [Google Scholar] [CrossRef]
  23. Thompson, B.; Tjan, B.S.; Liu, Z. Perceptual learning of motion direction discrimination with suppressed and unsuppressed MT in humans: An fMRI study. PLoS ONE 2013, 8, e53458. [Google Scholar] [CrossRef] [PubMed]
  24. Wei, W. Neural mechanisms of motion processing in the mammalian retina. Annu. Rev. Vis. Sci. 2018, 4, 165–192. [Google Scholar] [CrossRef] [PubMed]
  25. Su, C.; Mendes-Platt, R.F.; Alonso, J.M.; Swadlow, H.A.; Bereshpolova, Y. Retinal direction of motion is reliably transmitted to visual cortex through highly selective thalamocortical connections. Curr. Biol. 2025, 35, 217–223. [Google Scholar] [CrossRef] [PubMed]
  26. Hubel, D.H.; Wiesel, T.N. Receptive fields and functional architecture of monkey striate cortex. J. Physiol. 1968, 195, 215–243. [Google Scholar] [CrossRef]
  27. Orban, G.A. Higher order visual processing in macaque extrastriate cortex. Physiol. Rev. 2008, 88, 59–89. [Google Scholar] [CrossRef]
  28. Kohn, A.; Movshon, J.A. Adaptation changes the direction tuning of macaque MT neurons. Nat. Neurosci. 2004, 7, 764–772. [Google Scholar] [CrossRef]
  29. Sengpiel, F.; Kind, P.C. The role of activity in development of the visual system. Curr. Biol. 2002, 12, R818–R826. [Google Scholar] [CrossRef]
  30. Bharmauria, V.; Ouelhazi, A.; Lussiez, R.; Molotchnikoff, S. Adaptation-induced plasticity in the sensory cortex. J. Neurophysiol. 2022, 128, 946–962. [Google Scholar] [CrossRef]
  31. Yan, C.; Todo, Y.; Kobayashi, Y.; Tang, Z.; Li, B. An Artificial Visual System for Motion Direction Detection Based on the Hassenstein–Reichardt Correlator Model. Electronics 2022, 11, 1423. [Google Scholar] [CrossRef]
  32. Tang, C.; Todo, Y.; Ji, J.; Tang, Z. A novel motion direction detection mechanism based on dendritic computation of direction-selective ganglion cells. Knowl.-Based Syst. 2022, 241, 108205. [Google Scholar] [CrossRef]
  33. Han, M.; Todo, Y.; Tang, Z. Mechanism of Motion Direction Detection Based on Barlow’s Retina Inhibitory Scheme in Direction-Selective Ganglion Cells. Electronics 2021, 10, 1663. [Google Scholar] [CrossRef]
  34. Cafaro, J.; Zylberberg, J.; Field, G.D. Global motion processing by populations of direction-selective retinal ganglion cells. J. Neurosci. 2020, 40, 5807–5819. [Google Scholar] [CrossRef]
  35. Hua, Y.; Yuki, T.; Tao, S.; Tang, Z.; Cheng, T.; Qiu, Z. Bio-inspired computational model for direction and speed detection. Knowl.-Based Syst. 2024, 300, 112195. [Google Scholar] [CrossRef]
  36. Reynolds, D.A. Gaussian mixture models. In Encyclopedia of Biometrics; Springer: Boston, MA, USA, 2009; p. 3. [Google Scholar]
  37. Simoncelli, E.P.; Heeger, D.J. A model of neuronal responses in visual area MT. Vis. Res. 1998, 38, 743–761. [Google Scholar] [CrossRef]
  38. Orbán, G.; Berkes, P.; Fiser, J.; Lengyel, M. Neural variability and sampling-based probabilistic representations in the visual cortex. Neuron 2016, 92, 530–543. [Google Scholar] [CrossRef]
  39. Barlow, H.; Levick, W.R. The mechanism of directionally selective units in rabbit’s retina. J. Physiol. 1965, 178, 477. [Google Scholar] [CrossRef]
  40. Hassenstein, B.; Reichardt, W. Systemtheoretische analyse der zeit-, reihenfolgen-und vorzeichenauswertung bei der bewegungsperzeption des rüsselkäfers chlorophanus. Z. Naturforschung B 1956, 11, 513–524. [Google Scholar] [CrossRef]
  41. Zhou, T.; Gao, S.; Wang, J.; Chu, C.; Todo, Y.; Tang, Z. Financial time series prediction using a dendritic neuron model. Knowl.-Based Syst. 2016, 105, 214–224. [Google Scholar] [CrossRef]
  42. Borst, A.; Egelhaaf, M. Principles of visual motion detection. Trends Neurosci. 1989, 12, 297–306. [Google Scholar] [CrossRef]
  43. Joesch, M.; Plett, J.; Borst, A.; Reiff, D.F. Response properties of motion-sensitive visual interneurons in the lobula plate of Drosophila melanogaster. Curr. Biol. 2008, 18, 368–374. [Google Scholar] [CrossRef]
  44. de Polavieja, G.G. Neuronal algorithms that detect the temporal order of events. Neural Comput. 2006, 18, 2102–2121. [Google Scholar] [CrossRef] [PubMed]
  45. Zhang, F.; Han, S.; Gao, H.; Wang, T. A Gaussian mixture based hidden Markov model for motion recognition with 3D vision device. Comput. Electr. Eng. 2020, 83, 106603. [Google Scholar] [CrossRef]
  46. Huang, H.; Ye, H.; Sun, Y.; Liu, M. Gmmloc: Structure consistent visual localization with gaussian mixture models. IEEE Robot. Autom. Lett. 2020, 5, 5043–5050. [Google Scholar] [CrossRef]
Figure 1. (A) Basic structure of the Local Motion Detection Neuron (LMDN) [35]. The model consists of photoreceptor cells (PCs), bipolar cells (BCs), horizontal cells (HCs), and ganglion cells (GCs). (B) Structure of the LMDN detects upward motion.
Figure 1. (A) Basic structure of the Local Motion Detection Neuron (LMDN) [35]. The model consists of photoreceptor cells (PCs), bipolar cells (BCs), horizontal cells (HCs), and ganglion cells (GCs). (B) Structure of the LMDN detects upward motion.
Biomimetics 10 00332 g001
Figure 2. (A) Structure of GMM-based Unsupervised AVS. (B) An example of the input data. The input data comprise an image of an object before moving and an image of the object after moving, with a time difference of Δ t between the two images. (C) A schematic diagram of the connection state between one LMDN and global motion direction-detecting neurons before and after training. Before training, the connections between the LMDN and global motion direction-detecting neurons are random. However, after training, only the connections between the most similar neurons are strengthened, while the others are weakened.
Figure 2. (A) Structure of GMM-based Unsupervised AVS. (B) An example of the input data. The input data comprise an image of an object before moving and an image of the object after moving, with a time difference of Δ t between the two images. (C) A schematic diagram of the connection state between one LMDN and global motion direction-detecting neurons before and after training. Before training, the connections between the LMDN and global motion direction-detecting neurons are random. However, after training, only the connections between the most similar neurons are strengthened, while the others are weakened.
Biomimetics 10 00332 g002
Figure 3. (A) Results of t-SNE dimensionality reduction with the true labels. (B) The t-SNE dimensionality reduction results using the labels assigned by the trained GMM.
Figure 3. (A) Results of t-SNE dimensionality reduction with the true labels. (B) The t-SNE dimensionality reduction results using the labels assigned by the trained GMM.
Biomimetics 10 00332 g003
Figure 4. (A) Results of t-SNE dimensionality before training GMM. (B) The t-SNE dimensionality reduction results using the labels assigned by the trained GMM.
Figure 4. (A) Results of t-SNE dimensionality before training GMM. (B) The t-SNE dimensionality reduction results using the labels assigned by the trained GMM.
Biomimetics 10 00332 g004
Figure 5. (A) Two input images. The object moved toward the right for one pixel. (B) The distribution map of activated neurons after visualizing with t-SNE.
Figure 5. (A) Two input images. The object moved toward the right for one pixel. (B) The distribution map of activated neurons after visualizing with t-SNE.
Biomimetics 10 00332 g005
Figure 6. (A) The t-SNE dimensionality reduction results using the labels assigned by the GMM trained on 10% of the full dataset. (B) The t-SNE dimensionality reduction results using the labels assigned by the GMM trained on the full dataset.
Figure 6. (A) The t-SNE dimensionality reduction results using the labels assigned by the GMM trained on 10% of the full dataset. (B) The t-SNE dimensionality reduction results using the labels assigned by the GMM trained on the full dataset.
Biomimetics 10 00332 g006
Figure 7. (A,B) Examples of Dataset-C. (C,D) Examples of NoiseType1.
Figure 7. (A,B) Examples of Dataset-C. (C,D) Examples of NoiseType1.
Biomimetics 10 00332 g007
Table 1. Labels and number of components.
Table 1. Labels and number of components.
Label01234567
Random0078800445002954
Trained10241024102410241024102410241024
Table 2. Relations between labels and directions.
Table 2. Relations between labels and directions.
Label0123
DirectionRightwardLeft-LowerRight-UpperRight-Lower
Angle225°45°315°
Label4567
DirectionUpwardLeftwardDownwardLeft-Upper
Angle90°180°270°135°
Table 3. Activated neurons and directions.
Table 3. Activated neurons and directions.
Label0123
DirectionRightwardLeft-LowerRight-UpperRight-Lower
Activations561157
Label4567
DirectionUpwardLeftwardDownwardLeft-Upper
Activations7111617
Table 4. Result of accuracy test.
Table 4. Result of accuracy test.
Size/Noise0%1%5%10%
197.50%97.63%94.75%92.25%
298.75%97.88%97.00%93.63%
499.13%99.75%98.50%97.38%
8100%100%100%99.38%
16100%100%100%100%
32100%100%100%100%
64100%100%100%100%
128100%100%100%100%
256100%100%100%100%
512100%100%100%100%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qiu, Z.; Hua, Y.; Chen, T.; Todo, Y.; Tang, Z.; Qiu, D.; Chu, C. A Gaussian Mixture Model-Based Unsupervised Dendritic Artificial Visual System for Motion Direction Detection. Biomimetics 2025, 10, 332. https://doi.org/10.3390/biomimetics10050332

AMA Style

Qiu Z, Hua Y, Chen T, Todo Y, Tang Z, Qiu D, Chu C. A Gaussian Mixture Model-Based Unsupervised Dendritic Artificial Visual System for Motion Direction Detection. Biomimetics. 2025; 10(5):332. https://doi.org/10.3390/biomimetics10050332

Chicago/Turabian Style

Qiu, Zhiyu, Yuxiao Hua, Tianqi Chen, Yuki Todo, Zheng Tang, Delai Qiu, and Chunping Chu. 2025. "A Gaussian Mixture Model-Based Unsupervised Dendritic Artificial Visual System for Motion Direction Detection" Biomimetics 10, no. 5: 332. https://doi.org/10.3390/biomimetics10050332

APA Style

Qiu, Z., Hua, Y., Chen, T., Todo, Y., Tang, Z., Qiu, D., & Chu, C. (2025). A Gaussian Mixture Model-Based Unsupervised Dendritic Artificial Visual System for Motion Direction Detection. Biomimetics, 10(5), 332. https://doi.org/10.3390/biomimetics10050332

Article Metrics

Back to TopTop