Next Article in Journal
A Comparative Analysis of U-Net Architectures with Dimensionality Reduction for Agricultural Crop Classification Using Hyperspectral Data
Previous Article in Journal
Reclaiming XAI as an Innovation in Healthcare: Bridging Rule-Based Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Exploring Kalman Filtering Applications for Enhancing Artificial Neural Network Learning

Departamento de Innovacion Basada en la Informacion y el Conocimiento, CUCEI, Universidad de Guadalajara, Blvd. Marcelino Garcia Barragan 1421, Col. Olimpica, Guadalajara 44430, Jalisco, Mexico
Algorithms 2025, 18(9), 587; https://doi.org/10.3390/a18090587
Submission received: 6 August 2025 / Revised: 9 September 2025 / Accepted: 11 September 2025 / Published: 17 September 2025
(This article belongs to the Section Evolutionary Algorithms and Machine Learning)

Abstract

Kalman filter is a widely used estimation algorithm with numerous applications, including parameter estimation, classification, prediction, pattern recognition, tuning, and filtering. Recently, it has gained attention in artificial intelligence and machine learning as a mathematical framework for the learning process. As a methodology designed for stochastic environments, the Kalman filter effectively manages noise and unstructured data with incomplete information while preventing premature stagnation, enabling faster learning and reducing the need for extensive pre-processing. These characteristics make it ideal for training artificial neural networks and other machine learning techniques. Given its significance, this paper presents a review of Kalman filter applications for artificial neural network learning.

1. Introduction

Working with physical signals, regardless of their context and application, requires the use of underlying techniques to ensure they are properly characterized before processing. This enables the extraction of useful information despite the presence of noise [1], incomplete information [2], stochastic environments [3], control constraints [4], and undesirable components [5]. Over time, different disciplines have made significant contributions to this problem, leading to significant advancements in signal characterization techniques, the scope of problems addressed, and their applications.
The history of filtering has evolved continuously from basic electrical circuits to sophisticated digital and statistical methods. The development of filtering approaches has been primarily driven by the demands of communication systems, control theory, and signal and data processing. From pre-20th-century analog filters to the latest methodologies based on artificial intelligence and machine learning, filtering has become a fundamental tool in engineering, robotics, telecommunications, and data science. As technology advances, filtering will continue to evolve to address increasingly complex problems in dynamic, noisy, unstructured, nonlinear, and non-stationary environments [5].
Early references to filtering appear in works related to Fourier analysis before the 20th century. In 1809, Gauss proposed an optimal filter called the least squares method to determine the trajectory of celestial bodies. This method demonstrated several key advancements, such as not requiring prior knowledge of the signals, making it widely applicable across many scientific fields. According to the literature, this methodology remains a fundamental part for both linear and statistical filtering, playing a crucial role in the foundations of modern filters. It is important to highlight the role of analog filters in the early stages of telegraph communication, where they were used to suppress undesirable frequencies and reduce noise in electrical circuits. Their development continued into the early 20th century, with the introduction of low-pass, high-pass, and band-pass filters, which facilitated more complex applications and paved the way for modern filtering. A major milestone occurred with the advent of digital signal processing in the first half of the 20th century, which led to the formal development of digital filtering methods. As digital filtering advanced in development and the increasing use of digital signals in communications and control systems, researchers began developing more advanced digital filtering techniques. This progress, along with the increasing complexity of applications in areas such as electrical circuits [2], control improvement [6] sophisticated communication systems [7], aeronautics [8], military [9], naval [10], and transportation [9], required solutions that met the evolving demands of technology.
In its early stages, the development of control systems was heavily influenced by filtering techniques, initially through analog filters and later by digital filters, which became essential for implementations enabled by emerging computing technologies. However, at the end of the 19th century, Poincaré, with his seminal work on the “New Methods of Celestial Mechanics” [11], recognized the need to formulate a general theory of dynamic systems based on sets of first-order differential equations. He introduced the now fundamental concept of considering a relevant set of system variables as a trajectory of a point in an n-dimensional space. This approach quickly gained popularity and became known as the state-space method. Thus, the concept of state became dominant in the study of dynamic systems. A critical aspect in this methodology is that your current behavior is influenced by its past history, meaning that the behavior of the system cannot be specified simply as an instantaneous relationship between sets of input and output variables. An additional set of variables, known as state variables, is needed to account for the history of the system, which represents the minimum amount of information necessary to summarize the entire dynamic past of the system. These state variables provide all the information needed to predict the future behavior of the system in response to any input signal. The use of the state-space methodology for control systems design emerged by the mid-20th century; since then, this concept has allowed scientists to formally describe dynamic systems in order to manipulate their behavior through an appropriate controller design. State-space control and optimal control theory marked the beginning of what is known as modern control theory, with wide applications in aerospace technology, robotics, communications, energy manufacturing, and transportation, to name a few. The role of Rudolf Kalman in modern control is of great importance, particularly through the formalization of two key concepts: controllability and observability [12]. The first concept establishes the conditions required to manipulate system behavior according to pre-established operating conditions, while the second concept defines the conditions required to model the evolution of internal variables based on the measurement of inputs and outputs for a given time. Both conditions, along with a third concept known as stability, are essential for the implementation and proper performance of modern control systems. Their development laid the foundation for optimal control techniques and state-estimation methods, which remain fundamental in the design of modern control systems [13].
In 1949, Norbert Wiener [14] introduced the Wiener filter in the frequency domain, which successfully addresses the problem of linear optimal dynamic estimation in stationary stochastic process systems. Despite its importance, this method produces a large computational burden while also requiring that both the estimated signal and the measured signal satisfy stationary stochastic processes, limiting its generalization. In 1960, Rudolf Emil Kalman introduced the Kalman filter (KF) [12], which does not require that the measured signal and noise follow the assumption of a stationary stochastic process. The state equation describes the relationship between input and output, considering the signal process as the result of a linear system that is affected by Gaussian noise in both the input and system state, as well as in its measurement. The Kalman filter provides an optimal estimate in terms of minimum mean-square error for linear filters of non-stationary stochastic processes [3]. Kalman developed the filter for both continuous-time and discrete-time systems. These results represent a unified methodology between the stochastic treatment of signals and the concept of dynamic systems in state-space, achieving a balance between the concepts of modern control and statistical filtering. This filter solves the problem of optimal state estimation of a linear dynamic system in the presence of measurement noise. The Kalman filter is a recursive algorithm, which means that it can process new measurements without needing to store all previous data. This characteristic is inherited from the state-space model, which is the minimum amount of information necessary to describe the complete behavior of a system. As a result, the KF only needs to store the state of the system, as opposed to storing all past signal data, making it computationally efficient. The KF also updates its state estimate by comparing the predicted state with the actual measurement, improving the accuracy of the state estimation in real-time, expanding its use in non-stationary processes, and contributing to the advancements of optimal real-time estimation [5].
The relevance of the work made by Kalman for estimation and optimal control in state-space was fundamental for several reasons: real-time operation, recursive nature and low computational complexity, and robustness to noise and uncertainties. The KF provides an efficient solution for optimal real-time state estimation in dynamical systems. In practice, many systems are not fully observable due to sensor limitations or noisy measurements, but KF enables accurate state estimation even under incomplete and noisy information without requiring storage or reprocessing of historical system data. By assuming Gaussian-distributed system noise, the KF computes optimal estimates based on this assumption. This makes it particularly effective in systems where the dynamics are not perfectly known and noise is present in the system measurements. This is essential in applications where real-time decisions are crucial, such as robotics, aerospace, autonomous vehicles, navigation and guidance systems, biomedical engineering, energy systems, transportation, and communications, among others, especially when computational resources may be limited [5].
The development of quantum systems has expanded rapidily in recent years, and accordingly, KF has also extended. A detailed description of quantum filtering can be found in [15]. A quantum extended Kalman filter (QEKF) is presented in [16], which employs a commutative approximation and a time-varying linearization for nonlinear quantum stochastic differential equations. The quantum Kalman filter (QKF) for linear quantum systems with known parameters has been studied in [17], where an optical system containing an uncertain parameter in the laser probe is described, demonstrating better estimation compared to the classical KF. A deeper review of linear quantum KF is developed in [18]. A bibliography review regarding KF-based approaches for quantum systems has been presented in [19], where an improved method for optimal estimations is demonstrated. Their work also examines a practical scenario involving magnetic field estimations in quantum systems, where nonlinear KFs could be considered as an estimation solution. More recently, ref. [20] proposed an orbit deviation propagation approach based on a deep neural network.
In summary, KF was developed by Rudolph Kalman in 1969 as an optimal state estimator for linear dynamic systems under the presence of Gaussian noise, noisy or incomplete measurements, with uncertainties, and in non-stationary environments. Due to its recursive nature, KF ensures low computational complexity, making it widely applicable and successful in numerous real-time applications. While KF is inherently linear, this work inspired extensions for nonlinear systems, such as extended Kalman filter (EKF) and unscented Kalman filter (UKF). These extensions allow state estimation in nonlinear systems, which are common in practical control applications. Since most real-world applications involve noise, uncertainty, non-measurable signals, unreliable sensors, and time-varying conditions [21], these developments remain highly relevant. Therefore, this work contemplates a somewhat different application of KF: its use for artificial neural networks training. Although this topic is not entirely new, it has gained renewed attention in recent years due to the widespread popularity of applying artificial intelligence and machine learning techniques in complex and relevant problems.
Hence, considering all these facts, this review focuses KF in artificial neural network learning, emphasizing its ability to handle noisy and uncertain data efficiently. By reducing the need for complex preprocessing and speeding up learning, the Kalman filter offers a practical and powerful tool for improving machine learning methods. This review highlights its key advantages and contributions, showing how a classical estimation technique continues to play a vital role in advancing modern artificial intelligence.
This work is organized as follows: In Section 2, the concepts underlying the deduction of the linear Kalman filter and its consequent first-order analytical approximation (EKF) are presented, then different variants of the KF are described for both linear and nonlinear systems based on both analytical and numerical approximations, with references to several key applications in the literature. In Section 3, the training problem of neural networks is presented as an optimization problem, from which the applicability of the FK and its variants in the training of neural networks is derived, including a review of the literature corresponding to different implementations. This is exemplified in Section 4 by the development of the training of three types of widely used neural networks regarding function approximation using neural networks-multilayer perceptron neural networks, radial basis neural networks, and high-order recurrent neural networks-highlighting the elements relevant for the design of KF-based algorithms and their variants in the learning of artificial neural networks. In Section 5, the challenges, limitations, open problems, and future work related to the use of KF and its variants in the learning of neural networks are discussed. Finally, in Section 6, the conclusions of the review carried out are established.

2. Main Variants of Kalman Filter

In this section, concepts underlying the deduction of the linear Kalman filter and its consequent variants are presented, as well as their basis, evolution, and current applications. Since the proposal of the Kalman filter as the solution to the optimal estimation problem in both continuous-time and discrete-time linear dynamic systems, its applicability has been evident. The motivation and mathematical formulation of the KF were already discussed in the previous section.
In 1967 [22,23], one of the most widely used variants of KF was proposed: the extended Kalman filter (EKF). The motivation for this variant was to apply KF methodology in nonlinear systems, approximated by a first-order Taylor series. Since highly nonlinear systems are only approximated, the approximation error can lead to estimation errors. For this reason, EKF cannot be considered an optimal estimator, unlike the original KF. The EKF linearizes nonlinear functions around the current estimated state. Its motivation is justified because nonlinear systems are prevalent in real-world applications where nonlinearities arise due to the construction characteristics of the components in real-world systems, as well as the nature of the systems themselves [5]. Despite the estimation errors caused by linear approximation, this variant remains widely used due to its straightforward and clear mathematical formulation. Another drawback of this variant results from the computation of Jacobians due to the system linearization, which must be evaluated in each state-space point. This results in greater design complexity of the EKF compared to KF, in addition to an increase in computational complexity. These issues have led to various improvements to EKF, such as the robust distributed EKF [24], adaptive square root EKF [25], and lightweight EKF [26].
In 1997, Simon Julier and Jeffrey Uhlmann [8] proposed another variant of KF capable of handling nonlinearities more precisely. This motivation arose from the approximation errors caused by the linearization method used in the EKF. This new variant introduced the unscented transformation, leading to the unscented Kalman filter (UKF). The UKF has better performance for highly nonlinear systems and does not require the computation of Jacobians. However, it does require the selection of sample points in a probability distribution. Furthermore, its computational complexity is significantly higher, which is why it cannot be implemented in applications that require real-time processing [27]. To address these drawbacks, some modifications have been porposed, for example, ref. [28] embedded a new deterministic sampling point set into the UKF framework, and in [18], a new exponential attenuation factor was designed according to changes in noise variance.
In 2009, Arasaratnam and Haykin, ref. [29], introduced another variant of KF to address nonlinearities: the cubature Kalman filter (CKF). The motivation of the CKF lies in selecting a spherical volume of radial type and then using a set of generated cubature points based on sampling to approximately estimate the state of a nonlinear system.
All these KF variants have been developed primarily for nonlinear dynamic systems. These variants, as well as the original KF, are known as white-box models since their design is based on a system model described in state variables. However, for more complex applications, black box models are now commonly used. These models are often obtained experimentally through the acquisition of data representing the behavior of the system under certain conditions. For example, in [30], their use in data science is considered.
Other notable KF variants include the information filter (or inverse Kalman filter), which operates in the information space rather than the state space, using information matrices instead of covariance matrices; the infinite H filter, designed for systems with unknown uncertainties; the federated Kalman filter, developed for distributed systems; and the non-gist Kalman filter with adaptive noise, intended for highly nonlinear systems with varying probabilistic distributions. Other types of filters consist of KF with learning; these type of variants emphasizes its use in applications that require experimental modeling, that is, data-driven models as the work presented by [31] in a simulation data-driven context. In this case, two main groups can be differentiated: in the first group, KF is hybridized with machine learning techniques or artificial intelligence, mainly with artificial neural networks; while the second group involves neural networks in the iterative process of KF either. In this section, the concept’s inherent deduction of the linear Kalman filter and its consequent variants are presented, as well its basis, evolution, and current applications. Since the proposal of the Kalman filter as the solution to the optimal estimation problem in both continuous-time and discrete-time linear dynamic systems, its applicability has been evident. The motivation and mathematical formulation of the KF were already discussed in the previous section.
In 1997, Simon Julier and Jeffrey Uhlmann [8] proposed another variant of KF that is able to handle nonlinearities in a more precise way. Its motivation arises from the approximation errors caused by the linearization method used in the EKF. This new variant proposes the use of the unscented transformation yielding to the unscented Kalman filter (UKF). The UKF has better performance for highly nonlinear systems and does not require the computation of Jacobians. However, it does require the selection of sample points in a probability distribution. Furthermore, its computational complexity is significantly increased, which is why it cannot be implemented in applications that require real-time processing [27]. In this context, various authors have proposed some modifications: ref. [28] embedded a new deterministic sampling point set into the UKF framework and in [18] a new exponential attenuation factor according to change of noise variance is designed.
In 2009, Arasaratnam and Haykin, ref. [29], proposed another variant of KF to address nonlinearities, the so-called cubature Kalman filter (CKF). The motivation of the CKF lies in selecting a spherical volume of radial type and then using a set of generated cubature points based on sampling to approximately estimate the state of a nonlinear system.
All these groups of variants of KF have been developed primarily for nonlinear dynamic systems. These variants, as well as the original KF, are known as white-box models since their design is based on a system model described in state variables. However, for more complex applications, black box models are now commonly used. These models are often obtained experimentally through the acquisition of data that represents the behavior of the system under certain conditions, for example in [30], it is considered for use in data science.
Other variants of KF include the information filter (or inverse Kalman filter), which works in the information space and not in the state space through information matrices and not covariance matrices; the infinite H filter, designed for systems with unknown uncertainties; the federated Kalman filter, for distributed systems; the non-gist Kalman filter with adaptive noise, for highly nonlinear systems with varying probabilistic distributions; KF with learning, this type of variants emphasizes its use in applications that require experimental modeling, that is, data-driven models as the presented by [31] in a simulation data-driven context. In the latter case, two main groups can be differentiated: in the first group, KF hybridizes with machine learning technique or artificial intelligence, mainly artificial neural networks, while the second group involves neural networks in the iterative process of KF, either for parameter estimation or for other internal processes. In [32], similarly to the work proposed in [33], which applied neural networks for nonlinear modeling; [34], which uses neural networks tor time series prediction and [35] with a KF for parameter estimation and state prediction. Figure 1 shows the main variants of KF, described in this section.

3. Neural Network Learning

The advent of artificial neural networks dates back to the work of McCulloch and Pitts in 1943 [36], who formulated a mathematical model for a neural network based on observations of biological neuron behavior made by Santiago Ramon and Cajal in 1911 [37]. However, this model did not include any learning mechanism. It was not until Rosenblatt’s perceptron of 1958 [38] that a learning rule for an artificial neural network was established for the first time. However, this neural network was very limited in terms of its applications. In 1961, Widrow and Hoff [39] addressed the problem of learning in artificial neural networks with supervised learning as an optimization problem. Their approach expanded applications of neural networks, as well as the possible structures and topologies, marking the beginning of artificial neural networks and machine learning as we conceive them today [27,40,41].
Considering neural network learning as an optimization problem allows the use of different optimization and parametric estimation methods in training. From this perspective, the KF can be considered for neural network training, with the specific KF variant depending on the type of learning and the information available to the network [5]. In this sense, an excellent review work can be found in Haykin’s book on the use of KF in neural network training. In this work, a review of KF is made, followed by the presentation of various implementations of KF in the learning process of neural networks, most of them for offline learning, as shown in Table 1, as well as in a graphical representation in Figure 2.
Nowadays, the need to perform real-time learning has increased, which highlights the use of KF for training artificial neural networks. In addition to its recursive nature, KF reduces the need for preprocessing by working directly with noisy, uncertain, and non-stationary signals [42]. It is important to note that KF is considered a second-order gradient optimization algorithm, which reduces the probability of falling into local minima as shown in [43,44,45]. Furthermore, in [43], it is proved that the Kalman filter is a global observer for linear (discrete-time) time-varying systems, and then EKF is shown to act as a quasi-local asymptotic observer for discrete-time nonlinear systems. In [46], a backpropagation training algorithm is shown to be three orders of magnitude less computationally expensive than EFK in terms of the number of floating-point operations. However, in [47], a decoupled extended Kalman filter is proposed in order to decrease the computational effort of the learning algorithm; in its analysis, ref. [47] demonstrates the computational superiority of KF learning algorithms over gradient descent ones. Also, in [41], different approaches are proposed in order to reduce computational complexity implementations of KF variants to train neural networks offline and online, where as [48] demonstrated that KF algorithms converge in fewer iterations compared to backpropagation with the same neural network configurations. In [42], an analysis is carried out that gives insight into the convergence mechanisms, showing that with a modification of the algorithm, global convergence results can be achieved for general cases. The scheme can then be interpreted either as maximization of the likelihood estimation, or as a recursive prediction error algorithm. Additionally, ref. [49], using Lyapunov stability approach, showed that KF-based learning achieves faster convergence than traditional algorithms. Similarly, the work performed by [43] establishes specific conditions for improved convergence of KF as a learning algorithm.
This characteristic has also been exploited in machine learning to improve the performance of the Adam model, leading to the Kadam model, where KF is used to estimate the first and second moments required by the algorithm [50]. The Adam model does first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments [51], while the EKF can be used as a second-order gradient descent algorithm to estimate optimal weights for an RNN, as explained in [44,45], helping the EKF algorithm to avoid local minima.
It is worth highlighting in Table 1 the growing interest in real-time implementations. Such applications are particularly relevant for decentralized approaches as well as for embedded implementations, edge-computing approaches, and reduced communication problems. In Table 2, a comparison is included of reported processing time for different implementations, noting the suitability of EKF-based learning algorithms for online operation with small sampling times. The implementations included in Table 2 are aligned in terms of the elements considered for each scheme, allowing a meaningful comparison.
Table 1. KF variant neural network learning applications.
Table 1. KF variant neural network learning applications.
AuthorType of KFMain ContributionLearningExamples
[52]EKFReal-time learning algorithm for a multilayered neural networkOnlineNumerical
[53]DEKFFeedforward multilayered neural networks based on an EKFOfflineSimulation
[54]EKFReal-time neural controller for three phase induction motorsOnlineExperimental
[55]EKF, UKFSequential growing-and-pruning learning algorithmOfflineSimulation
[41]KF, EKF, UKF, DEKFKalman filtering as applied to the learning and use of neural networksOffline
Online
Numerical
Simulation
Experimental
[5]EKF, DEKFRadial basis neural networks trained with an extended Kalman filterOnlineSimulation
[49]EKFState-space recurrent neural networks for nonlinear system identificationOfflineSimulation
[56]KFQ-learning with KF for action selection in cooperative controlOfflineSimulation
[57]KFContinuous state-space via Q-Learning for Markov decision processOfflineNumerical
[58]KFOnline Sequential Extreme Learning Machine and Kalman filter regressionOfflineSimulation
[59]KFKalman filter with iterative learning controlOfflineSimulation
[60]EKFInduction motor control, combining EKF with a fuzzy logic controllerOfflineSimulation
[61]KFKalman Filter and a Temporal DifferencingOfflineSimulation
[21]EKFReal-time neural controller for autonomous robotic navigationOnlineExperimental
[62]KFKalman filter to update weights of a Single Layer Feedforward NetworkOfflineSimulation
[63]KFQ-learning is represented in the framework of Kalman filter modelOfflineSimulation
[64]KFNN-based learning modules to update a Kalman filter for estimationOfflineSimulation
[35]DEKFEstimation charge for lithium-ion batteriesOnlineExperimental
[65]KFKF learning for stochastic claims reservingOfflineSimulation
[66]KFFederated Kalman filters are proposedOfflineSimulation
[67]KFQ-learning Approach with Kalman Filter for Self-balancing RobotOfflineSimulation
[68]EKFState estimation algorithm combines the EKF and a Q-learning methodOfflineSimulation
[69]KFIt is proposed an extreme learning Kalman filter for NNOfflineSimulation
[70]KFKalman filtering with a dedicated recurrent neural networkOnlineNumerical
[71]KFKF filter is combined with a NN to predict the transaction throughput in a blockchainOfflineExperimental
[72]KFReinforcement learning adaptive KF for signal’s autoregressive modelingOnlineExperimental
[73]EKFContinuous action learning automata for tuning of Kalman filterOfflineExperimental
[74]KFKF agents that operate sequentially to estimate optimal learning rateOfflineSimulation
[75]KFKalman filter-based cycle-consistent adversarial learning framework for time seriesOfflineSimulation
[76]EKFNeural controller applied to an auxiliary energy system for electric vehiclesOnlineExperimental
[77]EKFReal-time fault-tolerant closed-loop neural controllerOnlineExperimental
[78]KFA neural network combined with a robust KFOfflineSimulation
Table 2. Comparison for real-time implementation of neural network learning-based algorithms.
Table 2. Comparison for real-time implementation of neural network learning-based algorithms.
WorkApplicationProcessing HardwareProcessing Time
[54]Three-phase induction motorDSP-DS11041 ms
[79]Mobile robotFPGA Cyclone IV, DE2-11514 μs
[21]Mobile robotDS11041 ms
[80]Smart gridLAUNCHXL-F28379D0.5 ms
[77]Three-phase induction motorDS11041 ms
Lastly, KF has a rigorous stability analysis which allows its initialization to be established according to analytically defined stability and convergence conditions [41,43]. In this sense, the literature shows a large number of successful implementations of KF for training neural networks applied mainly to classification, control [6,22,23,54,60], energy [25,35,81,82], estimation [26,69,82,83], forecasting [65,74,84,85], robotics [9,21,56,67], and NN training as presented in Table 1; as well as its combinations, these applications and their interrelationships are depicted in Figure 2. Therefore, in the next section, a neural learning solution from an optimization point of view is described and a solution obtained from the KF context is analyzed.

4. Kalman Filter for Neural Network Learning

In this section, the use of the Kalman filter for neural network learning is introduced as an application of its properties as a state estimator. This approach has been used by several authors to different types of neural networks. In this work, only Multilayer perceptron (MLP), radial basis, and recurrent high order neural networks (RHONN) are considered, whose learning processes are solved with the KF approach. In the literature, other similar learning approaches for other types of neural networks can be found; however, three types of neural networks considered in this work have been experimentally implemented in real time, as shown in Table 2 [5,21,54,76,77].
As previously explained, KF has been formulated for a linear dynamic system in state-space to provide a solution to the linear optimal filtering problem. This solution applies to both stationary and non-stationary environments. Also, as mentioned before, the solution is recursive, meaning each update of the estimated state is calculated using the previous estimate and new input data, requiring only the previous estimate to be stored. This also implies that storing all past data is not necessary. Let us now consider a linear dynamical system in discrete time, as depicted in Figure 3:

4.1. Concepts Prior to KF

Before discussing the formulation of the Kalman filter, it is important to contextualize other significant results.

Optimal Estimation

First, let us review the fundamental concepts of optimal estimation. Consider the following equation:
y k = w k + v k
where w k is an unknown signal and v k is considered as additive Gaussian noise. w ^ k is a posterior estimated of the signal w k , y 1 , y 2 , , y k . Typically, estimate w ^ k is different from unknown signal w k .
The first step for an optimization problem is to define a cost (loss) function, which must satisfy the following requirements:
  • Cost function is non-negative.
  • Cost function is a non-decreasing function of the estimation error, defined by:
w ^ k = w k w ^ k
These two requirements are satisfied by the expected square error, defined by:
J k = E { ( w K w ^ k ) 2 } = E { w ˜ k 2 }
where E is the expected value operator. Cost function J k is time-dependent with sample instant k, this emphasizes the non-stationary nature of the recursive estimation process. To deduce optimal value for the estimate w ^ k , the following theorems are required [13].
Theorem 1.
Conditional expectation estimator. If stochastic processes w k and y k are considered as Gaussian, then optimal estimated w ^ k to minimize mean square error J k is the expected value operator:
w ^ k = E { w k | y 1 , y 2 , , y k }
Theorem 2.
Orthogonality principle. Let the processes w k and y k be stochastic with zero mean, such that:
E { w k } = E { y k } = 0 k
Therefore:
i.
Stochastic processes w k and y k are Gaussian, or;
ii.
Optimal estimated w ^ k is restricted to be a linear function of measures y 1 , y 2 , , y k and mean square error cost function.
iii.
Then, optimal estimate w k ^ , with measurements y 1 , y 2 , , y k , is orthogonal to the projection of w k in generated space for such measurements.

4.2. Kalman Filter Realization

  • State-space model:
    w ( k + 1 ) = F ( k + 1 , k ) w ( k ) + u ( k )
    y ( k ) = H ( k ) w ( k ) + v ( k )
    where u ( k ) and v ( k ) are Gaussian independent noises with zero means, where the covariance matrices are Q ( k ) and R ( k ) , respectively.
  • Initialization:
    w ^ ( 0 ) = E { w ( 0 ) }
    P ( 0 ) = E { [ w ( 0 ) E { w ( 0 ) } ] [ w ( 0 ) E { w ( 0 ) } ] T }
  • Propagation of estimated state
    w ^ ( k ) = F ( k , k 1 ) w ^ ( k 1 )
  • Propagation of estimation error covariance
    P ( k ) = F ( k , k 1 ) P ( k 1 ) F T ( k , k 1 ) + Q ( k 1 )
  • Kalman gain matrix
    K ( k ) = P ( k ) H T ( k ) [ R ( k ) + H ( k ) P ( ) ( k ) H T ( k ) ] 1
  • State estimation update
    w ^ ( k ) = w ^ + K ( k ) ( y ( k ) H ( k ) w ^ ( k ) )
  • Estimation error covariance update
    P ( k ) = ( I K ( k ) H ( k ) ) P ( k )
The Kalman filter described earlier assumes a linear model of a dynamic system, as can be seen in Figure 4. However, in most cases the model is nonlinear; next, the use of KF will be extended through a linealization procedure resulting in EKF. Such an extension is feasible since KF is described in terms of differential equations in discrete-time systems.

4.3. Extended Kalman Filter

  • State-space model for discrete-time nonlinear systems
    w ( k + 1 ) = f ( k , w ( k ) ) + u ( k )
    y ( k ) = h ( k , w ( k ) ) + v ( k )
    where u ( k ) y v ( k ) Gaussian independent noises with zero mean and covariance matrices Q ( k ) and R ( k ) , respectively.
  • Initialization:
    w ^ ( 0 ) = E { w ( 0 ) }
    P ( 0 ) = E { [ w ( 0 ) E { w ( 0 ) } ] [ w ( 0 ) E { w ( 0 ) } ] T }
The basic idea of EKF is to linearize the system model in state-space at each time instant around the most recent estimated state, which can be considered as w ^ k or w ^ k , respectively. Once the linearized model is obtained, KF realization can be applied. Defining a discrete-time nonlinear system as:
F ( k + 1 , k ) = f ( k , w ( k ) ) w | w = w ^ ( k )
H ( k + 1 , k ) = h ( k , w ( k ) ) w | w = w ^ ( k )
EKF realization is given by:
  • Propagation of estimated state
    w ^ ( k ) = F ( k , k 1 ) w ^ ( k 1 )
  • Propagation of estimation error covariance
    P ( k ) = F ( k , k 1 ) P ( k 1 ) F T ( k , k 1 ) + Q ( k 1 )
  • Kalman gain matrix
    K ( k ) = P ( k ) H T ( k ) [ R ( k ) + H ( k ) P ( k ) H T ( k ) ] 1
  • State estimation update
    w ^ ( k ) = w ^ ( k ) + K ( k ) ( y ( k ) H ( k ) w ^ ( k ) )
  • Estimation error covariance update
    P ( k ) = ( I K ( k ) H ( k ) ) P ( k )
Then, following conditional expectation estimator for KF, it is possible to define:
w ^ k = E { w k | y 1 , , y k 1 }
with w k = F k , k 1 w k 1 + u k 1
w ^ k = E { ( F k , k 1 w k 1 + u k 1 ) | y 1 , , y k 1 } = F k , k 1 E { w k 1 | y 1 , , y k 1 } + E { u k 1 | y 1 , , y k 1 }
By definition:
E { w k 1 | y 1 , , y k 1 } = w ^ k 1
E { u k 1 | y 1 , , y k 1 } = E { u k 1 } = 0
Therefore the best a priori estimate of w based on measurements is:
w ^ k = F k , k 1 w ^ k 1

4.4. Relevant Results on KF for Neural Network Learning

From the previous explanation, it is easy to see that KF can be used to train artificial neural networks, as described in this section.

4.4.1. Comparison Between Kalman Filter and Recursive Least Squares Algorithm

It is well-known that most of the supervised neural networks models are trained with the recursive least squares algorithm, where the equations are defined as:
P ( k ) = P ( k 1 ) P ( k 1 ) φ ( k ) φ T ( k ) 1 + φ T ( k ) P ( k 1 ) φ ( k ) P ( k 1 )
Θ ^ ( k ) = Θ ( k 1 ) + P ( k ) φ ( k ) [ y ( k ) φ T ( k ) Θ ^ ( k 1 ) ]
On the other hand, KF equations can be written as:
P ( k + 1 ) = P ( k ) P ( k ) H T [ H P ( k ) H T + R ] 1 H P ( k ) + Q
where Θ is a vector which contains estimated parameters and φ is a regression vector. Then with Q = 0 the following is defined:
P ( k + 1 ) = P ( k ) P ( k ) H T [ H P ( k ) H T + R ] 1 H P ( k )
with θ ( k + 1 ) = θ ( k ) it follows:
P ( k ) = P ( k 1 ) P ( k 1 ) H T [ H P ( k 1 ) H T + R ] 1 H P ( k 1 )
and considering H = Θ T ( k )
P ( k ) = P ( k 1 ) P ( k 1 ) φ ( k ) [ φ T ( k ) P ( k 1 ) φ ( k ) + R ] 1 φ T ( k ) P ( k 1 )
Finally, with R = 1 , (1) can be obtained, which corresponds to the recursive least squares algorithm.
P ( k ) = P ( k 1 ) P ( k 1 ) φ ( k ) φ T ( k ) 1 + φ T ( k ) P ( k 1 ) φ ( k ) P ( k 1 )

4.4.2. Retropropagation Versus EKF

The backpropagation algorithm is mostly used in learning for supervised multilayer neural networks [48], including deep learning models, which is an algorithm based on the gradient descent rule [46].
Δ w i j = η E a v w i j
E a v = 1 2 ( d p z p ) 2
where η is the learning rate parameter, d p is p t h desired output, and z p is p t h neural network (approximated) output.
E a v w i j = ( d p z p ) z p w i j
Then:
Δ w i j = η ( d p z p ) z p w i j
In the EKF algorithm, the following is defined:
Δ w = K ( d p z p )
K = P H [ H T P H + R ] 1
Δ w = P H [ H T P H + R ] 1 ( d p z p )
Considering that P = P I and [ H T P H + R ] 1 = a I where a and p are scalar values, then, for a particular weight w i j , Δ w i j = a p ( d p z p ) h i j is defined where h i j = z p w i j :
Δ w i j = a p ( d p z p ) z p w i j
Note that this is the delta rule used in backpropagation algorithm with η = a p . Therefore, the backpropagation algorithm can be considered a special case of EKF. Since this algorithm is a simplified of EKF, it is possible that the backpropagation algorithm discards information that may be important during weight update [46].

4.5. Multilayer Perceptron Trained with EKF

It is well known that the MLP neural network is one of the most used neural network to deal with complex nonlinear data, traditionally trained with backpropagation-type algorithms and with optimizer-type algorithms [41]; most of their learning implementation works in an offline mode using epoch or batch approaches (see Table 1). In addition, various researchers have proposed learning algorithms for neural networks based on EKF; use of this kind of learning approaches improves learning process without need of hard preprocessing tasks. These applications are based on the training of MLP and can be interpreted as an estimation problem for a nonlinear system, which can be solved with EKF. The use of algorithms based on EKF is recursive due to the gradient descent algorithm. Recursive least squares algorithms, backpopagation algorithms, and others can be considered as specific cases of KF, as is shown in [46].
In KF application for MLP learning, the weight vector is considered as the state to be estimated by the Kalman Filter, where the neural network output is the measurement used by KF. The use of EKF is necessary since MLP is a nonlinear system. When neural networks are considered as an optimal filtering problem, the need for a recursive solution that uses current information and does not require storing the entire weights evolution becomes evident. This is the essence of KF applied to training neural networks.
Estimation is performed recursively, meaning that each update of the estimate (synaptic weights) is calculated using the previous estimate and current data, requiring only the previous estimate to be stored. Training is conducted from a set of N input–output measurements with the objective of finding the optimal weights that minimize prediction error. It is well known that neural networks can be modeled as:
w ^ ( k + 1 ) = w ^ ( k )
y ^ ( k ) = φ ( w ^ ( k ) , u ( k ) )
where φ ( ) is a nonlinear function of neural weights w ^ and external input u.
According to this, the equations necessary for the development of the learning algorithm based on EKF are:
K ( k ) = P ( k ) H ( k ) R ( k ) + H ( k ) P ( k ) H ( k ) 1
w ( k + 1 ) = w ( k ) + K ( k ) y ( k ) y ^ ( k )
P ( k + 1 ) = P ( k ) K ( k ) H ( k ) P ( k ) + Q ( k )
where P ( k ) L × L represents the estimation error matrix at time k , w L is the state vector (weight vector), L represents number of neural network weight (dimension of weight vector) and m represents vector of desired outputs, y ^ m is the vector of neural network output where m represents its dimension. K L × m is the Kalman gain, Q L × L is the covariance matrix for weight estimation noise, R m × m is the covariance matrix for measurement noise, H m × L is a matrix that contains the derivative of each neural output with respect to each neural network weight.
H i j ( k ) = y ^ i ( k ) w i j ( k ) w ( k ) = w ( k 1 ) ; i = 1 , , m ; j = 1 , , L
Typically, P , R , and Q are initialized as diagonal matrices where the non-zero elements are defined as P ( 0 ) , R ( 0 ) and Q ( 0 ) , respectively, considering the MLP depicted in Figure 5.
With p inputs and h hidden networks, the weight vector is defined as:
w = w 10 ( 1 ) w 1 p ( 1 ) w 20 ( 1 ) w h × p ( 1 ) w 10 ( 2 ) w 1 h ( 2 ) T
The total number of weight elements L, is defined as:
L = ( p + 1 ) × h + ( h + 1 ) × m
where m is the number of neural network outputs, defined as:
σ i = 1 1 + e n i i = 1 , , h
n i = j = 0 p w i j ( 1 ) x j x o = + 1
v 1 = k = 0 h w i k ( 2 ) σ k σ o = + 1
y ^ = v 1
Then
y ^ w = y ^ w 10 ( 1 ) y ^ w 11 ( 1 ) y ^ w 1 h ( 2 )
Therefore,
H = γ ( n 1 ) x 0 γ ( n 1 ) x p γ ( n h ) x p σ 0 σ h
with
γ ( n i ) = w 1 i ( 2 ) e n i 1 + e n i i = 1 , , h
which is valid for the MLP depicted in Figure 5 with sigmoid activation functions for hidden neurons and the linear activation function for the output neuron.

4.6. Recurrent High-Order Neural Network Trained with EKF

As a third type of neural network considered in this work, a Recurrent High-Order Neural Network (RHONN) is used to show applicability of EKF as a learning algorithm to train online artificial neural networks using stochastic estimation capabilities of EKF;this type of neural network is based on a Hopfield type neural network [86]. Recurrent neural networks offer a better-suited tool to model data from dynamic environments [54]. Since the seminal paper [87], there has been continuously increasing interest in applying NNs to identification and control of nonlinear systems, specially RHONN, due to their excellent approximation capabilities, using few units. These kind of NNs, compared to the first-order ones, are more flexible and robust when faced with new and/or noisy data patterns [88]. Furthermore, RHONNs performed better than the multilayer first-order ones using a small number of free parameters [89]. Additionally, different authors have demonstrated the feasibility and the advantages of using these architectures in applications for system identification and control. The best-known training approach for RNNs is the backpropagation-through-time learning [48]. However, it is a first-order gradient-descent method, and hence, its learning speed can be very slow. Recently, the extended Kalman filter (EKF)-based algorithms have been introduced to train NNs, in order to improve the learning convergence [41]. Note that it is derived from an EKF-based learning algorithm for RHONN.
Consider an EKF represented by Equations (47)–(49) with:
H i j ( k ) = y ^ i ( k ) w i j ( k ) w ( k ) = w ( k 1 ) ; i = 1 , , m ; j = 1 , , L
For a discrete-time RHONN used as an identifier:
y ^ ( k ) = x ^ ( k ) = w T ( k ) z x ( k ) , u ( k )
Therefore,
H ( k ) = y ^ ( k ) w = z x ( k ) , u ( k )
As initial weight values are non-dependent of the RHONN state:
H i j ( 0 ) = 0

4.7. Radial Basis Neural Network Trained with an EKF

From the literature, it is evident that another of the most used neural networks to deal with nonlinear complex data is the radial basis neural network. This type of neural network has a simple structure that allows its use without the need to be an expert on neural networks leaning. Their most commune algorithms are designed for offline learning; however, in [47], they propose an online learning algorithm based on an EKF. In this application, neural network parameters are state variables estimated by EKF. Consider the radial basis neural network (RBF) as depicted in Figure 6:
x = x 1 x 2 x p T
c = c 1 c 2 c p T
i = 1 , , m
With p inputs and m hidden neurons, RBF parameters can be defined as:
θ = w 1 w m c 11 c 1 p c 21 c m p T
y ^ = i = 0 p j = 1 m w j G | | x i c j | | = W G
y ^ θ = y ^ w 1 y ^ w m y ^ c 11 y ^ c 1 p y ^ c 21 y ^ c c m p T
In this case, the total number of neural network parameters L, is defined as:
L = m + m p
Therefore:
y ^ w 1 = G x c 1
y ^ w m = G x c m
y ^ c 1 = w 1 G x c 1
y ^ c m = w m G x c m
Considering:
G ( x , c i ) = e x c i 2 σ 2
G ( x , c i ) = e x c i 2 σ 2 = e 1 2 σ 2 ( x 1 c i j ) 2 + ( x 2 c i j ) 2 + + ( x p c i j ) 2
G ( x , c i j ) = e x c i 2 σ 2 1 2 σ 2 2 ( x 1 c i j )
G ( x , c i j ) = e x c i 2 σ 2 x j c i j
where i = 1 , , m and j = 1 , , p

Possible Modifications

  • Just weight estimation (arbitrary centers of fixed centers).
  • Global EKF.
  • Decoupled EKF.
    Centers decoupled from weights.
    Weights decoupled from centers and other weights.
  • Other combinations.
From these examples of EKF use in neural network learning, it is possible to start the discussion section in order to establish the challenges, limitations, open problems, and future works.

5. Challenges, Limitations, Open Problems and Future Work

Although KF has advantages over traditional learning algorithms, primarily the gradient descent algorithms, many challenges remain, such as computational cost, as using EKF can be resource-intensive, limiting its real-time application with limited hardware. Initializing the learning algorithm also remains a challenge. Regarding implementation limitations, the nonlinear approach to neural network learning also requires a mathematical model for the analytical approximation of EKF, while numerical approaches require training per epoch, batch, or semi-batch, which limits their real-time implementation.
As mentioned in this work, the use of KF and its variants for training neural networks requires a meticulous design tailored to the neural structure used, so the establishment of general-purpose, real-time implementable strategies remains an open problem. Another open problem is the choice of KF initialization parameters, since these design parameters directly impact the algorithm’s convergence conditions. Future work includes relevant applications for engineering such as advanced robotics, autonomous vehicles, energy systems, and biomedical systems, as well as complex innovative problems, including Edge Computing and the hardware implementation of KF and its variants. In this way, it is expected that publications regarding KF variants with a quantum approach can be increased in the coming years.

6. Conclusions

This work provides a practical review of the development of the Kalman Filter and its application to neural network training, including filtering and estimation concepts prior to KF, its main variants, and the modifications introduced to address nonlinear systems. Additionally, an introduction to the development of artificial neural networks and the inherent problems has been included. An analysis is also presented that highlights the KF potential as a learning algorithm for neural networks. Also, the training of three different types of artificial neural networks, MLP, RBF, and RHONN, using EFK was included. This review demonstrates the extensive applicability of KF in a wide range of applications ranging from traditional applications, estimation, and optimal filtering of linear systems to applications that combine the use of KF with artificial intelligence and machine learning techniques. Undoubtedly, in the coming years, an increasing number of implementations that combine the potential of KF with emerging artificial intelligence and machine learning methodologies to solve real-world problems will emerge.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study.

Acknowledgments

Author thank support of CUCEI, Universidad de Guadalajara in the development of this work.

Conflicts of Interest

The author declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
KFKalman Filter
EKFExtended Kalman Filter
UKFUnscented Kalman Filter
CKFCubature Kalman Filter
QKFQuantum Kalman Filter
QEKFQuantum Extended Kalman Filter
MLPMultilayer Perceptron
RBFRadial Basis Function
RHONNRecurrent High Order Neural Networks

References

  1. Crassidis, J.L.; Junkins, J.L. Optimal Estimation of Dynamic Systems; Chapman and Hall/CRC: Boca Raton, FL, USA, 2004. [Google Scholar]
  2. Grewal, M.S.; Andrews, A.P. Kalman Filtering Theory and Practice Using MATLAB; Wiley: Hoboken, NJ, USA, 2023. [Google Scholar]
  3. Jazwinski, A.H. Stochastic Processes and Filtering Theory; Courier Corporation: North Chelmsford, MA, USA, 2013. [Google Scholar]
  4. Maybeck, P.S. Stochastic Models, Estimation, and Control; Academic Press: Cambridge, MA, USA, 1982; Volume 3. [Google Scholar]
  5. Simon, D. Optimal state Estimation: Kalman, H Infinity, and Nonlinear Approaches; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
  6. Panomruttanarug, B.; Longman, R.W. Using Kalman filter to attenuate noise in learning and repetitive control can easily degrade performance. In Proceedings of the 2008 SICE Annual Conference, Chofu, Japan, 20–22 August 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 3453–3458. [Google Scholar]
  7. Khodarahmi, M.; Maihami, V. A review on Kalman filter models. Arch. Comput. Methods Eng. 2023, 30, 727–747. [Google Scholar] [CrossRef]
  8. Julier, S.J.; Uhlmann, J.K. New extension of the Kalman filter to nonlinear systems. In Proceedings Volume 3068, Signal Processing, Sensor Fusion, and Target Recognition VI; SPIE: Bellingham, WA, USA, 1997; pp. 182–193. [Google Scholar]
  9. Urrea, C.; Agramonte, R. Kalman filter: Historical overview and review of its use in robotics 60 years after its creation. J. Sensors 2021, 2021, 9674015. [Google Scholar] [CrossRef]
  10. Kim, S.; Petrunin, I.; Shin, H.S. A review of Kalman filter with artificial intelligence techniques. In Proceedings of the 2022 Integrated Communication, Navigation and Surveillance Conference (ICNS), Dulles, VA, USA, 5–7 April 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–12. [Google Scholar]
  11. Poincaré, H. Les méThodes Nouvelles de la méCanique céLeste; Gauthier-Villars: Paris, France, 1892. [Google Scholar]
  12. Kalman, R.E. Contributions to the theory of optimal control. Bol. Soc. Mat. Mex. 1960, 5, 102–119. [Google Scholar]
  13. Kalman, R.E. A new approach to linear filtering and prediction problems. J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef]
  14. Wiener, N. Extrapolation, Interpolation, and Smoothing of Stationary Time Series; MIT Press: Cambridge, MA, USA, 1964. [Google Scholar]
  15. Belavkin, V. Measurement, filtering and control in quantum open dynamical systems. Rep. Math. Phys. 1999, 43, A405–A425. [Google Scholar] [CrossRef]
  16. Emzir, M.F.; Woolley, M.J.; Petersen, I.R. A quantum extended Kalman filter. J. Phys. A: Math. Theor. 2017, 50, 225301. [Google Scholar] [CrossRef]
  17. Iida, S.; Ohki, K.; Yamamoto, N. Robust quantum Kalman filtering under the phase uncertainty of the probe-laser. In Proceedings of the 2010 IEEE International Symposium on Computer-Aided Control System Design, Yokohama, Japan, 8–10 September 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 749–754. [Google Scholar]
  18. Zhang, G.; Dong, Z. Linear quantum systems: A tutorial. Annu. Rev. Control 2022, 54, 274–294. [Google Scholar]
  19. Ma, K.; Kong, J.; Wang, Y.; Lu, X.M. Review of the applications of Kalman filtering in quantum systems. Symmetry 2022, 14, 2478. [Google Scholar] [CrossRef]
  20. Zhou, X.; Qiao, D.; Li, X. Neural network-based method for orbit uncertainty propagation and estimation. IEEE Trans. Aerosp. Electron. Syst. 2023, 60, 1176–1193. [Google Scholar] [CrossRef]
  21. Alanis, A.Y.; Arana-Daniel, N.; Lopez-Franco, C. Artificial Neural Networks for Engineering Applications; Academic Press: Cambridge, MA, USA, 2019. [Google Scholar]
  22. Gruber, M. An Approach to Target Tracking; Technical Note 1967-8, DDC 654272; MIT Lincoln Laboratory: Lexington, MA, USA, 1967. [Google Scholar]
  23. Larson, R.E.; Dressler, R.M.; Ratner, R.S. Application of the Extended Kalman Filter to Ballistic Trajectory Estimation; Final Report 5188-103; Stanford Research Institute: Monlo Park, CA, USA, 1967. [Google Scholar]
  24. Duan, P.; Duan, Z.; Lv, Y.; Chen, G. Distributed finite-horizon extended Kalman filtering for uncertain nonlinear systems. IEEE Trans. Cybern. 2019, 51, 512–520. [Google Scholar] [CrossRef]
  25. Jiang, C.; Wang, S.; Wu, B.; Fernandez, C.; Xiong, X.; Coffie-Ken, J. A state-of-charge estimation method of the power lithium-ion battery in complex conditions based on adaptive square root extended Kalman filter. Energy 2021, 219, 119603. [Google Scholar] [CrossRef]
  26. Dai, Z.; Jing, L. Lightweight extended Kalman filter for MARG sensors attitude estimation. IEEE Sens. J. 2021, 21, 14749–14758. [Google Scholar] [CrossRef]
  27. Williams, R.J.; Zipser, D. A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1989, 1, 270–280. [Google Scholar] [CrossRef]
  28. Chang, L.; Hu, B.; Li, A.; Qin, F. Transformed unscented Kalman filter. IEEE Trans. Autom. Control 2012, 58, 252–257. [Google Scholar] [CrossRef]
  29. Arasaratnam, I.; Haykin, S. Cubature Kalman filters. IEEE Trans. Autom. Control 2009, 54, 1254–1269. [Google Scholar] [CrossRef]
  30. Provost, F.; Fawcett, T. Data Science for Business: What you Need to Know About Data Mining and Data-Analytic Thinking; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2013. [Google Scholar]
  31. Haykin, S. Neural Networks and Learning Machines, 3rd ed.; Pearson Education: Upper Saddle River, NJ, USA, 2009. [Google Scholar]
  32. Härter, F.P.; de Campos Velho, H.F. New approach to applying neural network in nonlinear dynamic model. Appl. Math. Model. 2008, 32, 2621–2633. [Google Scholar] [CrossRef]
  33. Chen, X.; Bettens, A.; Xie, Z.; Wang, Z.; Wu, X. Kalman filter and neural network fusion for fault detection and recovery in satellite attitude estimation. Acta Astronaut. 2024, 217, 48–61. [Google Scholar] [CrossRef]
  34. Wu, X.; Wang, Y. Extended and unscented Kalman filtering based feedforward neural networks for time series prediction. Appl. Math. Model. 2012, 36, 1123–1131. [Google Scholar] [CrossRef]
  35. Xu, Y.; Hu, M.; Zhou, A.; Li, Y.; Li, S.; Fu, C.; Gong, C. State of charge estimation for lithium-ion batteries based on adaptive dual Kalman filter. Appl. Math. Model. 2020, 77, 1255–1272. [Google Scholar] [CrossRef]
  36. McCulloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 1943, 5, 115–133. [Google Scholar] [CrossRef]
  37. Cajal, S.R.Y.; Azoulay, D. Histology of the Nervous System: Of Man and Vertebrates; Oxford Academic: Oxford, UK, 1995. [Google Scholar]
  38. Rosenblatt, F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev. 1958, 65, 386. [Google Scholar] [CrossRef]
  39. Widrow, B.; Hoff, M.E. Adaptive switching circuits. In Neurocomputing: Foundations of Research; MIT Press: Cambridge, MA, USA, 1988; pp. 123–134. [Google Scholar]
  40. Werbos, P. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. Ph.D. Thesis, Committee on Applied Mathematics, Harvard University, Cambridge, MA, USA, 1974. [Google Scholar]
  41. Haykin, S. Kalman Filtering and Neural Networks; John Wiley & Sons: Hoboken, NJ, USA, 2001. [Google Scholar]
  42. Ljung, L. Asymptotic behavior of the extended Kalman filter as a parameter estimator for linear systems. IEEE Trans. Autom. Control 1979, 24, 36–50. [Google Scholar] [CrossRef]
  43. Song, Y.; Grizzle, J.W. The extended Kalman filter as a local asymptotic observer for discrete-time nonlinear systems. J. Math. Syst. Estim. Control 1995, 5, 59–78. [Google Scholar]
  44. De Mulder, W.; Bethard, S.; Moens, M.F. A survey on the application of recurrent neural networks to statistical language modeling. Comput. Speech Lang. 2015, 30, 61–98. [Google Scholar] [CrossRef]
  45. Jaeger, H. Tutorial on Training Recurrent Neural Networks, Covering BPPT, RTRL, EKF and the Echo State Network Approach; GMD-Forschungszentrum Informationstechnik: Bonn, Germany, 2002; Voluem 5. [Google Scholar]
  46. Ruck, D.W.; Rogers, S.K.; Kabrisky, M.; Maybeck, P.S.; Oxley, M.E. Comparative analysis of backpropagation and the extended Kalman filter for training multilayer perceptrons. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 14, 686–691. [Google Scholar] [CrossRef]
  47. Simon, D. Training radial basis neural networks with the extended Kalman filter. Neurocomputing 2002, 48, 455–475. [Google Scholar] [CrossRef]
  48. Singhal, S.; Wu, L. Training feed-forward networks with the extended Kalman algorithm. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Glasgow, UK, 23–26 May 1989; Volume 2, pp. 1187–1190. [Google Scholar] [CrossRef]
  49. Cordova, J.J.; Yu, W. Recurrent wavelets neural networks learning via dead zone Kalman filter. In Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain, 18–23 July 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 1–7. [Google Scholar]
  50. Camacho, J.; Villaseñor, C.; Alanis, A.Y.; Lopez-Franco, C.; Arana-Daniel, N. sKAdam: An improved scalar extension of KAdam for function optimization. Intell. Data Anal. 2020, 24, 87–104. [Google Scholar]
  51. Diederik, K. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  52. Iiguni, Y.; Sakai, H.; Tokumaru, H. A real-time learning algorithm for a multilayered neural network based on the extended Kalman filter. IEEE Trans. Signal Process. 1992, 40, 959–966. [Google Scholar] [CrossRef]
  53. Jin, L.; Nikiforuk, P.N.; Gupta, M.M. Weight-Decoupled Kalman Filter Learning Algorithm of Multi-Layered Neural Networks. 1995. Available online: https://madangupta.com/pages/info/mmg/paper/RJ/RJ-088.pdf (accessed on 10 September 2025).
  54. Alanis, A.Y.; Sanchez, E.N.; Loukianov, A.G. Discrete-time adaptive backstepping nonlinear control via high-order neural networks. IEEE Trans. Neural Netw. 2007, 18, 1185–1195. [Google Scholar]
  55. Yingxin, L.; Min, W.; Jinhua, S.; Kaoru, H. Sequential growing-and-pruning learning for recurrent neural networks using unscented or extended Kalman filter. In Proceedings of the 2008 27th Chinese Control Conference, Kunming, China, 16–18 July 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 242–247. [Google Scholar]
  56. Liu, Q.; Jiachen, M.; Wei, X. Action selection in cooperative robot soccer using Q-learning with Kalman filter. J. Comput. Inf. Syst. 2012, 8, 10367–10374. [Google Scholar]
  57. Tripp, C.; Shachter, R.D. Approximate Kalman filter Q-learning for continuous state-space MDPs. arXiv 2013, arXiv:1309.6868. [Google Scholar]
  58. Nobrega, J.P.; Oliveira, A.L. Kalman filter-based method for online sequential extreme learning machine for regression problems. Eng. Appl. Artif. Intell. 2015, 44, 101–110. [Google Scholar] [CrossRef]
  59. Cao, Z.; Lu, J.; Zhang, R.; Gao, F. Iterative learning Kalman filter for repetitive processes. J. Process Control 2016, 46, 92–104. [Google Scholar] [CrossRef]
  60. Douiri, M.R. Extended Kalman Filter Based Learning Fuzzy for Parameters Adaptation of Induction Motor Drive. In Proceedings of the 2014 13th Mexican International Conference on Artificial Intelligence, Tuxtla Gutierrez, Mexico, 16–22 November 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 147–151. [Google Scholar]
  61. Bekhtaoui, Z.; Meche, A.; Dahmani, M.; Meraim, K.A. Maneuvering target tracking using q-learning based Kalman filter. In Proceedings of the 2017 5th International Conference on Electrical Engineering-Boumerdes (ICEE-B), Boumerdes, Algeria, 29–31 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–5. [Google Scholar]
  62. Nobrega, J.P.; Oliveira, A.L. A sequential learning method with Kalman filter and extreme learning machine for regression and time series forecasting. Neurocomputing 2019, 337, 235–250. [Google Scholar] [CrossRef]
  63. Li, Z.; Shi, L.; Yang, L.; Shang, Z. An Adaptive Learning Rate Q-Learning Algorithm Based on Kalman Filter Inspired by Pigeon Pecking-Color Learning. In Proceedings of the International Conference on Bio-Inspired Computing: Theories and Applications, Zhengzhou, China, 22–25 November 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 693–706. [Google Scholar]
  64. Ullah, I.; Fayaz, M.; Naveed, N.; Kim, D. ANN based learning to Kalman filter algorithm for indoor environment prediction in smart greenhouse. IEEe Access 2020, 8, 159371–159388. [Google Scholar]
  65. Chukhrova, N.; Johannssen, A. Kalman filter learning algorithms and state space representations for stochastic claims reserving. Risks 2021, 9, 112. [Google Scholar] [CrossRef]
  66. Hu, K.; Wu, J.; Weng, L.; Zhang, Y.; Zheng, F.; Pang, Z.; Xia, M. A novel federated learning approach based on the confidence of federated Kalman filters. Int. J. Mach. Learn. Cybern. 2021, 12, 3607–3627. [Google Scholar] [CrossRef]
  67. Srichandan, A.; Dhingra, J.; Hota, M.K. An improved Q-learning approach with Kalman filter for self-balancing robot using OpenAI. J. Control. Autom. Electr. Syst. 2021, 32, 1521–1530. [Google Scholar] [CrossRef]
  68. Xiong, K.; Wei, C.; Zhang, H. Q-learning for noise covariance adaptation in extended KALMAN filter. Asian J. Control 2021, 23, 1803–1816. [Google Scholar] [CrossRef]
  69. Wang, H. Extreme learning Kalman filter for short-term wind speed prediction. Front. Energy Res. 2023, 10, 1047381. [Google Scholar] [CrossRef]
  70. Revach, G.; Shlezinger, N.; Ni, X.; Escoriza, A.L.; Van Sloun, R.J.; Eldar, Y.C. KalmanNet: Neural network aided Kalman filtering for partially known dynamics. IEEE Trans. Signal Process. 2022, 70, 1532–1547. [Google Scholar] [CrossRef]
  71. Hang, L.; Ullah, I.; Yang, J.; Chen, C. An improved Kalman filter using ANN-based learning module to predict transaction throughput of blockchain network in clinical trials. Peer-Netw. Appl. 2023, 16, 520–537. [Google Scholar] [CrossRef]
  72. He, P.; Wang, B.; Liu, X. Reinforcement learning adaptive Kalman filter for ae signal’s ar-mode denoise. In Proceedings of the 2023 IEEE 11th International Conference on Information, Communication and Networks (ICICN), Xi’an, China, 17–20 August 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 643–648. [Google Scholar]
  73. de Araujo, P.R.M.; Noureldin, A.; Givigi, S. Continuous Action Learning Automata: A Strategy for Dynamic Optimization of Invariant Kalman Filter Covariances. In Proceedings of the 2024 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), Kingston, ON, Canada, 6–9 August 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 160–161. [Google Scholar]
  74. Krishnamurthy, V.; Rojas, C.R. Slow convergence of interacting Kalman filters in word-of-mouth social learning. In Proceedings of the 2024 60th Annual Allerton Conference on Communication, Control, and Computing, Urbana, IL, USA, 24–27 September 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
  75. Liu, S.T.; Fan, J.J.; Wang, R.D.; Han, H.; Zhang, D.Y. Kalman Filter-based Cycle-Consistent Adversarial Learning for Time Series Anomaly Detection. J. Netw. Intell. 2024, 9, 790–803. [Google Scholar]
  76. Ruz Canul, M.A.; Ruz-Hernandez, J.A.; Alanis, A.Y.; Rullan-Lara, J.L.; Garcia-Hernandez, R.; Vior-Franco, J.R. Intelligent Robust Controllers Applied to an Auxiliary Energy System for Electric Vehicles. World Electr. Veh. J. 2024, 15, 479. [Google Scholar] [CrossRef]
  77. Alanis, A.Y.; Alvarez, J.G.; Sanchez, O.D.; Hernandez, H.M.; Valdivia-G, A. Fault-Tolerant Closed-Loop Controller Using Online Fault Detection by Neural Networks. Machines 2024, 12, 844. [Google Scholar]
  78. Hao, P.; Karakuş, O.; Achim, A. RKFNet: A novel neural network aided robust Kalman filter. Signal Process. 2025, 230, 109856. [Google Scholar] [CrossRef]
  79. Quintal, G.; Sanchez, E.N.; Alanis, A.Y.; Arana-Daniel, N.G. Real-time FPGA decentralized inverse optimal neural control for a Shrimp robot. In Proceedings of the 2015 10th System of Systems Engineering Conference (SoSE), San Antonio, TX, USA, 17–20 May 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 250–255. [Google Scholar]
  80. Sensor fault-tolerant control for a doubly fed induction generator in a smart grid. Eng. Appl. Artif. Intell. 2023, 117, 105527. [CrossRef]
  81. Kamwa, I.; Grondin, R. Fast adaptive schemes for tracking voltage phasor and local frequency in power transmission and distribution systems. IEEE Trans. Power Deliv. 2002, 7, 789–795. [Google Scholar] [CrossRef]
  82. Zhang, L.; Luh, P.B. Neural network-based market clearing price prediction and confidence interval estimation with an improved extended Kalman filter method. IEEE Trans. Power Syst. 2005, 20, 59–66. [Google Scholar] [CrossRef]
  83. Malartic, Q.; Farchi, A.; Bocquet, M. State, global, and local parameter estimation using local ensemble Kalman filters: Applications to online machine learning of chaotic dynamics. Q. J. R. Meteorol. Soc. 2022, 148, 2167–2193. [Google Scholar] [CrossRef]
  84. Chen, H.; Grant-Muller, S. Use of sequential learning for short-term traffic flow forecasting. Transp. Res. Part C Emerg. Technol. 2001, 9, 319–336. [Google Scholar] [CrossRef]
  85. Gerber, A.; Green, D.P. Rational learning and partisan attitudes. Am. J. Political Sci. 1998, 42, 794–818. [Google Scholar] [CrossRef]
  86. Hopfield, J.J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA 1982, 79, 2554–2558. [Google Scholar] [CrossRef] [PubMed]
  87. Kumpati, S.N.; Kannan, P. Identification and control of dynamical systems using neural networks. IEEE Trans. Neural Netw. 1990, 1, 4–27. [Google Scholar]
  88. Ghosh, J.; Shin, Y. Efficient higher-order neural networks for classification and function approximation. Int. J. Neural Syst. 1992, 3, 323–350. [Google Scholar] [CrossRef]
  89. Feldkamp, L.A.; Prokhorov, D.V.; Feldkamp, T.M. Simple and conditioned adaptive behavior from Kalman filter trained recurrent networks. Neural Netw. 2003, 16, 683–689. [Google Scholar] [CrossRef]
Figure 1. Kalman variants.
Figure 1. Kalman variants.
Algorithms 18 00587 g001
Figure 2. Kalman filter applications.
Figure 2. Kalman filter applications.
Algorithms 18 00587 g002
Figure 3. Kalman filter update process for discrete-time linear systems.
Figure 3. Kalman filter update process for discrete-time linear systems.
Algorithms 18 00587 g003
Figure 4. Kalman filter update process.
Figure 4. Kalman filter update process.
Algorithms 18 00587 g004
Figure 5. MLP trained with an EKF-based algorithm.
Figure 5. MLP trained with an EKF-based algorithm.
Algorithms 18 00587 g005
Figure 6. RBF trained with an EKF-based algorithm.
Figure 6. RBF trained with an EKF-based algorithm.
Algorithms 18 00587 g006
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alanis, A.Y. Exploring Kalman Filtering Applications for Enhancing Artificial Neural Network Learning. Algorithms 2025, 18, 587. https://doi.org/10.3390/a18090587

AMA Style

Alanis AY. Exploring Kalman Filtering Applications for Enhancing Artificial Neural Network Learning. Algorithms. 2025; 18(9):587. https://doi.org/10.3390/a18090587

Chicago/Turabian Style

Alanis, Alma Y. 2025. "Exploring Kalman Filtering Applications for Enhancing Artificial Neural Network Learning" Algorithms 18, no. 9: 587. https://doi.org/10.3390/a18090587

APA Style

Alanis, A. Y. (2025). Exploring Kalman Filtering Applications for Enhancing Artificial Neural Network Learning. Algorithms, 18(9), 587. https://doi.org/10.3390/a18090587

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop