Exploring Kalman Filtering Applications for Enhancing Artificial Neural Network Learning

Alanis, Alma Y.

doi:10.3390/a18090587

Open AccessReview

Exploring Kalman Filtering Applications for Enhancing Artificial Neural Network Learning

by

Alma Y. Alanis

Departamento de Innovacion Basada en la Informacion y el Conocimiento, CUCEI, Universidad de Guadalajara, Blvd. Marcelino Garcia Barragan 1421, Col. Olimpica, Guadalajara 44430, Jalisco, Mexico

Algorithms 2025, 18(9), 587; https://doi.org/10.3390/a18090587

Submission received: 6 August 2025 / Revised: 9 September 2025 / Accepted: 11 September 2025 / Published: 17 September 2025

(This article belongs to the Section Evolutionary Algorithms and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

Kalman filter is a widely used estimation algorithm with numerous applications, including parameter estimation, classification, prediction, pattern recognition, tuning, and filtering. Recently, it has gained attention in artificial intelligence and machine learning as a mathematical framework for the learning process. As a methodology designed for stochastic environments, the Kalman filter effectively manages noise and unstructured data with incomplete information while preventing premature stagnation, enabling faster learning and reducing the need for extensive pre-processing. These characteristics make it ideal for training artificial neural networks and other machine learning techniques. Given its significance, this paper presents a review of Kalman filter applications for artificial neural network learning.

Keywords:

Kalman filtering; neural network learning; artificial intelligence; neural network learning; online learning

1. Introduction

Working with physical signals, regardless of their context and application, requires the use of underlying techniques to ensure they are properly characterized before processing. This enables the extraction of useful information despite the presence of noise [1], incomplete information [2], stochastic environments [3], control constraints [4], and undesirable components [5]. Over time, different disciplines have made significant contributions to this problem, leading to significant advancements in signal characterization techniques, the scope of problems addressed, and their applications.

The history of filtering has evolved continuously from basic electrical circuits to sophisticated digital and statistical methods. The development of filtering approaches has been primarily driven by the demands of communication systems, control theory, and signal and data processing. From pre-20th-century analog filters to the latest methodologies based on artificial intelligence and machine learning, filtering has become a fundamental tool in engineering, robotics, telecommunications, and data science. As technology advances, filtering will continue to evolve to address increasingly complex problems in dynamic, noisy, unstructured, nonlinear, and non-stationary environments [5].

Early references to filtering appear in works related to Fourier analysis before the 20th century. In 1809, Gauss proposed an optimal filter called the least squares method to determine the trajectory of celestial bodies. This method demonstrated several key advancements, such as not requiring prior knowledge of the signals, making it widely applicable across many scientific fields. According to the literature, this methodology remains a fundamental part for both linear and statistical filtering, playing a crucial role in the foundations of modern filters. It is important to highlight the role of analog filters in the early stages of telegraph communication, where they were used to suppress undesirable frequencies and reduce noise in electrical circuits. Their development continued into the early 20th century, with the introduction of low-pass, high-pass, and band-pass filters, which facilitated more complex applications and paved the way for modern filtering. A major milestone occurred with the advent of digital signal processing in the first half of the 20th century, which led to the formal development of digital filtering methods. As digital filtering advanced in development and the increasing use of digital signals in communications and control systems, researchers began developing more advanced digital filtering techniques. This progress, along with the increasing complexity of applications in areas such as electrical circuits [2], control improvement [6] sophisticated communication systems [7], aeronautics [8], military [9], naval [10], and transportation [9], required solutions that met the evolving demands of technology.

In its early stages, the development of control systems was heavily influenced by filtering techniques, initially through analog filters and later by digital filters, which became essential for implementations enabled by emerging computing technologies. However, at the end of the 19th century, Poincaré, with his seminal work on the “New Methods of Celestial Mechanics” [11], recognized the need to formulate a general theory of dynamic systems based on sets of first-order differential equations. He introduced the now fundamental concept of considering a relevant set of system variables as a trajectory of a point in an n-dimensional space. This approach quickly gained popularity and became known as the state-space method. Thus, the concept of state became dominant in the study of dynamic systems. A critical aspect in this methodology is that your current behavior is influenced by its past history, meaning that the behavior of the system cannot be specified simply as an instantaneous relationship between sets of input and output variables. An additional set of variables, known as state variables, is needed to account for the history of the system, which represents the minimum amount of information necessary to summarize the entire dynamic past of the system. These state variables provide all the information needed to predict the future behavior of the system in response to any input signal. The use of the state-space methodology for control systems design emerged by the mid-20th century; since then, this concept has allowed scientists to formally describe dynamic systems in order to manipulate their behavior through an appropriate controller design. State-space control and optimal control theory marked the beginning of what is known as modern control theory, with wide applications in aerospace technology, robotics, communications, energy manufacturing, and transportation, to name a few. The role of Rudolf Kalman in modern control is of great importance, particularly through the formalization of two key concepts: controllability and observability [12]. The first concept establishes the conditions required to manipulate system behavior according to pre-established operating conditions, while the second concept defines the conditions required to model the evolution of internal variables based on the measurement of inputs and outputs for a given time. Both conditions, along with a third concept known as stability, are essential for the implementation and proper performance of modern control systems. Their development laid the foundation for optimal control techniques and state-estimation methods, which remain fundamental in the design of modern control systems [13].

In 1949, Norbert Wiener [14] introduced the Wiener filter in the frequency domain, which successfully addresses the problem of linear optimal dynamic estimation in stationary stochastic process systems. Despite its importance, this method produces a large computational burden while also requiring that both the estimated signal and the measured signal satisfy stationary stochastic processes, limiting its generalization. In 1960, Rudolf Emil Kalman introduced the Kalman filter (KF) [12], which does not require that the measured signal and noise follow the assumption of a stationary stochastic process. The state equation describes the relationship between input and output, considering the signal process as the result of a linear system that is affected by Gaussian noise in both the input and system state, as well as in its measurement. The Kalman filter provides an optimal estimate in terms of minimum mean-square error for linear filters of non-stationary stochastic processes [3]. Kalman developed the filter for both continuous-time and discrete-time systems. These results represent a unified methodology between the stochastic treatment of signals and the concept of dynamic systems in state-space, achieving a balance between the concepts of modern control and statistical filtering. This filter solves the problem of optimal state estimation of a linear dynamic system in the presence of measurement noise. The Kalman filter is a recursive algorithm, which means that it can process new measurements without needing to store all previous data. This characteristic is inherited from the state-space model, which is the minimum amount of information necessary to describe the complete behavior of a system. As a result, the KF only needs to store the state of the system, as opposed to storing all past signal data, making it computationally efficient. The KF also updates its state estimate by comparing the predicted state with the actual measurement, improving the accuracy of the state estimation in real-time, expanding its use in non-stationary processes, and contributing to the advancements of optimal real-time estimation [5].

The relevance of the work made by Kalman for estimation and optimal control in state-space was fundamental for several reasons: real-time operation, recursive nature and low computational complexity, and robustness to noise and uncertainties. The KF provides an efficient solution for optimal real-time state estimation in dynamical systems. In practice, many systems are not fully observable due to sensor limitations or noisy measurements, but KF enables accurate state estimation even under incomplete and noisy information without requiring storage or reprocessing of historical system data. By assuming Gaussian-distributed system noise, the KF computes optimal estimates based on this assumption. This makes it particularly effective in systems where the dynamics are not perfectly known and noise is present in the system measurements. This is essential in applications where real-time decisions are crucial, such as robotics, aerospace, autonomous vehicles, navigation and guidance systems, biomedical engineering, energy systems, transportation, and communications, among others, especially when computational resources may be limited [5].

The development of quantum systems has expanded rapidily in recent years, and accordingly, KF has also extended. A detailed description of quantum filtering can be found in [15]. A quantum extended Kalman filter (QEKF) is presented in [16], which employs a commutative approximation and a time-varying linearization for nonlinear quantum stochastic differential equations. The quantum Kalman filter (QKF) for linear quantum systems with known parameters has been studied in [17], where an optical system containing an uncertain parameter in the laser probe is described, demonstrating better estimation compared to the classical KF. A deeper review of linear quantum KF is developed in [18]. A bibliography review regarding KF-based approaches for quantum systems has been presented in [19], where an improved method for optimal estimations is demonstrated. Their work also examines a practical scenario involving magnetic field estimations in quantum systems, where nonlinear KFs could be considered as an estimation solution. More recently, ref. [20] proposed an orbit deviation propagation approach based on a deep neural network.

In summary, KF was developed by Rudolph Kalman in 1969 as an optimal state estimator for linear dynamic systems under the presence of Gaussian noise, noisy or incomplete measurements, with uncertainties, and in non-stationary environments. Due to its recursive nature, KF ensures low computational complexity, making it widely applicable and successful in numerous real-time applications. While KF is inherently linear, this work inspired extensions for nonlinear systems, such as extended Kalman filter (EKF) and unscented Kalman filter (UKF). These extensions allow state estimation in nonlinear systems, which are common in practical control applications. Since most real-world applications involve noise, uncertainty, non-measurable signals, unreliable sensors, and time-varying conditions [21], these developments remain highly relevant. Therefore, this work contemplates a somewhat different application of KF: its use for artificial neural networks training. Although this topic is not entirely new, it has gained renewed attention in recent years due to the widespread popularity of applying artificial intelligence and machine learning techniques in complex and relevant problems.

Hence, considering all these facts, this review focuses KF in artificial neural network learning, emphasizing its ability to handle noisy and uncertain data efficiently. By reducing the need for complex preprocessing and speeding up learning, the Kalman filter offers a practical and powerful tool for improving machine learning methods. This review highlights its key advantages and contributions, showing how a classical estimation technique continues to play a vital role in advancing modern artificial intelligence.

This work is organized as follows: In Section 2, the concepts underlying the deduction of the linear Kalman filter and its consequent first-order analytical approximation (EKF) are presented, then different variants of the KF are described for both linear and nonlinear systems based on both analytical and numerical approximations, with references to several key applications in the literature. In Section 3, the training problem of neural networks is presented as an optimization problem, from which the applicability of the FK and its variants in the training of neural networks is derived, including a review of the literature corresponding to different implementations. This is exemplified in Section 4 by the development of the training of three types of widely used neural networks regarding function approximation using neural networks-multilayer perceptron neural networks, radial basis neural networks, and high-order recurrent neural networks-highlighting the elements relevant for the design of KF-based algorithms and their variants in the learning of artificial neural networks. In Section 5, the challenges, limitations, open problems, and future work related to the use of KF and its variants in the learning of neural networks are discussed. Finally, in Section 6, the conclusions of the review carried out are established.

2. Main Variants of Kalman Filter

In this section, concepts underlying the deduction of the linear Kalman filter and its consequent variants are presented, as well as their basis, evolution, and current applications. Since the proposal of the Kalman filter as the solution to the optimal estimation problem in both continuous-time and discrete-time linear dynamic systems, its applicability has been evident. The motivation and mathematical formulation of the KF were already discussed in the previous section.

In 1967 [22,23], one of the most widely used variants of KF was proposed: the extended Kalman filter (EKF). The motivation for this variant was to apply KF methodology in nonlinear systems, approximated by a first-order Taylor series. Since highly nonlinear systems are only approximated, the approximation error can lead to estimation errors. For this reason, EKF cannot be considered an optimal estimator, unlike the original KF. The EKF linearizes nonlinear functions around the current estimated state. Its motivation is justified because nonlinear systems are prevalent in real-world applications where nonlinearities arise due to the construction characteristics of the components in real-world systems, as well as the nature of the systems themselves [5]. Despite the estimation errors caused by linear approximation, this variant remains widely used due to its straightforward and clear mathematical formulation. Another drawback of this variant results from the computation of Jacobians due to the system linearization, which must be evaluated in each state-space point. This results in greater design complexity of the EKF compared to KF, in addition to an increase in computational complexity. These issues have led to various improvements to EKF, such as the robust distributed EKF [24], adaptive square root EKF [25], and lightweight EKF [26].

In 1997, Simon Julier and Jeffrey Uhlmann [8] proposed another variant of KF capable of handling nonlinearities more precisely. This motivation arose from the approximation errors caused by the linearization method used in the EKF. This new variant introduced the unscented transformation, leading to the unscented Kalman filter (UKF). The UKF has better performance for highly nonlinear systems and does not require the computation of Jacobians. However, it does require the selection of sample points in a probability distribution. Furthermore, its computational complexity is significantly higher, which is why it cannot be implemented in applications that require real-time processing [27]. To address these drawbacks, some modifications have been porposed, for example, ref. [28] embedded a new deterministic sampling point set into the UKF framework, and in [18], a new exponential attenuation factor was designed according to changes in noise variance.

In 2009, Arasaratnam and Haykin, ref. [29], introduced another variant of KF to address nonlinearities: the cubature Kalman filter (CKF). The motivation of the CKF lies in selecting a spherical volume of radial type and then using a set of generated cubature points based on sampling to approximately estimate the state of a nonlinear system.

All these KF variants have been developed primarily for nonlinear dynamic systems. These variants, as well as the original KF, are known as white-box models since their design is based on a system model described in state variables. However, for more complex applications, black box models are now commonly used. These models are often obtained experimentally through the acquisition of data representing the behavior of the system under certain conditions. For example, in [30], their use in data science is considered.

Other notable KF variants include the information filter (or inverse Kalman filter), which operates in the information space rather than the state space, using information matrices instead of covariance matrices; the infinite H filter, designed for systems with unknown uncertainties; the federated Kalman filter, developed for distributed systems; and the non-gist Kalman filter with adaptive noise, intended for highly nonlinear systems with varying probabilistic distributions. Other types of filters consist of KF with learning; these type of variants emphasizes its use in applications that require experimental modeling, that is, data-driven models as the work presented by [31] in a simulation data-driven context. In this case, two main groups can be differentiated: in the first group, KF is hybridized with machine learning techniques or artificial intelligence, mainly with artificial neural networks; while the second group involves neural networks in the iterative process of KF either. In this section, the concept’s inherent deduction of the linear Kalman filter and its consequent variants are presented, as well its basis, evolution, and current applications. Since the proposal of the Kalman filter as the solution to the optimal estimation problem in both continuous-time and discrete-time linear dynamic systems, its applicability has been evident. The motivation and mathematical formulation of the KF were already discussed in the previous section.

In 1997, Simon Julier and Jeffrey Uhlmann [8] proposed another variant of KF that is able to handle nonlinearities in a more precise way. Its motivation arises from the approximation errors caused by the linearization method used in the EKF. This new variant proposes the use of the unscented transformation yielding to the unscented Kalman filter (UKF). The UKF has better performance for highly nonlinear systems and does not require the computation of Jacobians. However, it does require the selection of sample points in a probability distribution. Furthermore, its computational complexity is significantly increased, which is why it cannot be implemented in applications that require real-time processing [27]. In this context, various authors have proposed some modifications: ref. [28] embedded a new deterministic sampling point set into the UKF framework and in [18] a new exponential attenuation factor according to change of noise variance is designed.

In 2009, Arasaratnam and Haykin, ref. [29], proposed another variant of KF to address nonlinearities, the so-called cubature Kalman filter (CKF). The motivation of the CKF lies in selecting a spherical volume of radial type and then using a set of generated cubature points based on sampling to approximately estimate the state of a nonlinear system.

All these groups of variants of KF have been developed primarily for nonlinear dynamic systems. These variants, as well as the original KF, are known as white-box models since their design is based on a system model described in state variables. However, for more complex applications, black box models are now commonly used. These models are often obtained experimentally through the acquisition of data that represents the behavior of the system under certain conditions, for example in [30], it is considered for use in data science.

Other variants of KF include the information filter (or inverse Kalman filter), which works in the information space and not in the state space through information matrices and not covariance matrices; the infinite H filter, designed for systems with unknown uncertainties; the federated Kalman filter, for distributed systems; the non-gist Kalman filter with adaptive noise, for highly nonlinear systems with varying probabilistic distributions; KF with learning, this type of variants emphasizes its use in applications that require experimental modeling, that is, data-driven models as the presented by [31] in a simulation data-driven context. In the latter case, two main groups can be differentiated: in the first group, KF hybridizes with machine learning technique or artificial intelligence, mainly artificial neural networks, while the second group involves neural networks in the iterative process of KF, either for parameter estimation or for other internal processes. In [32], similarly to the work proposed in [33], which applied neural networks for nonlinear modeling; [34], which uses neural networks tor time series prediction and [35] with a KF for parameter estimation and state prediction. Figure 1 shows the main variants of KF, described in this section.

3. Neural Network Learning

The advent of artificial neural networks dates back to the work of McCulloch and Pitts in 1943 [36], who formulated a mathematical model for a neural network based on observations of biological neuron behavior made by Santiago Ramon and Cajal in 1911 [37]. However, this model did not include any learning mechanism. It was not until Rosenblatt’s perceptron of 1958 [38] that a learning rule for an artificial neural network was established for the first time. However, this neural network was very limited in terms of its applications. In 1961, Widrow and Hoff [39] addressed the problem of learning in artificial neural networks with supervised learning as an optimization problem. Their approach expanded applications of neural networks, as well as the possible structures and topologies, marking the beginning of artificial neural networks and machine learning as we conceive them today [27,40,41].

Considering neural network learning as an optimization problem allows the use of different optimization and parametric estimation methods in training. From this perspective, the KF can be considered for neural network training, with the specific KF variant depending on the type of learning and the information available to the network [5]. In this sense, an excellent review work can be found in Haykin’s book on the use of KF in neural network training. In this work, a review of KF is made, followed by the presentation of various implementations of KF in the learning process of neural networks, most of them for offline learning, as shown in Table 1, as well as in a graphical representation in Figure 2.

Nowadays, the need to perform real-time learning has increased, which highlights the use of KF for training artificial neural networks. In addition to its recursive nature, KF reduces the need for preprocessing by working directly with noisy, uncertain, and non-stationary signals [42]. It is important to note that KF is considered a second-order gradient optimization algorithm, which reduces the probability of falling into local minima as shown in [43,44,45]. Furthermore, in [43], it is proved that the Kalman filter is a global observer for linear (discrete-time) time-varying systems, and then EKF is shown to act as a quasi-local asymptotic observer for discrete-time nonlinear systems. In [46], a backpropagation training algorithm is shown to be three orders of magnitude less computationally expensive than EFK in terms of the number of floating-point operations. However, in [47], a decoupled extended Kalman filter is proposed in order to decrease the computational effort of the learning algorithm; in its analysis, ref. [47] demonstrates the computational superiority of KF learning algorithms over gradient descent ones. Also, in [41], different approaches are proposed in order to reduce computational complexity implementations of KF variants to train neural networks offline and online, where as [48] demonstrated that KF algorithms converge in fewer iterations compared to backpropagation with the same neural network configurations. In [42], an analysis is carried out that gives insight into the convergence mechanisms, showing that with a modification of the algorithm, global convergence results can be achieved for general cases. The scheme can then be interpreted either as maximization of the likelihood estimation, or as a recursive prediction error algorithm. Additionally, ref. [49], using Lyapunov stability approach, showed that KF-based learning achieves faster convergence than traditional algorithms. Similarly, the work performed by [43] establishes specific conditions for improved convergence of KF as a learning algorithm.

This characteristic has also been exploited in machine learning to improve the performance of the Adam model, leading to the Kadam model, where KF is used to estimate the first and second moments required by the algorithm [50]. The Adam model does first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments [51], while the EKF can be used as a second-order gradient descent algorithm to estimate optimal weights for an RNN, as explained in [44,45], helping the EKF algorithm to avoid local minima.

It is worth highlighting in Table 1 the growing interest in real-time implementations. Such applications are particularly relevant for decentralized approaches as well as for embedded implementations, edge-computing approaches, and reduced communication problems. In Table 2, a comparison is included of reported processing time for different implementations, noting the suitability of EKF-based learning algorithms for online operation with small sampling times. The implementations included in Table 2 are aligned in terms of the elements considered for each scheme, allowing a meaningful comparison.

Table 1. KF variant neural network learning applications.

Author	Type of KF	Main Contribution	Learning	Examples
[52]	EKF	Real-time learning algorithm for a multilayered neural network	Online	Numerical
[53]	DEKF	Feedforward multilayered neural networks based on an EKF	Offline	Simulation
[54]	EKF	Real-time neural controller for three phase induction motors	Online	Experimental
[55]	EKF, UKF	Sequential growing-and-pruning learning algorithm	Offline	Simulation
[41]	KF, EKF, UKF, DEKF	Kalman filtering as applied to the learning and use of neural networks	Offline Online	Numerical Simulation Experimental
[5]	EKF, DEKF	Radial basis neural networks trained with an extended Kalman filter	Online	Simulation
[49]	EKF	State-space recurrent neural networks for nonlinear system identification	Offline	Simulation
[56]	KF	Q-learning with KF for action selection in cooperative control	Offline	Simulation
[57]	KF	Continuous state-space via Q-Learning for Markov decision process	Offline	Numerical
[58]	KF	Online Sequential Extreme Learning Machine and Kalman filter regression	Offline	Simulation
[59]	KF	Kalman filter with iterative learning control	Offline	Simulation
[60]	EKF	Induction motor control, combining EKF with a fuzzy logic controller	Offline	Simulation
[61]	KF	Kalman Filter and a Temporal Differencing	Offline	Simulation
[21]	EKF	Real-time neural controller for autonomous robotic navigation	Online	Experimental
[62]	KF	Kalman filter to update weights of a Single Layer Feedforward Network	Offline	Simulation
[63]	KF	Q-learning is represented in the framework of Kalman filter model	Offline	Simulation
[64]	KF	NN-based learning modules to update a Kalman filter for estimation	Offline	Simulation
[35]	DEKF	Estimation charge for lithium-ion batteries	Online	Experimental
[65]	KF	KF learning for stochastic claims reserving	Offline	Simulation
[66]	KF	Federated Kalman filters are proposed	Offline	Simulation
[67]	KF	Q-learning Approach with Kalman Filter for Self-balancing Robot	Offline	Simulation
[68]	EKF	State estimation algorithm combines the EKF and a Q-learning method	Offline	Simulation
[69]	KF	It is proposed an extreme learning Kalman filter for NN	Offline	Simulation
[70]	KF	Kalman filtering with a dedicated recurrent neural network	Online	Numerical
[71]	KF	KF filter is combined with a NN to predict the transaction throughput in a blockchain	Offline	Experimental
[72]	KF	Reinforcement learning adaptive KF for signal’s autoregressive modeling	Online	Experimental
[73]	EKF	Continuous action learning automata for tuning of Kalman filter	Offline	Experimental
[74]	KF	KF agents that operate sequentially to estimate optimal learning rate	Offline	Simulation
[75]	KF	Kalman filter-based cycle-consistent adversarial learning framework for time series	Offline	Simulation
[76]	EKF	Neural controller applied to an auxiliary energy system for electric vehicles	Online	Experimental
[77]	EKF	Real-time fault-tolerant closed-loop neural controller	Online	Experimental
[78]	KF	A neural network combined with a robust KF	Offline	Simulation

Table 2. Comparison for real-time implementation of neural network learning-based algorithms.

Work	Application	Processing Hardware	Processing Time
[54]	Three-phase induction motor	DSP-DS1104	1 ms
[79]	Mobile robot	FPGA Cyclone IV, DE2-115	14 μs
[21]	Mobile robot	DS1104	1 ms
[80]	Smart grid	LAUNCHXL-F28379D	0.5 ms
[77]	Three-phase induction motor	DS1104	1 ms

Lastly, KF has a rigorous stability analysis which allows its initialization to be established according to analytically defined stability and convergence conditions [41,43]. In this sense, the literature shows a large number of successful implementations of KF for training neural networks applied mainly to classification, control [6,22,23,54,60], energy [25,35,81,82], estimation [26,69,82,83], forecasting [65,74,84,85], robotics [9,21,56,67], and NN training as presented in Table 1; as well as its combinations, these applications and their interrelationships are depicted in Figure 2. Therefore, in the next section, a neural learning solution from an optimization point of view is described and a solution obtained from the KF context is analyzed.

4. Kalman Filter for Neural Network Learning

In this section, the use of the Kalman filter for neural network learning is introduced as an application of its properties as a state estimator. This approach has been used by several authors to different types of neural networks. In this work, only Multilayer perceptron (MLP), radial basis, and recurrent high order neural networks (RHONN) are considered, whose learning processes are solved with the KF approach. In the literature, other similar learning approaches for other types of neural networks can be found; however, three types of neural networks considered in this work have been experimentally implemented in real time, as shown in Table 2 [5,21,54,76,77].

As previously explained, KF has been formulated for a linear dynamic system in state-space to provide a solution to the linear optimal filtering problem. This solution applies to both stationary and non-stationary environments. Also, as mentioned before, the solution is recursive, meaning each update of the estimated state is calculated using the previous estimate and new input data, requiring only the previous estimate to be stored. This also implies that storing all past data is not necessary. Let us now consider a linear dynamical system in discrete time, as depicted in Figure 3:

4.1. Concepts Prior to KF

Before discussing the formulation of the Kalman filter, it is important to contextualize other significant results.

Optimal Estimation

First, let us review the fundamental concepts of optimal estimation. Consider the following equation:

y_{k} = w_{k} + v_{k}

(1)

where

w_{k}

is an unknown signal and

v_{k}

is considered as additive Gaussian noise.

{\hat{w}}_{k}

is a posterior estimated of the signal

w_{k}

,

y_{1}, y_{2}, \dots, y_{k}

. Typically, estimate

{\hat{w}}_{k}

is different from unknown signal

w_{k}

.

The first step for an optimization problem is to define a cost (loss) function, which must satisfy the following requirements:

Cost function is non-negative.
Cost function is a non-decreasing function of the estimation error, defined by:

{\hat{w}}_{k} = w_{k} - {\hat{w}}_{k}

(2)

These two requirements are satisfied by the expected square error, defined by:

J_{k} = E {{(w_{K} - {\hat{w}}_{k})}^{2}} = E {{\tilde{w}}_{k}^{2}}

(3)

where E is the expected value operator. Cost function

J_{k}

is time-dependent with sample instant k, this emphasizes the non-stationary nature of the recursive estimation process. To deduce optimal value for the estimate

{\hat{w}}_{k}

, the following theorems are required [13].

Theorem 1.

Conditional expectation estimator. If stochastic processes

w_{k}

and

y_{k}

are considered as Gaussian, then optimal estimated

{\hat{w}}_{k}

to minimize mean square error

J_{k}

is the expected value operator:

{\hat{w}}_{k} = E {w_{k} | y_{1}, y_{2}, \dots, y_{k}}

(4)

Theorem 2.

Orthogonality principle. Let the processes

w_{k}

and

y_{k}

be stochastic with zero mean, such that:

E {w_{k}} = E {y_{k}} = 0 \forall k

(5)

Therefore:

i.: Stochastic processes $w_{k}$ and $y_{k}$ are Gaussian, or;
ii.: Optimal estimated ${\hat{w}}_{k}$ is restricted to be a linear function of measures $y_{1}, y_{2}, \dots, y_{k}$ and mean square error cost function.
iii.: Then, optimal estimate $\hat{w_{k}}$ , with measurements $y_{1}, y_{2}, \dots, y_{k}$ , is orthogonal to the projection of $w_{k}$ in generated space for such measurements.

4.2. Kalman Filter Realization

State-space model:

$w (k + 1) = F (k + 1, k) w (k) + u (k)$

(6)

$y (k) = H (k) w (k) + v (k)$

(7)

where $u (k)$ and $v (k)$ are Gaussian independent noises with zero means, where the covariance matrices are $Q (k)$ and $R (k)$ , respectively.
Initialization:

$\hat{w} (0) = E {w (0)}$

(8)

$P (0) = E {[w (0) - E {w (0)}] {[w (0) - E {w (0)}]}^{T}}$

(9)
Propagation of estimated state

${\hat{w}}^{-} (k) = F (k, k - 1) \hat{w} (k - 1)$

(10)
Propagation of estimation error covariance

$P^{-} (k) = F (k, k - 1) P (k - 1) F^{T} (k, k - 1) + Q (k - 1)$

(11)
Kalman gain matrix

$K (k) = P^{-} (k) H^{T} (k) [R (k) + H (k) P (-) (k) H^{T} {(k)]}^{-} 1$

(12)
State estimation update

$\hat{w} (k) = {\hat{w}}^{-} + K (k) (y (k) - H (k) {\hat{w}}^{-} (k))$

(13)
Estimation error covariance update

$P (k) = (I - K (k) H (k)) P^{-} (k)$

(14)

The Kalman filter described earlier assumes a linear model of a dynamic system, as can be seen in Figure 4. However, in most cases the model is nonlinear; next, the use of KF will be extended through a linealization procedure resulting in EKF. Such an extension is feasible since KF is described in terms of differential equations in discrete-time systems.

4.3. Extended Kalman Filter

State-space model for discrete-time nonlinear systems

$w (k + 1) = f (k, w (k)) + u (k)$

(15)

$y (k) = h (k, w (k)) + v (k)$

(16)

where $u (k)$ y $v (k)$ Gaussian independent noises with zero mean and covariance matrices $Q (k)$ and $R (k)$ , respectively.
Initialization:

$\hat{w} (0) = E {w (0)}$

(17)

$P (0) = E {[w (0) - E {w (0)}] {[w (0) - E {w (0)}]}^{T}}$

(18)

The basic idea of EKF is to linearize the system model in state-space at each time instant around the most recent estimated state, which can be considered as

{\hat{w}}_{k}

or

{\hat{w}}_{k}^{-}

, respectively. Once the linearized model is obtained, KF realization can be applied. Defining a discrete-time nonlinear system as:

F (k + 1, k) = \frac{\partial f (k, w (k))}{\partial w} |_{w = \hat{w} (k)}

(19)

H (k + 1, k) = \frac{\partial h (k, w (k))}{\partial w} |_{w = {\hat{w}}^{-} (k)}

(20)

EKF realization is given by:

Propagation of estimated state

${\hat{w}}^{-} (k) = F (k, k - 1) \hat{w} (k - 1)$

(21)
Propagation of estimation error covariance

$P^{-} (k) = F (k, k - 1) P (k - 1) F^{T} (k, k - 1) + Q (k - 1)$

(22)
Kalman gain matrix

$K (k) = P^{-} (k) H^{T} (k) {[R (k) + H (k) P^{-} (k) H^{T} (k)]}^{-} 1$

(23)
State estimation update

$\hat{w} (k) = {\hat{w}}^{-} (k) + K (k) (y (k) - H (k) {\hat{w}}^{-} (k))$

(24)
Estimation error covariance update

$P (k) = (I - K (k) H (k)) P^{-} (k)$

(25)

Then, following conditional expectation estimator for KF, it is possible to define:

{\hat{w}}_{k}^{-} = E {w_{k} | y_{1}, \dots, y_{k} - 1}

(26)

with

w_{k} = F_{k, k - 1} w_{k - 1} + u_{k - 1}

\begin{matrix} {\hat{w}}_{k}^{-} & = E {(F_{k, k - 1} w_{k - 1} + u_{k - 1}) | y_{1}, \dots, y_{k - 1}} \\ = F_{k, k - 1} E {w_{k - 1} | y_{1}, \dots, y_{k - 1}} + E {u_{k - 1} | y_{1}, \dots, y_{k - 1}} \end{matrix}

By definition:

E {w_{k - 1} | y_{1}, \dots, y_{k - 1}} = {\hat{w}}_{k - 1}

(27)

E {u_{k - 1} | y_{1}, \dots, y_{k - 1}} = E {u_{k - 1}} = 0

(28)

Therefore the best a priori estimate of w based on measurements is:

{\hat{w}}_{k}^{-} = F_{k, k - 1} {\hat{w}}_{k - 1}

(29)

4.4. Relevant Results on KF for Neural Network Learning

From the previous explanation, it is easy to see that KF can be used to train artificial neural networks, as described in this section.

4.4.1. Comparison Between Kalman Filter and Recursive Least Squares Algorithm

It is well-known that most of the supervised neural networks models are trained with the recursive least squares algorithm, where the equations are defined as:

\begin{matrix} P (k) = P (k - 1) - P (k - 1) \frac{φ (k) φ^{T} (k)}{1 + φ^{T} (k) P (k - 1) φ (k)} P (k - 1) \end{matrix}

(30)

\hat{Θ} (k) = Θ (k - 1) + P (k) φ (k) [y (k) - φ^{T} (k) \hat{Θ} (k - 1)]

(31)

On the other hand, KF equations can be written as:

P (k + 1) = P (k) - P (k) H^{T} {[H P (k) H^{T} + R]}^{- 1} H P (k) + Q

(32)

where

Θ

is a vector which contains estimated parameters and

φ

is a regression vector. Then with

Q = 0

the following is defined:

P (k + 1) = P (k) - P (k) H^{T} {[H P (k) H^{T} + R]}^{- 1} H P (k)

(33)

with

θ (k + 1) = θ (k)

it follows:

P (k) = P (k - 1) - P (k - 1) H^{T} {[H P (k - 1) H^{T} + R]}^{- 1} H P (k - 1)

(34)

and considering

H = Θ^{T} (k)

P (k) = P (k - 1) - P (k - 1) φ (k) {[φ^{T} (k) P (k - 1) φ (k) + R]}^{- 1} φ^{T} (k) P (k - 1)

(35)

Finally, with

R = 1

, (1) can be obtained, which corresponds to the recursive least squares algorithm.

P (k) = P (k - 1) - P (k - 1) \frac{φ (k) φ^{T} (k)}{1 + φ^{T} (k) P (k - 1) φ (k)} P (k - 1)

(36)

4.4.2. Retropropagation Versus EKF

The backpropagation algorithm is mostly used in learning for supervised multilayer neural networks [48], including deep learning models, which is an algorithm based on the gradient descent rule [46].

Δ w_{i j} = - η \frac{\partial E_{a v}}{\partial w_{i j}}

(37)

E_{a v} = \frac{1}{2} {(d_{p} - z_{p})}^{2}

(38)

where

η

is the learning rate parameter,

d_{p}

is

p - t h

desired output, and

z_{p}

is

p - t h

neural network (approximated) output.

\frac{\partial E_{a v}}{\partial w_{i j}} = (d_{p} - z_{p}) \frac{\partial z_{p}}{\partial w_{i j}}

(39)

Then:

Δ w_{i j} = η (d_{p} - z_{p}) \frac{\partial z_{p}}{\partial w_{i j}}

(40)

In the EKF algorithm, the following is defined:

Δ w = K (d_{p} - z_{p})

(41)

K = P H {[H^{T} P H + R]}^{- 1}

(42)

Δ w = P H {[H^{T} P H + R]}^{- 1} (d_{p} - z_{p})

(43)

Considering that

P = P I

and

{[H^{T} P H + R]}^{- 1} = a I

where a and p are scalar values, then, for a particular weight

w_{i j}

,

Δ w_{i j} = a p (d_{p} - z_{p}) h_{i j}

is defined where

h_{i j} = \frac{\partial z_{p}}{\partial w_{i j}}

:

Δ w_{i j} = a p (d_{p} - z_{p}) \frac{\partial z_{p}}{\partial w_{i j}}

(44)

Note that this is the delta rule used in backpropagation algorithm with

η = a p

. Therefore, the backpropagation algorithm can be considered a special case of EKF. Since this algorithm is a simplified of EKF, it is possible that the backpropagation algorithm discards information that may be important during weight update [46].

4.5. Multilayer Perceptron Trained with EKF

It is well known that the MLP neural network is one of the most used neural network to deal with complex nonlinear data, traditionally trained with backpropagation-type algorithms and with optimizer-type algorithms [41]; most of their learning implementation works in an offline mode using epoch or batch approaches (see Table 1). In addition, various researchers have proposed learning algorithms for neural networks based on EKF; use of this kind of learning approaches improves learning process without need of hard preprocessing tasks. These applications are based on the training of MLP and can be interpreted as an estimation problem for a nonlinear system, which can be solved with EKF. The use of algorithms based on EKF is recursive due to the gradient descent algorithm. Recursive least squares algorithms, backpopagation algorithms, and others can be considered as specific cases of KF, as is shown in [46].

In KF application for MLP learning, the weight vector is considered as the state to be estimated by the Kalman Filter, where the neural network output is the measurement used by KF. The use of EKF is necessary since MLP is a nonlinear system. When neural networks are considered as an optimal filtering problem, the need for a recursive solution that uses current information and does not require storing the entire weights evolution becomes evident. This is the essence of KF applied to training neural networks.

Estimation is performed recursively, meaning that each update of the estimate (synaptic weights) is calculated using the previous estimate and current data, requiring only the previous estimate to be stored. Training is conducted from a set of N input–output measurements with the objective of finding the optimal weights that minimize prediction error. It is well known that neural networks can be modeled as:

\hat{w} (k + 1) = \hat{w} (k)

(45)

\hat{y} (k) = φ (\hat{w} (k), u (k))

(46)

where

φ (•)

is a nonlinear function of neural weights

\hat{w}

and external input u.

According to this, the equations necessary for the development of the learning algorithm based on EKF are:

K (k) = P (k) H (k) {[R (k) + H (k) P (k) H (k)]}^{- 1}

(47)

w (k + 1) = w (k) + K (k) [y (k) - \hat{y} (k)]

(48)

P (k + 1) = P (k) - K (k) H (k) P (k) + Q (k)

(49)

where

P (k) \in ℜ^{L \times L}

represents the estimation error matrix at time

k, w \in ℜ^{L}

is the state vector (weight vector), L represents number of neural network weight (dimension of weight vector) and

\in ℜ^{m}

represents vector of desired outputs,

\hat{y} \in ℜ^{m}

is the vector of neural network output where m represents its dimension.

K \in ℜ^{L \times m}

is the Kalman gain,

Q \in ℜ^{L \times L}

is the covariance matrix for weight estimation noise,

R \in ℜ^{m \times m}

is the covariance matrix for measurement noise,

H \in ℜ^{m \times L}

is a matrix that contains the derivative of each neural output with respect to each neural network weight.

H_{i j} (k) = {[\frac{\partial {\hat{y}}_{i} (k)}{\partial w_{i j} (k)}]}_{w (k) = w (k - 1)}; i = 1, \dots, m; j = 1, \dots, L

(50)

Typically,

P, R

, and Q are initialized as diagonal matrices where the non-zero elements are defined as

P (0)

,

R (0)

and

Q (0)

, respectively, considering the MLP depicted in Figure 5.

With p inputs and h hidden networks, the weight vector is defined as:

w = {[w_{10}^{(1)} \dots w_{1 p}^{(1)} w_{20}^{(1)} \dots w_{h \times p}^{(1)} w_{10}^{(2)} \dots w_{1 h}^{(2)}]}^{T}

(51)

The total number of weight elements L, is defined as:

L = (p + 1) \times h + (h + 1) \times m

(52)

where m is the number of neural network outputs, defined as:

σ_{i} = \frac{1}{1 + e^{- n_{i}}} i = 1, \dots, h

(53)

n_{i} = \sum_{j = 0}^{p} w_{i j}^{(1)} x_{j} x_{o} = + 1

(54)

v_{1} = \sum_{k = 0}^{h} w_{i k}^{(2)} σ_{k} σ_{o} = + 1

(55)

\hat{y} = v_{1}

(56)

Then

\frac{\partial \hat{y}}{\partial w} = [\frac{\partial \hat{y}}{\partial w_{10}^{(1)}} \frac{\partial \hat{y}}{\partial w_{11}^{(1)}} \dots \frac{\partial \hat{y}}{\partial w_{1 h}^{(2)}}]

(57)

Therefore,

H = [γ (n_{1}) x_{0} \dots γ (n_{1}) x_{p} γ (n_{h}) x_{p} σ_{0} \dots σ_{h}]

(58)

with

γ (n_{i}) = \frac{w_{1 i}^{(2)} e^{- n_{i}}}{(1 + e^{- n_{i}})} i = 1, \dots, h

(59)

which is valid for the MLP depicted in Figure 5 with sigmoid activation functions for hidden neurons and the linear activation function for the output neuron.

4.6. Recurrent High-Order Neural Network Trained with EKF

As a third type of neural network considered in this work, a Recurrent High-Order Neural Network (RHONN) is used to show applicability of EKF as a learning algorithm to train online artificial neural networks using stochastic estimation capabilities of EKF;this type of neural network is based on a Hopfield type neural network [86]. Recurrent neural networks offer a better-suited tool to model data from dynamic environments [54]. Since the seminal paper [87], there has been continuously increasing interest in applying NNs to identification and control of nonlinear systems, specially RHONN, due to their excellent approximation capabilities, using few units. These kind of NNs, compared to the first-order ones, are more flexible and robust when faced with new and/or noisy data patterns [88]. Furthermore, RHONNs performed better than the multilayer first-order ones using a small number of free parameters [89]. Additionally, different authors have demonstrated the feasibility and the advantages of using these architectures in applications for system identification and control. The best-known training approach for RNNs is the backpropagation-through-time learning [48]. However, it is a first-order gradient-descent method, and hence, its learning speed can be very slow. Recently, the extended Kalman filter (EKF)-based algorithms have been introduced to train NNs, in order to improve the learning convergence [41]. Note that it is derived from an EKF-based learning algorithm for RHONN.

Consider an EKF represented by Equations (47)–(49) with:

H_{i j} (k) = {[\frac{\partial {\hat{y}}_{i} (k)}{\partial w_{i j} (k)}]}_{w (k) = w (k - 1)}; i = 1, \dots, m; j = 1, \dots, L

(60)

For a discrete-time RHONN used as an identifier:

\hat{y} (k) = \hat{x} (k) = w^{T} (k) z (x (k), u (k))

(61)

Therefore,

H (k) = \frac{\partial \hat{y} (k)}{\partial w} = z (x (k), u (k))

(62)

As initial weight values are non-dependent of the RHONN state:

H_{i j} (0) = 0

(63)

4.7. Radial Basis Neural Network Trained with an EKF

From the literature, it is evident that another of the most used neural networks to deal with nonlinear complex data is the radial basis neural network. This type of neural network has a simple structure that allows its use without the need to be an expert on neural networks leaning. Their most commune algorithms are designed for offline learning; however, in [47], they propose an online learning algorithm based on an EKF. In this application, neural network parameters are state variables estimated by EKF. Consider the radial basis neural network (RBF) as depicted in Figure 6:

x = {[x_{1} x_{2} \dots x_{p}]}^{T}

(64)

c = {[c_{1} c_{2} \dots c_{p}]}^{T}

(65)

i = 1, \dots, m

(66)

With p inputs and m hidden neurons, RBF parameters can be defined as:

θ = {[w_{1} \dots w_{m} c_{11} \dots c_{1 p} c_{21} \dots c_{m p}]}^{T}

(67)

\hat{y} = \sum_{i = 0}^{p} \sum_{j = 1}^{m} w_{j} G (| | x_{i} - c_{j} | |) = W G

(68)

\frac{\partial \hat{y}}{\partial θ} = {[\frac{\partial \hat{y}}{\partial w_{1}} \dots \frac{\partial \hat{y}}{\partial w_{m}} \frac{\partial \hat{y}}{\partial c_{11}} \dots \frac{\partial \hat{y}}{\partial c_{1 p}} \frac{\partial \hat{y}}{\partial c_{21}} \dots \frac{\partial \hat{y}}{\partial c_{c m p}}]}^{T}

(69)

In this case, the total number of neural network parameters L, is defined as:

L = m + m p

(70)

Therefore:

\frac{\partial \hat{y}}{\partial w_{1}} = G (∥x - c_{1}∥)

(71)

⋮

(72)

\frac{\partial \hat{y}}{\partial w_{m}} = G (∥x - c_{m}∥)

(73)

\frac{\partial \hat{y}}{\partial c_{1}} = w_{1} G^{'} (∥x - c_{1}∥)

(74)

⋮

(75)

\frac{\partial \hat{y}}{\partial c_{m}} = w_{m} G^{'} (∥x - c_{m}∥)

(76)

Considering:

\begin{matrix} \begin{matrix} G (x, c_{i}) = e^{- \frac{‖ x - c_{i} ‖}{2 σ^{2}}} \end{matrix} \end{matrix}

(77)

G (x, c_{i}) = e^{- \frac{‖ x - c_{i} ‖}{2 σ^{2}}} = e^{- \frac{1}{2 σ^{2}} [{(x_{1} - c_{i j})}^{2} + {(x_{2} - c_{i j})}^{2} + \dots + {(x_{p} - c_{i j})}^{2}]}

(78)

G^{'} (x, c_{i j}) = e^{- \frac{‖ x - c_{i} ‖}{2 σ^{2}}} [(- \frac{1}{2 σ^{2}} [- 2 (x_{1} - c_{i j})])]

(79)

G^{'} (x, c_{i j}) = e^{- \frac{‖ x - c_{i} ‖}{2 σ^{2}}} (x_{j} - c_{i j})

(80)

where

i = 1, \dots, m

and

j = 1, \dots, p

Possible Modifications

Just weight estimation (arbitrary centers of fixed centers).
Global EKF.
Decoupled EKF.
−
Centers decoupled from weights.
−
Weights decoupled from centers and other weights.
Other combinations.

From these examples of EKF use in neural network learning, it is possible to start the discussion section in order to establish the challenges, limitations, open problems, and future works.

5. Challenges, Limitations, Open Problems and Future Work

Although KF has advantages over traditional learning algorithms, primarily the gradient descent algorithms, many challenges remain, such as computational cost, as using EKF can be resource-intensive, limiting its real-time application with limited hardware. Initializing the learning algorithm also remains a challenge. Regarding implementation limitations, the nonlinear approach to neural network learning also requires a mathematical model for the analytical approximation of EKF, while numerical approaches require training per epoch, batch, or semi-batch, which limits their real-time implementation.

As mentioned in this work, the use of KF and its variants for training neural networks requires a meticulous design tailored to the neural structure used, so the establishment of general-purpose, real-time implementable strategies remains an open problem. Another open problem is the choice of KF initialization parameters, since these design parameters directly impact the algorithm’s convergence conditions. Future work includes relevant applications for engineering such as advanced robotics, autonomous vehicles, energy systems, and biomedical systems, as well as complex innovative problems, including Edge Computing and the hardware implementation of KF and its variants. In this way, it is expected that publications regarding KF variants with a quantum approach can be increased in the coming years.

6. Conclusions

This work provides a practical review of the development of the Kalman Filter and its application to neural network training, including filtering and estimation concepts prior to KF, its main variants, and the modifications introduced to address nonlinear systems. Additionally, an introduction to the development of artificial neural networks and the inherent problems has been included. An analysis is also presented that highlights the KF potential as a learning algorithm for neural networks. Also, the training of three different types of artificial neural networks, MLP, RBF, and RHONN, using EFK was included. This review demonstrates the extensive applicability of KF in a wide range of applications ranging from traditional applications, estimation, and optimal filtering of linear systems to applications that combine the use of KF with artificial intelligence and machine learning techniques. Undoubtedly, in the coming years, an increasing number of implementations that combine the potential of KF with emerging artificial intelligence and machine learning methodologies to solve real-world problems will emerge.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study.

Acknowledgments

Author thank support of CUCEI, Universidad de Guadalajara in the development of this work.

Conflicts of Interest

The author declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

KF	Kalman Filter
EKF	Extended Kalman Filter
UKF	Unscented Kalman Filter
CKF	Cubature Kalman Filter
QKF	Quantum Kalman Filter
QEKF	Quantum Extended Kalman Filter
MLP	Multilayer Perceptron
RBF	Radial Basis Function
RHONN	Recurrent High Order Neural Networks

References

Crassidis, J.L.; Junkins, J.L. Optimal Estimation of Dynamic Systems; Chapman and Hall/CRC: Boca Raton, FL, USA, 2004. [Google Scholar]
Grewal, M.S.; Andrews, A.P. Kalman Filtering Theory and Practice Using MATLAB; Wiley: Hoboken, NJ, USA, 2023. [Google Scholar]
Jazwinski, A.H. Stochastic Processes and Filtering Theory; Courier Corporation: North Chelmsford, MA, USA, 2013. [Google Scholar]
Maybeck, P.S. Stochastic Models, Estimation, and Control; Academic Press: Cambridge, MA, USA, 1982; Volume 3. [Google Scholar]
Simon, D. Optimal state Estimation: Kalman, H Infinity, and Nonlinear Approaches; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
Panomruttanarug, B.; Longman, R.W. Using Kalman filter to attenuate noise in learning and repetitive control can easily degrade performance. In Proceedings of the 2008 SICE Annual Conference, Chofu, Japan, 20–22 August 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 3453–3458. [Google Scholar]
Khodarahmi, M.; Maihami, V. A review on Kalman filter models. Arch. Comput. Methods Eng. 2023, 30, 727–747. [Google Scholar] [CrossRef]
Julier, S.J.; Uhlmann, J.K. New extension of the Kalman filter to nonlinear systems. In Proceedings Volume 3068, Signal Processing, Sensor Fusion, and Target Recognition VI; SPIE: Bellingham, WA, USA, 1997; pp. 182–193. [Google Scholar]
Urrea, C.; Agramonte, R. Kalman filter: Historical overview and review of its use in robotics 60 years after its creation. J. Sensors 2021, 2021, 9674015. [Google Scholar] [CrossRef]
Kim, S.; Petrunin, I.; Shin, H.S. A review of Kalman filter with artificial intelligence techniques. In Proceedings of the 2022 Integrated Communication, Navigation and Surveillance Conference (ICNS), Dulles, VA, USA, 5–7 April 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–12. [Google Scholar]
Poincaré, H. Les méThodes Nouvelles de la méCanique céLeste; Gauthier-Villars: Paris, France, 1892. [Google Scholar]
Kalman, R.E. Contributions to the theory of optimal control. Bol. Soc. Mat. Mex. 1960, 5, 102–119. [Google Scholar]
Kalman, R.E. A new approach to linear filtering and prediction problems. J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef]
Wiener, N. Extrapolation, Interpolation, and Smoothing of Stationary Time Series; MIT Press: Cambridge, MA, USA, 1964. [Google Scholar]
Belavkin, V. Measurement, filtering and control in quantum open dynamical systems. Rep. Math. Phys. 1999, 43, A405–A425. [Google Scholar] [CrossRef]
Emzir, M.F.; Woolley, M.J.; Petersen, I.R. A quantum extended Kalman filter. J. Phys. A: Math. Theor. 2017, 50, 225301. [Google Scholar] [CrossRef]
Iida, S.; Ohki, K.; Yamamoto, N. Robust quantum Kalman filtering under the phase uncertainty of the probe-laser. In Proceedings of the 2010 IEEE International Symposium on Computer-Aided Control System Design, Yokohama, Japan, 8–10 September 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 749–754. [Google Scholar]
Zhang, G.; Dong, Z. Linear quantum systems: A tutorial. Annu. Rev. Control 2022, 54, 274–294. [Google Scholar]
Ma, K.; Kong, J.; Wang, Y.; Lu, X.M. Review of the applications of Kalman filtering in quantum systems. Symmetry 2022, 14, 2478. [Google Scholar] [CrossRef]
Zhou, X.; Qiao, D.; Li, X. Neural network-based method for orbit uncertainty propagation and estimation. IEEE Trans. Aerosp. Electron. Syst. 2023, 60, 1176–1193. [Google Scholar] [CrossRef]
Alanis, A.Y.; Arana-Daniel, N.; Lopez-Franco, C. Artificial Neural Networks for Engineering Applications; Academic Press: Cambridge, MA, USA, 2019. [Google Scholar]
Gruber, M. An Approach to Target Tracking; Technical Note 1967-8, DDC 654272; MIT Lincoln Laboratory: Lexington, MA, USA, 1967. [Google Scholar]
Larson, R.E.; Dressler, R.M.; Ratner, R.S. Application of the Extended Kalman Filter to Ballistic Trajectory Estimation; Final Report 5188-103; Stanford Research Institute: Monlo Park, CA, USA, 1967. [Google Scholar]
Duan, P.; Duan, Z.; Lv, Y.; Chen, G. Distributed finite-horizon extended Kalman filtering for uncertain nonlinear systems. IEEE Trans. Cybern. 2019, 51, 512–520. [Google Scholar] [CrossRef]
Jiang, C.; Wang, S.; Wu, B.; Fernandez, C.; Xiong, X.; Coffie-Ken, J. A state-of-charge estimation method of the power lithium-ion battery in complex conditions based on adaptive square root extended Kalman filter. Energy 2021, 219, 119603. [Google Scholar] [CrossRef]
Dai, Z.; Jing, L. Lightweight extended Kalman filter for MARG sensors attitude estimation. IEEE Sens. J. 2021, 21, 14749–14758. [Google Scholar] [CrossRef]
Williams, R.J.; Zipser, D. A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1989, 1, 270–280. [Google Scholar] [CrossRef]
Chang, L.; Hu, B.; Li, A.; Qin, F. Transformed unscented Kalman filter. IEEE Trans. Autom. Control 2012, 58, 252–257. [Google Scholar] [CrossRef]
Arasaratnam, I.; Haykin, S. Cubature Kalman filters. IEEE Trans. Autom. Control 2009, 54, 1254–1269. [Google Scholar] [CrossRef]
Provost, F.; Fawcett, T. Data Science for Business: What you Need to Know About Data Mining and Data-Analytic Thinking; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2013. [Google Scholar]
Haykin, S. Neural Networks and Learning Machines, 3rd ed.; Pearson Education: Upper Saddle River, NJ, USA, 2009. [Google Scholar]
Härter, F.P.; de Campos Velho, H.F. New approach to applying neural network in nonlinear dynamic model. Appl. Math. Model. 2008, 32, 2621–2633. [Google Scholar] [CrossRef]
Chen, X.; Bettens, A.; Xie, Z.; Wang, Z.; Wu, X. Kalman filter and neural network fusion for fault detection and recovery in satellite attitude estimation. Acta Astronaut. 2024, 217, 48–61. [Google Scholar] [CrossRef]
Wu, X.; Wang, Y. Extended and unscented Kalman filtering based feedforward neural networks for time series prediction. Appl. Math. Model. 2012, 36, 1123–1131. [Google Scholar] [CrossRef]
Xu, Y.; Hu, M.; Zhou, A.; Li, Y.; Li, S.; Fu, C.; Gong, C. State of charge estimation for lithium-ion batteries based on adaptive dual Kalman filter. Appl. Math. Model. 2020, 77, 1255–1272. [Google Scholar] [CrossRef]
McCulloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 1943, 5, 115–133. [Google Scholar] [CrossRef]
Cajal, S.R.Y.; Azoulay, D. Histology of the Nervous System: Of Man and Vertebrates; Oxford Academic: Oxford, UK, 1995. [Google Scholar]
Rosenblatt, F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev. 1958, 65, 386. [Google Scholar] [CrossRef]
Widrow, B.; Hoff, M.E. Adaptive switching circuits. In Neurocomputing: Foundations of Research; MIT Press: Cambridge, MA, USA, 1988; pp. 123–134. [Google Scholar]
Werbos, P. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. Ph.D. Thesis, Committee on Applied Mathematics, Harvard University, Cambridge, MA, USA, 1974. [Google Scholar]
Haykin, S. Kalman Filtering and Neural Networks; John Wiley & Sons: Hoboken, NJ, USA, 2001. [Google Scholar]
Ljung, L. Asymptotic behavior of the extended Kalman filter as a parameter estimator for linear systems. IEEE Trans. Autom. Control 1979, 24, 36–50. [Google Scholar] [CrossRef]
Song, Y.; Grizzle, J.W. The extended Kalman filter as a local asymptotic observer for discrete-time nonlinear systems. J. Math. Syst. Estim. Control 1995, 5, 59–78. [Google Scholar]
De Mulder, W.; Bethard, S.; Moens, M.F. A survey on the application of recurrent neural networks to statistical language modeling. Comput. Speech Lang. 2015, 30, 61–98. [Google Scholar] [CrossRef]
Jaeger, H. Tutorial on Training Recurrent Neural Networks, Covering BPPT, RTRL, EKF and the Echo State Network Approach; GMD-Forschungszentrum Informationstechnik: Bonn, Germany, 2002; Voluem 5. [Google Scholar]
Ruck, D.W.; Rogers, S.K.; Kabrisky, M.; Maybeck, P.S.; Oxley, M.E. Comparative analysis of backpropagation and the extended Kalman filter for training multilayer perceptrons. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 14, 686–691. [Google Scholar] [CrossRef]
Simon, D. Training radial basis neural networks with the extended Kalman filter. Neurocomputing 2002, 48, 455–475. [Google Scholar] [CrossRef]
Singhal, S.; Wu, L. Training feed-forward networks with the extended Kalman algorithm. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Glasgow, UK, 23–26 May 1989; Volume 2, pp. 1187–1190. [Google Scholar] [CrossRef]
Cordova, J.J.; Yu, W. Recurrent wavelets neural networks learning via dead zone Kalman filter. In Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain, 18–23 July 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 1–7. [Google Scholar]
Camacho, J.; Villaseñor, C.; Alanis, A.Y.; Lopez-Franco, C.; Arana-Daniel, N. sKAdam: An improved scalar extension of KAdam for function optimization. Intell. Data Anal. 2020, 24, 87–104. [Google Scholar]
Diederik, K. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Iiguni, Y.; Sakai, H.; Tokumaru, H. A real-time learning algorithm for a multilayered neural network based on the extended Kalman filter. IEEE Trans. Signal Process. 1992, 40, 959–966. [Google Scholar] [CrossRef]
Jin, L.; Nikiforuk, P.N.; Gupta, M.M. Weight-Decoupled Kalman Filter Learning Algorithm of Multi-Layered Neural Networks. 1995. Available online: https://madangupta.com/pages/info/mmg/paper/RJ/RJ-088.pdf (accessed on 10 September 2025).
Alanis, A.Y.; Sanchez, E.N.; Loukianov, A.G. Discrete-time adaptive backstepping nonlinear control via high-order neural networks. IEEE Trans. Neural Netw. 2007, 18, 1185–1195. [Google Scholar]
Yingxin, L.; Min, W.; Jinhua, S.; Kaoru, H. Sequential growing-and-pruning learning for recurrent neural networks using unscented or extended Kalman filter. In Proceedings of the 2008 27th Chinese Control Conference, Kunming, China, 16–18 July 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 242–247. [Google Scholar]
Liu, Q.; Jiachen, M.; Wei, X. Action selection in cooperative robot soccer using Q-learning with Kalman filter. J. Comput. Inf. Syst. 2012, 8, 10367–10374. [Google Scholar]
Tripp, C.; Shachter, R.D. Approximate Kalman filter Q-learning for continuous state-space MDPs. arXiv 2013, arXiv:1309.6868. [Google Scholar]
Nobrega, J.P.; Oliveira, A.L. Kalman filter-based method for online sequential extreme learning machine for regression problems. Eng. Appl. Artif. Intell. 2015, 44, 101–110. [Google Scholar] [CrossRef]
Cao, Z.; Lu, J.; Zhang, R.; Gao, F. Iterative learning Kalman filter for repetitive processes. J. Process Control 2016, 46, 92–104. [Google Scholar] [CrossRef]
Douiri, M.R. Extended Kalman Filter Based Learning Fuzzy for Parameters Adaptation of Induction Motor Drive. In Proceedings of the 2014 13th Mexican International Conference on Artificial Intelligence, Tuxtla Gutierrez, Mexico, 16–22 November 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 147–151. [Google Scholar]
Bekhtaoui, Z.; Meche, A.; Dahmani, M.; Meraim, K.A. Maneuvering target tracking using q-learning based Kalman filter. In Proceedings of the 2017 5th International Conference on Electrical Engineering-Boumerdes (ICEE-B), Boumerdes, Algeria, 29–31 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–5. [Google Scholar]
Nobrega, J.P.; Oliveira, A.L. A sequential learning method with Kalman filter and extreme learning machine for regression and time series forecasting. Neurocomputing 2019, 337, 235–250. [Google Scholar] [CrossRef]
Li, Z.; Shi, L.; Yang, L.; Shang, Z. An Adaptive Learning Rate Q-Learning Algorithm Based on Kalman Filter Inspired by Pigeon Pecking-Color Learning. In Proceedings of the International Conference on Bio-Inspired Computing: Theories and Applications, Zhengzhou, China, 22–25 November 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 693–706. [Google Scholar]
Ullah, I.; Fayaz, M.; Naveed, N.; Kim, D. ANN based learning to Kalman filter algorithm for indoor environment prediction in smart greenhouse. IEEe Access 2020, 8, 159371–159388. [Google Scholar]
Chukhrova, N.; Johannssen, A. Kalman filter learning algorithms and state space representations for stochastic claims reserving. Risks 2021, 9, 112. [Google Scholar] [CrossRef]
Hu, K.; Wu, J.; Weng, L.; Zhang, Y.; Zheng, F.; Pang, Z.; Xia, M. A novel federated learning approach based on the confidence of federated Kalman filters. Int. J. Mach. Learn. Cybern. 2021, 12, 3607–3627. [Google Scholar] [CrossRef]
Srichandan, A.; Dhingra, J.; Hota, M.K. An improved Q-learning approach with Kalman filter for self-balancing robot using OpenAI. J. Control. Autom. Electr. Syst. 2021, 32, 1521–1530. [Google Scholar] [CrossRef]
Xiong, K.; Wei, C.; Zhang, H. Q-learning for noise covariance adaptation in extended KALMAN filter. Asian J. Control 2021, 23, 1803–1816. [Google Scholar] [CrossRef]
Wang, H. Extreme learning Kalman filter for short-term wind speed prediction. Front. Energy Res. 2023, 10, 1047381. [Google Scholar] [CrossRef]
Revach, G.; Shlezinger, N.; Ni, X.; Escoriza, A.L.; Van Sloun, R.J.; Eldar, Y.C. KalmanNet: Neural network aided Kalman filtering for partially known dynamics. IEEE Trans. Signal Process. 2022, 70, 1532–1547. [Google Scholar] [CrossRef]
Hang, L.; Ullah, I.; Yang, J.; Chen, C. An improved Kalman filter using ANN-based learning module to predict transaction throughput of blockchain network in clinical trials. Peer-Netw. Appl. 2023, 16, 520–537. [Google Scholar] [CrossRef]
He, P.; Wang, B.; Liu, X. Reinforcement learning adaptive Kalman filter for ae signal’s ar-mode denoise. In Proceedings of the 2023 IEEE 11th International Conference on Information, Communication and Networks (ICICN), Xi’an, China, 17–20 August 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 643–648. [Google Scholar]
de Araujo, P.R.M.; Noureldin, A.; Givigi, S. Continuous Action Learning Automata: A Strategy for Dynamic Optimization of Invariant Kalman Filter Covariances. In Proceedings of the 2024 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), Kingston, ON, Canada, 6–9 August 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 160–161. [Google Scholar]
Krishnamurthy, V.; Rojas, C.R. Slow convergence of interacting Kalman filters in word-of-mouth social learning. In Proceedings of the 2024 60th Annual Allerton Conference on Communication, Control, and Computing, Urbana, IL, USA, 24–27 September 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
Liu, S.T.; Fan, J.J.; Wang, R.D.; Han, H.; Zhang, D.Y. Kalman Filter-based Cycle-Consistent Adversarial Learning for Time Series Anomaly Detection. J. Netw. Intell. 2024, 9, 790–803. [Google Scholar]
Ruz Canul, M.A.; Ruz-Hernandez, J.A.; Alanis, A.Y.; Rullan-Lara, J.L.; Garcia-Hernandez, R.; Vior-Franco, J.R. Intelligent Robust Controllers Applied to an Auxiliary Energy System for Electric Vehicles. World Electr. Veh. J. 2024, 15, 479. [Google Scholar] [CrossRef]
Alanis, A.Y.; Alvarez, J.G.; Sanchez, O.D.; Hernandez, H.M.; Valdivia-G, A. Fault-Tolerant Closed-Loop Controller Using Online Fault Detection by Neural Networks. Machines 2024, 12, 844. [Google Scholar]
Hao, P.; Karakuş, O.; Achim, A. RKFNet: A novel neural network aided robust Kalman filter. Signal Process. 2025, 230, 109856. [Google Scholar] [CrossRef]
Quintal, G.; Sanchez, E.N.; Alanis, A.Y.; Arana-Daniel, N.G. Real-time FPGA decentralized inverse optimal neural control for a Shrimp robot. In Proceedings of the 2015 10th System of Systems Engineering Conference (SoSE), San Antonio, TX, USA, 17–20 May 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 250–255. [Google Scholar]
Sensor fault-tolerant control for a doubly fed induction generator in a smart grid. Eng. Appl. Artif. Intell. 2023, 117, 105527. [CrossRef]
Kamwa, I.; Grondin, R. Fast adaptive schemes for tracking voltage phasor and local frequency in power transmission and distribution systems. IEEE Trans. Power Deliv. 2002, 7, 789–795. [Google Scholar] [CrossRef]
Zhang, L.; Luh, P.B. Neural network-based market clearing price prediction and confidence interval estimation with an improved extended Kalman filter method. IEEE Trans. Power Syst. 2005, 20, 59–66. [Google Scholar] [CrossRef]
Malartic, Q.; Farchi, A.; Bocquet, M. State, global, and local parameter estimation using local ensemble Kalman filters: Applications to online machine learning of chaotic dynamics. Q. J. R. Meteorol. Soc. 2022, 148, 2167–2193. [Google Scholar] [CrossRef]
Chen, H.; Grant-Muller, S. Use of sequential learning for short-term traffic flow forecasting. Transp. Res. Part C Emerg. Technol. 2001, 9, 319–336. [Google Scholar] [CrossRef]
Gerber, A.; Green, D.P. Rational learning and partisan attitudes. Am. J. Political Sci. 1998, 42, 794–818. [Google Scholar] [CrossRef]
Hopfield, J.J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA 1982, 79, 2554–2558. [Google Scholar] [CrossRef] [PubMed]
Kumpati, S.N.; Kannan, P. Identification and control of dynamical systems using neural networks. IEEE Trans. Neural Netw. 1990, 1, 4–27. [Google Scholar]
Ghosh, J.; Shin, Y. Efficient higher-order neural networks for classification and function approximation. Int. J. Neural Syst. 1992, 3, 323–350. [Google Scholar] [CrossRef]
Feldkamp, L.A.; Prokhorov, D.V.; Feldkamp, T.M. Simple and conditioned adaptive behavior from Kalman filter trained recurrent networks. Neural Netw. 2003, 16, 683–689. [Google Scholar] [CrossRef]

Figure 1. Kalman variants.

Figure 2. Kalman filter applications.

Figure 3. Kalman filter update process for discrete-time linear systems.

Figure 4. Kalman filter update process.

Figure 5. MLP trained with an EKF-based algorithm.

Figure 6. RBF trained with an EKF-based algorithm.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alanis, A.Y. Exploring Kalman Filtering Applications for Enhancing Artificial Neural Network Learning. Algorithms 2025, 18, 587. https://doi.org/10.3390/a18090587

AMA Style

Alanis AY. Exploring Kalman Filtering Applications for Enhancing Artificial Neural Network Learning. Algorithms. 2025; 18(9):587. https://doi.org/10.3390/a18090587

Chicago/Turabian Style

Alanis, Alma Y. 2025. "Exploring Kalman Filtering Applications for Enhancing Artificial Neural Network Learning" Algorithms 18, no. 9: 587. https://doi.org/10.3390/a18090587

APA Style

Alanis, A. Y. (2025). Exploring Kalman Filtering Applications for Enhancing Artificial Neural Network Learning. Algorithms, 18(9), 587. https://doi.org/10.3390/a18090587

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Exploring Kalman Filtering Applications for Enhancing Artificial Neural Network Learning

Abstract

1. Introduction

2. Main Variants of Kalman Filter

3. Neural Network Learning

4. Kalman Filter for Neural Network Learning

4.1. Concepts Prior to KF

Optimal Estimation

4.2. Kalman Filter Realization

4.3. Extended Kalman Filter

4.4. Relevant Results on KF for Neural Network Learning

4.4.1. Comparison Between Kalman Filter and Recursive Least Squares Algorithm

4.4.2. Retropropagation Versus EKF

4.5. Multilayer Perceptron Trained with EKF

4.6. Recurrent High-Order Neural Network Trained with EKF

4.7. Radial Basis Neural Network Trained with an EKF

Possible Modifications

5. Challenges, Limitations, Open Problems and Future Work

6. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI