Unscented Kalman Filter-Aided Long Short-Term Memory Approach for Wind Nowcasting

: Obtaining reliable wind information is critical for efﬁciently managing air trafﬁc and airport operations. Wind forecasting has been considered one of the most challenging tasks in the aviation industry. Recently, with the advent of artiﬁcial intelligence, many machine learning techniques have been widely used to address a variety of complex phenomena in wind predictions. In this paper, we propose a hybrid framework that combines a machine learning model with Kalman ﬁltering for a wind nowcasting problem in the aviation industry. More speciﬁcally, this study has three objectives as follows: (1) compare the performance of the machine learning models (i.e., Gaussian process, multi-layer perceptron, and long short-term memory (LSTM) network) to identify the most appropriate model for wind predictions, (2) combine the machine learning model selected in step (1) with an unscented Kalman ﬁlter (UKF) to improve the ﬁdelity of the model, and (3) perform Monte Carlo simulations to quantify uncertainties arising from the modeling process. Results show that short-term time-series wind datasets are best predicted by the LSTM network compared to the other machine learning models and the UKF-aided LSTM (UKF-LSTM) approach outperforms the LSTM network only, especially when long-term wind forecasting needs to be considered.


Introduction
According to the Federal Aviation Administration (FAA), the FAA's air traffic organization served more than 44,000 flights and 2.7 million airline passengers daily in over 29 million square miles of airspace before the COVID-19 pandemic [1]. This is already a large number of flights and passengers; however, the FAA expects the United States (U.S.) domestic carrier passenger growth to average 1.8 percent per year over the next 20 years [2]. As aviation traffic continues to grow, most airport operators are concerned about ground delays directly related to operating costs [3]. Among various factors that affect the ground delays, accurate wind information around an airport is the most significant factor in evaluating the efficiency of airport operations [4]. Wind forecasting has recently been recognized as one of the most challenging tasks in the aviation industry [5].
Many research groups have been dedicated to developing numerical weather models to predict weather patterns. The two best known numerical weather models are the Global Forecast System [6] developed by the the National Oceanic and Atmospheric Administration (NOAA) and the European Centre for Medium-Range Weather Forecasts [7] developed by the European Centre, which are called the American model and the European model, respectively. While numerical weather models have been widely used in the aviation industry, it is worth mentioning that numerical weather models have some limitations in predicting wind patterns due to aleatory uncertainty. Recently, many machine learning techniques have been used along with a myriad of data-driven approaches to enhance the level of understanding of various complex phenomena in nature such as wind predictions.
In this paper, we propose a hybrid framework that combines a machine learning model and a Kalman filtering technique for a wind nowcasting problem in the aviation industry. More specifically, this research has three goals as follows: (1) compare machine learning models (i.e., Gaussian process (GP), multi-layer perceptron (MLP), and long short-term memory (LSTM) network [8]) to identify the most suitable model for wind predictions in Section 3.5, (2) combine the machine learning model selected in step (1) with an unscented Kalman filter (UKF) [9] to improve the fidelity of the model in Section 3.6, and (3) perform Monte Carlo simulations (MCSs) to quantify uncertainties arising from each modeling process in Section 3.7.
For the data-driven wind nowcasting approach proposed in this paper, we utilize the Modern-Era Retrospective analysis for Research and Applications-2 (MERRA-2) dataset [10] provided by the National Aeronautics and Space Administration (NASA) given that the MERRA-2 wind dataset is widely used in the aviation industry. For example, the Aviation Environmental Design Tool [11] developed by the FAA utilizes MERRA-2 wind data to calculate fuel consumption in a simulation environment. However, even though the MERRA-2 dataset contains reliable wind information, it is important to note that the MERRA-2 wind dataset is not adequate for wind speed predictions as it is basically historical data. The main purpose of this research is to develop a hybrid framework (i.e., combination of a machine learning algorithm and Kalman filtering technique) that uses the MERRA-2 wind dataset for wind nowcasting in the aviation industry. The remainder of this paper contains the following sections: literature review (Section 2), proposed methodology (Section 3), results and discussion (Section 4), and conclusion (Section 5).

Related Work
With the advent of artificial intelligence (AI), many machine learning techniques have been widely used to address various and complex phenomena in nature. In particular, many researchers have proposed new approaches using machine learning techniques to predict wind speed information at a specific location. As an illustration, for wind speed predictions, Mohandes et al. [12] used the support vector machine and the MLP and Kulkarni et al. [13] compared the artificial neural network (ANN) model performance with several statistical regression methods. Furthermore, Rozas-Larraondo et al. [14] proposed a new method based on non-parametric multivariate locally weighted regression for wind speed forecasting in airports and Khosravi et al. [15] conducted a case study to compare machine learning algorithms for time-series wind speed prediction at a wind farm in Brazil. Recently, various versions of the LSTM network have been widely used for short-term wind speed predictions [16][17][18].
Although machine learning techniques generally outperform traditional approaches (e.g., numerical weather models) in wind predictions, they do not always provide accurate wind information due to unpredictable uncertainties. In some cases, machine learning techniques are combined in a hybrid approach to provide more accurate wind predictions. For instance, for short-term wind speed predictions, extreme learning machines [19] are combined with either the nearest neighbors approach [20], the adaptive noise and autoregressive integrated moving average approach [21], or the improved seagull optimization algorithm [22]. Furthermore, Nezhad et al. [23] developed a new combined model that integrates wind source potential assessment and forecasting using image processing of satellite data and an adaptive neuro-fuzzy inference system. In 2021, Imani et al. [24] combined the rough and fuzzy set theory in the LSTM model to enhance accuracy and reduce data uncertainties.
Although the aforementioned hybrid methods improved the accuracy of short-term wind predictions, they might not be suitable frameworks for long-term wind forecasting since they did not use current and in-site measurements unlike Kalman filtering. In other words, it has been recently found that Kalman filtering [25,26] can improve the fidelity of machine learning models. As an illustration, Lee and Johnson [27] showed that Kalman filtering improves the accuracy of machine learning models such as GP regression and Ullah et al. [28] proposed a hybrid method combining an ANN and a Kalman filter technique to improve the performance of a prediction algorithm under dynamic conditions. In addition, Hur [29] recently presented a wind speed prediction scheme that comprises two stages: estimation by an extended Kalman filter (EKF) and prediction by a neural network. While the aforementioned literature survey has demonstrated the capability for wind forecasting, the proposed methods may not be applicable for the MERRA-2 wind dataset, which is one of the most commonly used wind datasets in the aviation industry, as the methods were not trained and developed using the MERRA-2 dataset. It is also worth mentioning that the methods did not consider a validation process at aviation-related locations such as cruise points of aircraft. Thus, the main contribution of this paper is to develop a hybrid framework that combines a machine learning model with Kalman filtering for wind nowcasting, especially using MERRA-2 wind data.

Methodology
This paper aims to develop a framework that performs wind nowcasting by combining a machine learning model with a Kalman filtering technique. The framework proposed in this paper consists of four phases as follows: (1) decompose annual wind data into training and validation samples as described in Section 3.1, (2) compare machine learning models (i.e., MLP, GP, and LSTM) to identify the most appropriate model for wind predictions as presented in Section 3.5, (3) combine the selected machine learning model with a UKF to improve the fidelity of the model as presented in Section 3.6, and (4) perform MCSs to quantify uncertainties arising from the modeling processes as described in Section 3.7. Figure 1 delineates an overall flowchart of the proposed framework.

Data Preparation
MERRA-2 contains a set of detailed weather-related properties (e.g., wind, humidity, and temperature) against longitude, latitude, altitude, and timestamp. Among various MERRA-2 weather variables, we specifically collected eastward and northward wind speed data as we wanted to develop a framework that performs wind predictions (e.g., eastward wind model and northward wind model) with the aim of providing an accurate wind forecast to the aviation industry. Figure 2 shows an example visualization of the MERRA-2 winds at a certain time and altitude. In terms of time-series data preparation, we retrieved the MERRA-2 wind dataset (i.e., three-hour interval dataset) from January to December in 2018 at a point of interest. To specify the point of interest, we collected the previous Delta Airlines 2638 flight trajectory information through the FlightAware flight tracking data platform [30]. We then selected one of the points consisting of the cruise phase of the flight (i.e., altitude = 34,000 feet). As the next step, we collected wind information at the point of interest from January to December in 2018 and generated a CSV file that includes information with respect to date and wind speed. Additionally, we decomposed the time-series dataset into a training (84%) phase (i.e., from January to October) and a validation (16%) phase (i.e., from November to December) for a holdout validation purpose. Figure 3 shows how the MERRA-2 annual wind datasets are decomposed into training and validation phases. Mathematically, given N number of data samples, we generated a dataset of N − 1 input and output pairs by shifting each data point for one-step-ahead forecasting; thus, the inputs become x 1:N−1 and their corresponding outputs are x 2:N . Here, subscript 1 : N means the history of the data from discrete time 1 to time N. In a nutshell, given the set of training data D = x 1:N−1 , x 2:N , a machine learning model identifies the relationship between input x i−1 and output x i by predicting output x i+1 with new input point x i . Here, we model wind speed information with one-step-ahead forecasting because the overall framework proposed in this paper includes the Kalman filter technique that generally follows one-step-ahead propagation [25]. The discrete dynamic equation of each wind is defined as follows: x where f (·) is wind dynamics that will be modeled by a machine learning technique and η is white noise. Here, x is an accessible ground-truth value (i.e., MERRA-2 wind data in this paper) in training and validation phases. It is important to note that we concentrated only on the point of interest to validate the applicability of the proposed methodology and further investigations may be conducted at other locations in U.S. territories (e.g., weather stations) when necessary.

Gaussian Process (GP)
A GP, which is also known as Kriging, was originally introduced by Matheron [31] as a geostatistical estimation method. The GP has been widely utilized in general regression problems because of the advantage that its prediction is probabilistic in such a way that it provides uncertainty bounds of the predictions. In this paper, we implemented the GP to create a nonlinear probabilistic regression model of MERRA-2 wind data. Given N input and output pairs, the hyper-parameters of the kernel were optimized during the GP regression process by maximizing the log marginal likelihood of the outputs. Additional details are summarized in Appendix A.

Multi-Layer Perceptron (MLP)
We implemented the MLP to create a nonlinear regression model of MERRA-2 wind data with the aim of finding the best weight parameters to minimize errors between predicted and target values. To find the best weight parameters in the model, we utilized the Adam algorithm that is an extended version of the stochastic gradient descent method. The MLP-based wind regression model entails the following fully-connected layers: (1) an input layer to receive MERRA-2 wind data, (2) an output layer with the linear activation function that makes a prediction, and (3) two hidden layers with the sigmoid function. Figure 4 shows the diagram of the MLP model structure used for this paper. To isolate the free hyper-parameters of the MLP wind regression model, we formulated the design of experiment (DoE) with respect to a number of hidden layers, a number of hidden nodes, learning rate, regularization penalty parameter, and batch size. The effective MLP model was finally determined by the choice of hyper-parameters tabulated in Table 1.

Long Short-Term Memory (LSTM) Network
While the MLP is widely used for nonlinear regression problems, it is important to note that the MLP-based regression method may not be robust for solving complex problems that frequently appear in nature. For this reason, many AI researchers have been committed to developing deep learning (DL) models that are generally defined with more than two hidden layers without losing the key idea of the MLP technique. The recurrent neural network (RNN), which is one of the DL models devised by mimicking the sequential processes of the human brain, has been introduced to particularly deal with time-series prediction tasks by allowing information to persist in network loops. However, one potential issue is that the RNN may not be capable of handling long-term dependencies, indicating that it only works well if the gap between previous and present information is small. In response to this concern, the LSTM network [8] was introduced to specifically handle long-term dependencies. The LSTM network typically has a memory block (i.e., output gate, input gate, forget gate) interacting in a very special way in modules. In this paper, we implemented the LSTM network, one of the popular DL models especially for time-series predictions, to handle sequential MERRA-2 wind data. That is, the LSTM network models true wind dynamics f (·) described in Equation (1), resulting in f LSTM (·) as follows: x where x i−1 is the ground truth value at time (i − 1) andx i is a predicted value at time i. In particular, we used the Keras LSTM library [32] to perform wind predictions with MERRA-2 data. The LSTM model structure used for this paper was constructed with the following properties as tabulated in Table 2.

Model Evaluation and Comparison
To identify the most appropriate machine-learning-based time-series wind prediction model, we computed the coefficient of determination (i.e., R-squared), root-mean-square error (RMSE), and mean absolute error (MAE) [33] with respect to the validation dataset. The error metrics used in this paper are defined as follows: where M is the number of data for this validation phase. The results are tabulated in Table 3. We also calculated the prediction errors (=x i − x i ) and uncertainty bounds of the models as shown in Figure 5. The results from Table 3 and Figure 5 indicate that time-series wind datasets are best predicted by the LSTM network (e.g., smallest RMSE and MAE) compared to the other machine learning models (i.e., GP and MLP).

UKF-Aided LSTM (UKF-LSTM) Approach
The most widely used algorithm for estimating the state variables of a dynamic system is a Kalman filter [25,26]. The Kalman filtering framework consists of two steps: (1) time update and (2) measurement update. In the step of time update (i.e., state propagation), it predicts state variables (e.g., wind speed in this paper) from one discrete-time k to next time (k + 1), like the one-step-ahead prediction in Equation (1). Once measured sensor data at time (k + 1) are incoming, the measurement-update step occurs. In the measurement update step, the filter corrects the predicted state value using predicted uncertainty based on RMSE minimization and recursive Bayesian estimation. In addition, to run the Kalman filter, the dynamic model and measurement model of the system are required to be known. When the model of the dynamic system is unknown or hard to be known (e.g., wind speed), we are able to use machine learning techniques to learn it. In fact, the machine learning community has applied machine learning techniques to both controls and estimation processes in the past. Estimation methods have ranged from Bayesian filtering with machine learning to nonlinear Kalman filtering with machine learning [27,34,35].
The Kalman filter is a linear quadratic estimator; thus, it performs only in linear systems. For handling nonlinear systems such as wind speed forecasting, we need to design nonlinear filters (e.g., EKF [36,37] and UKF [9]) modified from the linear Kalman filter. In this paper, we chose a UKF for the LSTM network selected for wind speed dynamics in Section 3.5. Since the LSTM network has no exact expression of analytical governing equations, obtaining the Jacobian of the LSTM network is challenging. Whereas an EKF requires computing Jacobian matrices, UKF does not since it is a more straightforward statistical approach. Moreover, theoretically, a UKF is more accurate than an EKF in nonlinear systems since the UKF has no linearization errors [38]. Although the UKF typically requires a high computational cost, the UKF has been selected for this research because the computational cost can be systematically manageable especially in the aviation domain. Mathematically, f LSTM (·) modeled in Equation (2) is used in the time update step as follows: where subscript k represents the k-th time step in the testing phase (i.e., long-term forecasting phase) and hat "ˆ" denotes an estimate; state estimatex := E[x]. Superscript − represents an a priori estimate. Unlike Equation (2), ground truth is not accessible in the long-term forecasting phase, so the input of f LSTM (·) in Equation (3) is not x butx. Initial condition x(0) = x 0 is given from the last point of the validation data (i.e., 21 UTC on December 31, 2018). In the step of measurement update: where optimal Kalman gain K is computed using predicted uncertainty and tilde "˜" denotes a measurement by a sensor. This measurement-update step is performed only when measurement dataz are available in the filter. In other words, if there is no measurement in time (k + 1), there is no measurement-update step in Kalman filtering. That is, without a measurement at time (k + 1), there is no correction ∆x k+1 = 0, and an a priori estimate is assigned to the final nowcasting at time (k + 1) (i.e.,x k+1 ⇐x − k+1 ). For implementation, we used an open-sourced package of a UKF [39]. For more details of the UKF, see Appendix B.
Although it is possible to obtain more accurate data by averaging outputs from multiple high-accuracy sensors if the given time is not constrained, this is not practical. In practical applications in the aviation industry, a real-time sensor reading process typically generates random noise because the sensor is a cheap and easy-to-implement sensor. Hence, it is inevitable to fuse real-time noisy measurements into the estimate values especially when a wind nowcasting analysis is performed. In fact, we use simulated noisy measurements in this paper to focus on testing our UKF-LSTM framework since we do not have an actual sensor that measures real-time wind speed at a certain location for our experiments. In this paper, we assume that the simulated sensor rate is as fast as the time rate of the learned LSTM network. That is, whenever the time-update step in Equation (3) is performed, the measurement-update step (i.e., correction) in Equations (4) and (5) is assumed to be performed. Figure 6 shows an overview of our methodology proposed in this paper. The proposed methodology consists of three phases (i.e., training, validation, and nowcasting) to predict wind speed information. We first decompose annual wind data points into N training and M validation samples. With N data points, we generate x 1:N−1 input and x 2:N output pairs by one-time shifting each point for preparing the training process. The training phase generates the LSTM network structure represented in Table 2. Next, the validation phase evaluates the performance of the trained LSTM model as illustrated in Table 3 and Figure 5. Once the LSTM model is validated, the nowcasting phase utilizes the UKF with sensor measurements (e.g., simulated noisy measurements in this paper) to improve the fidelity of the LSTM network. With the collaboration of the LSTM network and the UKF (i.e., UKF-LSTM), it eventually nowcasts wind values at each (k + 1) timestamp.

Monte Carlo Simulation (MCS)-Based Uncertainty Quantification
It is important to quantify the various uncertainties arising from input data (e.g., aleatory uncertainty) or modeling processes (e.g., epistemic uncertainty), especially if there is a need to generate a predictive model using a supervised machine learning algorithm. An uncertainty bound is generally defined as an interval that consists of probabilistic upper and lower bounds on the estimate of outcomes generated from the predictive model. The uncertainty bound for a linear regression method can easily be calculated with given equations (i.e., prediction/confidence interval) [40]; however, it is challenging to compute an uncertainty bound for nonlinear regression techniques such as the MLP. While the GP can provide meaningful uncertainty bounds along with the training process where both the mean and covariance are designed to be computed (Appendix A), it is not possible for the LSTM network to directly calculate an uncertainty bound. For this reason, we performed the MCS, a technique used to illustrate the impact of uncertainty in forecasting models, to quantify uncertainties that could arise from the UKF-LSTM modeling process.
Even if a machine learning technique trains the same dataset, the hyper-parameters for the learned model could be different at each time due to the model uncertainty. For uncertainty quantification of the machine learning technique, we ran ten Monte Carlo trials. In other words, we generated ten machine learning models by applying one machine learning technique to the same data ten times and then we computed the mean and variance of the ten models. The mean and variance became the outcomes of the MCS as shown in Figure 5.
Since it was assumed that the UKF in the proposed framework uses an easy-toimplement sensor that typically generates random noise, we employed the Gaussian random walk to generate simulated measurement data in this paper. In other words, as the actual sensor hardware does not exist in the experiment of this study, within the scope of only validation of our framework, our sensor model used in the UKF comprised ground-truth values plus random noise. That is, in Equation (5),z k+1 = x k+1 + ζ k+1 , where ζ is the Gaussian random measurement noise for the simulated measurement. Likewise, the simulated direct measurements were randomized, so we ran the MCS to quantify the uncertainty of our UKF-LSTM approach. More specifically, we conducted the following procedures to estimate an uncertainty bound of the UKF-LSTM process: (1) optimize hyperparameters of the LSTM network, (2) generate ten distinct simulated measurements using ground truth (i.e., MERRA-2 wind), (3) run the UKF-LSTM ten times with the isolated hyper-parameters, and (4) compute the mean and variance to quantify an uncertainty bound. Figure 7 shows a notional sketch of the MCS-based uncertainty quantification process flowchart.

Results and Discussion
We went through the goodness-of-fit procedure for three different machine learningbased time-series wind prediction models that include the GP, the MLP, and the LSTM network to identify the most appropriate model for time-series wind forecasting. Figure 5 presents the following implications: (1) the prediction results from the LSTM network are much closer to the ground truth data (i.e., MERRA-2 wind data) than those from the other machine learning models (that is, the modeling error of the LSTM is the smallest), and (2) the uncertainty bound of the LSTM network is also smaller than the other uncertainty bound generated by the GP. In other words, the red solid line is more accurate than the black dashed line or the blue dash-circle line, indicating that the LSTM network is the most suitable machine learning technique that models the time-series wind dataset. It is worth mentioning that the validation process utilized validation datasets (i.e., MERRA-2 wind data) as an input for the models. After identifying the best model for time-series wind predictions (i.e., LSTM network) among three machine learning models, we combined the LSTM network with the UKF to improve the fidelity of the LSTM network. To compare the UKF-LSTM approach with the LSTM network, we utilized a testing dataset (i.e., January 2019) as the ground truth. In the testing phase, the ground-truth values are not accessible, and they are used only for evaluating the performance of the proposed framework. As shown in each top side of Figure 8, it was observed that UKF-LSTM provided a better wind prediction compared to the LSTM network, especially when long-term wind forecasting needs to be considered, while the LSTM network generated wind forecasts that are valid only for short-term predictions. Moreover, as shown in each bottom half of Figure 8, the errors of the UKF-LSTM are smaller than those of measurements represented as red asterisks, indicating that our approach is more accurate than measured values from a cheap sensor.
It is important to note that the LSTM network (i.e., LSTM network model without UKF) did not perform well for the testing dataset because the model used wind data predicted by the model in the previous step as an input (i.e., recursively predicted value), which potentially results in feeding relatively incorrect input data to the model; thus, the model error sequentially increased over time. In other words, unlike the proposed UKF-LSTM approach, the LSTM network used a priori estimatex − (i.e., values without the correction step described in Equation (4) as the input of f LSTM (·) in Equation (3). In fact, given that the MERRA-2 dataset does not provide wind forecasts in a timely manner but focuses more on providing historical data, this motivated us to develop the UKF-LSTM approach to resolve the issue by implementing a filtering process into the LSTM network. One may claim that the LSTM network with a rolling prediction [41] possibly forecasts over the long-term; however, we did not consider the approach in this paper because it would typically require a high computational cost, which may result in it not being applicable for wind nowcasting in the aviation industry. To quantify uncertainties arising from the UKF-LSTM process, we performed the MCS with the isolated hyper-parameters of the LSTM network. As a result, the uncertainty bounds of the UKF-LSTM were identified, shown in each bottom half of Figure 8. As can be seen, it appears that the prediction error of the UKF-LSTM is within the uncertainty bound; thus, this indicates that the filtering process in our approach is well designed and well performed. However, it seems that the results of the LSTM network are averaged and its uncertainty is almost doubled compared to the UKF-LSTM approach. Moreover, the prediction error of the LSTM network is sometimes out of bounds; thus, seemingly the LSTM network would be too uncertain. Based on the results tabulated in Table 4, the LSTM network may not be appropriate for wind nowcasting given that next-step wind information is not provided.

Conclusions
This research established a framework that combines a machine-learning-based wind prediction algorithm with a Kalman filter technique with the aim of performing wind nowcasting in a more accurate manner. Three different machine learning algorithms (i.e., Gaussian process, multi-layer perceptron, and long short-term memory (LSTM) network) were evaluated to identify the most appropriate machine learning model for time-series wind predictions. The results indicate that the LSTM network performed better than the other machine learning models for time-series wind forecasts. However, the LSTM network provided relatively incorrect wind predictions especially when it needed to account for long-term wind forecasting. This was mainly due to the fact that the LSTM network took as input the wind value predicted by the model in the previous step (i.e., recursively predicted value), resulting in sequentially increasing the model error over time. To improve the fidelity of the LSTM network, we implemented an unscented Kalman filter (UKF) into the LSTM model, named the UKF-aided LSTM (UKF-LSTM) framework, and performed Monte Carlo simulations to validate whether the proposed framework generated results within uncertainty bounds. The results show that the UKF-LSTM approach outperformed the LSTM network and the prediction errors of the UKF-LSTM approach were within the uncertainty bounds, indicating that the filtering process in the framework was well designed. Although this research used the MERRA-2 wind dataset during the development period, the outcome of this research could be used with other wind datasets such as ground-based observational wind data. Future work will include further investigation of the framework developed in this research along with other locations in U.S. territories such as ground-based weather stations or cruise points of aircraft. This will not change the complexity of the framework but it will help airport operators improve airport planning procedures (e.g., runway operations) given that the proposed framework provides better wind nowcasting at an airport.

Appendix A. Gaussian Process
Let us consider a case in which we have the observation corrupted with white noise where ν i ∼ N (0, β −1 ). Since the white noise is independent of each data point, where the definition of covariance matrix C N ∈ R N×N is C N = K + β −1 I N×N . Hence, every element of covariance matrix C has the form C(x i , x j ) = k(x i , x j ) + β −1 δ i,j . The most widely used kernel function is the squared exponential, and its form is k(x i , x j ) = c = k(x N+1 , x N+1 ) + β −1 , and then p(y N+1 ) = N (0, c). Now we claim the conditional distribution is a Gaussian distribution with mean GP µ (·) and covariance GP Σ (·) specified as follows: where k * ∈ R N and it has elements k(x 1 , x N+1 ), k(x 2 , x N+1 ), · · · , k(x N , x N+1 ). For simplicity of presentation, we denote Equation (A3) as y i+1 = GP(y i ).

Appendix B. Unscented Kalman Filter
Appendix B.1. Time Update We use the following time-update equations to propagate the state estimate and covariance from one measurement time to the next. To propagate from time step k to (k + 1), we first choose unscented transformation UT(·) to obtain sigma pointsx (i) k with appropriate changes since the current best guess for the mean and covariance of x k arex + k and P + k . Error-covariance P : where x ∈ R n and for more details of UT(·), see ref. [38]. Next, we use nonlinear system equation f (·) to transform the sigma points intox . We averagex (i) k+1 vectors to obtain a priori state estimatex − k+1 at time (k + 1), and then we estimate a priori error covariance P − k+1 . Here, we add the Q k term to take process noise such as modeling errors into account.

Appendix B.2. Measurement Update
Now that the time update is completed, we implement the following measurementupdate equations when actual measurement z k+1 arrives at the filter. Similar to Equation (A5), we generate sigma pointsx (i) k+1 by UT(·) using prior state estimatesx − k+1 and covariance P − k+1 . Next, we use known nonlinear measurement equation h(·) to transform the sigma points intoẑ (i) k+1 vectors. That is,ẑ ). Similar to Equations (A6) and (A7), we averageẑ (i) k+1 vectors to obtain predicted measurementẑ k+1 at time (k + 1), and then we estimate covariance P z of the predicted measurement. Here, we add the R k+1 term to take measurement noise into account.
The measurement update of the state estimate is performed using the normal Kalman filter equations as follows: where K is called the Kalman gain, and Equation (A10) is the Joseph's form [42] of the covariance measurement update, so this form preserves its symmetry and positive definite.
For more details such as optimality and derivation, see refs. [38,43].