Development of an Online Adaptive Parameter Tuning vSLAM Algorithm for UAVs in GPS-Denied Environments

In recent years, unmanned aerial vehicles (UAVs) have been applied in many fields owing to their mature flight control technology and easy-to-operate characteristics. No doubt, these UAV-related applications rely heavily on location information provided by the positioning system. Most UAVs nowadays use a global navigation satellite system (GNSS) to obtain location information. However, this outside-in 3rd party positioning system is particularly susceptible to environmental interference and cannot be used in indoor environments, which limits the application diversity of UAVs. To deal with this problem, in this paper, a stereo-based visual simultaneous localization and mapping technology (vSLAM) is applied. The presented vSLAM algorithm fuses onboard inertial measurement unit (IMU) information to further solve the navigation problem in an unknown environment without the use of a GNSS signal and provides reliable localization information. The overall visual positioning system is based on the stereo parallel tracking and mapping architecture (S-PTAM). However, experiments found that the feature-matching threshold has a significant impact on positioning accuracy. Selection of the threshold is based on the Hamming distance without any physical meaning, which makes the threshold quite difficult to set manually. Therefore, this work develops an online adaptive matching threshold according to the keyframe poses. Experiments show that the developed adaptive matching threshold improves positioning accuracy. Since the attitude calculation of the IMU is carried out based on the Mahony complementary filter, the difference between the measured acceleration and the gravity is used as the metric to online tune the gain value dynamically, which can improve the accuracy of attitude estimation under aggressive motions. Moreover, a static state detection algorithm based on the moving window method and measured acceleration is proposed as well to accurately calculate the conversion mechanism between the vSLAM system and the IMU information; this initialization mechanism can help IMU provide a better initial guess for the bundle adjustment algorithm (BA) in the tracking thread. Finally, a performance evaluation of the proposed algorithm is conducted by the popular EuRoC dataset. All the experimental results show that the developed online adaptive parameter tuning algorithm can effectively improve the vSLAM accuracy and robustness.


Introduction
Due to high flexibility and excellent maneuverability, UAVs have become one of the most popular aerial vehicle platforms over the past few years. Moreover, the outdoor applications of UAVs are well known to the public, while this paper focuses on visual positioning in GPS-denied environments. Therefore, several indoor applications of UAVs are particularly illustrated. For example, UAVs can provide appropriate assistance to humans for those high-risk missions such as rescue operations and indoor equipment maintenance in nuclear power plants [1]. Besides, UAVs can also effectively enhance human productivity, such as being used for intelligent warehousing management in large factories or plant care and monitoring in greenhouses [2]. Military indoor inspection, indoor thresholds. In general, these parameters are not only fixed after system startup but also mostly lack intuitive physical meaning, making it difficult to have a well-founded manual adjustment. For UAVs with aggressive motions or in complex operating environments, these parameters require instant online updates to keep the system in optimal condition. For the above reasons, the concept of dynamic threshold is introduced into the field of vSLAM, such as dynamically adjusting threshold value [23], which controls the selection of keyframe based on field-of-view (FoV) repetition rate. Meanwhile, according to the results of several experiments, it is found that the selection of feature matching threshold has a dominant effect on positioning accuracy. Therefore, how to effectively select the matching threshold online according to different situations will be one of the key discussion issues in this paper.
In fact, not only the vSLAM system has the requirement of online parameter adjustment, but also the Mahony complementary filter [24] for UAVs attitude estimations. As the UAVs are under aggressive motions, the measured acceleration varies drastically, which will cause biased attitude estimation. The biased estimates not only deteriorate the basic flight stability of UAVs, but also lower the vSLAM accuracy. Therefore, this paper proposes to adjust the K p online in Mahony complementary filter to further improve the attitude estimation accuracy, not only to achieve better flight performance of UAVs, but also to prepare for the information fusion with the vSLAM system. The following summarizes the three major problems that are going to be addressed in this paper: 1.
Online adaptive parameter tuning for feature matching threshold, which lacks physical meaning. 2.
Online adaptive K p gain adjustment for Mahony complementary filter to resist aggressive motions.

3.
Fusion and motion compensation loop design between vSLAM and IMU.
Following the above-mentioned issues, the further proposed online adaptive parameter tuning algorithm and motion compensation loop in this paper has the following three primary contributions.

1.
The matching threshold and the K p gain, which are not easy to determine via manual tuning, are adjusted adaptively according to the UAVs' flight status.

2.
The proposed online adaptive parameter tuning algorithm can effectively improve the pose estimation accuracy and can enhance frame per second (FPS) by up to 70% and 29%, respectively, in the EuRoC dataset. 3.
The developed motion compensation loop subroutine can effectively utilize IMU information to improve the anti-shading robustness of the original vSLAM performance. Moreover, incorporating the presented online adaptive parameter tuning algorithm can further improve the robustness to a higher level.

The Framework of the vSLAM System
This paper develops a proposed online adaptive parameter tuning algorithm based on the S-PTAM [25], whose performance is comparable to the state-of-the-art ORB-SLAM2 [9] and has better accuracy than stereo large-scale direct SLAM (S-LSD-SLAM) [26]. The most significant feature of S-PTAM is that the overall system involves two independent threads, i.e., the tracking thread and the mapping thread. The former mainly performs real-time online localization, and the latter is responsible for local map optimization. In order to introduce the framework of the proposed algorithm, it is essential to briefly explain several important concepts in these threads and the settings adopted in this paper.

Coordinate Setup
EuRoC [27] dataset is used for the final algorithm validation. The euRoC dataset not only provides stereo image sequences and IMU measurements but also the relative pose relationships between sensors and the ground truth about 6 degrees of freedom (DoF) pose, which will be used as a reference in this paper. More details about this dataset can be accessed in the "Data Availability Statement" section at the end of this paper. In order to facilitate the comparison of results, this paper refers to its coordinate settings. The whole coordinates system contains a left camera frame (C), right camera frame (C right ), body frame (B), world frame (W), and navigation frame (S). Both camera frames take the front shooting direction as the z-axis, while the right and bottom of the shooting direction as the x-axis and y-axis, respectively. The optical center of the camera is taken as the origin. The body frame is the same as the z-axis parallel to the front shooting direction but different from the camera frames; the upper and right sides of the shooting direction are the x-axis and the y-axis. The world frame whose z-axis is parallel to gravity is defined by instruments that provide ground truth poses, such as the Vicon tracking system or Leica MS50. In order to show the definition of all frames more clearly, Figure 1 is provided to help to understand.
EuRoC [27] dataset is used for the final algorithm validation. The euRoC dataset not only provides stereo image sequences and IMU measurements but also the relative pose relationships between sensors and the ground truth about 6 degrees of freedom (DoF) pose, which will be used as a reference in this paper. More details about this dataset can be accessed in the "Data Availability Statement" section at the end of this paper. In order to facilitate the comparison of results, this paper refers to its coordinate settings. The whole coordinates system contains a left camera frame ( C ), right camera frame ( right C ), body frame ( B ), world frame ( W ), and navigation frame ( S ). Both camera frames take the front shooting direction as the z-axis, while the right and bottom of the shooting direction as the x-axis and y-axis, respectively. The optical center of the camera is taken as the origin. The body frame is the same as the z-axis parallel to the front shooting direction but different from the camera frames; the upper and right sides of the shooting direction are the x-axis and the y-axis. The world frame whose z-axis is parallel to gravity is defined by instruments that provide ground truth poses, such as the Vicon tracking system or Leica MS50. In order to show the definition of all frames more clearly, Figure 1 is provided to help to understand.
(a) (b) (c) It must be noted that the vSLAM system does not know any information about the world frame during the positioning process, which means that the world frame cannot be used as the reference frame to describe the pose, so it is necessary to introduce the navigation frame. In this paper, the first left camera frame, after activating the vSLAM system, is defined as the navigation frame, and all subsequent positioning information is described as the left camera frame with respect to the navigation frame, as shown in Figure 2. It must be noted that the vSLAM system does not know any information about the world frame during the positioning process, which means that the world frame cannot be used as the reference frame to describe the pose, so it is necessary to introduce the navigation frame. In this paper, the first left camera frame, after activating the vSLAM system, is defined as the navigation frame, and all subsequent positioning information is described as the left camera frame with respect to the navigation frame, as shown in Figure 2.

Keyframe Selection
The current tracking frame is treated as a keyframe as long as it meets the keyframe selection criteria. It is worth mentioning that the core strategy of keyframe selection is to check whether the whole map can provide enough feature points for the current tracking

Keyframe Selection
The current tracking frame is treated as a keyframe as long as it meets the keyframe selection criteria. It is worth mentioning that the core strategy of keyframe selection is to check whether the whole map can provide enough feature points for the current tracking frame to match. S-PTAM in [25] selects the keyframe that has the most covisible points with the current tracking frame and then takes the number of map points observed by this keyframe as the reference. When the number of map points matched with the current tracking frame is less than half of the reference or less than twenty, the tracking frame is considered a keyframe. In short, the keyframe observes more places that have not been explored before, as shown in Figure 3.

Keyframe Selection
The current tracking frame is treated as a keyframe as long as it meets the keyframe selection criteria. It is worth mentioning that the core strategy of keyframe selection is to check whether the whole map can provide enough feature points for the current tracking frame to match. S-PTAM in [25] selects the keyframe that has the most covisible points with the current tracking frame and then takes the number of map points observed by this keyframe as the reference. When the number of map points matched with the current tracking frame is less than half of the reference or less than twenty, the tracking frame is considered a keyframe. In short, the keyframe observes more places that have not been explored before, as shown in Figure 3.   By searching for keyframes that have a covisibility relationship with the previous tracking frame, the map points that may be observed at that moment can be quickly predicted from the global map, namely the submap. Then, the initial guess of the left camera pose can be predicted by the constant velocity motion model, which can combine with frustum culling to further remove the map points that cannot be seen in the submap, and further obtain the local map. Thereafter, matching features between the stereo image at the moment and the local map are conducted. Finally, according to the matching result, By searching for keyframes that have a covisibility relationship with the previous tracking frame, the map points that may be observed at that moment can be quickly predicted from the global map, namely the submap. Then, the initial guess of the left camera pose can be predicted by the constant velocity motion model, which can combine with frustum culling to further remove the map points that cannot be seen in the submap, and further obtain the local map. Thereafter, matching features between the stereo image at the moment and the local map are conducted. Finally, according to the matching result, BA is applied to obtain the optimal pose, which takes the output of the constant velocity motion model as the initial guess. The overall tracking thread is to repeat the above steps continuously. The associated detailed flowchart is summarized in Figure 5. By searching for keyframes that have a covisibility relationship with the previous tracking frame, the map points that may be observed at that moment can be quickly predicted from the global map, namely the submap. Then, the initial guess of the left camera pose can be predicted by the constant velocity motion model, which can combine with frustum culling to further remove the map points that cannot be seen in the submap, and further obtain the local map. Thereafter, matching features between the stereo image at the moment and the local map are conducted. Finally, according to the matching result, BA is applied to obtain the optimal pose, which takes the output of the constant velocity motion model as the initial guess. The overall tracking thread is to repeat the above steps continuously. The associated detailed flowchart is summarized in Figure 5.   There are two noteworthy points for the aforementioned algorithm flow chart:

•
The output of the constant velocity motion model may be a weak initial guess, especially when UAVs are in aggressive motions such as sharp turnings or lost image information.

•
The optimization of the BA is highly dependent on the accuracy of feature matching.
In other words, the initial guess and the outliers of the matching pair must be handled carefully during the development process. For the former, this paper will obtain a more reliable value by using IMU information, which will be explained clearly in Sections 4 and 5. First of all, let's introduce feature matching and BA in the following section.

Feature Extraction and Matching
Before feature matching, feature extraction is performed on the stereo images to find the 2D static feature points, which consist of two parts, key points and descriptors. It is worth mentioning that the key points cannot be matched directly, while the further extracted descriptors allow the key points have the ability to describe the surrounding areas, which makes it possible to perform feature matching. In this paper, the Shi-Tomasi corner detection algorithm [28] proposed by Jianbo and Tomasi is used for the detection of the key point, while binary robust independent elementary features (BRIEF) [29] is used for the descriptors extraction. Figure 6. shows the key point detection results for the EuRoC dataset. Examining Figure 6b, it is obvious that in the low-illuminated area, fewer key points will be detected, which may further lead to positioning divergence or drift. which makes it possible to perform feature matching. In this paper, the Shi-Tomasi corner detection algorithm [28] proposed by Jianbo and Tomasi is used for the detection of the key point, while binary robust independent elementary features (BRIEF) [29] is used for the descriptors extraction. Figure 6. shows the key point detection results for the EuRoC dataset. Examining Figure 6b, it is obvious that in the low-illuminated area, fewer key points will be detected, which may further lead to positioning divergence or drift. The BRIEF descriptor is a 256-bit binary vector, therefore, the Hamming distance is adopted to evaluate the feature similarity. For example, if there are two binary vectors, such as 10001 and 01011, with three different bits in the same position between them, then the Hamming distance is defined as three. As a result, the larger the Hamming distance, the lower the similarity of the two vectors. Following the above description, in this paper, each 3D map point on the local map is checked against all the feature points on the current stereo images by the brute-force feature matching algorithm, further matching those with the minimum Hamming distance and below the matching threshold. The illustration of the overall matching algorithm is shown in Figure 7, while Figure 8 shows the result of feature matching in the EuRoC dataset. At the same time, since stereo rectification is performed in advance, the ideal matching pairs can be guaranteed to have close v values, which can be used as a physical constraint to preliminarily filter out the extreme outliers. However, as illustrated in Figure 8, although this strong matching constraint can remove most of the outliers, certain mismatching pairs remain inevitable, as shown by the pink lines in Figure 8. Therefore, the BA algorithm, which is susceptible to mismatching, must The BRIEF descriptor is a 256-bit binary vector, therefore, the Hamming distance is adopted to evaluate the feature similarity. For example, if there are two binary vectors, such as 10001 and 01011, with three different bits in the same position between them, then the Hamming distance is defined as three. As a result, the larger the Hamming distance, the lower the similarity of the two vectors. Following the above description, in this paper, each 3D map point on the local map is checked against all the feature points on the current stereo images by the brute-force feature matching algorithm, further matching those with the minimum Hamming distance and below the matching threshold. The illustration of the overall matching algorithm is shown in Figure 7, while Figure 8 shows the result of feature matching in the EuRoC dataset. At the same time, since stereo rectification is performed in advance, the ideal matching pairs can be guaranteed to have close v values, which can be used as a physical constraint to preliminarily filter out the extreme outliers. However, as illustrated in Figure 8, although this strong matching constraint can remove most of the outliers, certain mismatching pairs remain inevitable, as shown by the pink lines in Figure 8. Therefore, the BA algorithm, which is susceptible to mismatching, must introduce an additional mechanism to maintain the pose estimation accuracy. The related details will be presented in the next section. introduce an additional mechanism to maintain the pose estimation accuracy. The related details will be presented in the next section.

Bundle Adjustment
Before constructing the BA cost function, the camera model must be established first. The pinhole model is often used to describe the conversion mechanism between the camera frame and the pixel frame. The projection can be expressed as:

Bundle Adjustment
Before constructing the BA cost function, the camera model must be established first. The pinhole model is often used to describe the conversion mechanism between the camera frame and the pixel frame. The projection can be expressed as: where u, v are pixel coordinate values, while f x , f y are focal lengths (in pixels) in the u and v directions, respectively. c u and c v are the corresponding principal points. C R R and t are the rotation matrix and translation vector, respectively, for the reference frame (R) with respect to the camera frame (C), p R is the map point position represented in the reference frame, s is the scale factor, the subscript (·) 1:3 means to take the first three values of the vector, and C R T is a transform matrix that can be written as: In essence, the BA algorithm is to solve for the optimal camera poses and map point positions by minimizing the re-projection error and its cost function can be expressed as Equation (3).
where C S T j,est is the transform matrix for the navigation frame with respect to the j th camera frame, p S i,est is the map point position represented in the navigation frame, and u k , v k are the pixel value on the j th camera which match to the map point p S i,est . And re-projection error can be defined as: where e k ∈ R 2×1 . Substitute Equation (4) into Equation (3) and convert it into a vector form.
where e re can be written as: After defining the cost function, the derivatives of the state, i.e., camera poses C S T j,est and map points, p S i,est must be determined in order to construct the Jacobian matrix of e re that will be used in the optimization procedure. However, it is worth noting that the rotating matrix itself has six constraints, that is, the L2-norm of each column and row must equal 1. In other words, it must guarantee that these constraints always hold during each iteration of the optimization, which will not only make the process difficult to perform but also increase the time spent. Fortunately, by introducing Lie algebra [30], this problem can be solved perfectly.
Firstly, define a small pose perturbation as shown in Equation (7).
Equation (7) can be written in the form of Lie algebra by logarithmic mapping, shown as follows.
where (·) ∧ SE3 is the inverse operation of Equation (9) and can be defined as: An additional similar marker (·) ∧ SO3 must be introduced, whose definition is shown in Equation (12), and both (·) ∧ SE3 (·) ∧ SO3 will be used in the subsequent derivation.
Then, the derivative of the re-projection error for small pose perturbation (represented as the Lie algebra) can be obtained by the chain rule.
where p C i,est is defined as: Therefore, according to Equations (1) and (14), the first term ∂e k /∂p C i,est in Equation (13) can be derived as follows. where , and the scale factor s is equal to p C i,est,z . Next, the left perturbation model will be introduced in order to obtain the second term ∂p C i,est /∂ξ j,est in Equation (13), which is the derivative of Lie algebra.
The derivative of the re-projection error with respect to the pose is obtained by substituting Equations (15) and (16) into Equation (13), which yields The derivative of the re-projection error with respect to the map point can be obtained by using the chain rule again, which gives It is worth noting that the first term ∂e k /∂p C i,est of Equation (18) has been derived from Equation (15). Therefore, only the second term ∂p C i,est /∂p S i,est needs to be derived. This result can be obtained directly from Equation (14), as shown in the following. Substituting Equations (15) and (19) into Equation (18) yields According to Equations (17) and (20), the Jacobian matrix of e re can be constructed by Based on the Jacobian matrix Equation (21), the increment of state can be solved by applying Levenberg-Marquardt algorithm (LM) to further obtain the optimal camera poses and map point positions. That is where ∆x is defined as: According to Equations (17) and (20), the Jacobian matrix of e re can be constructed by Based on the Jacobian matrix Equation (21) where x Δ is defined as: In each optimization iteration, the state will be continuously updated according to Equation (24) until the increment x Δ is small enough to stop. Moreover, the termination condition applied in this paper is that the root means square of x Δ is less than 9 10 .
However, it is worth noting that BA is sensitive to outliers, namely feature mismatching. To solve this issue, some strategies for filtering outliers must be designed. In this paper, the Huber loss will be used to deal with this problem.
Different from the square error expressed in Equation (3), the cost function is now modified as follows by considering the Huber loss.
,, , where L is the Huber function and is defined as: where e n 1 , i e represents the th i element of the vector e and δ is the Huber threshold which is set to 5.991 in this paper and must be set manually in advance. It is evident that when the error is greater than the threshold δ , the error will show a linear growth In each optimization iteration, the state will be continuously updated according to Equation (24) until the increment ∆x is small enough to stop. Moreover, the termination condition applied in this paper is that the root means square of ∆x is less than 10 −9 .
However, it is worth noting that BA is sensitive to outliers, namely feature mismatching. To solve this issue, some strategies for filtering outliers must be designed. In this paper, the Huber loss will be used to deal with this problem.
Different from the square error expressed in Equation (3), the cost function is now modified as follows by considering the Huber loss.
where L(·) is the Huber function and is defined as: where e ∈ R n×1 , e i represents the i th element of the vector e and δ is the Huber threshold which is set to 5.991 in this paper and must be set manually in advance. It is evident that when the error is greater than the threshold δ, the error will show a linear growth instead of the original squared increment, which can effectively eliminate the large error caused by the outliers. A detailed derivation and explanation about how to realize the outliers suppression into the state increment calculation shown in Equation (22) are given in the following. With the use of the Huber function (26), let us redefine the problem as searching for a state increment ∆x based on a fixed state x f ix that can make the modified cost function as small as possible. That is By using the Taylor expansion, the first-order approximation can be obtained where e re is defined as: Using the chain rule, the derivative of the cost function with respect to the state increment can be expressed by where b and B are defined by and respectively. According to Equations (26) and (31), the detailed configuration of the diagonal matrix B can be written as: where e re,1 , e re,2 and e re,2m are the 1st, 2nd and 2mth element of the vector e re , respectively. The ψ(·) is defined as: Substituting Equation (29) into Equation (30) yields Compared with Equation (22), obviously, the cost function with Huber loss is almost equivalent to iteratively reweighted least squares (IRLS). Therefore, based on Equations (36) and (24), the BA algorithm with outlier rejection can be carried out to obtain the optimal camera poses and map point positions.
Following the above description, not only the sensitivity of the BA with respect to the initial value but also the resistance against outliers will be tested. More specifically, the ground truth pose of the camera ( C S T gt ) is multiplied by a perturbation as the initial value, which is shown in Equation (37).
where ζ ∈ se(3). Finally, we plot the root mean square error (RMSE) of the estimated camera pose according to different initial guesses C S T initial guess , which are represented as different L2-norm values ζ. At the same time, it must be emphasized that the BA used in the tracking thread does not optimize the numerous map points, but only the current camera poses in order to improve real-time performance, also known as motion-only BA [8]. Therefore, the following tests (with outliers) are all based on this type of BA. The first row of Figure 9 shows the results without Huber, while the second row presents those with Huber. equivalent to iteratively reweighted least squares (IRLS). Therefore, based on Equations (36) and (24), the BA algorithm with outlier rejection can be carried out to obtain the optimal camera poses and map point positions.
Following the above description, not only the sensitivity of the BA with respect to the initial value but also the resistance against outliers will be tested. More specifically, the ground truth pose of the camera ( T C S gt ) is multiplied by a perturbation as the initial value, which is shown in Equation (37).
where ζ 3 se . Finally, we plot the root mean square error (RMSE) of the estimated camera pose according to different initial guesses T C S initial guess , which are represented as different L2-norm values ζ . At the same time, it must be emphasized that the BA used in the tracking thread does not optimize the numerous map points, but only the current camera poses in order to improve real-time performance, also known as motion-only BA [8]. Therefore, the following tests (with outliers) are all based on this type of BA. The first row of Figure 9 shows the results without Huber, while the second row presents those with Huber.  Apparently, with the aid of the Huber loss, the performance of BA is much more accurate, which means that the Huber loss can effectively resist the impact of outliers. Meanwhile, the sensitivity against the initial value perturbation is also much lower. At the right side of the horizontal axis of each graph in Figure 9, it also illustrates that if the initial guesses are quite poor, the estimated state will still diverge even using the Huber loss. Fortunately, the image frame rate is mostly between 20 and 30. The UAV's pose state usually does not change much in such a short period of time (about 0.05 s). Therefore, a terrible initial guess is less likely to be created under normal circumstances.

Accuracy Analysis under Different Matching Thresholds
After introducing the problems faced by BA, let's move on to another issue. Feature matching plays a quite important role in the feature-based vSLAM system, and the matching result will directly affect many critical procedures, such as keyframe selection, BA, triangulation used in mapping, and so on. Undoubtedly, all of the above will directly or indirectly impact the final vSLAM accuracy. Meanwhile, as mentioned in Section 2.4, feature matching relies heavily on the matching threshold, which is a Hamming distance with no physical meaning and must be set before starting the vSLAM system. Based on the above description, a correlation between the matching threshold and the positioning accuracy definitely exists. As a result, a guide regarding the selection of the threshold should be provided.
Take the MH_01_easy series as an example, and observe the positioning accuracy with matching thresholds 10, 15 and 20, respectively. As listed in Table 1, the absolute trajectory error (ATE) and relative pose error (RPE) [31] are adopted as the benchmark to evaluate the accuracy. According to Table 1, it can be seen that the RPE and ATE are not the smallest for the most severe matching threshold, 10. The reason is that when the matching threshold is very small, it drastically reduces the number of correct match pairs and thus loses the constraint for state optimization. However, the increase in the matching threshold does not necessarily guarantee an increase in localization accuracy. Table 1 clearly reveals that a robust vSLAM should be able to adjust the associated matching threshold automatically. The online automatic threshold scheduling not only improves the overall localization accuracy but prevents tedious manual adjustment as well. In the following section, an online adaptive matching threshold tuning algorithm is proposed.

Online Adaptive Matching Threshold Tuning Algorithm
This research tends to use more physically meaningful displacement and yaw angle differences between keyframes as indicators to adjust the matching threshold. As mentioned in Section 2.2, the overlap rate between the keyframe's FoV and the current global map is relatively low. For example, when the displacement and yaw angle difference between two adjacent keyframes are less than certain thresholds defined as thres m,1 and thres yaw,1 , it represents a contradiction with the definition of a keyframe. Therefore, it can be inferred that the threshold of the Hamming distance is too severe, and then increases the value immediately. In the opposite case, the matching threshold will be reduced. Because the initial stereo image is the most stable, the initial matching threshold is set to 10, and an upper bound thres max and lower bound thres min are set to limit the matching threshold range. The overall process is summarized in Figure 10, and the parameter settings used in this paper are listed in Table 2.
the value immediately. In the opposite case, the matching threshold will be reduced. Because the initial stereo image is the most stable, the initial matching threshold is set to 10, and an upper bound max thres and lower bound min thres are set to limit the matching threshold range. The overall process is summarized in Figure 10, and the parameter settings used in this paper are listed in Table 2.

Mahony Complementary Filter
IMUs is an indispensable component for UAVs, and most flight control algorithms rely heavily on them for attitude estimation. In other words, IMU information is basically available on UAVs. Besides, it is expected that the positioning accuracy or robustness can be improved by fusing IMU and vSLAM information. Therefore, this paper not only uses vSLAM to obtain positioning information but also uses Mahony complementary filter to

Mahony Complementary Filter
IMUs is an indispensable component for UAVs, and most flight control algorithms rely heavily on them for attitude estimation. In other words, IMU information is basically available on UAVs. Besides, it is expected that the positioning accuracy or robustness can be improved by fusing IMU and vSLAM information. Therefore, this paper not only uses vSLAM to obtain positioning information but also uses Mahony complementary filter to calculate the UAV's attitude from IMU measurements. First, a brief introduction to the Mahony complementary filter will be given below.
The first step to carrying out the Mahony complementary filter algorithm is to integrate the gyroscope measurements or angular velocity, which can be expressed as where G represents the gravity frame, whose z-axis is always parallel to the gravity. B G q k is the normalized quaternion, which represents the attitude of the gravity frame with respect to the body frame at the moment k. ω k−1,mes is the angular velocity measurement at the moment k − 1 and ∆k is the sampling period of the IMU, which is 0.005 s in the EuRoC dataset. Ω(·) is defined as: The unit gravity vector a k,est is further estimated by the attitude B G q k derived from Equation (38), as shown in Equation (40).
where a k,est is the estimated unit gravity vector, the star mark in B G q * k represents the conjugate operation on the quaternion, and ⊗ is defined as: After obtaining the estimated unit gravity vector, the error e k can be calculated by comparing a k,est it with the acceleration measurement where a k, f ilter represents the acceleration measurement after passing through a low-pass filter (LPF) at moment k. The setting of the LPF in this paper will be described in Section 5.1. Then, a P gain K p is applied for IMU pose estimation correction purpose The correction effort u k is used to compensate for the angular velocity measurement. As a result, the new attitude estimation at the next moment can be derived by integrating the compensated angular velocity shown as follows However, there are some problems in this process, which will be analyzed and improved in the following section.

Online Adaptive K p Tuning
According to Equations (42) and (43), the attitude error correction (44) should have a premise. The acceleration measurement is supposed to contain gravity information only to truly reflect the accumulated error of attitude. In other words, the compensation timing of Equation (44) should be restricted. Put simply, when the difference between acceleration measurement and gravity is greater than a threshold thres norm , the compensation effort u k should be disabled.
Based on the above description, a confidence level is considered to represent the difference between the acceleration measurement and gravity. For example, the smaller the difference between the two, the less the UAV's acceleration there is in acceleration measurement, which can reflect the accumulated error more truly. Based on this motivation, this paper further proposes an adaptive K p , applied in Equation (43), in compliance with different motions in order to improve the estimation accuracy.
The following presents several adjustment strategies (including control and experimental group) to analyze the accuracy improvement.

•
Pure Integration (control group): The attitude is directly obtained by integrating the angular velocity according to Equation (38).
• Pure Mahony (control group): No matter if the difference between acceleration measurement and gravity is smaller than the threshold thres norm , the angular velocity is always compensated according to Equation (44).

•
Conditional Method (experimental group): If the difference between acceleration measurement and gravity is greater than the threshold thres norm , u k will be set to zero.
where g is gravity acceleration and is set to 9.82121 in this paper.
• Adaptive Method Version. 2 (experimental group): Based on the Adaptive Method Version.1, a minor change is made where κ is an additional parameter to adjust the sensitivity of K p for acceleration change.
• Adaptive Method Version. 3 (experimental group): Based on the Adaptive Method Version.2, modify Equation (47) as the following adaption law Table 3 shows the parameter settings used in this paper. Then, the EuRoC dataset is used to evaluate the above seven adjustment strategies. Besides, the estimated results are shown in Figures 11-13. These figures show the roll and pitch angle estimation results for different adjustment strategies and are presented in the first and second rows of each figure, respectively, while the third row represents a flag that indicates whether the difference between measured acceleration and gravity is smaller than the threshold thres norm . Finally, the RMSE of the Euler angle is listed in Table 4.  Table 3 shows the parameter settings used in this paper. Then, the EuRoC dataset is used to evaluate the above seven adjustment strategies. Besides, the estimated results are shown in Figures 11-13. These figures show the roll and pitch angle estimation results for different adjustment strategies and are presented in the first and second rows of each figure, respectively, while the third row represents a flag that indicates whether the difference between measured acceleration and gravity is smaller than the threshold norm thres . Finally, the RMSE of the Euler angle is listed in Table 4.          Examining Table 4, it can be observed that Pure Mahony achieved good accuracy performance in the MH_01_easy and MH_02_easy series. The main reasons are further analyzed. According to [27], it can be known that the UAV's motions in both the series are relatively slow in the whole dataset, which means that the error e k is reliable most of the time, so the Pure Mahony, which always compensates the angular velocity, will have high accuracy in attitude estimation. At the same time, we observe that the Conditional Method has quite higher accuracy than Pure Mahony for aggressive motions series, such as MH_03_medium, MH_04_difficult and MH_05_difficult. These experimental results strongly show that conditional compensation can improve accuracy in aggressive motions definitely in regard to the Adaptive Method Version. 1~3, we can find that the Adaptive Method Version.3 has more stable and accurate results, which sufficiently shows that the online K p adjustment is helpful for the attitude estimation in aggressive motions. Finally, in the MH_04_difficult series, the results show that the Pure Integration achieves a good accuracy performance in the roll angle; this is because the EuRoC dataset uses a higher grade IMU. According to [27], MH_04_difficult has the shortest duration among the five series, so the accumulation error induced by pure integration is not significant. There is no doubt that using pure integration to obtain the attitude will definitely cause estimation drift. Briefly, this paper will adopt the Adaptive Method Version. 3 for the attitude calculation.

Static State Detection Algorithm
With the above introduction and analysis of the Mahony complementary filter, it is obvious that the attitude calculated through the IMU is relative to the gravity frame, while the information obtained by vSLAM is relative to the navigation frame. Therefore, information alignment from the IMU gravity frame to the vSLAM navigation frame is essential for pose compensation.
Equation (49) shows the process of converting the IMU information into the same reference frame as vSLAM.
where G B T k,imu is the k th pose information for the body frame relative to the gravity frame calculated by the IMU, S C T k,imu is the converted IMU information, and B 0 is the first body frame. Besides, according to the definition of the navigation frame, S B 0 T is the inverse matrix of B C T, which represents the relative pose between the body frame and the left camera frame, and it is worth mentioning that S B 0 T and B C T are fixed values only related to the hardware setup, and must be derived from the calibration procedure in advance. Fortunately, in the EuRoC dataset, these values have been precisely provided. It is obvious that each conversion will involve B 0 G T. In other words, it must be as accurate as possible in order to avoid negative compensation effects. Meanwhile, when a UAV is stationary, the acceleration measurements which do not involve linear acceleration are much cleaner. Based on the stationary property of the UAV, accelerometer measurements in a static state will be applied to obtain accuracy B 0 G T before activating the vSLAM system. First, in order to know when the UAV is stationary exactly, this paper further proposes a static state detection algorithm, which will be illustrated in detail below.
However, even in the stationary status, the accelerometer inevitably contains highfrequency noise and thereby causes inaccurate pose estimation. An LPF, as shown in Equation (50), is applied to suppress the high-frequency noise appearing in the acceleration measurements a k, f ilter = βa k−1, f ilter + (1 − β)a k,mes , where a k, f ilter is the k th filtered acceleration measurement, a k,mes is the k th unfiltered acceleration measurement, while β can be defined as: where f c is the cutoff frequency and is set to be 0.4775 Hz in this paper. After filtering the acceleration measurements, they are placed into a moving window (500 pieces in size). When the moving window is full, the judgment procedure will be triggered. If the standard deviation in this moving window is less than thres std , which is set to be 0.02, and the difference between the latest measurement and the gravity is less than thres norm as well, the moment is considered to be stationary. However, if the condition is not satisfied, then 70% history data in the moving window will be cleared, and the above action will be repeated again until the static state is detected. The flowchart of the overall detection algorithm is illustrated in Figure 14, while Figure 15 shows the results of the static state detection for the EuRoC dataset. In Figure 15, the filtered 3-axis acceleration and the L2-norm of acceleration are illustrated, respectively. In addition, the timing of the static state determined by the proposed algorithm is presented as well. According to the detection results, the dataset MH_01_easy series is considered to be stationary at 23.5 s, 11.25 s for the MH_03_medium series and 13 s for the MH_04_difficult series. The results also show that the UAVs are not immediately detected as a static state when they are stationary. The reason is that the detection accuracy will significantly affect the subsequent compensation procedure, so the relevant thresholds in the static state detection algorithm are set more severely to guarantee the detection quality.

Motion Compensation Proccess
In order to fit the body frame setup and facilitate the calculation of T B G 0 , the gravity frame is rotated by −90 degrees according to its y-axis first, and the x-axis of the rotated gravity frame will be parallel to the opposite direction of gravity. Based on the coordinate configuration, when the UAV is stationary, the relationship between acceleration measurements and gravity can be expressed by

Motion Compensation Proccess
In order to fit the body frame setup and facilitate the calculation of T B G 0 , the gravity frame is rotated by −90 degrees according to its y-axis first, and the x-axis of the rotated gravity frame will be parallel to the opposite direction of gravity. Based on the coordinate configuration, when the UAV is stationary, the relationship between acceleration measurements and gravity can be expressed by

Motion Compensation Proccess
In order to fit the body frame setup and facilitate the calculation of B 0 G T, the gravity frame is rotated by −90 degrees according to its y-axis first, and the x-axis of the rotated gravity frame will be parallel to the opposite direction of gravity. Based on the coordinate configuration, when the UAV is stationary, the relationship between acceleration measurements and gravity can be expressed by   a aver,x a aver,y a aver,z where a aver,x , a aver,y , a aver,z are the average of the latest measurements within 0.5 s in moving window, φ 0 , θ 0 and ψ 0 are the euler angles of the body frame with respect to the rotated gravity frame in static state, and R is defined as: Substituting Equation (53) into Equation (52) gives   a aver,x a aver,y a aver,z According to Equation (54), φ 0 and θ 0 can be calculated by φ 0 = atan2 −a aver,y , a aver,x θ 0 = atan2 a aver,z , a 2 aver,x + a 2 aver,y , Let ψ 0 equal to 0 degrees, then the rotation matrix of the first body frame with respect to the gravity frame can be derived as follows where H is defined as Note that the settings of ψ 0 will not affect the attitude estimation (in Mahony complementary filter). In this paper, we further define that the origin of both the gravity frame and first body frame are coincident, as shown in Equation (58).
According to Equations (56) and (58), T can be obtained as follows And the flowchart of the B 0 G T calculation is illustrated in Figure 16. The estimated φ 0 and θ 0 for the EuRoC dataset are listed in Table 5. The errors are calculated by comparing them with the ground truth. According to Table 5, it can be found that the proposed algorithm can accurately detect the stationary state and the associated estimation errors for φ 0 and θ 0 are almost less than 0.1 degree, which meets the accuracy demand of The estimated 0 and θ 0 for the EuRoC dataset are listed in Table 5. The errors are calculated by comparing them with the ground truth. According to Table 5, it can be found that the proposed algorithm can accurately detect the stationary state and the associated estimation errors for 0 and θ 0 are almost less than 0.1 degree, which meets the accuracy demand of T B G 0 .      After obtaining the accurate B 0 G T, based on the attitude estimated by Mahony complementary filter, the position information can be derived by integrating the free acceleration twice. G t k,origin,B = G t k−1,origin,B + G v k−1,origin,B ∆k + 0.5 G a k−1, f ree (∆k) 2 G v k,origin,B = G v k−1,origin,B + G a k−1, f ree ∆k , ( where G t k,origin,B and G v k,origin,B represent the k th position and velocity of the UAV under the gravity frame, respectively, and G a k, f ree is the k th acceleration measurement with gravitational component removed, also known as free acceleration and can be defined as: where G B R k is obtained by converting B G q k , which is derived from the Mahony complementary filter. Besides, it is worth mentioning that in the EuRoC dataset, every ten sample periods 10∆k, both IMU and image information will be aligned, which is good timing to fuse them. Therefore, first define G B T k+10 as: and convert the reference coordinate system of G B T k+10 by Equation (49) to further obtain S C T i+1,com , which will be served as the initial guess of BA in the tracking thread at moment i + 1, replacing the constant velocity motion model. These processes are illustrated in Figure 17.
periods k 10Δ , both IMU and image information will be aligned, which is good timing to fuse them. Therefore, first define T , which will be served as the initial guess of BA in the tracking thread at moment i 1 , replacing the constant velocity motion model. These processes are illustrated in Figure  17. However, due to the double integration used in Equation (60), the position G t k,origin,B and velocity G v k,origin,B error will accumulate rapidly. Therefore, the position as well as velocity will be reset by the compensated vSLAM output S C T i+1 to reduce the accumulated error. Firstly, convert the reference frame of S C T i+1 by Equation (63).
where G B T k+10,res is defined as: Thus, the reset position G t k+10,res can be obtained, while the reset speed G v k+10,res can be derived from the previous reset position, as shown in Equation (65).
After obtaining the reset position and reset speed, Equation (60) can be performed again based on them. The overall compensation subroutine working in real-time is illustrated in Figure 18.
After obtaining the reset position and reset speed, Equation (60) can be performed again based on them. The overall compensation subroutine working in real-time is illustrated in Figure 18.

Experiment Verification
In this section, the advantages of the proposed algorithm are verified by using the EuRoC dataset. Not only the pose estimation accuracy, FPS and anti-shading robustness are examined, but also the online adaptive tuning process of both matching thresholds and p K is shown to give the readers a more comprehensive understanding of the proposed algorithm.

Ablation Studied for Accuracy Comparison
This section mainly analyzes the estimation accuracy of the following four different strategies so as to independently evaluate the effort of both online adaptive tuning algorithms:

Experiment Verification
In this section, the advantages of the proposed algorithm are verified by using the EuRoC dataset. Not only the pose estimation accuracy, FPS and anti-shading robustness are examined, but also the online adaptive tuning process of both matching thresholds and K p is shown to give the readers a more comprehensive understanding of the proposed algorithm.

Ablation Studied for Accuracy Comparison
This section mainly analyzes the estimation accuracy of the following four different strategies so as to independently evaluate the effort of both online adaptive tuning algorithms: Moreover, to validate the contribution of the proposed algorithm for localization accuracy objectively, the loop closure for drift compensation in the following experiments is not considered. Based on the above four conditions, the corresponding accuracy analysis results are shown in Figures 19-22. In Figures 19 and 20, the error bars in each dataset series with respect to the above four different strategies are displayed from left to right.

•
Case. 3: only use the online adaptive p K tuning algorithm.
• Case. 4: use both proposed online adaptive tuning algorithms.
Moreover, to validate the contribution of the proposed algorithm for localization a curacy objectively, the loop closure for drift compensation in the following experimen is not considered. Based on the above four conditions, the corresponding accuracy anal sis results are shown in Figures 19-22. In Figures 19 and 20, the error bars in each dataset seri with respect to the above four different strategies are displayed from left to right.   Case.
• Case. 3: only use the online adaptive p K tuning algorithm.
• Case. 4: use both proposed online adaptive tuning algorithms.
Moreover, to validate the contribution of the proposed algorithm for localization ac curacy objectively, the loop closure for drift compensation in the following experimen is not considered. Based on the above four conditions, the corresponding accuracy analy sis results are shown in Figures 19-22. In Figures 19 and 20, the error bars in each dataset serie with respect to the above four different strategies are displayed from left to right.     Experimental examinations clearly show that the proposed online adaptive matchin threshold tuning algorithm can effectively improve the accuracy. In particular, in th MH_03_medium series, 63% and 70% accuracy improvement is achieved in RPE and AT respectively. Even though the accuracy did not improve in the MH_04_difficult series, th overall accuracy did not change drastically too much, and meanwhile, according to Figur 20, the attitude estimation accuracy is still improved. However, we also know that th online adaptive p K tuning algorithm is not significantly helpful for accuracy improv ment. Even though in Table 4, the IMU data in each dataset series has been tested ind pendently, and results show that the online adaptive p K tuning algorithm can improv the attitude estimation accuracy effectively. Based on the conclusion given in Figure 9, can be further inferred that since the motion-only BA with Huber robust kernel signif cantly reduces the sensitivity to the initial value, the pose compensation with IMU doe Experimental examinations clearly show that the proposed online adaptive matching threshold tuning algorithm can effectively improve the accuracy. In particular, in the MH_03_medium series, 63% and 70% accuracy improvement is achieved in RPE and ATE, respectively. Even though the accuracy did not improve in the MH_04_difficult series, the overall accuracy did not change drastically too much, and meanwhile, according to Figure 20, the attitude estimation accuracy is still improved. However, we also know that the online adaptive K p tuning algorithm is not significantly helpful for accuracy improvement. Even though in Table 4, the IMU data in each dataset series has been tested independently, and results show that the online adaptive K p tuning algorithm can improve the attitude estimation accuracy effectively. Based on the conclusion given in Figure 9, it can be further inferred that since the motion-only BA with Huber robust kernel significantly reduces the sensitivity to the initial value, the pose compensation with IMU does not have a noticeable effect. However, for severe situations, such as when the stereo cameras are temporarily occluded, the IMU's pose compensation will play an important role in maintaining the positioning stability, which will be tested and analyzed in the next section. Figure 23 shows the time cost for the four different strategies, while Table 6 lists the computer specifications used in this paper. Besides, from the onboard implementation point of view, the Intel NUC, a lightweight and small computer with comparable computing power, is sufficient to carry out the proposed algorithms. Figure 23 clearly shows that the online adaptive matching threshold tuning algorithm can improve FPS effectively. The reason is that the proposed adaptive matching threshold tuning algorithm can reasonably adjust the matching threshold to avoid over-triggering the keyframe selection, namely excessive mapping and involving too many map points that need optimization. Table 6. Computer specifications used for the proposed algorithm.

CPU RAM
Intel Core i7-11800H @2.30GHz 16 GB Figure 24 shows the automatic tuning process of matching threshold and K p parameter (take the MH_01_easy and MH_03_medium series, for example). Figure 24 shows the matching threshold is not always being adjusted at every frame, but only when the keyframe is established, which corresponds to the content described in Section 3.2. In contrast, the online adaptive K p tuning is performed at every moment, as long as the acceleration information is available.
adjust the matching threshold to avoid over-triggering the keyframe selection, namely excessive mapping and involving too many map points that need optimization.

CPU
RAM Intel Core i7-11800H @2.30GHz 16 GB Figure 23. FPS comparison. Figure 24 shows the automatic tuning process of matching threshold and p K parameter (take the MH_01_easy and MH_03_medium series, for example). Figure 24 shows the matching threshold is not always being adjusted at every frame, but only when the keyframe is established, which corresponds to the content described in Section 3.2. In contrast, the online adaptive p K tuning is performed at every moment, as long as the acceleration information is available.

Anti-Shading Robustness Test
Since the IMU information is integrated into the vSLAM algorithm, the UAVs' flight trajectories can temporarily rely on the motion compensation loop, which further enhances the overall anti-shading robustness of the original vSLAM. Following the above statement, the robustness of the overall system against shading will be examined. The examination method is to turn off the left and right camera images simultaneously to simulate the shading situation. In order to analyze the anti-shading performance more comprehensively, the following three vSLAM scenarios will be examined and analyzed.  Table 7 shows the corresponding results, including the time stamp for image loss, and the pose accuracy represented in RPE and ATE.

Anti-Shading Robustness Test
Since the IMU information is integrated into the vSLAM algorithm, the UAVs' flight trajectories can temporarily rely on the motion compensation loop, which further enhances the overall anti-shading robustness of the original vSLAM. Following the above statement, the robustness of the overall system against shading will be examined. The examination method is to turn off the left and right camera images simultaneously to simulate the shading situation. In order to analyze the anti-shading performance more comprehensively, the following three vSLAM scenarios will be examined and analyzed.  Table 7 shows the corresponding results, including the time stamp for image loss, and the pose accuracy represented in RPE and ATE. According to Table 7, Case. A, the original vSLAM system, is unable to perform localization successfully without using both the proposed online adaptive algorithm and the motion compensation loop subroutine. In addition, in Case B, even though the system can work properly in MH_01_easy, MH_02_easy and MH_05_difficult series, the UAVs' localizations still fail for MH_03_medium and MH_04_difficult series. On the contrary, by using the proposed online adaptive algorithm and the motion compensation loop subroutine, namely Case. C, not only the pose estimation can be successfully conducted in all data series, but also the localization accuracy can be maintained. The above experimental results firmly reveal that the developed motion compensation loop subroutine can indeed increase the robustness against shading by the addition of IMU information, and more importantly, a better level of robustness performance will be achieved by further combining online adaptive parameter tuning algorithms. The key factor is that the online adaptive parameter tuning algorithm can effectively increase the attitude estimation accuracy, thereby improving the motion compensation quality, which heavily relies on the estimated free acceleration and attitude shown in Equation (61).
The trajectories of the anti-shading experiment are shown in Figure 25. The black crosses in these figures represent the localization results (Case. C) via using the proposed algorithm during the critical image loss period. These figures also show that the positioning system without the proposed algorithm not only fails to track after image loss, but also makes a straight-line estimation based on the constant velocity motion model. This experiment implies that there will be less chance to pull back the trajectory under aggressive motions even when the image information recovers. On the contrary, these failures can be solved successfully by applying the proposed method.
In the following, we present the results of anti-shading robustness tests in parallel, as shown in Figure 26, to make the robustness analysis of the proposed vSLAM more intuitive. Figure 26a shows the localization results for three different scenarios after masking the stereo camera at the 849th frame in the MH_03_medium series. It is obvious that the proposed algorithm is the only survivor that completes the indoor flight localization and mapping examination successfully, while the others all fail to track the flight trajectories; this result highlights again that the proposed algorithm is effective in improving the robustness against shading for the vSLAM system. In addition, Figure 26b shows the localization results for three different scenarios before the stereo camera is masked. Even though all these three scenarios conduct localization successfully, the proposed algorithm has the lowest number of keyframes, which implies that the developed vSLAM algorithm can preserve precise vSLAM with fewer map feature points. It is worth mentioning that fewer map feature points requirement reduces the computational loading and saves a lot of storage memory space; these advantages make the proposed algorithm more suitable for practical applications of light UAVs. thereby improving the motion compensation quality, which heavily relies on the estimated free acceleration and attitude shown in Equation (61).
The trajectories of the anti-shading experiment are shown in Figure 25. The black crosses in these figures represent the localization results (Case. C) via using the proposed algorithm during the critical image loss period. These figures also show that the positioning system without the proposed algorithm not only fails to track after image loss, but also makes a straight-line estimation based on the constant velocity motion model. This experiment implies that there will be less chance to pull back the trajectory under aggressive motions even when the image information recovers. On the contrary, these failures can be solved successfully by applying the proposed method. In the following, we present the results of anti-shading robustness tests in parallel, as shown in Figure 26, to make the robustness analysis of the proposed vSLAM more intuitive. Figure 26a shows the localization results for three different scenarios after masking the stereo camera at the 849th frame in the MH_03_medium series. It is obvious that the proposed algorithm is the only survivor that completes the indoor flight localization and mapping examination successfully, while the others all fail to track the flight trajectories; this result highlights again that the proposed algorithm is effective in improving the robustness against shading for the vSLAM system. In addition, Figure 26b shows the localization results for three different scenarios before the stereo camera is masked. Even though all these three scenarios conduct localization successfully, the proposed algorithm has the lowest number of keyframes, which implies that the developed vSLAM algorithm can preserve precise vSLAM with fewer map feature points. It is worth mentioning that fewer map feature points requirement reduces the computational loading and saves a lot of storage memory space; these advantages make the proposed algorithm more suitable for practical applications of light UAVs.

Conclusions
UAVs are one of the most promising vehicles in recent years, and the underlying positioning technology is receiving more and more attention. However, the over-reliance on GPS has limited the application diversity of UAVs. In order to solve the navigation problem in GPS-denied or interferenced environments, the stereo vSLAM solution is gradually gaining attention. Therefore, this paper presents an improved UAVs vSLAM system based on the S-PTAM architecture and Mahony complementary filter. The proposed online adaptive matching threshold tuning algorithm and online adaptive K p tuning algorithm can improve the overall vSLAM accuracy and robustness. These adaptive tuning algorithms will automatically adjust the influential matching threshold online and the K p gain used in the Mahony complementary filter, respectively. The adaptive mechanisms eliminate the tedious manual adjustment of these non-physically meaningful parameters. In order to fuse the IMU and vSLAM information, an additional static state detection algorithm is proposed, which has been tested and proven to be accurate in detecting static state. After the static state has been detected, an accurate initial relationship can be computed, which will greatly help the subsequent information conversion. Besides, the error of the initial relationship is almost less 0.1 degree. Based on this accurate initial relationship, a good conversion mechanism between IMU and vSLAM is established. Finally, a couple of experimental studies are used to validate the proposed algorithm and the vSLAM architecture. The results show that the online adaptive matching threshold tuning algorithm can improve localization accuracy and FPS effectively. Moreover, an anti-shading robustness test was further addressed. Experiments firmly show that the vSLAM robustness against temporary image loss can be achieved successfully by incorporating the proposed algorithms.
Supplementary Materials: The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/s22208067/s1, Vedio S1: An Online Adaptive Parameter Tuning vSLAM Algorithm for UAVs in GPS-denied Environments-Series (3). Data Availability Statement: The official website of the EuRoC dataset, including downloadable links is https://projects.asl.ethz.ch/datasets/doku.php?id=kmavvisualinertialdatasets (accessed on 10 October 2022); EuRoC dataset provides stereo image sequences, IMU measurements, ground truth about 6 DoF pose and relative pose relationships between sensors which are included in the sensor.yaml file in each dataset series folder; The information about sensor type, UAVs used, dataset series characteristics and other details can be accessed in [27] or the official website presented above.

Conflicts of Interest:
The authors declare no conflict of interest.