Position Estimation Method for Small Drones Based on the Fusion of Multisource, Multimodal Data and Digital Twins

: In response to the issue of low positioning accuracy and insufficient robustness in small UAVs (unmanned aerial vehicle) caused by sensor noise and cumulative motion errors during flight in complex environments, this paper proposes a multisource, multimodal data fusion method. Initially, it employs a multimodal data fusion of various sensors, including GPS (global positioning system), an IMU (inertial measurement unit)


Introduction
With the gradual opening of low-altitude airspace, the application of drones is becoming increasingly widespread in various fields, ranging from circuit inspection and logistics delivery to urban traffic management [1].The effective deployment of drones has become a hallmark of contemporary technological progress; however, precise positioning and navigation capabilities are essential for drones to perform tasks safely and efficiently.Currently, there are multiple positioning methods for drones, including the global positioning system (GPS), inertial navigation system (INS), terrain-aided navigation (TAN), Doppler velocity Log (DVL), odometer (OD), and visual navigation (VO) [2,3].Nevertheless, as application scenarios become more complex, the accuracy and robustness of navigation systems face severe challenges during high-speed maneuvers or prolonged flights.
Traditional drone positioning relies on a single sensor system, which may be adequate in specific usage environments, but in complex environments, the interference from multipath effects, signal blockage, and environmental noise can lead to the accumulation of sensor noise and motion errors, impacting positioning accuracy.For instance, the inertial navigation system (INS) is a type of relative positioning technology that can only provide accurate solutions for a limited time [4], as errors in inertial sensors and integration drift can lead to solution divergence [5,6]; therefore, to enhance the positioning accuracy and system stability of drones, researchers have begun to explore multisource and multimodal data fusion methods.Data fusion technology [7] involves the merging and processing of multiple complementary or similar information sources, where the merged information is more precise than any single source.The information fusion technique of combined navigation involves first converting the navigation data from various sensors into a unified coordinate system, and then using appropriate mathematical estimation methods to fuse the data within this system, obtaining the system's optimal estimate [8,9].Thus, the data fusion process entails using mathematical optimal estimation algorithms to estimate the status of the navigation system.There are several categories of data fusion algorithms, among which, analyticalbased methods and learning-based methods are the most widely used.Analytical-based methods use analytical functions to simulate system states and external measurements, referred to as "estimation" in the literature [10][11][12].Many analytical-based methods are widely utilized, such as Kalman filtering (KF) and particle filtering (PF) [13], while several learning-based data fusion methods, such as artificial neural networks (ANNs), fuzzy logic, and support vector machines (SVMs), are also used because they can model systems without prior statistical information about the process and measurement noise [14].Since the introduction of the Kalman filter (KF), it has been extensively used to address many challenging and complex information fusion problems; however, traditional Kalman filtering can only be applied in linear systems.To apply federated Kalman filtering in nonlinear systems, several improved KF methods have been proposed [15].Based on the changing Bayesian adaptive Kalman filtering [16], a method has been proposed to track the continuously varying noise variance in the global positioning system/inertial navigation system.Insufficient prior information can cause the filter to diverge [17].The extended Kalman filter (EKF) is a widely used method for integrating the inertial/global positioning systems.It linearizes nonlinear systems through a first-order Taylor expansion [18]; for example, the fusion of visual sensors and an IMU can be divided into loose and tight coupling [19].In loose coupling, an extended Kalman filter or the unscented Kalman filter (UKF) fuses the pose information obtained from the visual sensors and IMU as separate modules.The information fusion used in studies [20] is of loose coupling.The author of utilizes tight coupling, where the pose information obtained after processing the intermediate data from the visual and IMU sensors through filters has higher accuracy compared to loose coupling, but its robustness is poorer, and the failure of a sensor can paralyze the entire system.Additionally, for multisensor data fusion, the time synchronization of data directly affects the accuracy of the fusion.A method of sequential measurement updating has been proposed in the literature [21], where the update timing nodes are the same as the IMU data measurement nodes.Other sensors act as auxiliary sensors to correct IMU data, and their update nodes must be on the IMU update nodes.Due to the different data acquisition frequencies of various sensors, it is necessary to synchronize the timing of various data at the fusion center, which can determine the best sequence for fusion [22].The pre-fusion processes add many complex processing and computational procedures to the multisensor data fusion process.
Digital twin technology, as an innovative technology that has emerged in recent years, has demonstrated its significant application potential and value across multiple domains.For instance, in military operations [23], study reflects on the construction of a digital twin battlefield, proposing a universal architecture for digital twin battlefields and providing feasible solutions for their construction.In drone-related fields, study introduces an algorithm based on random finite sets, analyzing the system architecture and model, and demonstrating the application of digital twin robots in drone operations [24].Study discusses and designs the digital twin framework for military large-scale drones using digital twin technology combined with cloud computing, analyzing its functions to some extent and providing solutions for constructing drone digital twin systems.In the area of drone positioning and navigation research, the introduction of digital twin technology offers new ideas and methods with which to address the challenges faced by traditional positioning systems.Digital twin technology creates a virtual drone model capable of real-time mapping and simulating the actual drone's state and behavior [25].This bridge between the virtual realm and reality not only aids in predicting and analyzing drone performance but also provides more precise positioning and navigation support under complex environmental conditions.
However, although multisource, multimodal data fusion and digital twin technology theoretically hold significant potential, effectively integrating these technologies to achieve higher positioning accuracy and robustness remains a challenge in practical applications.Most existing studies focus on improving specific technologies or algorithms, lacking an in-depth exploration of how to effectively merge these different technologies to meet the positioning needs of drones in complex environments.
Therefore, this paper proposes a novel positioning estimation method for small unmanned aerial vehicles (UAVs) based on multisource, multimodal data fusion and digital twin technology.As shown in Figure 1, this method initially utilizes various sensors, including GPS, IMU, and visual sensors, to acquire multimodal data.The IMU performs quaternion calculations to obtain attitude data, the RealSense camera provides environmental depth information as well as distance positioning, and the GPS performs coordinate system conversion in addition to providing global positioning data.These data are combined through tightly coupled optimization, integrating visual and inertial navigation data.The GPS fusion, based on pose graph optimization, is further refined, and the digital twin UAV is used as a critical data source.The extended Kalman filter (EKF) algorithm is employed to achieve data fusion between the real UAV and its digital twin.
Electronics 2024, 13, x FOR PEER REVIEW 3 of 23 traditional positioning systems.Digital twin technology creates a virtual drone model capable of real-time mapping and simulating the actual drone's state and behavior [25].This bridge between the virtual realm and reality not only aids in predicting and analyzing drone performance but also provides more precise positioning and navigation support under complex environmental conditions.However, although multisource, multimodal data fusion and digital twin technology theoretically hold significant potential, effectively integrating these technologies to achieve higher positioning accuracy and robustness remains a challenge in practical applications.Most existing studies focus on improving specific technologies or algorithms, lacking an in-depth exploration of how to effectively merge these different technologies to meet the positioning needs of drones in complex environments.
Therefore, this paper proposes a novel positioning estimation method for small unmanned aerial vehicles (UAVs) based on multisource, multimodal data fusion and digital twin technology.As shown in Figure 1, this method initially utilizes various sensors, including GPS, IMU, and visual sensors, to acquire multimodal data.The IMU performs quaternion calculations to obtain attitude data, the RealSense camera provides environmental depth information as well as distance positioning, and the GPS performs coordinate system conversion in addition to providing global positioning data.These data are combined through tightly coupled optimization, integrating visual and inertial navigation data.The GPS fusion, based on pose graph optimization, is further refined, and the digital twin UAV is used as a critical data source.The extended Kalman filter (EKF) algorithm is employed to achieve data fusion between the real UAV and its digital twin.This fusion approach effectively eliminates the limitations of a single sensor and reduces the positioning deviations caused by sensor errors as well as external environmental changes.Moreover, the digital twin drone provides a simulation model that predicts the real drone under the same conditions, utilizing the predictive capabilities of the digital twin model to correct and optimize the sensor data of the real drone.Especially in cases where sensor data are unstable or severely interfered with, it provides a more accurate reference for real-time location information, allowing for the timely correction of This fusion approach effectively eliminates the limitations of a single sensor and reduces the positioning deviations caused by sensor errors as well as external environmental changes.Moreover, the digital twin drone provides a simulation model that predicts the real drone under the same conditions, utilizing the predictive capabilities of the digital twin model to correct and optimize the sensor data of the real drone.Especially in cases where sensor data are unstable or severely interfered with, it provides a more accurate reference for real-time location information, allowing for the timely correction of positioning information when sensor data undergo sudden changes, ensuring the stability and accuracy of the drone control system.This study not only offers new ideas and methods for the precise positioning of small drones but also explores new possibilities for the practical application of digital twin technology in drone navigation and applications.

System and Model Construction
This paper establishes a data fusion system architecture based on digital twins for multisource and multimodal data fusion methods, as illustrated in Figure 2.This diagram depicts a multisource and multimodal data fusion system primarily deployed in UAV software layers and digital twin simulations.During the digital twin simulation process, sensor models provide the UAV flight control with sensor information obtained from the simulation environment.The flight controller computes the current state estimates and outputs actuation signals to the UAV model, which calculates the required forces as well as torques and assigns motor voltages.Additionally, inputs such as drag, friction, gravity, and torque are used by the physical engine to compute the UAV's next motion state in the simulation environment, and the estimated data are relayed in real time to the physical UAV.The UAV dynamics information, enriched with environmental model data like gravity, air density, atmospheric pressure, magnetic fields, and GPS coordinates, is then transmitted back to the sensor model, creating a feedback loop that captures sensor-acquired data.On the UAV's physical side, the system encompasses data collection, processing, and algorithmic fusion, utilizing raw data from the IMUs and GPS, along with environmental information captured by depth cameras.It also receives real-time location estimates from the digital twin UAV, which are fed into the data fusion module.The data fusion and control execution section form the core of the system.Here, data from various sensors are integrated to enhance positioning accuracy and robustness.The fused information is used to generate precise flight control signals, which are subsequently transmitted back to the UAV flight control through the MavLink control protocol, achieving accurate flight control.
positioning information when sensor data undergo sudden changes, ensuring the stab and accuracy of the drone control system.This study not only offers new ideas and m ods for the precise positioning of small drones but also explores new possibilities for practical application of digital twin technology in drone navigation and applications

System and Model Construction
This paper establishes a data fusion system architecture based on digital twin multisource and multimodal data fusion methods, as illustrated in Figure 2.This diag depicts a multisource and multimodal data fusion system primarily deployed in U software layers and digital twin simulations.During the digital twin simulation pro sensor models provide the UAV flight control with sensor information obtained from simulation environment.The flight controller computes the current state estimates outputs actuation signals to the UAV model, which calculates the required forces as as torques and assigns motor voltages.Additionally, inputs such as drag, friction, gra and torque are used by the physical engine to compute the UAV's next motion state in simulation environment, and the estimated data are relayed in real time to the phy UAV.The UAV dynamics information, enriched with environmental model data like g ity, air density, atmospheric pressure, magnetic fields, and GPS coordinates, is then tr mitted back to the sensor model, creating a feedback loop that captures sensor-acqu data.On the UAV's physical side, the system encompasses data collection, processing, algorithmic fusion, utilizing raw data from the IMUs and GPS, along with environme information captured by depth cameras.It also receives real-time location estimates f the digital twin UAV, which are fed into the data fusion module.The data fusion control execution section form the core of the system.Here, data from various sensor integrated to enhance positioning accuracy and robustness.The fused information is u to generate precise flight control signals, which are subsequently transmitted back to UAV flight control through the MavLink control protocol, achieving accurate flight trol.Through this architecture, the UAV can swiftly adapt to complex environments effectively handle uncertainties, enhancing its operational stability and reliabilit Through this architecture, the UAV can swiftly adapt to complex environments and effectively handle uncertainties, enhancing its operational stability and reliability in complex application scenarios.Additionally, the architecture supports modularity and distributed processing, offering flexibility and scalability for the further development of UAV technology.
To effectively simulate and predict the behavior as well as performance of physical unmanned aerial vehicles (UAVs) and achieve consistency between the digital twin UAV and the physical UAV, it is essential to construct a multidimensional and multiscale UAV model within the digital twin framework.This foundation supports the data interaction between the virtual and physical UAVs.This article will provide detailed explanations of the model parameters and model composition, and then construct a multidimensional and multiscale UAV model from three aspects: geometric, physical, and behavioral.This enables the digital twin UAV to simulate the physical properties, dynamic behaviors, and environmental interactions of the physical UAV in real time.Through such modeling, the digital twin acts as a critical tool in aligning the virtual operations with the real-world dynamics of UAVs, enhancing predictive maintenance and operational planning.

Geometric Model
Incorporating the actual geometric dimensions and mass parameters of the unmanned aerial vehicle (UAV), a foundational model is developed using modeling software, based on the hardware architecture of the UAV as illustrated in Figure 3. Particular emphasis is placed on the construction of critical components such as motors, propellers, flight controllers, and GPS modules, which are integral to the flight control stability of the UAV.Detailed attention is devoted to the assembly specifications of each UAV component, as demonstrated in Figure 4, which depicts the UAV's external features, part dimensions, and material types, among other fundamental indicators.This process ensures a high-fidelity construction model of the physical UAV, enabling the digital twin system to accurately replicate the real UAV.This underpins subsequent digital twin UAV flight experiments by providing a fundamental model support, mitigating disparities between flight experimental results and actual conditions that may arise due to differences in geometric model parameters.Furthermore, it facilitates the development and testing of interfaces between the virtual and physical domains within digital twin technology.
Electronics 2024, 13, x FOR PEER REVIEW 5 of 23 complex application scenarios.Additionally, the architecture supports modularity and distributed processing, offering flexibility and scalability for the further development of UAV technology.
To effectively simulate and predict the behavior as well as performance of physical unmanned aerial vehicles (UAVs) and achieve consistency between the digital twin UAV and the physical UAV, it is essential to construct a multidimensional and multiscale UAV model within the digital twin framework.This foundation supports the data interaction between the virtual and physical UAVs.This article will provide detailed explanations of the model parameters and model composition, and then construct a multidimensional and multiscale UAV model from three aspects: geometric, physical, and behavioral.This enables the digital twin UAV to simulate the physical properties, dynamic behaviors, and environmental interactions of the physical UAV in real time.Through such modeling, the digital twin acts as a critical tool in aligning the virtual operations with the real-world dynamics of UAVs, enhancing predictive maintenance and operational planning.

Geometric Model
Incorporating the actual geometric dimensions and mass parameters of the unmanned aerial vehicle (UAV), a foundational model is developed using modeling software, based on the hardware architecture of the UAV as illustrated in Figure 3. Particular emphasis is placed on the construction of critical components such as motors, propellers, flight controllers, and GPS modules, which are integral to the flight control stability of the UAV.Detailed attention is devoted to the assembly specifications of each UAV component, as demonstrated in Figure 4, which depicts the UAV's external features, part dimensions, and material types, among other fundamental indicators.This process ensures a high-fidelity construction model of the physical UAV, enabling the digital twin system to accurately replicate the real UAV.This underpins subsequent digital twin UAV flight experiments by providing a fundamental model support, mitigating disparities between flight experimental results and actual conditions that may arise due to differences in geometric model parameters.Furthermore, it facilitates the development and testing of interfaces between the virtual and physical domains within digital twin technology.To effectively simulate and predict the behavior as well as performance of physical unmanned aerial vehicles (UAVs) and achieve consistency between the digital twin UAV and the physical UAV, it is essential to construct a multidimensional and multiscale UAV model within the digital twin framework.This foundation supports the data interaction between the virtual and physical UAVs.This article will provide detailed explanations of the model parameters and model composition, and then construct a multidimensional and multiscale UAV model from three aspects: geometric, physical, and behavioral.This enables the digital twin UAV to simulate the physical properties, dynamic behaviors, and environmental interactions of the physical UAV in real time.Through such modeling, the digital twin acts as a critical tool in aligning the virtual operations with the real-world dynamics of UAVs, enhancing predictive maintenance and operational planning.

Geometric Model
Incorporating the actual geometric dimensions and mass parameters of the unmanned aerial vehicle (UAV), a foundational model is developed using modeling software, based on the hardware architecture of the UAV as illustrated in Figure 3. Particular emphasis is placed on the construction of critical components such as motors, propellers, flight controllers, and GPS modules, which are integral to the flight control stability of the UAV.Detailed attention is devoted to the assembly specifications of each UAV component, as demonstrated in Figure 4, which depicts the UAV's external features, part dimensions, and material types, among other fundamental indicators.This process ensures a high-fidelity construction model of the physical UAV, enabling the digital twin system to accurately replicate the real UAV.This underpins subsequent digital twin UAV flight experiments by providing a fundamental model support, mitigating disparities between flight experimental results and actual conditions that may arise due to differences in geometric model parameters.Furthermore, it facilitates the development and testing of interfaces between the virtual and physical domains within digital twin technology.

Physical Model
The virtual drone entity model primarily consists of two parts: built-in sensors of the PX4 flight controller, and physical connections.The built-in sensors mainly include a gyroscope, magnetometer, GPS, and barometer.The physical connection methods define the basic parameters for serial ports, local UDP, local TCP, remote TCP, QGC ground stations, and messaging protocols.The sensor module mainly contains devices such as gyroscopes, magnetometers, GPS, and barometers.The physical engine module mainly implements the dynamics modeling of quadrotor drones.A quadrotor drone with mass (m) is defined as a collection of four vertices [x 1 , x 2 , x 3 , x 4 ], each vertex having a control input [u 1 , u 2 , u 3 , u 4 ], where x i represents the position of the vertex relative to the centroid and u i ∈ [0, 1] represents the corresponding control command.Finally, the control inputs, (u_i), are mapped to the forces, F i = f(u i ), and torques, τ i = g(u i ), at vertex i.The digital twin UAV control input, u i , is mapped to the angular velocity at each propeller located at the four vertices, and the local forces and torques at the four vertices are represented by Equations ( 1) and ( 2): wherein C T and C pow represent the thrust coefficient and power coefficient calculated based on the physical characteristics of the propeller, respectively.σ stands for air density, D is the propeller diameter, and ω max is the maximum angular velocity (rotations per second).
Based on the parameters of the physical drone and the experimental environment, this paper assumes the related parameters of the drone, as shown in Table 1.

Behavior Model
The behavioral model represents a precise control framework for UAV flight, and a PID (proportional, integral, and derivative) control loop is an effective method for regulating UAV flight.Building upon this, the paper proposes a design for a dual closed-loop cascaded PID control circuit, as shown in Figure 5.In the diagram, the input signals, after filtration, are separately directed to the position PID controller and the attitude PID controller.The position PID controller takes the UAV's altitude value signal, which is linked to the UAV's NED (north, east, down) coordinate system.The signal represents the difference between the set target altitude and the actual flight altitude, with the controller's output being the vertical acceleration.The attitude PID controller receives the UAV's horizontal displacement signals, which are related to its horizontal coordinates.Its values represent the discrepancies in the UAV's roll, pitch, and yaw angles from the set horizontal target values to their actual values, with the controller's output being the horizontal acceleration.Through position and attitude calculations, the system converts the input quantities allocated to each direction into PWM (pulse width modulation) control signals, which are then sent to the motors.This process controls the motors, ensuring the precision of the UAV's flight control.

Multisource, Multimodal Data Fusion
Navigating with only one type of sensor has its drawbacks.Due to the inherent d ferences among individual sensors, relying on a single sensor for navigation may lead unsuitability for the environment, resulting in poor navigation accuracy.To address v ious complex environments, this paper adopts a navigation method using multiple se sors.The data collected by multiple sensors need to be fused, filtering out unnecessa data and extracting the required data, to achieve the best results.

Vision IMU Pose Fusion
Since pure visual positioning algorithms rely entirely on data obtained from camer they are susceptible to environmental lighting interference, resulting in low robustne Cameras mounted on drones are prone to positioning failures during high-speed mov ments or rotational motions, making pure visual positioning highly dependent on en ronmental conditions and low in output frequency, which can lead to inaccurate pose timation and cumulative errors [26].In contrast, IMUs (inertial measurement units) off good dynamic response performance and high pose update frequency but suffer fro significant long-term cumulative errors, drift, and poor static characteristics; therefore, improve the accuracy and applicability range of pose estimation, this paper fuses the po estimation results of visual odometry with those of the IMU, enhancing the robustne and real-time performance of drone pose estimation.
Visual-IMU fusion methods can be categorized into two types: loose coupling a tight coupling.The basic approach of loose coupling is to process the inertial measu ment unit and visual sensors separately, which is relatively simple, requires less comp tation, and is highly scalable, but the accuracy of the fused pose estimation is relative poor.The tight coupling approach combines image information with inertial navigati information and calculates the pose information through the fused data.This method significantly more complex and requires more computation, but can effectively utilize formation from each sensor, resulting in the higher accuracy of the fused pose estimatio This paper adopts the tight coupling method in order to achieve the fusion of visual a inertial navigation information.

IMU Pre-Integration
Given the disparity between the capture frequency of cameras and the signal f quency of IMUs, it is essential to consider time synchronization or measurement f

Multisource, Multimodal Data Fusion
Navigating with only one type of sensor has its drawbacks.Due to the inherent differences among individual sensors, relying on a single sensor for navigation may lead to unsuitability for the environment, resulting in poor navigation accuracy.To address various complex environments, this paper adopts a navigation method using multiple sensors.The data collected by multiple sensors need to be fused, filtering out unnecessary data and extracting the required data, to achieve the best results.

Vision IMU Pose Fusion
Since pure visual positioning algorithms rely entirely on data obtained from cameras, they are susceptible to environmental lighting interference, resulting in low robustness.Cameras mounted on drones are prone to positioning failures during high-speed movements or rotational motions, making pure visual positioning highly dependent on environmental conditions and low in output frequency, which can lead to inaccurate pose estimation and cumulative errors [26].In contrast, IMUs (inertial measurement units) offer good dynamic response performance and high pose update frequency but suffer from significant long-term cumulative errors, drift, and poor static characteristics; therefore, to improve the accuracy and applicability range of pose estimation, this paper fuses the pose estimation results of visual odometry with those of the IMU, enhancing the robustness and real-time performance of drone pose estimation.
Visual-IMU fusion methods can be categorized into two types: loose coupling and tight coupling.The basic approach of loose coupling is to process the inertial measurement unit and visual sensors separately, which is relatively simple, requires less computation, and is highly scalable, but the accuracy of the fused pose estimation is relatively poor.The tight coupling approach combines image information with inertial navigation information and calculates the pose information through the fused data.This method is significantly more complex and requires more computation, but can effectively utilize information from each sensor, resulting in the higher accuracy of the fused pose estimation.This paper adopts the tight coupling method in order to achieve the fusion of visual and inertial navigation information.

IMU Pre-Integration
Given the disparity between the capture frequency of cameras and the signal frequency of IMUs, it is essential to consider time synchronization or measurement frequency before fusing IMU data with image information.The method of pre-integration is chosen to overcome the accumulation of errors in the world coordinate system, addressing the issue of the discrepancy in frequency between cameras and IMUs.The fundamental idea behind IMU pre-integration is to extract the invariant items during each optimization iteration beforehand, thereby reducing the workload of reintegration on each occasion.As depicted in Figure 6, assume that we are in the global coordinate system of the camera, integrating each piece of IMU data from a previous keyframe (at time (t_1)) to a subsequent keyframe (at time (t_2)).
Electronics 2024, 13, x FOR PEER REVIEW 8 of occasion.As depicted in Figure 6, assume that we are in the global coordinate system the camera, integrating each piece of IMU data from a previous keyframe (at time (t_ to a subsequent keyframe (at time (t_2)).
Camera keyframes IMU test value To optimize the poses and velocities at times (t_1) and (t_2), as well as the IMU bi (denoted as ), where the IMU measurement is acceleration, ( ), the following can deduced: By integrating directly into the global coordinate system, the following can be i ferred: Since the drone operates in three-dimensional space, the velocity, (v), is direction Therefore, each time optimization occurs, updating the velocity, (v_t1), in the global coo dinate system requires recalculating the result within the summation sign.
The calculation is performed as a result of pre-integration.Then, the formula for c culating the position is modified to the following: Therefore, each time the velocity, , in the global coordinate system is updated is only necessary to substitute the results of pre-integration to obtain the new integrati results: This approach also reduces the necessity of recalculating when the original state i formation changes, significantly lowering computational complexity.By expanding a cording to Equation (7) using the Euler integration method, the values of these three va iables can ultimately be obtained.This completes the process of IMU value pr To optimize the poses and velocities at times (t_1) and (t_2), as well as the IMU bias (denoted as b a ), where the IMU measurement is acceleration, (a k ), the following can be deduced: By integrating directly into the global coordinate system, the following can be inferred: Since the drone operates in three-dimensional space, the velocity, (v), is directional.Therefore, each time optimization occurs, updating the velocity, (v_t1), in the global coordinate system requires recalculating the result within the summation sign.
The calculation is performed as a result of pre-integration.Then, the formula for calculating the position is modified to the following: Therefore, each time the velocity, v t1 , in the global coordinate system is updated, it is only necessary to substitute the results of pre-integration to obtain the new integration results: This approach also reduces the necessity of recalculating when the original state information changes, significantly lowering computational complexity.By expanding according to Equation (7) using the Euler integration method, the values of these three variables can ultimately be obtained.This completes the process of IMU value pre-integration.Subsequently, each optimization only requires multiplying the new velocity by time, thus greatly reducing the amount of computation required.

Tight Coupling Optimization Model of Vision Inertial Navigation
As illustrated in Figure 7, visual feature measurements can be obtained through visual sensors.Subsequently, corresponding image feature information is derived using relevant algorithms.By integrating these with IMU measurements from the same time period, more accurate pose and velocity information can be acquired.
Electronics 2024, 13, x FOR PEER REVIEW 9 of 23 integration.Subsequently, each optimization only requires multiplying the new velocity by time, thus greatly reducing the amount of computation required.

Tight Coupling Optimization Model of Vision Inertial Navigation
As illustrated in Figure 7, visual feature measurements can be obtained through visual sensors.Subsequently, corresponding image feature information is derived using relevant algorithms.By integrating these with IMU measurements from the same time period, more accurate pose and velocity information can be acquired.The specific steps for the visual-inertial fusion are as follows: 1. Based on the IMU motion model, integration can be performed on the IMU for the next image frame, obtaining a predicted value of the IMU increment; 2. The difference between the predicted pre-integration value of the IMU and the observed value obtained from the IMU measurement at the next image frame is calculated, yielding the residual of the inertial data; 3. The residual of the inertial data is introduced into the optimal cost function of visual SLAM, making the optimal cost function a sum of the visual reprojection error and inertial data residual.
Equation ( 8) represent the motion equation and observation equation that describe the process of the body's motion.The motion state at time k + 1, denoted as k + 1, can be predicted based on the state at time k and the IMU measurements between times k and k + 1.The observation equation is used to describe the process of the camera detecting visual features at time k + 1: The symbol represents the system's state at time k + 1, which encompasses the rotation information, velocity information, translation The specific steps for the visual-inertial fusion are as follows: 1.
Based on the IMU motion model, integration can be performed on the IMU for the next image frame, obtaining a predicted value of the IMU increment; 2.
The difference between the predicted pre-integration value of the IMU and the observed value obtained from the IMU measurement at the next image frame is calculated, yielding the residual of the inertial data; 3.
The residual of the inertial data is introduced into the optimal cost function of visual SLAM, making the optimal cost function a sum of the visual reprojection error and inertial data residual.
Equation ( 8) represent the motion equation and observation equation that describe the process of the body's motion.The motion state at time k + 1, denoted as k + 1, can be predicted based on the state at time k and the IMU measurements between times k and k + 1.The observation equation is used to describe the process of the camera detecting visual features at time k + 1: The symbol represents the system's state at time k + 1, which encompasses the rotation information, velocity information, translation information, and the biases of the gyroscope and accelerometer from time k.The term µ k+1 refers to the motion measurement provided by the IMU.
The kinematic process of a body is a cyclic iterative process.Based on the kinematic equations and inputs, it predicts the state at the next moment and uses the sensor's measurement equations to correct the state at the next moment.In the visual-inertial fusion algorithm, IMU data also act as an input, thereby allowing for more accurate predictions of the system's state using the kinematic equations.This process emphasizes the continuous integration of motion data and sensor measurements to refine the system's understanding of its state over time, leveraging both the predictive power of the motion model and the corrective insights from real-time sensor data.

GPS Fusion Based on Pose Graph Optimization
This section introduces a sensor fusion algorithm based on pose graph optimization, which is used to integrate the GPS with the visual-inertial positioning algorithm introduced previously.It utilizes an optimization approach to complement the advantages of high local positioning accuracy and global positioning without cumulative errors.
After converting the absolute position data obtained from the GPS sensor into the coordinate system, the position under the east, north, up (ENU) coordinate system, taking the position at the first moment as the origin, can be obtained to determine the drone's relative position.This is then aligned in time with the relative position data from the visualinertial odometer obtained through the fusion of visual and IMU sensors, eliminating data with mismatched timestamps to ensure the correspondence of the data (Figure 8).
Electronics 2024, 13, x FOR PEER REVIEW 10 of 23 information, and the biases of the gyroscope and accelerometer from time k.The term refers to the motion measurement provided by the IMU.The kinematic process of a body is a cyclic iterative process.Based on the kinematic equations and inputs, it predicts the state at the next moment and uses the sensor's measurement equations to correct the state at the next moment.In the visual-inertial fusion algorithm, IMU data also act as an input, thereby allowing for more accurate predictions of the system's state using the kinematic equations.This process emphasizes the continuous integration of motion data and sensor measurements to refine the system's understanding of its state over time, leveraging both the predictive power of the motion model and the corrective insights from real-time sensor data.

GPS Fusion Based on Pose Graph Optimization
This section introduces a sensor fusion algorithm based on pose graph optimization, which is used to integrate the GPS with the visual-inertial positioning algorithm introduced previously.It utilizes an optimization approach to complement the advantages of high local positioning accuracy and global positioning without cumulative errors.
After converting the absolute position data obtained from the GPS sensor into the coordinate system, the position under the east, north, up (ENU) coordinate system, taking the position at the first moment as the origin, can be obtained to determine the drone's relative position.This is then aligned in time with the relative position data from the visual-inertial odometer obtained through the fusion of visual and IMU sensors, eliminating data with mismatched timestamps to ensure the correspondence of the data (Figure 8).Since the relative poses from the visual-inertial odometer are quite accurate with minimal short-term drift, the covariance of the visual-inertial odometer is set to 0.1.The covariance of the GPS is determined by the strength of the signal.In the fusion algorithm, the weights of the various sensors are dynamically allocated in real-time based on the covariance of GPS and the visual-inertial odometer to achieve better fusion effects [27].
Pose graph optimization is a computational method used in SLAM to optimize the poses of robots.Compared to the commonly used bundle adjustment (BA) optimization in SLAM, pose graph optimization excludes the numerous three-dimensional space points considered in BA optimization, focusing only on the drone's position and attitude.This significantly reduces the scale of optimization and avoids unnecessary computational power consumption.Figure 9 illustrates the construction of the pose graph.Since the relative poses from the visual-inertial odometer are quite accurate with minimal short-term drift, the covariance of the visual-inertial odometer is set to 0.1.The covariance of the GPS is determined by the strength of the signal.In the fusion algorithm, the weights of the various sensors are dynamically allocated in real-time based on the covariance of GPS and the visual-inertial odometer to achieve better fusion effects [27].
Pose graph optimization is a computational method used in SLAM to optimize the poses of robots.Compared to the commonly used bundle adjustment (BA) optimization in SLAM, pose graph optimization excludes the numerous three-dimensional space points considered in BA optimization, focusing only on the drone's position and attitude.This significantly reduces the scale of optimization and avoids unnecessary computational power consumption.Figure 9 illustrates the construction of the pose graph.In Figure 9, the triangles represent the drone's motion posture at each moment, referred to as nodes in the optimization process.The circles represent the change between two consecutive motion postures, known as relative measurements, and in the context of optimization problems they are also called edges.In bundle adjustment (BA) optimiza- In Figure 9, the triangles represent the drone's motion posture at each moment, referred to as nodes in the optimization process.The circles represent the change between two consecutive motion postures, known as relative measurements, and in the context of optimization problems they are also called edges.In bundle adjustment (BA) optimization, the number of map points significantly exceeds the number of poses.If the system operates over a long duration, the number of nodes awaiting optimization increases, leading to slower solution speeds.To improve the solution speed, the concept of pose graph optimization emerged.The relative measurements between two motion states can be directly obtained through various sensors carried by the drone (inertial navigation, vision, and GPS).
Assuming that the pose of the UAV at two times is T i , T j , and the relative pose change of the UAV between these two times is ∆T i j , then there is the following formula: Build an error term: The objective function of pose graph optimization can be written as follows: In the above equation, Λ ij represents the information matrix, which is the inverse of the covariance, indicating the weight of each error in the optimization function.If we set the initial point of the optimization problem as x k , and consider an increment added on top of this initial point, then the estimated value between state nodes becomes F k (x k + ∆x), while the error value between nodes is e k (x k + ∆x).Subsequently, the overall error cost function is linearized for processing: For the k-th edge, the optimization function is as follows: In the above formula, the constant term is sorted into C k .In fact, the variation in the final optimization function is as follows: The core of solving the optimization problem is to find a state variable, ∆x, to minimize the variation in the above formula, so the most direct way is to take the derivative of the above formula and then make the derivative of ∆x the smallest derivative in the definition domain, and then the optimal solution can be obtained: When ∆x decreases to a certain extent, the iteration is stopped.

Digital Twin-Kalman Fusion Algorithm
In multisensor fusion positioning systems, although the addition of sensors can provide more sources of data, enhancing the system's perception of the external environment and positioning accuracy, it also introduces more complexity and uncertainty.From the perspectives of probability theory and systems engineering, as the number of sensors increases the complexity of the system grows linearly or at an even higher rate, leading to a corresponding increase in the probability of failures.Each sensor could potentially become a point of failure, increasing the overall probability of system failure [28].Moreover, errors between different sensors may also accumulate or amplify each other, affecting the final data fusion quality and positioning accuracy; therefore, this study introduces digital twin technology as an innovative strategy.By constructing a digital twin model of the real drone, it achieves the real-time simulation of the real drone's flight status.This digital twin drone can not only predict and simulate the behavior and state of the real drone but also provide alternative positioning information in case the actual drone encounters sensor failures or data anomalies.This paper will employ pose optimization techniques based on the extended Kalman filter (EKF) to improve the accuracy of the drone's motion posture.The core of pose optimization techniques based on the EKF lies in the fusion of different information sources.Applying the Kalman filter to pose optimization involves using data generated by the digital twin model as estimated values, combined with real drone sensor data for posture prediction.Through the extended Kalman filter, the position information of the drone being measured in space is continuously corrected, and the filtered position information is fed back to the real drone, thereby enhancing the accuracy and robustness of the drone's motion posture and position information.
In UAV systems, attitude information directly influences the movement direction and velocity vector of a UAV.Given that attitude information is typically of high frequency, accurate attitude estimation can significantly reduce uncertainties in position estimation.This enhancement in accuracy allows a system to exhibit greater robustness against external disturbances and sensor noise; by initially using high-frequency attitude information for attitude estimation, it provides precise direction and orientation references for subsequent low-frequency position estimation.
Take the attitude information of the digital twin UAV at the current time as the following: The computational state update equation can be described as a first-order Markov process: where θ ′ roll , θ ′ yaw , θ ′ pitch are the pose state of the target at the next moment; τ ϵ e , τ ϵ n , τ ϵ θ are the reciprocal of the time constant of the noise process; and ω ϵ e , ω ϵ n , ω ϵ θ are white noise with zero mean.
The discrete state equation of X from time k to time k + 1 is as follows: where T is the sampling period and W X (k) is white noise.Assuming that the state noise at each time is uncorrelated, the correlation state covariance matrix can be obtained as follows: Q= diag(q 11 , q 12 , q 13 ) The position and attitude state of the previous state point Ω is the observation value: In this formula, Z represents the three observed values obtained from actual sensor measurements.These values are derived from the sensor's measurement of the position and attitude changes at the current moment.P ′ e and P ′ n are the observed values in the edirection and n-direction at the next moment, respectively; P e and P n are the observed values in the e-direction and n-direction at the current moment, respectively; θ ′ is the observed pitch angle at the next moment; and θ is the observed pitch angle at the current moment.
The discrete observation equation is as follows: In the formula, H = diag(1, 1, 1); W Z (k) denotes the observation noise.The covariance matrix, R, of the observation noise is represented as follows: After completing the attitude estimation, position estimation is performed.Using the position coordinates and velocity of the digital twin UAV as state variables, the discretized state equation is as follows: Using the position data obtained from the sensor as measurement information, the observation equation is established as follows: where W k and V k+1 are uncorrelated noise sequences with a mean of zero, X k+1 is the state vector, and Z k+1 is the observation vector.By expanding f [X k , k] around Xk using the Taylor series and taking the first-order term, and similarly expanding h[X k+1 , k + 1] around Xk+l|k using the Taylor series and taking the first-order term, we obtain the following: Using the initial position coordinates, velocity, and attitude of the UAV as the initial values of the state vector, and applying the standard Kalman filter, the extended Kalman filter equations are derived as follows, with the filtering process divided into prediction and update stages.
The prior estimate of the state variables at time k + 1 is given as follows: The error covariance at time k + 1 is as follows: The Kalman gain is calculated as follows: The state estimate is as follows: The error covariance update at time k + 1 is as follows:

Introduction to the Experimental Platform
For data collection and conducting real flight experiments, a quadcopter drone hardware platform was initially established, as illustrated in Figure 10.This platform was equipped with various sensors as previously described, including an inertial measurement unit (IMU), GPS, and depth cameras.Simultaneously, the sensor models and measurement frequencies used in the digital twin simulation platform matched those of the actual sensors.The GPS provided coarse positioning information at a frequency of 1 Hz, the IMU supplied high-speed dynamic attitude change data at 100 Hz, and the visual sensors delivered visual-assisted positioning information at a rate of 20 Hz.Furthermore, to validate the effectiveness of the digital twin system, this study first established a virtual external testing environment as a primary prerequisite for digital twin testing.To conduct more accurate experiments with the drone digital twin system, the actual experiment environment the school playground was modeled in multiple dimensions.Three-dimensional scene real images and model diagrams are shown in Figure Furthermore, to validate the effectiveness of the digital twin system, this study first established a virtual external testing environment as a primary prerequisite for digital twin testing.To conduct more accurate experiments with the drone digital twin system, the actual experiment environment the school playground was modeled in multiple dimensions.Three-dimensional scene real images and model diagrams are shown in Figure 11.Furthermore, to validate the effectiveness of the digital twin system, this study first established a virtual external testing environment as a primary prerequisite for digital twin testing.To conduct more accurate experiments with the drone digital twin system, the actual experiment environment the school playground was modeled in multiple dimensions.Three-dimensional scene real images and model diagrams are shown in Figure 11.The experimental method employs a multisource multimodal data fusion architecture based on digital twin technology, involving two main components: a physical drone and a digital twin platform, as illustrated in Figure 12.On the physical drone end, various sensors such as GPS, IMU, and RealSense cameras collect pose data in real time.These data are transmitted to the drone's onboard computer for processing and multimodal fusion.The ground-based digital twin platform serves as the virtual part of the experiment, receiving the drone's location information in real time, using it as a reference value, and combining it with the digital twin model to calculate the drone's pose.The calculated pose information, in the form of estimated values, is transmitted back to the drone via data transmission modules using UDP network and MAVLink communication protocols.The drone's onboard computer receives these estimated values and further fuses them with the measurements obtained from its own multisensor fusion using the extended Kalman filter (EKF) algorithm to optimize the accuracy of pose estimation.The extended Kalman filter, through its iterative prediction and updating process, corrects the drone's real-time pose, thereby reducing estimation errors.The experimental method employs a multisource multimodal data fusion architecture based on digital twin technology, involving two main components: a physical drone and a digital twin platform, as illustrated in Figure 12.On the physical drone end, various sensors such as GPS, IMU, and RealSense cameras collect pose data in real time.These data are transmitted to the drone's onboard computer for processing and multimodal fusion.The ground-based digital twin platform serves as the virtual part of the experiment, receiving the drone's location information in real time, using it as a reference value, and combining it with the digital twin model to calculate the drone's pose.The calculated pose information, in the form of estimated values, is transmitted back to the drone via data transmission modules using UDP network and MAVLink communication protocols.The drone's onboard computer receives these estimated values and further fuses them with the measurements obtained from its own multisensor fusion using the extended Kalman filter (EKF) algorithm to optimize the accuracy of pose estimation.The extended Kalman filter, through its iterative prediction and updating process, corrects the drone's real-time pose, thereby reducing estimation errors.The entire experimental process forms a closed-loop system, where the real-time teraction between the physical drone and the digital twin platform ensures the continuo synchronization and precise calibration of pose data.This method, by fusing physi multisensor data with virtual data from the digital twin platform, aims to enhance accuracy of pose estimation and validate the effectiveness as well as the reliability of m tisource, multimodal data in practical applications.The entire experimental process forms a closed-loop system, where the real-time interaction between the physical drone and the digital twin platform ensures the continuous synchronization and precise calibration of pose data.This method, by fusing physical multisensor data with virtual data from the digital twin platform, aims to enhance the accuracy of pose estimation and validate the effectiveness as well as the reliability of multisource, multimodal data in practical applications.

Comparison of Integrated Navigation Experiments
The digital twin environment for this experiment utilizes UE4 to simulate a real campus playground environment.While the real drone is conducting actual flight operations, the digital twin drone runs synchronously in the twin environment, collecting sensor data.The frequencies of the sensors used in the digital twin simulation match those of the real sensors: the inertial sensor sampling rate is 100 Hz, the GPS sensor frequency is 1 Hz, and the vision sensor frequency is 20 Hz.This paper focuses on a comparative analysis of the integrated application of various combined navigation methods.It compares three different combination navigation strategies: visual odometry/inertial measurement unit (VO/IMU), global positioning system/inertial measurement unit (GPS/IMU), and global positioning system/visual odometry/inertial measurement unit (GPS/VO/IMU).By analyzing their performance in positioning accuracy, theoretical foundations can be provided for the selection of UAV navigation systems.
The experimental results in Figure 13 show that the GPS/VO/IMU combination navigation method exhibits smaller positioning errors and higher stability.This finding suggests that the fusion of GPS, VO, and IMU data in drone positioning can effectively suppress errors that may arise from individual systems, reduce positioning fluctuations caused by the performance variability of a single system, and significantly enhance overall navigation accuracy.Compared to using VO/IMU or GPS/IMU alone, the GPS/VO/IMU combination not only provides more accurate location information but also maintains stable system performance under complex environmental conditions.These experimental results further confirm the importance of multimodal information fusion in improving the robustness of positioning systems.Especially when GPS signals are limited or the accuracy of the VO system is affected by indistinct environmental features, multisource information fusion ensures the continuity and accuracy of the system.Additionally, the GPS/VO/IMU combination navigation, by complementing the strengths and weaknesses of each system, not only enhances positioning accuracy but also improves the system's adaptability to dynamic changes and resistance to interference, providing strong support for the stable operation of drones in variable environments.

Analysis of Navigation Stability Experiment
Figure 14 showcases the distribution of positional and attitudinal errors and the covariance of the drone system before applying the data fusion technique proposed in this paper.From the figure, it can be observed that although the system can maintain errors within a relatively small range in the initial phase, indicating a certain degree of accuracy and stability in the short term, the errors in position and attitude exhibit a trend of gradual increase over time, manifesting as a systematic drift phenomenon.This drift leads to the divergence of covariance, reflecting an increase in the uncertainty of system state estimation.The performance degradation of the system indicates a lack of effective error correction mechanisms during long-term operation, ultimately leading to the inability of the drone's navigation system to maintain a normal working state, thereby validating the insufficiency of single-sensor or unoptimized multisensor systems in maintaining long-term positioning stability.In contrast, Figure 15 shows the error distribution and covariance after implementing the data fusion method proposed in this study.Compared to Figure 14, the covariance matrix maintained smaller fluctuations throughout the experiment period without significant divergence.This proves that the multimodal fusion positioning method proposed in this paper can effectively solve the problem of system state divergence caused by visual positioning errors, increasing the stability and robustness of state estimation, and improving accuracy based on actual operational results.This means that, under diverse operational conditions and various disturbances that may be encountered, the drone's navigation system can still maintain high positioning accuracy, ensuring the reliability and safety of the drone's mission execution.In contrast, Figure 15 shows the error distribution and covariance after implementing the data fusion method proposed in this study.Compared to Figure 14, the covariance matrix maintained smaller fluctuations throughout the experiment period without significant divergence.This proves that the multimodal fusion positioning method proposed in this paper can effectively solve the problem of system state divergence caused by visual positioning errors, increasing the stability and robustness of state estimation, and improving accuracy based on actual operational results.This means that, under diverse operational conditions and various disturbances that may be encountered, the drone's navigation system can still maintain high positioning accuracy, ensuring the reliability and safety of the drone's mission execution.
this paper can effectively solve the problem of system state divergence caused by visual positioning errors, increasing the stability and robustness of state estimation, and improving accuracy based on actual operational results.This means that, under diverse operational conditions and various disturbances that may be encountered, the drone's navigation system can still maintain high positioning accuracy, ensuring the reliability and safety of the drone's mission execution.

Experimental Analysis of Fault-Tolerant Navigation
It is evident from Figure 16 that a gradual fault was applied between 50 s and 150 s, while a sudden fault was applied between 250 s and 350 s.The component applied

Experimental Analysis of Fault-Tolerant Navigation
It is evident from Figure 16 that a gradual fault was applied between 50 s and 150 s, while a sudden fault was applied between 250 s and 350 s.The component applied pertains to the drone's x-axis direction, with little difference observed in the y-axis and z-axis components; however, the occurrence of the fault and the distinction between the two can be clearly seen in the x-axis component.pertains to the drone's x-axis direction, with little difference observed in the y-axis and zaxis components; however, the occurrence of the fault and the distinction between the two can be clearly seen in the x-axis component.In the absence of a digital twin drone data source and the utilization of a multisource, multimodal navigation algorithm, the drone system experiences abrupt changes, especially in the x-axis direction, when subjected to the applied faults, leading to system instability.This instability manifests as rapid changes in position and attitude, thereby affecting the normal operation of the drone; however, the drone system, incorporating the mul- In the absence of a digital twin drone data source and the utilization of a multisource, multimodal navigation algorithm, the drone system experiences abrupt changes, especially in the x-axis direction, when subjected to the applied faults, leading to system instability.This instability manifests as rapid changes in position and attitude, thereby affecting the normal operation of the drone; however, the drone system, incorporating the multisource multimodal navigation algorithm, demonstrates significant stability.In the face of gradual faults, despite potential delays in detection that cause some variation in the response of the multisource multimodal navigation algorithm, the variation is minimal, and the system can quickly return to a stable state.In the case of sudden faults, the multisource, multimodal navigation algorithm exhibits high robustness, with almost no impact, allowing for a smooth transition and ensuring the stability of the combined navigation system.
PX4 is an open-source flight control system that is widely used in drones and other unmanned systems.Its navigation module employs integrated navigation methods, which are more complex than the standalone GPS/IMU algorithm as it fuses data from multiple sensors.This paper uses PX4 as a benchmark with which to validate the effectiveness of the designed fusion algorithm.We conducted waypoint mission flights near tall buildings, and the PX4 flight control data were transmitted to the ground station in real-time via a wireless link to obtain the trajectory.The experimental results are shown in Figure 16.In Figure 17b, the area marked with an orange ellipse indicates the trajectory when the drone was near tall buildings and the number of satellites was fewer than six.It can be seen that the position data from the PX4 flight control system, which relies on GPS signals, began to drift and fluctuate, whereas the position estimated by the fusion algorithm only exhibited minor oscillations and remained stable.Additionally, in Figure 17a, the three-dimensional coordinates represent the relative x, y, and z positions of the drone.Overall, compared to the PX4 flight control trajectory, the drone using the data fusion algorithm maintained a smooth and consistent trajectory throughout the flight, reducing sharp fluctuations and abrupt changes.This indicates that the data fusion technology effectively filters out noise and errors in sensor data.Specifically, the digital twin technology provides a robust reference for data fusion while simulating the real drone; by utilizing predictive data generated from the digital twin model and integrating it with real sensor data, the algorithm effectively combats the instability caused by unexpected faults.In facing sensor errors or external disturbances, the system can adjust state estimates more stably.Moreover, this method also enhances positioning accuracy.As shown in Figures 16 and 17, the fluctuation range of positional errors is relatively smaller and more concentrated, indicating the precision advantage of the fusion method.Specifically, the digital twin technology provides a robust reference for data fusion while simulating the real drone; by utilizing predictive data generated from the digital twin model and integrating it with real sensor data, the algorithm effectively combats the instability caused by unexpected faults.In facing sensor errors or external disturbances, the system can adjust state estimates more stably.Moreover, this method also enhances positioning accuracy.As shown in Figures 16 and 17, the fluctuation range of positional errors is relatively smaller and more concentrated, indicating the precision advantage of the fusion method.

Conclusions and Future Work
This paper first discussed the challenges of maintaining positioning accuracy and stability during the continuous maneuvers of small drones in complex environments.To address this, it proposed a fusion positioning method that incorporates three different sensor technologies: GPS, inertial navigation, and visual navigation.After collecting relevant data from each sensor, a tightly coupled integration optimized the fusion of visual and inertial data, which were then further integrated with GPS data based on pose graph optimization principles.Several experimental comparisons of navigation combinations and navigational stability analyses were carried out.The experiments proved that the combined navigation system utilizing GPS/VO/IMU enhances positional accuracy by leveraging the strengths and mitigating the weaknesses of each individual system.This combination not only provides more precise location information but also maintains a stable system performance in complex environmental conditions, resulting in a smoother UAV flight trajectory; however, as the number of sensors increases, the system's complexity grows linearly or even at a higher rate, which in turn increases the likelihood of system failures.Consequently, this research introduced a digital twin UAV data source.By using the data generated by the digital twin model as an estimation input, combined with real UAV sensor data via the extended Kalman filter (EKF) algorithm, the approach corrects for positional information deviations caused by sensor noise and environmental disturbances.The filtered positional information is then fed back into the control system of the real UAV to achieve real-time corrections of the UAV's movement attitude and positional information.The method has been empirically proven to significantly improve the UAV's positioning accuracy, exhibiting a high level of robustness and ensuring the stability of the combined navigation system as well as the reliability of the navigation data.
In conclusion, the data fusion method proposed in this study has been theoretically and experimentally proven to be effective in addressing UAV positioning issues in complex dynamic environments.The main contributions of this paper include the following: (i) Proposing a multimodal data fusion method: This paper introduces a positioning method that tightly couples and optimally fuses data from GPS, IMU, and visual sensors.This approach successfully addresses the problem of insufficient accuracy from single sensors, significantly enhancing UAV positioning accuracy and stability, especially in complex and dynamic environments.(ii) Introducing digital twin technology: To address the increased probability of failure due to the rising number of sensors, this study establishes a high-fidelity UAV model within the digital twin.By introducing the digital twin UAV as a robust reference in the data fusion process, this innovation effectively reduces the impact of abrupt changes in sensor data on positioning accuracy.It also provides a solid foundation for the long-term stable operation of the system and offers important references as well as research directions for the design and optimization of future UAV navigation systems.(iii) Application of the extended Kalman filter algorithm: This study employs the extended Kalman filter algorithm to achieve efficient multisource data fusion.The extended Kalman filter algorithm can handle state estimation problems in nonlinear systems, correcting position information deviations caused by sensor noise and environmental interference in real time.This ensures the stable flight and precise navigation of UAVs in various complex scenarios, significantly improving positioning accuracy and system robustness in challenging environments.
Future inquiries shall pivot towards an enhanced exploration of fusion algorithm optimization tailored to accommodating more intricate environments and rigorous application demands, harnessing cutting-edge computational technologies such as edge computing or cloud computing to expedite data processing, diminishing system response latencies, and bolstering real-time application performance.An in-depth examination and refinement of the extant Kalman filter algorithms are envisaged to fortify their adaptability and precision in the management of data within highly dynamic and convoluted environmental contexts.Moreover, a deep dive into the application of digital twin technology within UAV locational matrices, particularly within the realms of fault diagnostics and prognostic maintenance, is anticipated.The construction of digital twin models for UAVs will permit the real-time surveillance of UAV operational statuses, the prognostication of potential malfunctions, and the pre-emptive enactment of preventative measures, thereby augmenting system stability and reliability.

Figure 1 .
Figure 1.Schematic diagram of the multisource, multimodal data fusion positioning method.

Figure 1 .
Figure 1.Schematic diagram of the multisource, multimodal data fusion positioning method.

Figure 2 .
Figure 2. Architecture of the multisource and multimodal data fusion system.

Figure 2 .
Figure 2. Architecture of the multisource and multimodal data fusion system.

Figure 3 .
Figure 3. Real drone and geometric model of the drone.

Figure 3 .
Figure 3. Real drone and geometric model of the drone.

Figure 3 .
Figure 3. Real drone and geometric model of the drone.

Figure 9 .
Figure 9. Schematic diagram of the pose graph.

Figure 13 .
Figure 13.The error curves of integrated navigation in the x, y, and z directions.The blue line represents the navigation error of the visual and IMU fusion, the green line represents the navigation error of the GPS and IMU fusion, and the red line represents the navigation error of the GPS, IMU, and visual fusion.These experimental results further confirm the importance of multimodal information fusion in improving the robustness of positioning systems.Especially when GPS signals are limited or the accuracy of the VO system is affected by indistinct environmental features, multisource information fusion ensures the continuity and accuracy of the

Figure 13 .
Figure 13.The error curves of integrated navigation in the x, y, and z directions.The blue line represents the navigation error of the visual and IMU fusion, the green line represents the navigation error of the GPS and IMU fusion, and the red line represents the navigation error of the GPS, IMU, and visual fusion.

23 Figure 14 .
Figure 14.Visual positioning error and covariance line chart.The red line represents the covariance curve, and the blue line represents the error line.

Figure 14 .
Figure 14.Visual positioning error and covariance line chart.The red line represents the covariance curve, and the blue line represents the error line.

Figure 15 .
Figure 15.Multimodal data fusion navigation error and covariance line.The red line represents the covariance curve, and the blue line represents the error line.

Figure 15 .
Figure 15.Multimodal data fusion navigation error and covariance line.The red line represents the covariance curve, and the blue line represents the error line.

Figure 16 .
Figure 16.The error line graph comparing multimodal with multisource, multimodal (incorporating digital twin drone sources).

Figure 16 .
Figure 16.The error line graph comparing multimodal with multisource, multimodal (incorporating digital twin drone sources).

Figure 17 .
Figure 17.Actual flight trajectories of the drone near tall buildings.(a) Comparison of the threedimensional flight trajectory lines of the drone.The red dashed line represents the trajectory data from the PX4 flight control system, while the green line represents the trajectory data from the fusion-based system.(b) Comparison of the two-dimensional planar trajectories on the ground station.The blue line represents the trajectory data from the fusion-based system, and the red line represents the flight trajectory data provided by the PX4 flight control system, the area marked with an orange ellipse indicates the trajectory when the drone was near tall buildings and the number of satellites was fewer than six.

Figure 17 .
Figure 17.Actual flight trajectories of the drone near tall buildings.(a) Comparison of the threedimensional flight trajectory lines of the drone.The red dashed line represents the trajectory data from the PX4 flight control system, while the green line represents the trajectory data from the fusion-based system.(b) Comparison of the two-dimensional planar trajectories on the ground station.The blue line represents the trajectory data from the fusion-based system, and the red line represents the flight trajectory data provided by the PX4 flight control system, the area marked with an orange ellipse indicates the trajectory when the drone was near tall buildings and the number of satellites was fewer than six.