Pose Measurement for Unmanned Aerial Vehicle Based on Rigid Skeleton

: Pose measurement is a necessary technology for UAV navigation. Accurate pose measurement is the most important guarantee for a UAV stable ﬂight. UAV pose measurement methods mostly use image matching with aircraft models or 2D points corresponding with 3D points. These methods will lead to pose measurement errors due to inaccurate contour and key feature point extraction. In order to solve these problems, a pose measurement method based on the structural characteristics of aircraft rigid skeleton is proposed in this paper. The depth information is introduced to guide and label the 2D feature points to eliminate the feature mismatch and segment the region. The space points obtained from the marked feature points ﬁt the space linear equation of the rigid skeleton, and the UAV attitude is calculated by combining with the geometric model. This method does not need cooperative identiﬁcation of the aircraft model, and can stably measure the position and attitude of short-range UAV in various environments. The effectiveness and reliability of the proposed method are veriﬁed by experiments on a visual simulation platform. The method proposed can prevent aircraft collision and ensure the safety of UAV navigation in autonomous refueling or formation ﬂight.


Introduction
UAVs are unmanned aircraft operated by radio remote control equipment and selfprovided program control devices. They are widely used in aerial photography, agriculture, express transportation, disaster relief, surveying and mapping, news reporting, disaster relief, and other fields [1,2]. Safety and stability are the most basic and important requirements for a UAV in flight. Therefore, flight control of the UAV is a particularly essential feature. Attitude calculation is an imperative aspect of flight control [3]. The estimated attitude is released to the attitude controller [4] to ensure stable flight. Therefore, the attitude estimation is the most important guarantee of flight stability.
The measurement of aircraft position and attitude can be divided into contact measurement and non-contact measurement according to the type and position of sensors. The classification of pose measurement methods is shown in Table 1. Contact measurement is also called attitude internal measurement. The measurement sensor used in this method is attached to the aircraft and moves with it. Non-contact measurement is also called attitude external measurement. In this case, the measurement sensor is not in contact with the aircraft; instead, it is usually placed on the ground or some other platform.
Several mature sensing instruments for attitude internal measurement are available, such as global navigation satellite system (GNSS) [5,6] and inertial navigation equipment (IMU) [7,8]. These equipment can directly give high-precision attitude parameters but also present a number of obvious shortcomings in their attitude internal measurement methods, which are mainly reflected in the robustness of the measurement method and measurement accuracy will become worse under certain conditions. At the same time, the attitude measurement method is affected by factors such as calculation frequency and signal reception. contact GNSS In ref. [5,6], GNSS is used to solve the aircraft pose, but the data update fremeasurement quency of GNSS is slow, and the signal transmission is easily affected.
(attitude internal IMU In ref. [7,8], IMU is used to calculate aircraft pose, but the positioning error measurement) of IMU increases with time, and the long-term accuracy is poor.
Ref. [9] proposed SoftPOSIT, which can solve pose parameters iteratively. Ref. [10] proposed the N-point perspective problem, which is a PNP problem. Monocular Ref. [11] proposed a method to generate a template to estimate the target pose. non-contact Ref. [12] performed pose measurement based on contour matching method. measurement Ref. [13] recognized the lines of aircraft to realize the structure extraction.
Ref. [14] proposed a method to solve aircraft pose by using the line feature. (attitude external Ref. [15] proposed a method to compare simulated images with real model. measurement) Binocular or Ref. [16] proposed a combined vision technology based on a multi-camera. multiocular Ref. [17,18] proposed optimization-based methods to estimate aircraft pose.
Ref. [19] extracted and clustered image line features to solve aircraft pose. Proposed Method proposes pose measurement based on rigid skeleton.
The attitude external measurement of UAV algorithms has been studied by research institutes worldwide. Some methods measure the aircraft's pose by monocular. These methods usually need measuring equipment such as theodolite and radar. Dementhon and Davis [9] proposed SoftPOSIT, an improved algorithm of POSIT. In this algorithm, the target pose parameters can be solved iteratively when the corresponding 2D-3D points are unknown. Fishler et al. [10] proposed the N-point perspective problem, which is a PNP problem. The coordinates of N points in the target coordinate system, as well as those of the projection points on the image plane, are determined to solve the relative position and attitude relationship. Hinterstoisser et al. [11] proposed a method to generate a template according to the viewpoint and match real images to estimate the target pose. Sample templates, which include color and depth gradients, are obtained from different perspectives. The real image is matched with templates from different viewpoints, thresholds are set to obtain ideal matching results, and pose estimation is completed. The robust ICP algorithm is used to achieve 3D point set matching. Liebelt performed pose measurement based on the target contour matching method [12]. Given an initial value, iterative optimization is used to minimize deviations by calculating the deviation between the result of the target edge reprojection and the actual imaging result of the target edge. The target pose is then solved. In this method, the target edge contour need to be obtained. Luo et al. [13] adopted a line segment detector to extract line segment features, mean-shift and k-means++ approaches are applied to group line segments. This method recognized the lines corresponding to the fuselage and wing in the image to realize the structure extraction of straight wing aircraft.
Studies on attitude external measurement of UAV based on multiple cameras have been conducted. Wang Bin proposed a method to solve aircraft attitudes by using the straight-line feature information of the aircraft [14]. The rotation matrix obtained by the angle bisector method is used as an initial value, and the coplanar error formula of the linear feature space of multiple camera images is established. The spatial coplanar error of all observed straight lines is used as an objective function to solve the optimal rotation matrix iteratively. Finally, the attitude parameters are analyzed. This calculation process requires the assistance of a theodolite. Li Guorong [15] proposed a method to compare simulated images with real ones and invert the target flight attitude. The pyramid-based regional matching algorithm may be used to determine the initial value of the target's pose parameters, after which the iterative optimization algorithm is refined and optimized on the basis of outer contour point set matching. Yuan et al. [16] proposed a combined vision technology based on a multi-camera, which combines multi-view and mono-view constraints for UAV attitude estimation. The particle filter framework combined with constraints to improve the accuracy of the algorithm. Zhang et al. [17,18] combined stereo vision with the geometric information of the object, proposed a method based on optimization to measure the position and attitude of the spacecraft and realized the attitude solution of the spacecraft with the non-cooperative target. Teng et al. [19] extracted and clustered image line features based on two widely separated cameras, combined with the common geometric constraints of straight wing aircraft to solve the aircraft attitude. The aircraft attitude measurement is realized under the condition of the large field of view and long-distance.
In practical applications, an image may be unavailable or the target may not be captured when a photometric device is used to capture images. Furthermore, algorithms that rely on 2D images usually need other information or equipment to solve the spatial pose accurately. Existing multi-eye measurement methods mainly match images with models or seek the correspondence between 2D and 3D points. These methods are negatively affected by factors such as inaccurate contour extraction, occlusion of key feature points, and incorrect matching in complex environments. Therefore, the pose measurement accuracy of these methods is low, and knowledge of the aircraft model information must be available.
In the process of flight, the skeleton structure of UAVs is rigid and almost not affected by external factors. Therefore, the attitude of UAV can be solved by the spatial straight line in the rigid skeleton.
This paper proposes a measurement method for the position and attitude of UAV based on a rigid skeleton that integrates 2D images and 3D point cloud information without requiring aircraft size or model information and then solves the problem of UAV aircraft attitude determination. The main contributions of this article are as follows: • A method framework based on the structural characteristics of a rigid skeleton of the aircraft is proposed to solve the position and attitude of short-range UAVs without knowledge of the aircraft model and size. • A key point screening method that combines 2D and 3D information is proposed for aircraft pose measurement. The method in this paper is based on a stereo vision measurement system. Depth information is combined with edge contour and corner point information to reduce the mismatch of aircraft key feature points effectively. • The proposed method can solve the online attitude of UAVs without identification and other auxiliary equipment, thereby effectively improving the reliability and robustness of the attitude calculation algorithm. Thus, the method can be applied to low-consumption airborne environments.
The rest of this article is organized as follows. Section 2 introduces the related knowledge and system framework. Section 3 describes the algorithm in detail. Section 4 shows the simulation experiment. The results of the entire method and comparative experiments are provided in Section 5. Finally, Section 6 summarizes the article.

System Framework
The attitude of an aircraft refers to the state between the three axes of the aircraft's coordinate system and a fixed coordinate system. The flight attitude of an aircraft affects the altitude and direction of the flight. A pilot can judge the airplane's flight attitude through the horizon when flying at low speed. However, because the attitude of the pilot body changes with the attitude of the aircraft, this type of experience-based judgment is unreliable and prone to misjudgment. Therefore, the measurement of aircraft attitude is a very important research topic in efforts to ensure the safety of aircraft navigation.

Definition of Aircraft Attitude Angle
We define the aircraft local coordinate system O p − x p y p z p , as shown in Figure 1. The center point (center of mass) of the aircraft is taken as the origin of its local coordinate system O p , the direction of the wing (pointing to the right wing) is defined as the O p x p axis, the direction of the nose is the O p y p axis, and the direction perpendicular to the aircraft is the O p z p axis. This coordinate system changes as the aircraft attitude changes. Solving the aircraft attitude is equivalent to finding the rotation angle between the current coordinate system and the initial coordinate system. In this system, the three Euler angles, namely, pitch, yaw, and roll, are used to describe the attitude of the aircraft. Pitch refers to rotation around the X axis and is also called the pitch angle θ. Yaw refers to rotation around the Z axis and is also called the yaw angle ϕ. Roll refers to rotation around the Y axis and is also called the roll angle γ.

Proposed Architecture
The position and attitude measurement of UAVs is mainly used for formation flying and autonomous refueling in the air. During this process, the rear aircraft must sail at a speed slightly faster than that of the front aircraft. The two aircraft are constantly approaching, and the observation angle and relative distance between them may change remarkably over the course of flight. Thus, measuring the relative distance and attitude between the two aircraft is necessary to ensure safe navigation. This paper focuses on the online measurement of the position and attitude of shortrange UAVs. The measurement system is shown in Figure 2. Two cameras are placed on the rear machine, video streams acquired by cameras are used as input. Our method solves the pose of the front UAV by each pair of images. images, and the straight-line equation of the wing edge space, the intersection line of the two wing planes, and the position of the front aircraft tail are subsequently calculated. Integrating the straight-line equation of the wing edge and the plane's intersection line realizes the calculation of the three attitude angles. In the next section, the specific implementation of the proposed method will be described in detail.

Research Methodology
The algorithm proposed in this paper is divided into four parts, namely the initial stage, the tracking and positioning stage, the 3D reconstruction stage, and the pose calculation stage. The algorithm framework is shown in Figure 3. In the initialization phase, the candidate region can use the pre-trained model to perform initial frame processing on the image and then apply the epipolar geometric constraints in stereo vision to detect the consistency of the candidate image. In the tracking and positioning stage, the pre-training model is used to detect the initial position of the target. The 3D positioning result obtained by the detection result, the online tracking result, the target motion estimation, and the stereo vision constitute a confidence determination formula. The confidence of the target and final position are then located. In the 3D reconstruction stage, the tracking and positioning areas are matched in the binocular camera, and, finally, the 3D data of the target area are formed. In the pose solution stage, 3D data and 2D information are combined. In the disordered 3D data, the key points on the wings and fuselage are screened and marked separately to form a key point set. The key point set is used to solve the intersection of the wing edge line and wing plane, and the space line is used to calculate the attitude angle of the target aircraft in three directions. Motion estimation and tracker module in the tracking stage can use the target position information obtained in the solving stage to update.

Initialization Stage
Before tracking and positioning, the tracking target needs to be initially positioned. In the initialization phase, an offline trained classifier (offline model) is used to find possible candidates in the image sequence that is input. Candidates we find are not all the region we want to track. Therefore, the candidate regions need to be filtered [20].
{x t li } M i=0 is the candidate found in the t th frame of the left image sequence. and {x t rj } N j=0 is the candidate region found in the t th frame of the right image sequence. M and N indicate the number of candidate regions detected. Each candidate area is x t = a t , b t , w t , h t , where a, b, w, h denotes the column and row coordinates of the top left corner and the width and height of the candidate, respectively. p li and p rj are the center points of x t li and x t rj , respectively. w li and w rj are the widths of x t li and x t rj , respectively [21]. In a fixed binocular system, the target is a single-target detection. Therefore, in the corresponding frame, in the M candidate regions {x t li } M i=0 and N candidate regions {x t rj } N j=0 corresponding to all the left and right images, only one pair of candidates should satisfied the following conditions: widths of candidates w li and w rj should satisfied w li − w rj < 0.05 × max(w li , w rj ). Based on the epipolar, the constraint shows that the relevant center points p li and p rj of should be satisfied p T rj F p li < ε (where F is the fundamental matrix and ε is a reliable threshold for judging). By applying binocular filtering, candidate in the image can be determined, and the other candidates that are incorrect are filtered out.

Tracking Stage
After the initialization is complete, the algorithm enters the tracking stage. For the corresponding frame in each image sequence, the target position is located through detection (that is, obtained by the offline classifier model), online tracking and motion estimation [22]. In this stage, the detection algorithm is the ADAboost algorithm [23] with haar feature. The tracking algorithm which used is KCF [24] with hog feature.
For the target in each image sequence, Kalman filtering [25] and the position and attitude of the target in the previous frame can be used to predict the current position of the target, and and the position of the motion estimation is represented as where a k , b k , w k , h k denotes the column and row coordinates of the top left corner and the width and height of the motion estimation x k respectively. Furthermore, an ROI (region of interest) is generated around the estimated position for subsequent local detection, which is three times the size of the width w k and height h k .
The offline classifier is used to detect the target, and {x di } M i=0 is the candidate found in the current frame. Each candidate area is denotes the column and row coordinates of the top left corner and the width and height of the candidate x d respectively. p d is the center point of x d . Furthermore, the spatial center position p cen (x cen , y cen , z cen ) of the previous frame is calculated to be projected onto the image plane to obtain its coordinates p pcen (u pcen , v pcen ).
The shape and position of the drogue do not mutate throughout the process. Therefore, the detection candidates which detected by detection algorithm close to the region given by the motion estimation have high confidence. The judgment of confidence of the detection candidate can be completed by the following: and c ds and c dc are defined as The function f is used to remap the value of p N within 2σ of (a k , b k ) to the interval [0, 1]. s is also a normal distribution, which is given according to the distance from the center point p d to the projected point p pcen , similar as c dc , here σ = u 2 pcen + v 2 pcen . If there are multiple candidates, the value with the highest confidence is selected as the detection result.
Similarly, the confidence judgment formula for online tracking is as follows: The parameter definition is similar to that in the confidence of the detection candidate.
For the positioning process of each frame, the target state can be defined as {x, c}, where x is the target position and c is the confidence of the position. Given motion estimation x k , detection x d and online tracking x o , the confidence evaluation function is as follows: where θ 1 and θ 2 are the reliability thresholds of detection and tracking confidence, respectively. The confidence of detection and tracking is compared with a set reliability threshold to locate the current frame target. If the detection confidence is 0 for 20 frames, the confidence of the entire algorithm is considered to be 0, and the algorithm returns to the initialization stage.

3D Reconstruction Stage
This study uses a semi-global matching algorithm to solve the disparity problem of input image pair [26]. The algorithm idea is to form a disparity map by selecting the disparity of each pixel. The global energetic function associated with the disparity map is set to minimize the energetic function to achieve the optimal goal of each pixel. The energetic function that depends on the disparity image D is as follows: where p and q represent pixels. D represents the disparity image. D P and D q represent the disparity between p and q, respectively. C p, D p is the matching cost of pixel p with respect to disparity D p , which can be solved by the sum of absolute differences. q is in the neighborhood N p of p, and its disparity changes slightly (that is, 1 pixel). The operator T[]) is the probability distribution of the corresponding intensity, which is 0 if its argument is false and 1, otherwise. P 1 and P 2 represent a penalty coefficient of 1 and a disparity difference greater than 1, respectively. Solving using the energetic function in the Formula (6) is an NP-complete problem. It is approximated as multiple one-dimensional problems and solved by dynamic programming [27]. For example, considering the direction from left to right, each pixel is only related to its coordinate pixel. The formula is as follows: where L r (p, d) represents the minimum cost of the current pixel when the disparity of the current pixel is d. Here, r can be understood as the neighboring pixels of the pixel p in this direction.
Costs in the eight directions are accumulated, and the smallest accumulated cost is selected as the final disparity of the pixel. The cumulative formula is expressed as follows: A disparity map of the entire image can be obtained by performing the above operation for each pixel in image. Because the wing of the aircraft is symmetrical, according to the plane's own coordinate system and the plane's top view, the straight lines l 1 and l 2 are in the plane coordinate system O p − x p y p z p in O p x p y p . The projected line on the plane has the same angle relative to the x p axis and is axisymmetric about y p . Therefore, the aircraft yaw angle can be solved according to the space straight lines l 1 and l 2 . According to the rear view and coordinate system of the aircraft, similar to the yaw angle solution, the roll angle of the aircraft can be calculated on the basis of the space straight lines l 1 and l 2 . The projections of the space straight lines l 1 and l 2 on the O p y p z p plane are challenging to distinguish and easily affected by noise; thus, the pitch angle is unstable. From our analysis of the side view of the aircraft, the intersection line of the left and right wing planes l 3 on the O p y p z p plane is parallel to the fuselage line. Therefore, the pitch angle of the aircraft can be solved by the angle between this intersection and the plane.  In summary, the process of solving the 3D attitude angle of the aircraft is transformed into the process of solving the space straight lines l 1 and l 2 where the rear edges of the left and right wings of the aircraft are located, and the intersection line l 3 of the two wing planes.

Solving Stage
The proposed method uses the disparity map I d to obtain the target area of the aircraft and then count the points in this map according to the columns. The disparity map can be divided into three regions according to the number of corresponding points in the columns, namely, the left wing region I d l , the fuselage area I d c , and the right wing area I d r .
The proposed method extracts the edge space points of the left wing of the aircraft, that is, filter the edge point set in the left wing area. Next, the proposed method calculates the corresponding depth map I depth l according to the disparity map I d l . Because the wing area is projected on the image plane according to the perspective projection, the order of the wing part in each column of the image is from far to near. The input image has a one-to-one correspondence with the disparity map; thus, the left wing edge point set {p edge l } can be filtered out according to the depth information of the corresponding points obtained by the disparity map calculation. In the process of screening the edge point set,  According to the point set {p edge l } fit the space straight line L edge l on the edge of the left wing: x Similar to the above solution process, the corresponding depth map I d r is calculated for the right wing area I depth r , and the right wing edge space point set {p edge r } is obtained. The fitting is the edge space line of the right wing L edge r : According to the linear equations of L edge l and L edge r , the angles ϕ edge l and ϕ edge r between their projection lines in O p x p y p of the aircraft coordinate system (i.e., Oxz plane of the camera coordinate system) and the axis of x can be solved. Given the symmetrical properties of the wings, the initial state angle can be defined as ϕ 0 , and the yaw angle of the aircraft is ϕ. The corresponding relationship is shown in Figure 6: Because the angles between the two linear space vectors and the positive direction of the x axis are acute, the formula for calculating the yaw angle ϕ of the aircraft is as follows: The aircraft pitch angle θ is solved by the intersection of the two wing planes. Although the disparity map I d can be divided into the wing area and the fuselage area, errors may be encountered during the process of fitting the plane because a large number of noise points in the 3D point cloud are calculated in the stereo matching algorithm. Moreover, because the amount of point cloud data is large, the calculation efficiency of the method may be negatively affected. Therefore, this paper uses the spatial point set formed by the feature points in the wing area to fit the plane and calculate the intersection of the wing plane. Here, the proposed method use the disparity map I d and its area distinction to filter the matching feature points, constrain the matching of the feature points through the disparity (i.e., depth information) corresponding to the feature points, eliminate the incorrectly matched points on the same horizontal line, as shown in Figure 7. The feature used in the image is ORB feature [29].
The 3D coordinates of each point in the feature set {p f ea l } are calculated, and the plane equation of the left wing is fitted as follows: A wing l x + B wing l y + C wing l = z (13) In the same way, the feature point set on the right wing {p f ea r } can be extracted through the above process, and the right wing plane equation can be fitted as shown below: A wing r x + B wing r y + C wing r = z The normal vectors − → M wing l and − → M wing r of the plane where the two wings are located are obtained, and the cross product are used to calculate the direction vector − → M l 3 of the plane intersection l 3 .

− →
The angle between the projection of the direction vector − → M l 3 of the plane intersection line l 3 in O p y p z p plane of the aircraft coordinate system (i.e., Oyz plane of the camera coordinate system) and the y axis of the aircraft coordinate system (i.e., the z axis of the camera coordinate system) is the pitch angle θ of the aircraft.
According to the conventional aircraft shape, the position of the aircraft can be spatially located through the tail of the aircraft. The point with the smallest depth information is found in the fuselage area I d c in the disparity map. The difference between the disparity of the point in the adjacent area is used to eliminate the noise points, and the position p tail (x tail , y tail , z tail ) of the tail of the aircraft is obtained.
The proposed method is presented in Algorithm 1. After calculating the aircraft's space pose, its position information will be adopted to update the confidence of the tracking stage. if I l , I r is initialized then end if x l &&x r = 0 then I depth = disparity(x l , x r ) ;

Simulation Experiment
Plane fitting in the pose measurement process may exert a great impact on the accuracy of the method. Therefore, a simulation experiment is carried out on the plane fitting method. A plane is constructed in the space, several points in the plane are projected onto the corresponding plane of the binocular camera, noises of 0.25, 0.5, and 1 pixel are added to the image plane points, and the method is verified. In different noise environments, within the range of 10 to 32 m, the error is between the solved plane and the constructed plane equation. Figure 8a shows the result of the method obtained under a 0.25-pixel noise. The angle error is approximately 1 • within 32 m. Figure 8b shows the result of the method obtained under a 0.5-pixel noise. The results show that the error basically increases as the distance increases. The error reaches 2.38 • . Figure 8c shows the result of the method obtained under a 1-pixel noise. The error reaches 2.44 • at 25 m. At 32 m, the error is close to 4 • , which exerts a greater impact on the measurement accuracy.  Corresponding to the experimental conditions, the pixel noise is used as a variable, and simulation experiments are performed at 10, 20, and 30 m, as shown in Figure 9. Figure 9c shows that at 30 m, after the error of more than 0.65 pixels, the angle error has exceeded 2 • ; when the noise is close to 1 pixel, the angle error is close to 4 • , which echoes the previous experiment.  In summary, the simulation experiment shows that the matching error must be reduced as much as possible to meet actual needs. This error should ideally be controlled to 0.5 pixels, that is, the angle error is controlled at about 2 • , which can better meet the demand.

Physical Experiment
In this section, the complete experiment process and analysis are discussed. We then introduce the experimental environment, datasets, and experimental procedure.

Implementation Details
The algorithm in this paper is verified on a digital simulation platform. The visual simulation software used for display can simulate the effect of UAV navigation in various geographical environments, altitudes, and weather environments. The image sequence outputted by the simulation software can simulate the images taken by the dual cameras and output the true values of the real position and attitude. These images can be used as datasets to verify and compare measurement methods for the position and attitude of the UAV. Because of the limited parameters in the simulation software, the roll angle of the aircraft attitude is set to a fixed value of 0.
The dataset used is shown in Figure 10: The proposed method is written in vs2013+OpenCV 2.4.13 using C++ and ran on a computer with an Intel i7-6820HQ (2.7 GHz) CPU, 16 GB RAM, and an NVIDIA Quadro M2000M GPU. The resolution of the image sequence collected by the simulated binocular stereo vision system is 888 × 500 pixels. The internal parameters of the virtual camera are fixed, the focal length is a x = a y = 1080 pixels, the principal point coordinates are (u 0 , v 0 ) = (444, 250) pixels. The rotation matrix in the stereo vision system is the identity matrix, the translation vector is T = (0, 0, 1000) mm.
The proposed method in this study is formed by the combination of motion estimation, an offline model, and online tracking. The proposed method uses ADAboost [23] with the haar feature as the offline model and KCF [24] with the hog feature as the online tracking algorithm. Some parameters must be set in each stage of the proposed method. During classifier pre-training, the number of stages is set to 20 based on experience. The minimum detection rate of each class of strong classifier is set to 0.999. In the tracking component, the linear interpolation factor for adaptation is set to 0.012 and the Gaussian kernel bandwidth is set to 0.6. In the solving stage, the method of feature extraction is ORB feature [29]. Other parameters are set to default values.

Measurement Accuracy Evaluation
This article considers different weather conditions, angles, and flying altitudes (i.e., different backgrounds) and collects eight datasets, including aircraft out of view and partial occlusion, among others. The measurement distance is approximately 14-33 m. The aircraft pose was measured on these eight datasets and compared with the real values recorded by the visual simulation software. The attitude angles and distance RMS values of the eight datasets are shown in Figures 11 and 12, respectively.   Figure 13 shows the error values of the three attitude angles of some measurement data. In most datasets, the RMS of the roll angle is small and stable within 0.5 • . However, in some of the datasets, the error of the roll angle is relatively large because of the influence of the background environment and other attitude angles. In this case, the RMS may reach 1.9 • . The RMS of the yaw angle is stable at approximately 1.0 • but may occasionally be as high as 1.7 • . The error of the pitch angle is the largest among the errors of the three attitude angles. In most datasets, the RMS of the pitch angle is stable at approximately 2.0 • ; in some datasets, the values obtained may rise to 2.5 • . In summary, the RMS of most of attitude angles which calculated by the proposed method is within 2.0 • between 14 and 33 m. During the determination of the relative position of UAVs, distance, that is, the depth direction in the camera coordinate system, is the most important measurement. The proposed method calculates the distance between drones in each dataset. In all datasets, the RMS of the distance measurement result is 0.61 m. The data in Figure 12 show that most of the RMS values of the distance measurement results in all datasets are within 0.84 m.
Only one dataset, where the true distance is approximately 25-30 m, has an RMS of 1.08 m. Figure 14 shows the RMS values of the results in this dataset. Our findings indicate that the proposed method can be used for collision avoidance measurements when the aircraft is sailing in the air. The aircraft attitude angles calculated by the proposed method are shown in Table 2. The error contrast between the yaw angle measured by the proposed method and the true value is relatively small and remains within 1.5 • . The error of the roll angle is clearly distinguished. When the actual yaw angle is small, the measurement error of the roll angle is small. When the yaw angle is large, the measurement error of the roll angle is large, but the RMS remains within 2 • . The error of the pitch angle is well within a stable range, and most of the measurement errors are within approximately 2.0 • . While some measurement values are occasionally greater than 2.5 • , which could affect the measurement accuracy to a certain extent, it will not bring devastating results and the safety of aircraft navigation can still be ensured. The method proposed in this paper is compared with conventional methods of calculating aircraft attitude, including PNP, EPNP [30], and Li's method [31], to verify its validity and reliability. PNP is a traditional pose estimation algorithm, which estimates the pose of the target through multiple space points and their projection positions. Li's method is based on parallel vision and uses the triangulation method for stereo matching and 3D reconstruction. Furthermore, the pose estimation algorithm of Li's method is also widely used. So they were selected for comparison to validate our proposed method. In this experiment, eight key corners of the aircraft are artificially marked for the comparison algorithm to calculate the aircraft attitude. The speed of the measurement method in this paper is 12 fps. Because some data of other methods are extracted manually, the calculation speeds of the algorithms cannot be compared. The eight points extracted are shown in Figure 15. The measured RMSEs of aircraft attitude angle and depth distance are shown in Table 3. The RMSE of the three attitude angles are 1.1 • , 0.8 • and 2.1 • . The data show no large gap between the accuracy of the proposed method in this article and the comparison method, but the proposed method can be applied to the aircraft attitude when some key points, such as the wing and tail part, are blocked. In the case of the comparison method, the extracted 2D point data do not correspond to the 3D points, and large deviations in the measurement results are noted. These deviations present a safety hazard during the flight of the aircraft. Li's method has the same accuracy as the proposed method in distance measurement, but it has a larger angle error in attitude measurement. In Li's method, when calculating three-dimensional space points, a large error will be introduced because of the long distance of the object. The attitude is calculated through the corresponding three-dimensional points. So the result of attitude calculation is inaccurate.  Figure 16 shows some of the measured attitude angle data. The yaw and roll measured by the comparison algorithm produces serious errors in images 22 and 23. This error is caused by the UAV partially occluding the right wing in the image, which results in a corresponding point error. The pitch also shows obvious errors in images of 22-24, 34-40, and 46 because the wing plane is linearized in the image on account of the occlusion of the wing and the small pitch angle, thereby resulting in inaccurate lifting points. By comparison, the measured values of the proposed method are stable within a certain error range. This finding confirms the robustness and reliability of the proposed method.

Conclusions
Aimed at the flying formation and AAR of UAVs, this paper proposes a UAV pose measurement method based on rigid skeleton structure, which uses the spatial linear equation of rigid skeleton combined with the geometric model to solve the aircraft pose. This paper first proposes a key point extraction method that integrates 2D image data with 3D depth information to solve the problem of reduced feature point mismatch. This method improves the accuracy and stability of the pose measurement method. Compared with other methods, the proposed method in this paper can solve the problem of UAV attitude calculation without the need for auxiliary equipment, such as a theodolite, or remote transmission of data signals. Thus, the proposed method effectively improves the reliability and robustness of the pose measurement method and provides a safety guarantee for the aerial flight of UAVs. The proposed method can measure the position and attitude of UAVs without the need for knowledge of the aircraft model and size. The RMS error of the attitude angle is within 2.1 • and the RMS error of distance measurement is within 0.61 m. The calculation frequency is 12 fps, which meets the collision avoidance measurement conditions of short-range UAVs in flight. The effectiveness and feasibility of the proposed method are verified in a simulation environment. The proposed method does not have a complex framework, which is easy to be implemented and used in actual project engineering.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: UAV Unmanned Aerial Vehicle AAR Automated Aerial Refueling