Monocular Vision-Based Pose Determination in Close Proximity for Low Impact Docking

Pose determination in close proximity is critical for space missions in which monocular vision is one of the most promising solutions. Although numerous approaches such as using artificial beacons or specific shapes on spacecrafts have proved to be effective, the high individuation and the large time delay limit their use in low impact docking. This paper proposes a unified framework to determinate the relative pose between two docking mechanisms by treating their guide petals as measurement objects. Fusing the pose information of one docking mechanism to simplify image processing and creating an intermediate coordinate system to solve the perspective-n-point problem greatly improve the real-time performance and the robustness of the method. Experimental results show that the position measurement error is within 3.7 mm, while the rotation error around docking direction is less than 0.16°, corresponding to a measurement time reduction of 85%.


Introduction
Low impact docking [1] is a subject of intense research in the context of current docking systems. It is widely used in on-orbit servicing (OOS) [2], comet and asteroid exploration [3,4], and active debris removal (ADR) [5]. One of its core technologies is pose determination in close proximity. Pose determination generally refers to computing the relative position and the attitude between objects. The relative pose is unambiguously identified by six degrees of freedom (DOFs)-three DOFs for the relative position and three DOFs for the relative attitude. For low impact docking, pose determination occurs over a distance of less than several meters (depending on the size of the target), and its ultimate goal is to obtain the relative pose between the docking mechanisms of two spacecrafts (either cooperative or non-cooperative) with high speed, precision, and robustness. Most current methods are indirect, measuring the relative pose between two spacecrafts and then calculating the relative pose between two docking mechanisms by using the assembly relation between spacecraft and docking mechanism.
Over the last decades, state-of-the-art techniques and algorithms have been developed for cooperative and uncooperative pose determination by electro-optical (EO) sensors [6]. EO sensors have a low power consumption and can be used to estimate all pose parameters. Consequently, such sensors are the preferred instruments for this application. In general, EO sensor systems can be classified as passive systems, systems consisting of single (monocular) or multiple (stereo) cameras, and active light detection and ranging (LIDAR) systems. Among these systems, monocular vision systems have the lowest hardware complexity and cost and can be used for remote monitoring. A stereo vision system uses more than one camera, enabling it to acquire three-dimensional (3D) information about the target. However, monocular and stereo vision systems suffer from the same handicaps as all vision systems-sensitivity to illumination conditions and difficulty segmenting objects from complex backgrounds. In contrast, LIDAR is robust to differences in illumination and can obtain both position and intensity data in 3D; however, a LIDAR system consumes more energy and exhibits

Problem Formulation
To make the subsequent description and derivation process clearer and easier to understand, we first define the relevant coordinate systems in the first part of this section. Then, the rest of this section presents a comparison of the pose determination methods and theoretically proves the superiority of the proposed method.

Definition of the Coordinate Systems
Here, we define the coordinate systems used in this paper: the camera coordinate system (CCS), the docking mechanism coordinate system of the chasing spacecraft (DMCS_CS), the docking mechanism coordinate system of the target spacecraft (DMCS_TS), the mark coordinate system (MCS), the chasing spacecraft coordinate system (CSCS), and the target spacecraft coordinate system (TSCS). In general, a spacecraft consists of the spacecraft body, the docking mechanism, a camera or marks, and other equipment, such as a pair of long solar panels, an antenna, and/or a space manipulator (see Figure 1a,b). The defined coordinate systems are shown in Figure 1c and described in Table 1. When defining the CCS, we assume that the camera is an ideal camera, namely, that its principal point coincides with the center of the image and that it has zero skew and an aspect ratio of 1. The origin O C of the CCS is located at the optical center of the camera, and its distance from the center of the image is the focal length f. The Z C axis coincides with the camera's optical axis and points to the object being measured. The X C and Y C axes are parallel to the X and the Y directions of the imaging plane, respectively. The homogeneous variation matrix of the CCS in the CSCS is 1 T C . Note that the MCS represents the coordinate system of either the artificial beacons or the specific shape of the spacecraft used to determine the target pose, as mentioned above. Therefore, only the homogeneous variation matrix 2 T M of the MCS in the TSCS is given. To make the subsequent description and derivation process clearer and easier to understand, we first define the relevant coordinate systems in the first part of this section. Then, the rest of this section presents a comparison of the pose determination methods and theoretically proves the superiority of the proposed method.

Definition of the coordinate systems
Here, we define the coordinate systems used in this paper: the camera coordinate system (CCS), the docking mechanism coordinate system of the chasing spacecraft (DMCS_CS), the docking mechanism coordinate system of the target spacecraft (DMCS_TS), the mark coordinate system (MCS), the chasing spacecraft coordinate system (CSCS), and the target spacecraft coordinate system (TSCS). In general, a spacecraft consists of the spacecraft body, the docking mechanism, a camera or marks, and other equipment, such as a pair of long solar panels, an antenna, and/or a space manipulator (see Figure 1a and b). The defined coordinate systems are shown in Figure 1c and described in Table 1. When defining the CCS, we assume that the camera is an ideal camera, namely, that its principal point coincides with the center of the image and that it has zero skew and an aspect ratio of 1. The origin OC of the CCS is located at the optical center of the camera, and its distance from the center of the image is the focal length f. The ZC axis coincides with the camera's optical axis and points to the object being measured. The XC and YC axes are parallel to the X and the Y directions of the imaging plane, respectively. The homogeneous variation matrix of the CCS in the CSCS is Note that the MCS represents the coordinate system of either the artificial beacons or the specific shape of the spacecraft used to determine the target pose, as mentioned above. Therefore, only the homogeneous variation matrix   Figure 1).    Figure 1).

Comparison of Different Methods
No measurement is exact, as is well known. When a quantity is measured, the outcome depends on the measuring equipment, the measurement procedure, the environment, and other factors. Based on different types of measurement objects, we distinguish two possible measurement methods. For the first method (M1), the measurement objects are artificial beacons or specific spacecraft shapes. For the second method (M2, our proposed method), the measurement objects are the guide petals of the docking mechanisms. In the introduction, we explain the limitations of current methods. In order to further evaluate the performance of different measurement methods, we compare the uncertainty, the error, and the frequency. Under the same conditions, the same EO sensor and assembly accuracy, the measurement quality of the methods can be evaluated by comparing the measurement uncertainty. The lower the measurement uncertainty is, the better the quality is. The detailed process is described as follows.
First, expressions for the relative poses between the docking mechanisms under the two methods are derived via homogeneous transformation as follows, where Equation (1) is for M1 and Equation (2) is for M2: and Refer to the coordinates shown in Figure 1c, A T B represents the homogeneous transformation matrix from coordinate system A into coordinate system B. It is a 4 × 4 square matrix consisting of a rotation matrix A R B and a translation matrix A P B , i.e., A T B = A R B A P B 0 1 . Suppose that N is the quantity to be measured and that x, y, z . . . are direct measurements, such that N = f(x, y, z . . .). From the measurement uncertainty formula, one can obtain the measurement uncertainty of N as follows: Then, by substituting Equations (1) and (2) into Equation (3), we can obtain the measurement uncertainty formulas for the two methods, as shown in Equations (4) and (5). For M1, the measurement uncertainty formula is: where x 1 = CS T 1 , y 1 = 1 T C , z 1 = C T M , w 1 = M T 2 and u 1 = 2 T TS . For M2, the measurement uncertainty formula is: where x 2 = CS T C and y 2 = C T TS . σ x 1 , σ y 1 , σ w 1 , and σ u 1 (the measurement uncertainties of CS T 1 , 1 T C , M T 2 , and 2 T TS , respectively) are mainly due to E 1 (manufacturing error and assembly error); thus, it is reasonable to assume that σ x 1 = σ y 1 = σ w 1 = σ u 1 . σ z 1 and σ y 2 (the measurement uncertainties of C T M and C T TS , respectively) are mainly due to E 2 (measurement error from pose determination); thus, σ z 1 = σ y 2 . For σ x 2 (the measurement uncertainty of CS T C ), we choose the source with the least error: Therefore, σ > σ . Finally, the results can be summarized as follows: • As shown in Equations (1) and (2), our proposed method M2 works well for pose measurement for both cooperative and non-cooperative targets, and it is much simpler and more efficient than the existing method M1.

•
The measurement accuracy of M2 is higher than that of M1 since σ > σ .

Design of the Monocular Vision System
We designed a monocular vision system for determining the relative pose between two docking mechanisms for low impact docking. This system, especially the active light source, is described in detail in this chapter.

Architecture of the Monocular Vision System
The monocular vision system consists of three parts, namely, an active light source, a camera, and a data processing computer, as shown in Figure 2. The active light source is mounted on the docking ring of the active docking mechanism. It moves with the docking ring to provide active illumination; its detailed structure is introduced in Section 3.2 below. Because of the short measuring distance, the small range of movement of the docking mechanism and the high measurement accuracy required, we chose the Manta G-419B camera produced by Allied Vision Technologies. The physical resolution is 2048 × 2048 pixels, and the cell size is 5.5 µm. The maximum frame rate at full resolution is 28.6 fps, and the lens' theoretical focal length is 8 mm. As mentioned above, the camera is installed inside the docking mechanism, i.e., on the hatches (see Figure 1), and will not affect the normal passage of astronauts and cargo. The camera periodically captures images and transmits them to the data processing computer via a gigabit Ethernet (GigE) interface. Then, the data processing computer performs image processing and pose calculation to obtain the relative position between the docking mechanisms. In the method proposed in this paper, the spacecraft as a whole is not the measurement object; instead, only the vertices of the guide petals (see Figure 3) are measured. Therefore, the monocular vision system is designed to capture a clear image of the guide petals.

Design of the active light source
In a vision system designed to determine the relative pose for low impact docking in a complex space environment, illumination becomes a key factor. It is particularly difficult to maintain the objects to be measured under suitable illumination conditions. The typical, simple method is to use an integrating sphere as a uniform source to illuminate the target. Then, all objects in the field of view have a similar grayscale range. However, segmenting the target object from the background requires complex image processing algorithms, and this complexity seriously affects the stability and the realtime performance of the system. Considering this restriction, a distributed active light source was designed, which can adaptively adjust both the brightness and the illuminated area. The active light source consists of three arc-shaped LED panels that are mounted at even intervals on the inner wall of the active docking ring (see Figure 2) and move with the active docking ring to accommodate changes in the relative pose. During the close-proximity docking process, only the guide petals of the two docking mechanisms are illuminated. Each LED panel consists of a panel, multiple LEDs, and a diffuser film (see Figure 4a). Because of the diffuser film, the light from the LED point sources is diffused into 120 degrees of diffuse light. Thus, the guide petals are uniformly illuminated, while the surrounding objects are not, as shown in Figure 4b and c. Hence, the active light source ensures that the guide petals remain under suitable lighting conditions throughout the docking process. (a)

Design of the active light source
In a vision system designed to determine the relative pose for low impact docking in a complex space environment, illumination becomes a key factor. It is particularly difficult to maintain the objects to be measured under suitable illumination conditions. The typical, simple method is to use an integrating sphere as a uniform source to illuminate the target. Then, all objects in the field of view have a similar grayscale range. However, segmenting the target object from the background requires complex image processing algorithms, and this complexity seriously affects the stability and the realtime performance of the system. Considering this restriction, a distributed active light source was designed, which can adaptively adjust both the brightness and the illuminated area. The active light source consists of three arc-shaped LED panels that are mounted at even intervals on the inner wall of the active docking ring (see Figure 2) and move with the active docking ring to accommodate changes in the relative pose. During the close-proximity docking process, only the guide petals of the two docking mechanisms are illuminated. Each LED panel consists of a panel, multiple LEDs, and a diffuser film (see Figure 4a). Because of the diffuser film, the light from the LED point sources is diffused into 120 degrees of diffuse light. Thus, the guide petals are uniformly illuminated, while the surrounding objects are not, as shown in Figure 4b and c. Hence, the active light source ensures that the guide petals remain under suitable lighting conditions throughout the docking process. (a)

Design of the Active Light Source
In a vision system designed to determine the relative pose for low impact docking in a complex space environment, illumination becomes a key factor. It is particularly difficult to maintain the objects to be measured under suitable illumination conditions. The typical, simple method is to use an integrating sphere as a uniform source to illuminate the target. Then, all objects in the field of view have a similar grayscale range. However, segmenting the target object from the background requires complex image processing algorithms, and this complexity seriously affects the stability and the real-time performance of the system. Considering this restriction, a distributed active light source was designed, which can adaptively adjust both the brightness and the illuminated area. The active light source consists of three arc-shaped LED panels that are mounted at even intervals on the inner wall of the active docking ring (see Figure 2) and move with the active docking ring to accommodate changes in the relative pose. During the close-proximity docking process, only the guide petals of the two docking mechanisms are illuminated. Each LED panel consists of a panel, multiple LEDs, and a diffuser film (see Figure 4a). Because of the diffuser film, the light from the LED point sources is diffused into 120 degrees of diffuse light. Thus, the guide petals are uniformly illuminated, while the surrounding objects are not, as shown in Figure 4b,c. Hence, the active light source ensures that the guide petals remain under suitable lighting conditions throughout the docking process. changes in the relative pose. During the close-proximity docking process, only the guide petals of the two docking mechanisms are illuminated. Each LED panel consists of a panel, multiple LEDs, and a diffuser film (see Figure 4a). Because of the diffuser film, the light from the LED point sources is diffused into 120 degrees of diffuse light. Thus, the guide petals are uniformly illuminated, while the surrounding objects are not, as shown in Figure 4b and c. Hence, the active light source ensures that the guide petals remain under suitable lighting conditions throughout the docking process.

Key algorithms of the monocular vision system
The algorithmic framework for monocular vision-based pose determination is shown in Figure  5. It has two key components: multitarget tracking and pose determination. The details of these steps are presented in Sections 4.1 and 4.2.

Multitarget tracking
Because of the complexity of the space environment, the structure of the target spacecraft and the imaging characteristics of a monocular vision system with a high original image resolution (2048 × 2048), it is challenging to design a monocular vision system with high performance, low algorithm complexity, and insensitivity to the pose and the geometry of the target spacecraft. To solve these problems, we introduce the pose information of the active docking mechanism for multitarget tracking.

Key Algorithms of the Monocular Vision System
The algorithmic framework for monocular vision-based pose determination is shown in Figure 5. It has two key components: multitarget tracking and pose determination. The details of these steps are presented in Sections 4.1 and 4.2.

Key algorithms of the monocular vision system
The algorithmic framework for monocular vision-based pose determination is shown in Figure  5. It has two key components: multitarget tracking and pose determination. The details of these steps are presented in Sections 4.1 and 4.2.

Multitarget tracking
Because of the complexity of the space environment, the structure of the target spacecraft and the imaging characteristics of a monocular vision system with a high original image resolution (2048 × 2048), it is challenging to design a monocular vision system with high performance, low algorithm complexity, and insensitivity to the pose and the geometry of the target spacecraft. To solve these problems, we introduce the pose information of the active docking mechanism for multitarget tracking.

Multitarget Tracking
Because of the complexity of the space environment, the structure of the target spacecraft and the imaging characteristics of a monocular vision system with a high original image resolution (2048 × 2048), it is challenging to design a monocular vision system with high performance, low algorithm complexity, and insensitivity to the pose and the geometry of the target spacecraft. To solve these problems, we introduce the pose information of the active docking mechanism for multitarget tracking.

ROI Extraction
At the beginning of the low impact docking process, the relative pose between the docking mechanisms is within a certain range, as described by the initial docking conditions. To achieve low impact docking, the pose of the active docking mechanism must be adjusted during the docking process. The regions of interest (ROIs), namely, the areas corresponding to the guide petals in the image, are related to the pose of the active docking mechanism. There are six square ROIs in the image during the docking process; we define the center coordinates of each ROI as P roi i = (u i , v i ) (i = 1, . . . , 6) and the length of each ROI as d roi (in pixels). The derivation process of P roi i and d roi is as follows. As shown in Figure 3, the coordinates CS P i = ( CS X i , CS Y i , CS Z i , 1) T (i = 1, . . . , 6) denote the geometrical center of each guide petal, and D is the true length corresponding to d roi (the side length of each square ROI). CS P 1 , CS P 3 , and CS P 5 are located on the active docking mechanism, and CS P 2 , CS P 4 , and CS P 6 are located on the hypothetical passive docking mechanism, as shown in Figure 6. C T ideal CS is the ideal homogeneous transformation matrix for the CCS to the DMCS_CS, and it is not an exact value. T CS is the homogeneous transformation matrix of the active docking mechanism, i.e., the pose information. By applying a homogeneous transformation, CS P i can be transformed into the CCS to . . , 6):  As shown in Figure 3, the coordinates ( , , Substituting CS i P into Equation (6), we obtain Equation (7) as follows: Substituting CS P i into Equation (6), we obtain Equation (7) as follows: Sensors 2019, 19, 3261 9 of 17 Suppose that the camera's interior and external parameters are K and T. The three-dimensional point (X W , Y W , Z W ) in the world coordinates can map to the two-dimensional pixel point (u, v): After calibration, we can obtain the interior parameter. The external parameter is related to C T ideal CS and T CS . Thus, CS P i can be obtained as follows: In addition, since the rotations of the docking mechanisms during the docking process are relatively small, d roi , the length of each ROI in pixels, is mainly related to the distance of the active docking mechanism: f Then, Thus, using ROI extraction, we can track the measurement objects accurately and rapidly, as shown in Figure 7. In addition, since the rotations of the docking mechanisms during the docking process are relatively small, roi d , the length of each ROI in pixels, is mainly related to the distance of the active docking mechanism: ( 1, , 6). Then, ( 1, ,6).
Thus, using ROI extraction, we can track the measurement objects accurately and rapidly, as shown in Figure 7.

Image processing
After the ROI extraction, we detect the twelve vertices of the six guide petals with a series of image processing algorithms. The steps of this algorithm include image filtering, edge detection, line extraction, and feature acquisition.
In general, noise is introduced into a visual system, and this noise can contaminate the images acquired by the system. It is necessary to filter out the noise in an image before edge detection. For image filtering, the most common methods include normalized box filtering, Gaussian filtering, median filtering, and bilateral filtering. To balance computational speed and filtering performance, we choose median filtering [21]. Median filtering is a nonlinear image smoothing technique that sets the gray value of each pixel to the median value among all pixels in a selected neighborhood. This technique can protect the edges in an image such that they are not blurred when the noise is filtered out. The mathematical expression of the median filtering process is as follows:

Image Processing
After the ROI extraction, we detect the twelve vertices of the six guide petals with a series of image processing algorithms. The steps of this algorithm include image filtering, edge detection, line extraction, and feature acquisition.
In general, noise is introduced into a visual system, and this noise can contaminate the images acquired by the system. It is necessary to filter out the noise in an image before edge detection. For image filtering, the most common methods include normalized box filtering, Gaussian filtering, median filtering, and bilateral filtering. To balance computational speed and filtering performance, we choose median filtering [21]. Median filtering is a nonlinear image smoothing technique that sets the gray value of each pixel to the median value among all pixels in a selected neighborhood. This technique can protect the edges in an image such that they are not blurred when the noise is filtered out. The mathematical expression of the median filtering process is as follows: where f (x, y) and g(x, y) are the original and the processed images, respectively, and W is a two-dimensional template. This template usually has square dimensions of 3 × 3 or 5 × 5; alternatively, it can be a different shape, such as a line, circle, or cross. Figure 8a illustrates the effect of median filtering.
To calculate the intersection between two such centerlines, we assume that the two lines are represented by 1 1 ( , ) r θ and 2 2 ( , ) r θ . According to Equation (13), we can establish the following linear equations: where ( , ) u v denotes the intersection coordinates, representing the vertex of the guide petal, in Cartesian space. Thus, ( , ) u v can be solved for as follows: Finally, we obtain the twelve vertices of the six guide petals, as shown in Figure 8d. For edge detection, the Canny operator is well known as an adaptable and efficient operator [21]. Hence, it is used in this paper to detect the edges of the guide petals. The Canny edge detection algorithm includes four calculation steps: first, Gaussian smoothing is performed; next, the gradient value and the direction of the first-order differential partial derivative are calculated; then, the non-maximum extreme value is suppressed; and finally, the edge connection is created using a dual threshold. In this way, we obtain the edges of the six guide petals, as shown in Figure 8b.
Binary images containing the edges of the guide petals are obtained after the application of the Canny algorithm. To recognize the linear edges of the targets, the Hough transform [21] is used. A prominent advantage of this approach is its robustness due to its insensitivity to data inaccuracies and noise. The Hough transform maps the points in an image from Cartesian space into polar coordinates. More specifically, N curves that intersect at the same point in polar space correspond to N points on the same straight line in Cartesian space. Figure 8c shows the line extraction results achieved using the Hough transform.
To resolve the vertices of the guide petals, we first calculate the centerline of each side of each guide petal. In polar coordinates, the line corresponding to one edge of one side of a guide petal is represented by (θ L , r L ), and the line corresponding to the other edge is represented by (θ R , r R ). Here, θ i represents the polar path, and r i represents the polar angle (i = L/R). Then, the centerline is represented by (θ, r): Therefore, in Cartesian space, the corresponding straight line equation is as follows: To calculate the intersection between two such centerlines, we assume that the two lines are represented by (θ 1 , r 1 ) and (θ 2 , r 2 ). According to Equation (13), we can establish the following linear equations: cos θ 1 sin θ 1 cos θ 2 sin θ 2 where (u, v) denotes the intersection coordinates, representing the vertex of the guide petal, in Cartesian space. Thus, (u, v) can be solved for as follows: Finally, we obtain the twelve vertices of the six guide petals, as shown in Figure 8d.

Pose Determination
The next stage is to resolve the relative pose, which includes feature correspondence, the solution of the perspective-n-point problem, and coordinate conversion.

Solution of the PnP Problem
The solution of the PnP problem is the most important and difficult step of this part. There are many algorithms available to solve the PnP problem, such as P3P, EPnP [22] and UPnP [23]. However, each algorithm has its restrictions. For example, P3P limits the input perspective points to 4, EPnP requests that the perspective points be non-coplanar, and UPnP's calculations are rather complex. To calculate the relative pose efficiently and robustly, an indirect solution using the intermediate coordinate system is proposed. The following derivation process is to determine the relative position of the docking mechanism coordinate system of the chasing spacecraft (DMCS_CS) to the camera coordinate system (CCS).
After the previous processing, the projections of the six vertices of DMCS_CS in the normalized image plane can be obtained, which are p n = (u n , v n ). Given the limitations of the docking initial conditions, the docking process can obtain at least four and up to six vertices. That is, the maximum values of n are 4, 5, and 6.
To create an intermediate coordinate system O med X med Y med Z med , the longest projection length P L P R is selected, as shown in Figure 9. Then, we use the vector → P L P R as the rotation axis X med , and the origin is at the center of P L P R . Inspired by a robust solution to the perspective-n-point problem (RPnP) [24], we divide the n vertices into three-point subsets such as {P L P R P k |n L, n L, k ∈ {1 . . . n}}. The constraint of each subset yields one polynomial of order 4 as follows: . . . f n−2 (x) = a n−2 x 4 + b n−2 x 3 + c n−2 x 2 + d n−2 x + e n−2 = 0 .
By using the least-squares residual, a cost function F = n−2 i=1 f 2 i (x) is defined as the square sum of Equation (17). The minimum of F can be obtained by solving F = As soon as x is determined, the vertices in the CCS can be calculated, and X med = → P L P R / P L P R . Then, the rotation matrix from the intermediate coordinate system to the CCS can be expressed as: where R 0 is an arbitrary orthogonal matrix and [r 7 r 8 r 9 ] T equals the rotation axis X med . rot(X, α) denotes a rotation α of degrees around X med , with s = sin(α) and c = cos(α). The projection from the 3D points in the intermediate coordinate system to the 2D normalized image plane can be expressed as follows: where t = [tx ty tz] T is the translation vector. Rearranging Equation (19), we have: where . . . . . r 6 Y med n u n + r 9 Z med n u n − r 4 Y med n − r 7 Z med n r 6 Y med n v n + r 9 Z med n v n − r 5 Y med n − r 8 Z med The unknown variable vector c s t x t y t z 1 T can be retrieved by solving the linear equation system using singular value decomposition. That is, the rotation matrix and the translation vector from the intermediate coordinate system to the CCS can be obtained.
coordinate system is proposed. The following derivation process is to determine the relative position of the docking mechanism coordinate system of the chasing spacecraft (DMCS_CS) to the camera coordinate system (CCS).
After the previous processing, the projections of the six vertices of DMCS_CS in the normalized image plane can be obtained, which are ( , ) n n n p u v = . Given the limitations of the docking initial conditions, the docking process can obtain at least four and up to six vertices. That is, the maximum values of n are 4, 5, and 6. To create an intermediate coordinate system OmedXmedYmedZmed, the longest projection length L R P P is selected, as shown in Figure 9. Then, we use the vector L R P P   as the rotation axis Xmed, and the origin is at the center of L R P P . Inspired by a robust solution to the perspective-n-point problem  After the intermediate coordinate system is determined, we can easily obtain the rotation matrix and the translation vector from OCS to O med . Then, using homogeneous transformation, the rotation matrix and the translation vector, CS T C can be obtained from OCS to OC.

Coordinate Conversion
Here, we obtain the rotation matrix and the translation vector from DMCS_CS and DMCS_TS to CCS, i.e., CS T C and TS T C , respectively. Thus, the relative pose between the two docking mechanisms is:

Ground-Based Semi-Physical Simulation Experiments
To verify the proposed method, semi-physical simulation experiments are presented in this section. All experiments were performed with the same semi-physical simulation platform. This platform is mainly composed of an active docking mechanism (Stewart platform), a passive docking mechanism, a monocular vision system, a data processing and control cabinet, a human-machine interface (HMI), a Leica T-Mac (TMC30-B), and a Leica Absolute Tracker (AT960) (as shown in Figure 10). The active docking mechanism at the bottom of the frame is used to simulate the chasing spacecraft, and the passive docking mechanism at the top of the frame is used to simulate the target spacecraft. The structure of the monocular vision system and its installation relationship with the docking mechanism were previously described. The cabinet realizes the motion control of the docking mechanism and the data processing for the vision system. The T-Mac (TMC30-B) and the Absolute Tracker (AT960) are laser measuring devices manufactured by Leica Geosystems. The combination of the T-Mac and the Absolute Tracker enables the measurement of the six DOFs between the docking mechanisms. The corresponding measurement uncertainties are shown in Table 2.

T-Mac Uncertainty
Accuracy of rotation angles 0.01 • = 18 µm/100 mm (0.002"/ft) Accuracy of time stamps ±5 ms Positional accuracy (for one single coordinate, X, Y or Z) ±15 µm + 6 µm/m (±0.0006" + 0.00007"/ft) spacecraft. The structure of the monocular vision system and its installation relationship with the docking mechanism were previously described. The cabinet realizes the motion control of the docking mechanism and the data processing for the vision system. The T-Mac (TMC30-B) and the Absolute Tracker (AT960) are laser measuring devices manufactured by Leica Geosystems. The combination of the T-Mac and the Absolute Tracker enables the measurement of the six DOFs between the docking mechanisms. The corresponding measurement uncertainties are shown in Table  2.

T-Mac Uncertainty
Accuracy of rotation angles 0.01°=18 µm/100 mm (0.002''/ft) Accuracy of time stamps The purpose of the experiments was to verify the proposed method under close-proximity conditions. The docking mechanisms of the platform are approximately half the size of an actual docking mechanism. Therefore, for docking in close proximity, we assume that the distance between the docking mechanisms is less than 0.1 meters. Before the experiments, we fixed the passive docking mechanism and measured its relative pose in the coordinate system of the laser tracker. Then, we fixed the active docking mechanism and measured its pose. During the experiments, the passive docking mechanism was always fixed, and the active docking mechanism was controlled to move through eight groups of specific poses. Each group consisted of 243 (243= 3 5 ) poses defined by selected combinations of X, Y, Z, Rx, Ry, and Rz, and each of them had three different values, as shown in Table 3. They were not precise relative poses but rather served as input data to control the active docking mechanism. After the active docking mechanism moved to the above poses, the monocular vision system captured images, calculated relative poses, and saved the data. At the same time, the laser tracker measured and saved the T-Mac poses. We assume that the result measured by the laser tracker is the true value of the relative pose between the docking mechanisms. Accordingly, the difference between the value measured by the monocular vision system and this true value is the measurement error of the monocular vision system. To understand the process, six typical cases are clearly shown in the Supplementary Materials. Figure 11, the measurement errors are E X ∈ [−2.1, 3 ). The measurement results marked with a "+" symbol represent high noise in the measurement process, but this does not mean that the data are invalid. It is important to note that the data of each group are normally distributed due to the existence of Gaussian noise. Thus, in most cases, there is millimeter-scale accuracy in the relative position and one-tenth-of-a-degree-scale accuracy in the relative attitude. The measurement errors observed in these experiments are smaller than those of the existing measurement systems, such as PXS. Moreover, the measurement frequency is approximately 13.5 Hz; that is, the measurement time is 85% less than that of PXS (2 Hz). These results show that the proposed method is feasible and efficient. true value of the relative pose between the docking mechanisms. Accordingly, the difference between the value measured by the monocular vision system and this true value is the measurement error of the monocular vision system. To understand the process, six typical cases are clearly shown in the supplementary materials.

As shown in
As shown in Figure 11, the measurement errors are (mm/°). The measurement results marked with a "+" symbol represent high noise in the measurement process, but this does not mean that the data are invalid. It is important to note that the data of each group are normally distributed due to the existence of Gaussian noise. Thus, in most cases, there is millimeter-scale accuracy in the relative position and one-tenth-of-a-degree-scale accuracy in the relative attitude. The measurement errors observed in these experiments are smaller than those of the existing measurement systems, such as PXS. Moreover, the measurement frequency is approximately 13.5 Hz; that is, the measurement time is 85% less than that of PXS (2 Hz). These results show that the proposed method is feasible and efficient.

Conclusions
This paper discusses the influence of relative pose determination for low impact docking in close proximity and analyzes the advantages and the disadvantages of various methods. A new pose determination method based on monocular vision is proposed after a theoretical consideration of the measurement uncertainty. The main contributions of this work can be summarized as follows: • This paper proposed a unified framework for determining the relative pose between two docking mechanisms, which reduces the dependence of artificial beacons or the specific shape of the target spacecraft and the introduction of (manufacturing and assembly) error. Therefore, the novel method can be widely applied for low impact docking.
• The fusion of pose information and the optimization of the PnP problem solution greatly improve the real-time performance and the robustness of pose determination.

•
The experiments verified that the method can be used to determinate the relative pose between two docking mechanisms in close proximity for low impact docking. Meanwhile the measurement accuracy and the speed of the proposed method are superior to those of the PXS. The position measurement error is within 3.7 mm, and the rotation error around the docking direction is less than 0.16 • , corresponding to a measurement time reduction of 85%.
In the future, the improvement and the optimization of the hardware and software will be investigated by, for example, increasing the measurement speed using a graphics processing unit (GPU) and parallel computing. In particular, the cellular neural network [25] can be used for parallel processing of the six ROIs, which will greatly improve the efficiency of image processing.

Conflicts of Interest:
The authors declare no conflict of interest.