Remote Marker-Based Tracking for UAV Landing Using Visible-Light Camera Sensor

Unmanned aerial vehicles (UAVs), which are commonly known as drones, have proved to be useful not only on the battlefields where manned flight is considered too risky or difficult, but also in everyday life purposes such as surveillance, monitoring, rescue, unmanned cargo, aerial video, and photography. More advanced drones make use of global positioning system (GPS) receivers during the navigation and control loop which allows for smart GPS features of drone navigation. However, there are problems if the drones operate in heterogeneous areas with no GPS signal, so it is important to perform research into the development of UAVs with autonomous navigation and landing guidance using computer vision. In this research, we determined how to safely land a drone in the absence of GPS signals using our remote maker-based tracking algorithm based on the visible light camera sensor. The proposed method uses a unique marker designed as a tracking target during landing procedures. Experimental results show that our method significantly outperforms state-of-the-art object trackers in terms of both accuracy and processing time, and we perform test on an embedded system in various environments.


Introduction
The global market for unmanned aerial vehicles (UAVs) has grown dramatically in recent years along with rapid development of new applications [1]. UAVs haves been widely adopted in robotics research because they can extend human's capabilities in a variety of areas especially for military application such as search-and-rescue and surveillance, as well as applications such as transportation, artistic photography and video. Typical UAVs are controlled by humans, and experience is required as the UAV still haves low control accuracy. The mission path of autonomous UAVs requires them to fly at low speed while following a path or to track an object of interest, and to perform a series of actions. Nevertheless, in order to address a broader range of applications, one has to migrate to integrated processing add-ons that would carry out on demand, on-board, collaborative or autonomous functions, with the aim of realizing an intelligent UAV functionality. With the increase in the computational potential of UAVs, previous studies do not only address computationally demanding tasks, but have adopted the UAVs into autonomous system that can take control of its own flight and perform optimized missions.
In addition, they can be used to identifying and rescue potential victims [20]. Recently, Amazon, which is a giant E-commerce company successfully developed a delivery system designed to deliver packages to customers within 30 min using a UAV service called Amazon Prime Air [21]. It uses an Amazon-branded landing mat as a homing-beacon for the drone to land and deposit its payload. In general, however global positioning system (GPS) signal is usually owing to obstruction from tall building in urban environment.
Inspired by the amazing work from Amazon and considering the scenarios that result in GPS signal being lost, we propose a remote marker-based tracking algorithm for UAV landing procedure for various time and condition (during morning, afternoon, evening, and night time) without the need for any support from GPS signal. Our algorithm has two basic requirements, which are implemented on an onboard system: (1) real-time: the tracking algorithm can be processed on the onboard system at the real-time speed; and (2) accuracy: in order to successfully land on the landing pad, the tracking algorithm should track the marker precisely even in the presence of the aforementioned challenging factors.
The outline of the paper is organized as follows: In Section 2, we present recent works related to vision-based autonomous landing for drones. In Section 3, we introduce the proposed visual marker tracking algorithm. Then in Section 4, we discuss the performance evaluations in various conditions during a given day and we make comparisons with popular state-of-the-art visual trackers. Finally, we conclude the paper in Section 5.

Related Works
Previous studies of the vision-based landing for UAVs are classified into two categories, namely passive and active methods. Previous researches on passive methods utilized the camera sensors distributed on the ground and the complicated set up for both UAV or ground environments is necessary. Martínez et al. designed a trinocular system, which is composed of three FireWire cameras fixed on the ground, to estimate the vehicle's position and orientation by tracking color land markers on the UAV [22]. Kong et al. deployed a custom-built infrared stereo camera with large field of view and claimed that their system could resist all weather conditions [23]. In his research, he successfully implemented several algorithms to track the UAV during landing operation. However, the accuracy is still low in case of fixed-wing touchdown points and high temperature objects in the background. Recent work from Yang et al. [24] shows the promising results on UAV auto landing in GPS-denied environment using a ground-based near infrared (NIR) camera system. Using an infrared laser lamp mounted at the nose of the UAV, their system achieved high landing accuracy with the distance over 1000 m. There is no denial that these passive methods achieved impressive results, but in some circumstances such as urban or low altitude operations, the issue for setting up a ground station with complicated equipment should be considered to be solved.
With respect to solving the landing problem in a much more convenient and less complicated way, active methods are considered, which use on-board camera mounted on the UAV to detect interested region or fiducial marker on the ground for accurate landing. They can be classified into marker-less and marker-based approaches. With respect to the former category, Anitha et al. [25] proposed a simple algorithm to estimate the relative position of the UAV and a runway in order to perform automatic landing based on the images captured by camera. While this algorithm performs well in the daytime, guiding lamps should be attached on both sides of the runway, making it difficult for use in various places. Using a single onboard camera, Li presents a two-stage processing procedure to find and evaluate all possible landing areas in order to select the best one for landing. To do so, they use machine learning algorithm based on naive Bayesian classifier [26]. However, this study did not perform the experiments using multiple images captured in various places and times, and the performance will be affected by the kinds of image.
To consider the limitation of the marker-less approach, several marker-based methods have been researched. Taking advantages of features pertaining to markers, recent studies on precise UAV Sensors 2017, 17,1987 3 of 38 landing have achieved improved landing performance. The marker center and direction are predicted for each input image in order to guide the drone to land at the marker's center position with the correct orientation.
Sharp et al. used a visible light camera to capture an input image, and a square marker with white border and black background including a few smaller white squares inside, to simplify its segmentation from the background. They used corner detection and correspondence matching [27]. The circle patterned-marker has been used, and it could be identified from various heights by applying a fixed threshold to the input image followed by the contour detection to detect the concentric rings inside [28]. An improved version of speeded up robust features (SURF) was proposed by Zhao et al. [29] to resolve the inefficiency of the SURF algorithm in the autonomous landing system. Recently, many studies [30,31] used AprilTag [32] as a landing target owning to its high-contrast, and two-dimensional (2D) tags are designed to be robust to low image resolution, occlusions, rotations and lighting variation. Kyristsis et al. [33] used AprilTags C++ Library [34] along with the OpenCV4Tegra framework [35], which allows the performance of all OpenCV functions in parallel as graphics processing unit (GPU) functions and finally achieved the detection rate of 26-31 fps with the help of the global navigation satellite system (GNSS). The hardware that they used was quite powerful, and they employed a DJI Matrice UAV [36] along with an NVIDIA Tegra K1 SOC embedded processor [37]. These studies could successfully track the marker when the marker was clearly visible in the input image captured at daytime, but they would fail if the images become too dark at nighttime. To consider this issue, other studies show that infrared radiation images captured from a thermal imager may increase the rate of identification for the nighttime scenario [38,39]. The target emits far infrared (FIR) light actively in order to overcome the problem involving the incorrect detection of targets under low-light conditions. Using a letter-based marker, they can easily detect feature points so that the drone can perform translation or rotation movements in order to perform safe landing at the desired location. However, in these studies, expensive thermal cameras are required [38,39], and this is not possible in conventional drone system including only visible-light camera.
In our research, we focus on tracking the marker using one visible-light camera in a real-time manner on a common onboard system having a low processing power during both daytime and nighttime. Therefore, it is difficult to use high-computation algorithm [40][41][42][43] or features detection and description [44][45][46][47]. To do that, we propose two different tracking strategies to precisely land a UAV using a robust tracking algorithm regardless of whether the marker is visible during the day-time or not visible during the nighttime. Our research was novel compared to previous work in the following three ways: (1) Our algorithm can track and detect a marker not only when the marker is visible in the morning, afternoon, and evening, but also when the marker is hardly visible at nighttime. (2) Using images captured from a single visible-light camera with an onboard system that has low processing power, our algorithm outperforms previous state-of-the-art object trackers in terms of both accuracy and processing speed. (3) Our marker design is simple and unique compared to those used by other marker-based tracking algorithm. Our database was self-constructed using a visible-light camera mounted on the DJI Phantom 4 drone [36] at various time during the day, and this database was made public so that other researchers can compare and evaluate its performance.
A comparison of previous tracking algorithms employed for autonomous drone landing with our proposed method is summarized in Table 1. A NIR laser lamp is fixed on the nose of the UAV for easy detection.
A wide baseline camera array-based method was proposed to achieve high precision for calibration and localization results.
It is not practical to be used in the narrow landing area. Complicated set-up of two camera array systems on the ground is required.

Without marker Visible light camera
Daytime and nighttime with guiding lamp [25] Image binarization and Hough transform used to extract the runway border line for the landing in flight-gear flight simulator.
A simple algorithm is used for border-line detection of runway.
Guiding lamps are required at nighttime, which make it difficult to be used in various places.
Day time [26] Two-stage processing procedure to find all possible landing areas and select the best one using the naive Bayesian classifier.
Without the marker, drone can find the landing site in emergency case.
Experiments were not performed in various places and time. In addition, system is evaluated on a Mac Pro laptop computer.  [38,39] Using letter-based marker emitting FIR light, feature points can be extracted from the marker so that the drone can perform translation or rotation movements in order to perform safety landing at the desired location.
Using the thermal image, marker detection can be less affected by illumination, time, and environmental change.
A costly thermal camera should be used in drone, and this cannot be used in conventional drone systems, including the use of only visible-light camera.
Marker detection is possible using conventional visible light camera in drone.
Marker is detected only during daytime. High computing algorithm cannot be processed in real-time with on-board system [29] Daytime, and nighttime (proposed method) Using real-time marker-based tracking algorithm tested on an onboard system having low processing power.
Marker detection method can be operated both in daytime and nighttime.
A specific marker is required for the proposed method.

Overview of the Proposed Method
As mentioned in Section 1, our work focuses on supporting the safe landing of drone in sophisticated environments where the GPS signal is not available to use. Assume the scenario where the drone reaches the target destination with guidance from the GPS system, after which it begins to land to deliver its cargos or packages (delivery mission). Our objective is to solve the problem for the case when the GPS signal is not available, requiring the drone to depend on visual systems for landing operation. Our research does not deal with actual control or guidance of the drone while landing but we focus on implementing a high-speed vision-based tracking approach, enabling accurate landing at a desired location. In addition, using hardware drone stabilizer, the roll and pitch of the drone in Figure 1 can be maintained during the landing operation of drone. Therefore, we detect three parameters of translations for the X-and Y-axes (Xd and Yd of Figure 1), and the yaw rotation of Figure  1, from which the drone can land at the correct position of the marker with the correct direction. The yaw (direction) estimation is required because the change of yaw rotation of the drone causes changes in the X-and Y-axes, which causes the incorrect translations of the drone. As shown in Figure 1a In our research, we are not only trying to land the drone accurately during the daytime but also at nighttime. We used the information from the system clock of the embedded system and the image brightness level captured by drone camera to determine whether the operation is at day or night so that our algorithm would arbitrary select different tracking algorithms in the daytime (Section 3.2) and nighttime (Section 3.3).

Proposed Marker Design
During the daytime, the marker is visible and if we can find 2D coordinates of the center of the marker, we can send a command to the drone to move closer to the center of the desired target. Our goal is to overcome the disadvantages of the commonly used patterns and create a target pattern that can be uniquely identified irrespective of the vertical distance Z from the drone to ground. In addition, the target pattern should be identified even when parts of the target are not visible. Moreover, the design pattern should be simple enough to be easily identified by vision algorithm at a high frame rate. Considering these requirements, our designed marker consists of three inner In our research, we are not only trying to land the drone accurately during the daytime but also at nighttime. We used the information from the system clock of the embedded system and the image brightness level captured by drone camera to determine whether the operation is at day or night so that our algorithm would arbitrary select different tracking algorithms in the daytime (Section 3.2) and nighttime (Section 3.3).

Proposed Marker Design
During the daytime, the marker is visible and if we can find 2D coordinates of the center of the marker, we can send a command to the drone to move closer to the center of the desired target. Our goal is to overcome the disadvantages of the commonly used patterns and create a target pattern that can be uniquely identified irrespective of the vertical distance Z from the drone to ground. In addition, the target pattern should be identified even when parts of the target are not visible.
Moreover, the design pattern should be simple enough to be easily identified by vision algorithm at a high frame rate. Considering these requirements, our designed marker consists of three inner circles, each of which is evenly divided by 8 areas as shown in Figure 2. The width × height of our marker is 1 × 1 m. In order to make our marker unique and simple, we utilize an even distribution of black and white color in the area between the inner circles.
For example, in the area between the center of the marker and the smallest circle, there are seven white areas and one black area among the eight fan-shaped areas. The reason for putting one black area is that the direction of the drone should be maintained so as to move the drone based its axis, illustrated in Figure 1. The marker is printed on conventional fabric (polyester fabric) by MUTOH printer using water color ink [48]. The pixel contrast is 255 (0 for black area and 255 white region) in our marker image. Other detail specification including the reflectance factors of the paints is not open to the public because the commercial printer with ink was used for printing.
Sensors 2017, 17,1987 7 of 38 circles, each of which is evenly divided by 8 areas as shown in Figure 2. The width × height of our marker is 1 × 1 m. In order to make our marker unique and simple, we utilize an even distribution of black and white color in the area between the inner circles. For example, in the area between the center of the marker and the smallest circle, there are seven white areas and one black area among the eight fan-shaped areas. The reason for putting one black area is that the direction of the drone should be maintained so as to move the drone based its axis, illustrated in Figure 1. The marker is printed on conventional fabric (polyester fabric) by MUTOH printer using water color ink [48]. The pixel contrast is 255 (0 for black area and 255 white region) in our marker image. Other detail specification including the reflectance factors of the paints is not open to the public because the commercial printer with ink was used for printing.  Figure 3 shows the overall flowchart of the proposed marker-based tracking algorithm during the daytime. Using the 1st image (captured by the camera) whose width and height are Wo and Ho, respectively, we reduce the size of the image by the factor of four (W = Wo/4, H = Ho/4) for faster processing in the next step of the template-matching process. There are several types of template matching, and we choose the correlation coefficient-based method because of its high matching accuracy [49]. Given T as the template image and I as a part of the input image, we obtained the normalized ones from both of them as T' and I', and we used them for the correlation coefficientbased matching score (M) as follows. .
In our research, our method is applied at the moment that GPS signal is lost. At this time, because the height of drone can be obtained based on the GPS signal, and the camera focal length with the marker dimension are known in advance, we can calculate the size of template. After that, because there is no additional information of height of drone including GPS signal, our method uses the same sized template. Using the template matching strategy, we obtained the center position of our marker, and we created a new template with the same size (w × h) as the original template. From the 2nd input image, we do not perform template matching for the whole image but in a small region of interest (ROI), The size of the ROI is (w + m) × (h + m) where m is a margin that we empirically considered  Figure 3 shows the overall flowchart of the proposed marker-based tracking algorithm during the daytime. Using the 1st image (captured by the camera) whose width and height are W o and H o , respectively, we reduce the size of the image by the factor of four (W = W o /4, H = H o /4) for faster processing in the next step of the template-matching process. There are several types of template matching, and we choose the correlation coefficient-based method because of its high matching accuracy [49]. Given T as the template image and I as a part of the input image, we obtained the normalized ones from both of them as T' and I', and we used them for the correlation coefficient-based matching score (M) as follows.
In our research, our method is applied at the moment that GPS signal is lost. At this time, because the height of drone can be obtained based on the GPS signal, and the camera focal length with the marker dimension are known in advance, we can calculate the size of template. After that, because there is no additional information of height of drone including GPS signal, our method uses the same sized template. Using the template matching strategy, we obtained the center position of our marker, and we created a new template with the same size (w × h) as the original template. From the 2nd input image, we do not perform template matching for the whole image but in a small region of interest (ROI), The size of the ROI is (w + m) × (h + m) where m is a margin that we empirically considered large enough to ensure that the marker appears inside the selected ROI. We called this algorithm the adaptive template matching (ATM) algorithm because in every frame, the template image is updated, after which the change of the input image due to the drone landing and movement can be covered. However, there is drifting phenomenon of the detected center owning by the ATM algorithm based on the ground-truth center in each frame. Many previous studies [50][51][52][53][54][55] used template matching which has shown a large drifting phenomenon that significantly affects the tracking performance over long term. The reason for this is that during landing as the marker becomes bigger, its appearance changes rapidly, so the new (updated) template cannot keep up and therefore, in the long run, the tracking results deteriorate.
Sensors 2017, 17,1987 8 of 38 large enough to ensure that the marker appears inside the selected ROI. We called this algorithm the adaptive template matching (ATM) algorithm because in every frame, the template image is updated, after which the change of the input image due to the drone landing and movement can be covered. However, there is drifting phenomenon of the detected center owning by the ATM algorithm based on the ground-truth center in each frame. Many previous studies [50][51][52][53][54][55] used template matching which has shown a large drifting phenomenon that significantly affects the tracking performance over long term. The reason for this is that during landing as the marker becomes bigger, its appearance changes rapidly, so the new (updated) template cannot keep up and therefore, in the long run, the tracking results deteriorate. In order to overcome this problem, we perform the profile-checker algorithm as shown in Figure  3. In our design of the marker of Figure 2, a black area is always between two white areas (in the circular direction) as shown in Figure 4. Therefore, based on the center of the marker, we can draw a circle that will contains 14 circular segments as shown in Figures 4 and 5. We refer this circle as the profile and all of its segments as sub-profiles. Therefore, using the center detected by the ATM algorithm, as the angle θ increases from 0° to 360° as shown in Figures 4 and 5, we obtain the values of all pixels along the circle (profile) in the counter-clockwise direction. With these values, we apply a threshold to create a profile with seven black and seven white sub-profiles ( Figure 5). The threshold is defined as the mean value of the maximum and minimum value of all the obtained values. The value that is larger than the threshold is determined as 1; otherwise it is set to 0. From Figure 5, we observe that all black sub-profiles have a similar width while there is a white sub-profile that is wider than other white profiles.
Therefore, our proposed profile checker algorithm can detect the positions of P1, …, P14. In addition, we can detect M1, …, M4. Here, M1 is the midpoint between P4 and P5 as shown in Figure 6, and as in this method, we can also obtain M2, …, M4. Because the width of sub-profile P2-P3 is wider than those of others, we can differentiate this sub-profile from others. From the two points of and , we obtained the line . In addition, from those of and , we obtained the line as shown in Figure 6. The intersect of these two lines is determined to be the marker center using the profile checker algorithm. In order to overcome this problem, we perform the profile-checker algorithm as shown in Figure 3. In our design of the marker of Figure 2, a black area is always between two white areas (in the circular direction) as shown in Figure 4. Therefore, based on the center of the marker, we can draw a circle that will contains 14 circular segments as shown in Figures 4 and 5. We refer this circle as the profile and all of its segments as sub-profiles. Therefore, using the center detected by the ATM algorithm, as the angle θ increases from 0 • to 360 • as shown in Figures 4 and 5, we obtain the values of all pixels along the circle (profile) in the counter-clockwise direction. With these values, we apply a threshold to create a profile with seven black and seven white sub-profiles ( Figure 5). The threshold is defined as the mean value of the maximum and minimum value of all the obtained values. The value that is larger than the threshold is determined as 1; otherwise it is set to 0. From Figure 5, we observe that all black sub-profiles have a similar width while there is a white sub-profile that is wider than other white profiles. Therefore, our proposed profile checker algorithm can detect the positions of P 1 , . . . , P 14 . In addition, we can detect M 1 , . . . , M 4 . Here, M 1 is the midpoint between P 4 and P 5 as shown in Figure 6, and as in this method, we can also obtain M 2 , . . . , M 4 . Because the width of sub-profile P2-P3 is wider than those of others, we can differentiate this sub-profile from others. From the two points of M 1 and M 2 , we obtained the line M 1 M 2 . In addition, from those of M 3 and M 4 , we obtained the line M 3 M 4 as shown in Figure 6. The intersect of these two lines is determined to be the marker center using the profile checker algorithm.        Our proposed profile checker algorithm can be described in detail as follows.
Step 1: Extract all pixel value in the circle profile with radius r in a vector: Step 2: Find threshold Th: Step 5: If C is equal to 14, the sub-profile that has max( ) k δ is selected. Then, from the four adjacent sub-profiles of this selected sub-profile, two lines are detected (shown in Figure 6), and the intersect point of these two lines is determined as the marker center by the profile checker algorithm. Then, the direction is estimated based on the average position (between the starting and ending index of the selected sub-profile) and this detected center.
Step 6: If C is not equal to 14, we increase the radius of the circle profile of   Our proposed profile checker algorithm can be described in detail as follows.
Step 1: Extract all pixel value in the circle profile with radius r in a vector: Step 2: Find threshold Th: Step 3: Obtain the binarized vector Step 4: Count the number of values (C) that are changed compared with previous value (from 0 to 1 or 1 to 0) and create C sub-profiles SP = {α k , β k , γ k , δ k } k= 1, . . . , C , where α k , β k , γ k , δ k are the index of the starting point, index of the ending point, code (0 for black, 1 for white), and the width of the sub-profile (distance between the starting and ending point of the sub-profile), respectively Step 5: If C is equal to 14, the sub-profile that has max(δ k ) is selected. Then, from the four adjacent sub-profiles of this selected sub-profile, two lines are detected (shown in Figure 6), and the intersect point of these two lines is determined as the marker center by the profile checker algorithm. Then, the direction is estimated based on the average position (between the starting and ending index of the selected sub-profile) and this detected center.
Step 6: If C is not equal to 14, we increase the radius of the circle profile of R = r o ± i.∆r (where ∆r = w 8 , r o = w 4 , and w is the width of template) as shown in Figure 7, and steps 1~5 are repeated until R < w 2 (or C is equal to 14). If R ≥ w 2 , we use the detection result obtained by ATM as the marker center.

Updating the Detected Center of Marker by Kalman Filtering
Using our proposed profile checker algorithm, we find the accurate center position and direction of the marker. However, the detected center position can still can vibrate based on the ground-truth center by the rapid change in the input image, and to solve this problem, we use Kalman filtering. Kalman filtering is a framework for predicting a process's state and the use of new measurements to correct or update these predictions. For each time step k, a Kalman filter first makes a prediction ˆk x − of the state at this time step: where is a vector that represent the processing time at state k − 1 and A is the process transition matrix.
is a control vector at time step k and B converts the control vector into state space [56].
In our model of moving marker on 2D camera images, state is a 4 dimensional-vector [ ] , , , T x y dx dy where x and y represent the X-and Y-coordinates of the marker's center, respectively. dx and dy represent its velocity on X-and Y-axes, respectively. For simplicity, we choose to use the following transition matrix: Our UAV is able to make yaw rotation based on marker direction's prediction so that is just a scalar representing how much the object is expected to move along the X-and Y-axes in response to control. Converting into state space is simple by using the B vector of Equation (4).

Updating the Detected Center of Marker by Kalman Filtering
Using our proposed profile checker algorithm, we find the accurate center position and direction of the marker. However, the detected center position can still can vibrate based on the ground-truth center by the rapid change in the input image, and to solve this problem, we use Kalman filtering. Kalman filtering is a framework for predicting a process's state and the use of new measurements to correct or update these predictions. For each time step k, a Kalman filter first makes a predictionx − k of the state at this time step: where x k−1 is a vector that represent the processing time at state k − 1 and A is the process transition matrix. u k is a control vector at time step k and B converts the control vector u k into state space [56]. In our model of moving marker on 2D camera images, state is a 4 dimensional-vector [x, y, dx, dy] T where x and y represent the X-and Y-coordinates of the marker's center, respectively. dx and dy represent its velocity on X-and Y-axes, respectively. For simplicity, we choose to use the following transition matrix: Our UAV is able to make yaw rotation based on marker direction's prediction so that u k is just a scalar representing how much the object is expected to move along the X-and Y-axes in response to control. Converting u k into state space is simple by using the B vector of Equation (4). The Kalman filter concludes the time update steps by projecting the estimated error covariance P − k forward by one-time step: where P k−1 is a matrix that represent the error covariance having full diagonals in the state prediction at time k, and Q is the process noise covariance. In our research, we choose a fixed process noise covariance as below: After the state x − k is predicted at time k, the Kalman filter uses a new measurement to correct the prediction during the subsequent update steps. At the 1st input image, we initialize the Kalman filter and use the center detected by the profile checker algorithm as the measurements for the next step. Initially, we compute the Kalman gain and it is later used to correct the state estimate x − k : where H is the matrix that convert the state space into measurement space and R k is the measurement noise covariance. In our case, we use a fixed R k for all future time updates: where e is Euler's number. Using the Kalman gain K k and measurement z k from time step k, we can update the new estimate:x In our approach, measurements z k are the output from our proposed marker tracking algorithm so that z k contains two dimensions and has the form [x 0 , y 0 ] T . As a result, H has the form: The final step of the Kalman filter at each iteration is to update the error covariance P − k into P k : However, in the case when the drone gets very closer to the marker like, as in Figure 7, even though we tried to increase the radius of the circle many times, the number of obtained sub-profiles is larger than 14 at the largest blue circle. In this case, we choose the center detected by ATM as the new measurement (dark blue point) for the Kalman filter. Using the final center detected by Kalman filtering (green point), we again perform the proposed profile checker algorithm and selected the sub-profile that has the largest width to find its bisecting angle as the final direction of the marker.

Marker-Based Tracking Algorithm (Night Time)
In our research, we aim to develop an algorithm that works well in any kind of lighting conditions. Our strategy in the daytime is to take advantages of the unique design marker and to track its position as the drone's altitude decreases. However, because the image is too dark at nighttime as shown in Figure 9a, the use of the same approach leads to errors in the marker detection because the ATM algorithm performs poorly when there is a very small difference between the marker and non-marker areas. As explained in Section 3.1, we use the information obtained from the system clock of the embedded system and the image brightness level captured by drone camera to determine whether the operation is during the daytime or nighttime. Therefore, our algorithm would arbitrary select different tracking algorithms base on the time of the day. In detail, our algorithm does not depend on only the time (the system clock of drone), but refers to both the time and image brightness level captured by the drone camera. The reason why our method does not depend on only the image bright level is that it can be affected by the brightness of background such as bright or dark ground. Therefore, our algorithm first checks the time. If the time is at night, our method checks the image brightness level again for the higher credibility of determination of daytime and nighttime. If the image brightness level is lower than threshold, then our method performs the tracking algorithm for nighttime of Figure 8. If either the time or image brightness level is not satisfied with our condition, our method determines that it is daytime and performs the tracking algorithm for day time of Figure 3. Figure 8 shows the overall marker detection procedure at nighttime. At nighttime, image segmentation is performed to roughly estimate the position of the marker from the input image. Of the various image segmentation techniques [57][58][59][60][61], we propose a simple segmentation algorithm to distinguish our marker and the background from the input image. First, we apply adaptive thresholding to the input image. The adaptive threshold is determined based on the brightness histogram of the ROI of the image, as well as the image-binarization algorithm [62]. There is also a need for noise to be removed in the image after thresholding. To do this, we used a morphology technique called the Hit and Miss algorithm [63]. The purpose of the Hit-and-Miss algorithm is to detect certain patterns in an image. A structure element containing 0, 1 or blank is used as a template that slides over the image and the pixel corresponding to the center of the template is set to 1 if the template matches the images or 0 otherwise. After that, a thin pattern of our marker is successfully detected as shown in Figure 9c; next we need to roughly estimate the area that belongs to the marker. To do that, we process the result using the dilation algorithm [63], which helps to increase the boundary to the background, as shown in Figure 9d.
Sensors 2017, 17,1987 13 of 38 because the ATM algorithm performs poorly when there is a very small difference between the marker and non-marker areas. As explained in Section 3.1, we use the information obtained from the system clock of the embedded system and the image brightness level captured by drone camera to determine whether the operation is during the daytime or nighttime. Therefore, our algorithm would arbitrary select different tracking algorithms base on the time of the day. In detail, our algorithm does not depend on only the time (the system clock of drone), but refers to both the time and image brightness level captured by the drone camera. The reason why our method does not depend on only the image bright level is that it can be affected by the brightness of background such as bright or dark ground. Therefore, our algorithm first checks the time. If the time is at night, our method checks the image brightness level again for the higher credibility of determination of daytime and nighttime. If the image brightness level is lower than threshold, then our method performs the tracking algorithm for nighttime of Figure 8. If either the time or image brightness level is not satisfied with our condition, our method determines that it is daytime and performs the tracking algorithm for day time of Figure 3. Figure 8 shows the overall marker detection procedure at nighttime. At nighttime, image segmentation is performed to roughly estimate the position of the marker from the input image. Of the various image segmentation techniques [57][58][59][60][61], we propose a simple segmentation algorithm to distinguish our marker and the background from the input image. First, we apply adaptive thresholding to the input image. The adaptive threshold is determined based on the brightness histogram of the ROI of the image, as well as the image-binarization algorithm [62]. There is also a need for noise to be removed in the image after thresholding. To do this, we used a morphology technique called the Hit and Miss algorithm [63]. The purpose of the Hit-and-Miss algorithm is to detect certain patterns in an image. A structure element containing 0, 1 or blank is used as a template that slides over the image and the pixel corresponding to the center of the template is set to 1 if the template matches the images or 0 otherwise. After that, a thin pattern of our marker is successfully detected as shown in Figure 9c; next we need to roughly estimate the area that belongs to the marker. To do that, we process the result using the dilation algorithm [63], which helps to increase the boundary to the background, as shown in Figure 9d.  The detected marker is displayed as white pixels while black pixels indicate the background area as shown in Figure 9d. We roughly estimate the center of the marker by calculating the geometric center of the white pixels. Based on that, we can obtain the width and height of the. Then, we apply our proposed profile checker algorithm as we try to find the direction available in the marker at nighttime. Using the result image (Figure 10a), we created a profile based on the predicted center. We empirically choose the radius of the circle profile as 0.4 × (width of marker). From the generated profile (Figure 10a), we see that there are only two sub-profiles: one black and one white. From that, we detect the "A" and "B" position in Figure 10b, and the direction is estimated based on the midpoint ("K") of the arc connecting "A" and "B" with the detected center "O" of Figure 10a.  The detected marker is displayed as white pixels while black pixels indicate the background area as shown in Figure 9d. We roughly estimate the center of the marker by calculating the geometric center of the white pixels. Based on that, we can obtain the width and height of the. Then, we apply our proposed profile checker algorithm as we try to find the direction available in the marker at nighttime. Using the result image (Figure 10a), we created a profile based on the predicted center. We empirically choose the radius of the circle profile as 0.4 × (width of marker). From the generated profile (Figure 10a), we see that there are only two sub-profiles: one black and one white. From that, we detect the "A" and "B" position in Figure 10b, and the direction is estimated based on the midpoint ("K") of the arc connecting "A" and "B" with the detected center "O" of Figure 10a. The detected marker is displayed as white pixels while black pixels indicate the background area as shown in Figure 9d. We roughly estimate the center of the marker by calculating the geometric center of the white pixels. Based on that, we can obtain the width and height of the. Then, we apply our proposed profile checker algorithm as we try to find the direction available in the marker at nighttime. Using the result image (Figure 10a), we created a profile based on the predicted center. We empirically choose the radius of the circle profile as 0.4 × (width of marker). From the generated profile (Figure 10a), we see that there are only two sub-profiles: one black and one white. From that, we detect the "A" and "B" position in Figure 10b, and the direction is estimated based on the midpoint ("K") of the arc connecting "A" and "B" with the detected center "O" of Figure 10a.

Experimental Platform and Environments
In our experiments, we used a DJI Phantom 4 quadcopter [36] to capture the video while the drone was landing. It includes a color camera with a 1/2.3-inch-thick complementary metal-oxide-semiconductor (CMOS) sensor, with a 94 • field-of-view (FOV) and an f/2.8 lens. The captured videos are in mpeg-4 (MP4) format with 30 fps, and have a size of 1280 × 720 pixels. For fast processing, in our experiment, the captured color image is converted to a gray one. By averaging the R, G, and B pixel values, we obtained the gray information. The drone's gimbal is adjusted 90 • downward so that during landing, the camera can be facing the ground. We implemented our algorithm on an onboard system that has a 32-bit 800-MHz ARM Cortex-A9 central processing unit (CPU) [64], 512 MB RAM, 1.5 GB flash memory, and a Linux kernel (version 3.12.10). In previous studies [33,65,66], they used a sophisticated tracking algorithm with a high-end embedded system such as an NVIDIA Jetson TK1 developer kit, including an ARM Cortex-A15 CPU (higher than 1 GHz [64]) and GPU [37], or an Intel NUC board with a 3.4 GHz CPU [67]. In particular, in [33,66], parallel processing is possible using a GPU, but our system does not include a GPU, which makes it difficult to utilize parallel processing. We developed our algorithm using an OpenCV library (version 3.1 [68]) and C++ program using Microsoft Visual Studio 2015 [69]. Then, we ported our program on the onboard system on which our system actually operates. Originally, our onboard computer did not support OpenCV library, and we created a custom OpenCV library using the ARM cross compiler tool and CMake software [70]. Then, we transferred it into our onboard system. Our onboard system is shown in Figure 11.

Experimental Platform and Environments
In our experiments, we used a DJI Phantom 4 quadcopter [36] to capture the video while the drone was landing. It includes a color camera with a 1/2.3-inch-thick complementary metal-oxidesemiconductor (CMOS) sensor, with a 94° field-of-view (FOV) and an f/2.8 lens. The captured videos are in mpeg-4 (MP4) format with 30 fps, and have a size of 1280 × 720 pixels. For fast processing, in our experiment, the captured color image is converted to a gray one. By averaging the R, G, and B pixel values, we obtained the gray information. The drone's gimbal is adjusted 90° downward so that during landing, the camera can be facing the ground. We implemented our algorithm on an onboard system that has a 32-bit 800-MHz ARM Cortex-A9 central processing unit (CPU) [64], 512 MB RAM, 1.5 GB flash memory, and a Linux kernel (version 3.12.10). In previous studies [33,65,66], they used a sophisticated tracking algorithm with a high-end embedded system such as an NVIDIA Jetson TK1 developer kit, including an ARM Cortex-A15 CPU (higher than 1 GHz [64]) and GPU [37], or an Intel NUC board with a 3.4 GHz CPU [67]. In particular, in [33,66], parallel processing is possible using a GPU, but our system does not include a GPU, which makes it difficult to utilize parallel processing. We developed our algorithm using an OpenCV library (version 3.1 [68]) and C++ program using Microsoft Visual Studio 2015 [69]. Then, we ported our program on the onboard system on which our system actually operates. Originally, our onboard computer did not support OpenCV library, and we created a custom OpenCV library using the ARM cross compiler tool and CMake software [70]. Then, we transferred it into our onboard system. Our onboard system is shown in Figure 11. There are open databases that are captured by drone cameras, such as the Stanford Drone Dataset [71], Mini-drone Video Dataset [72], and SenseFly Dataset [73]. However, there are no open databases with images acquired while drones perform landing operations. Therefore, we acquired videos to build a new database (Dongguk Drone Camera Database (DDroneC-DB1) [74]) for our method. Our database (shown in Table 2) is divided in two sub databases: drone landing on the marker and drone hovering over the same position while the marker is moving on the ground. For each sub database, we captured four videos at 10 AM, 2 PM, 6 PM, and 10 PM. We acquired videos in varying types of environments (humidity level, wind velocity, temperature, and weather). The marker was visible in the sequences for the morning, afternoon, and evening, but it was barely seen in the night video. We made our DDroneC-DB1 public to other researchers through [74] to enable them to evaluate the performance of their marker-tracking methods using our database. There are open databases that are captured by drone cameras, such as the Stanford Drone Dataset [71], Mini-drone Video Dataset [72], and SenseFly Dataset [73]. However, there are no open databases with images acquired while drones perform landing operations. Therefore, we acquired videos to build a new database (Dongguk Drone Camera Database (DDroneC-DB1) [74]) for our method. Our database (shown in Table 2) is divided in two sub databases: drone landing on the marker and drone hovering over the same position while the marker is moving on the ground. For each sub database, we captured four videos at 10 AM, 2 PM, 6 PM, and 10 PM. We acquired videos in varying types of environments (humidity level, wind velocity, temperature, and weather). The marker was visible in the sequences for the morning, afternoon, and evening, but it was barely seen in the night video. We made our DDroneC-DB1 public to other researchers through [74] to enable them to evaluate the performance of their marker-tracking methods using our database.

Marker Detection Accuracy and Processing Time
Using DDroneC-DB1, we compared the accuracies and processing time of our method with those obtained by the state-of-the-art methods of object tracking such as Multiple Instance Learning (MIL) [40], Tracking-Learning-Detection (TLD) [41], Median Flow [42], and Kernelized Correlation Filter (KCF) [43].
In this work, we calculated the center location error (CLE) and predicted direction error (PDE) of the tracked marker to compare the accuracies as follows: where O E K and O GT K are the estimated and ground truth positions of the marker's center, respectively. D E K and D GT K are the predicted and ground truth direction of the marker, respectively. In Tables 3  and 4, we compared the CLE and the PDE obtained using our method with those obtained using other methods. The previous methods, i.e., MIL, TLD, Median Flow, and KCF do not produce the marker direction, but only the center of the marker. Therefore, we compared the PDE obtained by our method with those obtained by ATM algorithm, and with our method without Kalman filtering as shown in Table 4.
In our research, our goal is to find the marker in the current frame, which we have tracked successfully in previous frames. Conventional state-of-the-art trackers [40][41][42][43] use a bounding box manually provided at the first frame or by other detection algorithms, and they take it as the positive example for the object. Many image patches outside the bounding box are considered as the background. In order to test these trackers with our self-constructed dataset DDroneC-DB1, we perform template matching at the first image to select the initial bounding box which contains the marker and use it as the input of these state-of-the-art trackers. MIL tracker considered a small number of neighborhood locations around the predicted bounding box from previous step as positive examples and group them in a positive bag [40]. The collection of images in the positive bag are not all positive examples. Instead, only one image in the positive bag needs to be a positive example. Even if the current location of the tracked object is not accurate, when samples from the neighborhood of the current location are put in the positive bag, there is a good chance that this bag contains at least one image in which the object is nicely centered. TLD algorithm decomposes the long-term tracking task into three components: tracking, learning and detection [41]. The TLD tracker follows the object from frame to frame. The detector localizes all appearances that have been observed so far and corrects the tracker if necessary. The learning component estimates detector's errors and updates it to avoid these errors in the future. In the other hands, Median Flow tracks the object in both forward and backward directions in time and measures the discrepancies between these two trajectories [42]. Minimizing forward and backward error enables them to reliably detect tracking failures and select reliable trajectories in video sequences. Recent KCF tracker builds on the idea of MIL tracker [43]. This tracker utilizes the fact that the multiple positive sample used in the MIL tracker have large overlapping regions. This overlapping data leads to some nice mathematical properties that is exploited by this tracker to make tracking faster and more accurate at the same time.
Although the state-of-the-art trackers [40][41][42][43] are designed to track generic objects, in our experiments, we optimized these state-of-the-art trackers for our marker for fair comparison. Although the performance of these trackers is lower than that by our method, the performance degradation of these trackers with the sub-database 2 (drone hovering) is lower than that with the sub-database 1 (drone landing). That is because these trackers are designed to track the object whose size does not change much. However, our method can track the marker even in the case that the size of marker changes drastically. The reason why we used these state-of-the-art trackers for comparisons is that there is no open source of mark tracker which can be used for our marker. The previous algorithms for ArUco, AprilTags, and Alvar cannot be used for our specific marker, but they can be used for their own types of marker. Therefore, we used these state-of-the-art trackers for comparisons.
As shown in Tables 3 and 4, our method outperforms the other methods as well as our method without Kalman filtering ("Ours without KF") in case of the sequences of morning, afternoon, and evening. However, for the night sequence, our method without Kalman filtering shows higher accuracies than our method with Kalman filtering ("Ours with KF"). Because the visibility of the marker is degraded at nighttime as shown in Figure 9a, the detection accuracy of the marker center obtained by our method is degraded compared to those obtained by our method in daytime. This increases the fluctuation of the detected position of the marker center at nighttime, and consequently, the errors of the Kalman filtering also increases. Therefore, our proposed algorithm does not use Kalman filtering at nighttime as shown in Figure 8.  Figure 12a-d shows the comparative graphs of the CLE and PDE obtained by our method compared with previous methods using sub-database 1. Figure 13a-d shows those of the CLE and PDE with sub-database 2. As previously explained, the previous methods, i.e., MIL, TLD, Median Flow, and KCF do not produce the marker direction, but only the center of the marker. Therefore, we compared the PDE obtained by our method with those obtained by ATM, and our method without Kalman filtering ("Ours without KF" in Figures 12 and 13). In addition, in Figures 12 and 13, "Ours with KF" refers to our proposed method, and "MEDIAN" represents the method of Median Flow. As shown in Figure 12a-c, our proposed method outperforms the MIL, TLD, Median Flow, KCF, as well as our method without Kalman filtering in terms of CLE and PDE. In Figure 12d, although our method outperforms MIL, TLD, Median Flow, and KCF, our method without Kalman filtering shows higher accuracies than our method with Kalman filtering. That is because the visibility of the marker is severely degraded at nighttime as shown in Figure 9a, which causes the detection accuracy of the marker center obtained by our method to degrade compared to those obtained by our method in the daytime. This increases the fluctuation of the detected position of the marker center at nighttime, and consequently, the errors of the Kalman filtering also increases.  Figure 14 shows the marker-detection examples obtained by our method and previous methods with sub-database 1. Our experiments conducted at two different heights: 6 m and 10 m in the case of morning, afternoon, and evening. For the night time case, we only test marker detection at the height of 6 m because images captured at the height further than 6 m are too dark and cannot perform any detection. In our experiments, we use the DJI Phantom 4's remote controller in its default setting of Mode 2 (with the left stick controlling the throttle). At the heights of 6 m, we let the drone descend by manually pushing the left stick down until the drone safely lands on the ground and all the wings stop rotating. Meanwhile, at the height of 10 m, we let it descend using Return-to-Home (RTH) function.
As shown in Table 3 and Figure 14, MIL and KCF trackers are the 2nd and 3rd ranked in accuracy, respectively. Both of them have comparable results due to the similarity in the nature of their algorithms. We especially note that the TLD and Median Flow trackers easily lose target even when the marker is still in the FOV of the camera. In addition, our proposed marker tracker is able to locate the target accurately even in the case that part of marker disappears in the FOV of the camera. At the 62nd frame of Figure 14a, both TLD and Median Flow trackers completely lose the marker when some portions of the marker are out of view, i.e., some parts of the marker are not shown in the camera's FOV. The same results can be observed at the 61st frame of Figure 14c and the 60th frame of Figure 14e. In the case of night landing, as shown in Figure 14g, our proposed method maintains a good bounding box prediction which covers the most area of the marker while other trackers fail.
Sensors 2017, 17,1987 22 of 38 Figure 13. Comparisons of CLE and PDE obtained by our method with those obtained by other methods using sub-database 2 including the videos captured (a) in the morning, (b) in the afternoon, (c) in the evening, and (d) at night. Figure 14 shows the marker-detection examples obtained by our method and previous methods with sub-database 1. Our experiments conducted at two different heights: 6 m and 10 m in the case of morning, afternoon, and evening. For the night time case, we only test marker detection at the height of 6 m because images captured at the height further than 6 m are too dark and cannot perform any detection. In our experiments, we use the DJI Phantom 4's remote controller in its default setting of Mode 2 (with the left stick controlling the throttle). At the heights of 6 m, we let the drone descend by manually pushing the left stick down until the drone safely lands on the ground and all the wings stop rotating. Meanwhile, at the height of 10 m, we let it descend using Return-to-Home (RTH) function.
As shown in Table 3 and Figure 14, MIL and KCF trackers are the 2nd and 3rd ranked in accuracy, respectively. Both of them have comparable results due to the similarity in the nature of their algorithms. We especially note that the TLD and Median Flow trackers easily lose target even when the marker is still in the FOV of the camera. In addition, our proposed marker tracker is able to locate the target accurately even in the case that part of marker disappears in the FOV of the camera. At the 62nd frame of Figure 14a, both TLD and Median Flow trackers completely lose the marker when some portions of the marker are out of view, i.e., some parts of the marker are not shown in the camera's FOV. The same results can be observed at the 61st frame of Figure 14c   In the 10 m landing experiments, scale variation and cluttered background are the main challenging factors. We compare marker detection at many different areas and environments. For example, as shown in Figure 14b,d, test images are both captured in a sunny day but at different environments. We purposely put our marker under the shade of a tree with lots of noisy lighting hole in Figure 14b in order to check our proposed method's robustness. Meanwhile, in Figure 14d, our marker is placed at an open area which has a very strong sunlight. In contrast, our marker in Figure 14f looks very blurred at the height of 10 m. Despite having several challenges, our proposed tracker maintains higher accuracy compared to other trackers, followed by MIL, KCF, Median Flow and TLD trackers.
In addition, Figure 15 shows the marker-detection examples by our method and previous methods with sub-database 2 at the height of 10 m. Different from the previous experiments, we tested our algorithm and previous methods to determine the detection results while the marker is translated or rotated on the ground. This type of experiment is similar with other state-of-the-art's general object tracking test. As shown in Figure 15a,c,e, even though we manually translate marker with some little changes of orientation, our tracker has no problem detecting marker center and its direction in successive frames. A much more complex background with marker's high-velocity translation and 360 • rotations are main challenges in Figure 15b,d,f. Instead of slowly moving the marker by hands, we kick and push the marker further away which results in massive movements and rotations. Moreover, in Figure 15d, we design and print two toy markers with a similar design with our proposed marker and put them near our proposed marker in the testing scene. Our proposed tracking algorithm still achieves the best tracking performance and returns no false positive detection. We observe that there is no huge degradation of tracking performance from TLD and Median Flow trackers compared to previous landing test. MIL and KCF trackers successfully follow marker location and have reasonable performance. Overall, our method outperforms previous marker-detection methods, and our method can correctly detect the marker even in complicated backgrounds. In addition, even with the night images, our proposed method can detect both the correct position and direction of the marker, as shown in Figures 14g and 15g. Sensors 2017, 17, 1987 26 of 38 In the 10 m landing experiments, scale variation and cluttered background are the main challenging factors. We compare marker detection at many different areas and environments. For example, as shown in Figure 14b,d, test images are both captured in a sunny day but at different environments. We purposely put our marker under the shade of a tree with lots of noisy lighting hole in Figure 14b in order to check our proposed method's robustness. Meanwhile, in Figure 14d, our marker is placed at an open area which has a very strong sunlight. In contrast, our marker in Figure  14f looks very blurred at the height of 10 m. Despite having several challenges, our proposed tracker maintains higher accuracy compared to other trackers, followed by MIL, KCF, Median Flow and TLD trackers.
In addition, Figure 15 shows the marker-detection examples by our method and previous methods with sub-database 2 at the height of 10 m. Different from the previous experiments, we tested our algorithm and previous methods to determine the detection results while the marker is translated or rotated on the ground. This type of experiment is similar with other state-of-the-art's general object tracking test. As shown in Figure 15a,c,e, even though we manually translate marker with some little changes of orientation, our tracker has no problem detecting marker center and its direction in successive frames. A much more complex background with marker's high-velocity translation and 360° rotations are main challenges in Figure 15b,d,f. Instead of slowly moving the marker by hands, we kick and push the marker further away which results in massive movements and rotations. Moreover, in Figure 15d, we design and print two toy markers with a similar design with our proposed marker and put them near our proposed marker in the testing scene. Our proposed tracking algorithm still achieves the best tracking performance and returns no false positive detection. We observe that there is no huge degradation of tracking performance from TLD and Median Flow trackers compared to previous landing test. MIL and KCF trackers successfully follow marker location and have reasonable performance. Overall, our method outperforms previous marker-detection methods, and our method can correctly detect the marker even in complicated backgrounds. In addition, even with the night images, our proposed method can detect both the correct position and direction of the marker, as shown in Figures 14g and 15g. (a)   Table 5 shows the comparative processing time per image obtained by our method and previous methods. We measured the processing time using the embedded system of Figure 11. As shown in Table 5, the processing speed achieved our method is much faster than those achieved by previous methods, and our method can be operated at a real-time speed of more than 40 (1000/25) fps. Although MIL shows a lower marker detection error than our method with the morning and night videos of sub-database 2, as shown in Table 3, the processing speed of MIL is too slow for use in real-time embedded systems compared to our method as shown in Table 5. Therefore, the effectiveness of our method is higher than previous developed methods.

Pose Estimation Experiments
In the next experiment, we compare the accuracy of full pose estimation by our method with that by fiducial marker tracker of ArUco. In order to compute the full pose of our detected marker with respect to the camera frame, we need to obtain following information: intrinsic parameters of the camera (camera matrix and distortion coefficients), 2D coordinates of a few points in the input image and their 3D locations in the real world.
Before we try to find the pose of our marker which refers to its relative orientation and position with respect to drone camera, we have to perform camera calibration to obtain camera matrix and distortion coefficients vector. As shown in Figure 16, we print out a chessboard pattern image [75] in an A4 paper and take several images with different chessboard's poses using the DJI Phantom 4 camera. Using OpenCV's calibrateCamera function [76], we obtain camera matrix (M) and distortion coefficients (C) shown as below: Our marker pose estimation is carried out through the OpenCV's solvePnP function [76]. The output of this function is the current pose of the camera with respect to the center of the marker. The solvePnP function is based on the pinhole camera model. In this model, each point of view is formed by projecting each image point into the corresponding image plane point using a perspective transformation: r 11 r 12 r 13 t x r 21 r 22 r 23 t y r 31 r 32 where X, Y, Z are coordinates of 3D points P in the world coordinates space and u, v are the coordinates of the projection point p in image plane. The coefficients (c x , c y ) and (f x , f y ) representing respectively the coordinates of the principal point, that is usually at the image center, and the focal lengths expressed in pixel units. We already mentioned how we obtained these coefficients through the calibration procedure of the camera described above. Using the solvePnP function, we can obtain all parameters of rotation matrix (R) and translation vector (t). In order to get three Euler angles (yaw, pitch, roll) from acquired rotation matrix R, we use OpenCV's decomposeProjectionMatrix function [76].
where X, Y, Z are coordinates of 3D points P in the world coordinates space and u, v are the coordinates of the projection point p in image plane. The coefficients (cx, cy) and (fx, fy) representing respectively the coordinates of the principal point, that is usually at the image center, and the focal lengths expressed in pixel units. We already mentioned how we obtained these coefficients through the calibration procedure of the camera described above. Using the solvePnP function, we can obtain all parameters of rotation matrix (R) and translation vector (t). In order to get three Euler angles (yaw, pitch, roll) from acquired rotation matrix R, we use OpenCV's decomposeProjectionMatrix function [76]. In this experiment, we want to show that our proposed profile checker algorithm is not only capable of detecting marker center or its direction but also support computing full pose of the pattern respect to the camera frame and achieve comparable results with another fiducial marker tracker, such as ArUco tracker [77]. Using the same MUTOH printer [48], we print a predefined version of ArUco marker (DICT_6x6_50) which has the same dimension (1-m width and 1-m height) with our proposed marker. Same camera parameters of Equations (13) and (14) were used for our method and ArUco marker-based method for fair comparison.
Because we cannot have ground-truth values of full pose (yaw, pitch, roll, three translations on X-, Y-, and Z-axes) from drone, we measured the accuracy of full pose estimation by comparing the estimated values of full pose by our method with those by ArUco marker-based method. For that, experiments were done as follows. Figure 17 shows how we manually place ArUco marker next to our proposed marker, the distance between two marker centers O1 and O2 is 1.2 m. In detail, the Y-and Z-axes of our marker and ArUco marker are coincident, and the two X-axes of these two markers have only the disparity of 1.2 m. In this case, we let the drone flying freely with lots of X, Y, Z translations and yaw, pitch, roll rotations. Using OpenCV's detectMarkers function [77] which is based on the research [78], we detect 4 corners A, B, C, D of ArUco marker at each image frame and use them as the key points for pose estimation. Applying the same solvePnP function described above, we can also compute pose estimation of ArUco marker respected to camera frame. In this experiment, we want to show that our proposed profile checker algorithm is not only capable of detecting marker center or its direction but also support computing full pose of the pattern respect to the camera frame and achieve comparable results with another fiducial marker tracker, such as ArUco tracker [77]. Using the same MUTOH printer [48], we print a predefined version of ArUco marker (DICT_6x6_50) which has the same dimension (1-m width and 1-m height) with our proposed marker. Same camera parameters of Equations (13) and (14) were used for our method and ArUco marker-based method for fair comparison.
Because we cannot have ground-truth values of full pose (yaw, pitch, roll, three translations on X-, Y-, and Z-axes) from drone, we measured the accuracy of full pose estimation by comparing the estimated values of full pose by our method with those by ArUco marker-based method. For that, experiments were done as follows. Figure 17 shows how we manually place ArUco marker next to our proposed marker, the distance between two marker centers O 1 and O 2 is 1.2 m. In detail, the Y-and Z-axes of our marker and ArUco marker are coincident, and the two X-axes of these two markers have only the disparity of 1.2 m. In this case, we let the drone flying freely with lots of X, Y, Z translations and yaw, pitch, roll rotations. Using OpenCV's detectMarkers function [77] which is based on the research [78], we detect 4 corners A, B, C, D of ArUco marker at each image frame and use them as the key points for pose estimation.
Applying the same solvePnP function described above, we can also compute pose estimation of ArUco marker respected to camera frame.
In our research, we used our own method for detecting four key points with our marker not using OpenCV function. That is, as shown in Figure 4, we select four detected key points P 2 , P 4 , P 9 , and P 13 from our proposed profile checker algorithm as 4 key points for pose estimation. The reason we choose these four points is that even if the drone is getting closer to the marker, these four key points would always be seen by the drone's camera. By conclusion, for our marker, we used our own detection/tracking algorithm for four key points whereas we used their own, freely available detection/tracking code for four key points from ArUco marker [78] for fair comparison. From the four detected key points from our marker and ArUco marker, we used the same OpenCV's solvePnP function [76] in order to obtain the 6 parameters (the 3 translations of X-, Y-, Z-axes, and the 3 rotations of yaw, pitch, roll) for pose estimation. This solvePnP function has been widely used for pose estimation purpose in previous researches (even in [78]), and we used the same solvePnP function for both our marker and ArUco marker for fair comparisons. Table 6 shows all 3D coordinates of selected key points using for both markers.
Sensors 2017, 17,1987 32 of 38 In our research, we used our own method for detecting four key points with our marker not using OpenCV function. That is, as shown in Figure 4, we select four detected key points P2, P4, P9, and P13 from our proposed profile checker algorithm as 4 key points for pose estimation. The reason we choose these four points is that even if the drone is getting closer to the marker, these four key points would always be seen by the drone's camera. By conclusion, for our marker, we used our own detection/tracking algorithm for four key points whereas we used their own, freely available detection/tracking code for four key points from ArUco marker [78] for fair comparison. From the four detected key points from our marker and ArUco marker, we used the same OpenCV's solvePnP function [76] in order to obtain the 6 parameters (the 3 translations of X-, Y-, Z-axes, and the 3 rotations of yaw, pitch, roll) for pose estimation. This solvePnP function has been widely used for pose estimation purpose in previous researches (even in [78]), and we used the same solvePnP function for both our marker and ArUco marker for fair comparisons. Table 6 shows all 3D coordinates of selected key points using for both markers.   Figure 18 shows our pose estimation result of our proposed marker and ArUco marker with 3D coordinates axes (red, green and blue line representing X-, Y-and Z-axes). As shown in Figure 18, using key points detected from our proposed method, we successfully achieve comparable results to ArUco marker's pose estimation. At the 65th, 467th and 796th frames, even though the drone has lot of X, Y, Z translation and rotates a lot in the yaw, pitch and roll, but our marker's predicted coordinates axes are almost the same with ArUco's estimation. As shown from Figures 19 and 20, our marker's pose estimation (red solid line) is almost the same with ArUco's pose estimation (green dash  Table 6. 3D coordinates of all detect key points of ArUco marker and our proposed marker.

Methods
Key Points X Y Z Our marker  Figure 18 shows our pose estimation result of our proposed marker and ArUco marker with 3D coordinates axes (red, green and blue line representing X-, Y-and Z-axes). As shown in Figure 18, using key points detected from our proposed method, we successfully achieve comparable results to ArUco marker's pose estimation. At the 65th, 467th and 796th frames, even though the drone has lot of X, Y, Z translation and rotates a lot in the yaw, pitch and roll, but our marker's predicted coordinates axes are almost the same with ArUco's estimation. As shown from Figures 19 and 20, our marker's pose estimation (red solid line) is almost the same with ArUco's pose estimation (green dash line) in all video frames. Table 7 shows average error of X, Y, Z translation and yaw, pitch, roll rotation in the above case. As shown in this table, we can find that the accuracy by our marker-based estimation of full pose is similar to that by ArUco marker-based method.
Sensors 2017, 17,1987 33 of 38 line) in all video frames. Table 7 shows average error of X, Y, Z translation and yaw, pitch, roll rotation in the above case. As shown in this table, we can find that the accuracy by our marker-based estimation of full pose is similar to that by ArUco marker-based method.    Table 7 shows average error of X, Y, Z translation and yaw, pitch, roll rotation in the above case. As shown in this table, we can find that the accuracy by our marker-based estimation of full pose is similar to that by ArUco marker-based method.

Conclusions
In this paper, we proposed a novel method for detecting the marker center and estimating the marker direction based on the ATM, profile checker, and Kalman filtering algorithm in order to precisely land a UAV. In particular, our proposed method can be operated using nighttime video based on the adaptive thresholding and morphological processing algorithm. We performed extensive tests in various environments that show that our algorithm outperformed the state-of-theart visual trackers in terms of both robustness and accuracy. In addition, the processing speed of our method was much faster than those obtained by previous methods, and we confirmed that our proposed method can be operated at a real-time speed exceeding 40 (1000/25) fps in an actual embedded system.
Based on the specification of DJI phantom 4 drone used in our experiment [36], the maximum wind speed resistance is 10 m/s. However, when we collected lots of data of Table 2 under various weather and time situations, there was no case that the wind speed exceeds in 3.5 m/s, and it is very difficult to collect the data by waiting the weather of strong wind. We would have experiments with the additional data collected at strong wind in future work. Our algorithm can detect the marker center and the estimate marker direction without the need for any training procedures. Therefore, for future work, we hope to enhance the performance of our method by adopting a training scheme, and we will consider employing the deep learning-based tracking algorithm in our system.

Conclusions
In this paper, we proposed a novel method for detecting the marker center and estimating the marker direction based on the ATM, profile checker, and Kalman filtering algorithm in order to precisely land a UAV. In particular, our proposed method can be operated using nighttime video based on the adaptive thresholding and morphological processing algorithm. We performed extensive tests in various environments that show that our algorithm outperformed the state-of-the-art visual trackers in terms of both robustness and accuracy. In addition, the processing speed of our method was much faster than those obtained by previous methods, and we confirmed that our proposed method can be operated at a real-time speed exceeding 40 (1000/25) fps in an actual embedded system.
Based on the specification of DJI phantom 4 drone used in our experiment [36], the maximum wind speed resistance is 10 m/s. However, when we collected lots of data of Table 2 under various weather and time situations, there was no case that the wind speed exceeds in 3.5 m/s, and it is very difficult to collect the data by waiting the weather of strong wind. We would have experiments with the additional data collected at strong wind in future work. Our algorithm can detect the marker center and the estimate marker direction without the need for any training procedures. Therefore, for future work, we hope to enhance the performance of our method by adopting a training scheme, and we will consider employing the deep learning-based tracking algorithm in our system.