General Moving Object Localization from a Single Flying Camera

. Abstract: Object localization is an important task in the visual surveillance of scenes, and it has important applications in locating personnel and / or equipment in large open spaces such as a farm or a mine. Traditionally, object localization can be performed using the technique of stereo vision: using two ﬁxed cameras for a moving object, or using a single moving camera for a stationary object. This research addresses the problem of determining the location of a moving object using only a single moving camera, and it does not make use of any prior information on the type of object nor the size of the object. Our technique makes use of a single camera mounted on a quadrotor drone, which ﬂies in a speciﬁc pattern relative to the object in order to remove the depth ambiguity associated with their relative motion. In our previous work, we showed that with three images, we can recover the location of an object moving parallel to the direction of motion of the camera. In this research, we ﬁnd that with four images, we can recover the location of an object moving linearly in an arbitrary direction. We evaluated our algorithm on over 70 image sequences of objects moving in various directions, and the results showed a much smaller depth error rate (less than 8.0% typically) than other state-of-the-art algorithms.


Introduction
In the visual surveillance of scenes, it is not sufficient to simply detect and track an object [1]. We often need to know the location of the object (or at least the relative location of the object from the camera). This has important applications in finding and locating personnel and/or equipment on a farm [2] or in a mine [3].
There are many methods of estimating the distance of an object from a camera in a scene, such as using reflection of light, laser, radio waves, and ultrasonic waves [4][5][6][7][8][9][10]. However, it is more popular to use one or multiple cameras for object localization due to cost considerations [11]. The most general way of object localization from vision is using two fixed cameras to obtain a perspective for a target object. This method is called "stereo vision" [12] and has shown a high accuracy for stationary objects and objects at a near distance [13].
However, the problem with traditional stereo vision algorithms is that, since the distance between two cameras (i.e., the baseline) is short, it is hard to obtain an accurate localization for distant objects as the error rate becomes large. An alternative is to use only one camera that is able to move to a new position to take a second image, and we can make the baseline arbitrarily large [14]. Unfortunately, the object that we are tracking may have moved during that time, rendering traditional stereo vision algorithms useless [15].
In this research, we will address the problem of determining the location of a moving object using a single moving camera. The objectives are two-fold: (1) we want to be able to make the baseline arbitrarily large; (2) we want to remove the depth ambiguity caused by a moving object relative to a moving camera. We approached this problem by mounting a single camera on a quadrotor drone and instructing it to fly in a specific pattern so as to remove the depth ambiguity associated with its relative motion. To maintain generality, we did not assume that we knew the class of the object (e.g., person, vehicle, farming equipment, etc.) or the size, speed, and direction of motion of the object. This allows us to apply this technique to various applications such as locating farm personnel and equipment, mining vehicles, delivery vehicles belonging to logistics supply chains, wild animals in a prairie, etc. In our previous work [16], we showed that, by using three images, we are able to recover the location of an object moving in a direction parallel to the motion of the camera. In this research, we will show that, by using four images, we are able to recover the location of an object moving linearly in an arbitrary direction.
This paper is organized as follows. Section 2 reviews the related works in object localization. The system design of our proposed method is described in Section 3, and the mathematical formulation is discussed in Section 4. Section 5 gives the implementation details and Section 6 shows the evaluation performance of the proposed method. Finally, Section 7 provides some discussion on the localization errors, and Section 8 gives the conclusion and a summary of our proposed approach.

Related Works
A very popular solution for object localization (and object tracking) is to use drones. The most direct way to achieve this is to get the drone to fly directly to the object and then retrieve its global positioning system (GPS) location. Tracking can be achieved simply by following the object when it moves. However, this very simple approach is inefficient, and it is also not possible to localize multiple objects simultaneously.
A better approach is to use stereo vision on drones. Cigla et al. [17] proposed an onboard stereo system for a drone to perform pursuit as well as sense and avoid. Grewe and Stevenson [18] used a drone-based stereo camera system to assist a low-vision person with environment awareness by detecting a user's heading and direction. Nam and Gun-Woo [19] showed the possibility of using stereo vision with the Inertial Measurement Unit (IMU) to form a tightly coupled Visual-Inertial Navigation System (VINS), which shows a better performance in terms of robustness and accuracy. There is no restriction on the number of cameras for the stereo vision method. Fehrman and McGough [20] performed real-time depth mapping using a low-cost camera array of 16 cameras. Compared to the single camera pair, depth mapping becomes more accurate by increasing the number of camera pairs. Stereo vision has also been shown to be able to recover distances accurately for near objects. Wan and Zhou [21] introduced a novel stereo rectification method for a dual-fixed Pan-Tilt-Zoom (PTZ) camera system that can recover distance accurately for up to 20 m indoors. Tran et al. [22] proposed a stereo camera system that can perform face detection, with good results for the detection rate (99.92%) in an indoor environment (5 to 20 m). Ramírez-Hernández et al. [23] presented a novel camera calibration method based on the least square method that can improve the accuracy of stereo vision systems for three-dimensional point localization up to a distance of 20 cm. Detecting and localizing moving objects in a stereo vision system is also not a problem. Islam et al. [24] proposed a stereo camera rig with an ultra-wide baseline distance and conventional cameras with fish-eye lenses. They used a unique method to restore the 3D positions by developing a relationship between the pixel dimensions and distances in an image and real-world coordinates. In their research, although the target was moving, the depth extraction results were reasonably accurate. However, their work was conducted in close range (around 2 m) and the object also moved only a little (around 900 mm). This is the biggest disadvantage of the stereo vision method when localizing objects. Since the algorithm requires the value of the distance between the two cameras (i.e., baseline, which is fixed), it is not good for localizing far objects, as there is only a small disparity between the objects in the two images. We need a much larger baseline to localize far objects. As the baseline gets shorter, the error gets larger. Kim et al. [25] tried to overcome this problem by using multiple drones with a single camera on each of them. They performed research on ground obstacle mapping for Unmanned Ground Vehicles (UGV) without the aid of GPS and using two drones with a single camera on each. Their baseline is changeable, since each drone can be positioned to achieve the desired distance. Xu et al. [26] used three or more unmanned aerial vehicles simultaneously observe the same ground target and acquire multiple remote-sensing images to improve the precision of the target's localization. The multi-unmanned aerial vehicle cooperative target localization method can use more observation information, which results in a higher rendezvous accuracy and improved performance.
The major problem with putting stereo cameras on a drone is the power and the cost. Achtelik et al. [27] described a system that uses stereo vision and laser odometry for drone navigation; however, carrying two cameras at once consumes more power and decreases the payload as well as the duration of flight. For lightweight or low-cost drones, stereo cameras cannot be a solution since the drones cannot afford to even carry the additional camera. Santamaria-Navarro et al. [28] proposed a navigation system for low-cost micro aerial vehicles (MAVs), using two Kalman filters in the extended and error-state flavors. Their system used a single-camera system and is based on low-cost sensors. This is the reason why all the drones on the market currently come with only a single camera.
Another solution is to use pre-determined information to localize objects with a single camera. If we know the information about the target object, such as its size, shape, and color, we can immediately calculate the distance to the object using simple ratio calculations. Chen and Wang [29] showed one good example of such a system. They performed efficient calibration for multiple PTZ cameras by using a single object lying on a horizontal plane with known parameters. Their algorithm works by detecting A4 paper, estimating the angle and altitude of one camera, projecting the A4 paper's location on a 3D space, and then calibrating other cameras with this information.
For drones, this method of using prior information is very simple and has no effects on the duration of flight and payload, since it needs no additional equipment. Zhang and Liu [30] proposed a system where a drone estimates relative altitude from a ground object with (or without) movement if the size of the object is known. Sereewattana et al. [31] performed a depth estimation of color markers for the automatic landing control of Unmanned Aerial Vehicles (UAVs) using stereo vision with a single camera. In their work, the color markers are static (i.e., not moving) and are close to the drone (below 3 m). Sereewattana et al. [32] also extended the concept of automatic landing control for a fixed-wing UAV. Like their earlier work, they took images with a single downward looking camera at different time intervals, and they used static color markers on the flat ground. Nguyen et al. [33] also proposed a novel system to safely land a drone in the absence of GPS signals using a unique marker designed as a tracking target during landing procedures.
Meanwhile, object localization can be performed with a single camera with image analysis. This approach uses feature extraction from images to perform navigation and depth extraction. Huang et al. [34] proposed an accurate autonomous indoor navigation system for a quad copter with a monocular vision. They used information from multiple sensors from the drone as well as a vision-based Simultaneous Localization and Mapping (SLAM) system and Extended Kalman Filter (EKF) to achieve robust and accurate 3D position and velocity estimation. However, they require a large amount of time (about 1 min) with respect to their traveling distance (around 8 m), and their research is confined to an indoor environment which covers a small area.
The solution we are looking for is to use stereo vision principles but with a single-camera system instead. This can be achieved by taking multiple images of the object with a camera that is moving in a certain pattern so as to get the required baseline and image disparity. This system is also free from the decrease in payload and duration of flight, since it does not require any additional hardware gadgets. Thus far, we have found few related works that address these requirements. Bian et al. [35] did propose a novel monocular vision-based environmental perception approach and implemented it in an unmanned aerial vehicle system for close-proximity transmission tower inspection. Their proposed framework comprises tower localization and an improved point-line-based simultaneous localization and mapping framework consisting of feature matching, frame tracking, local mapping, loop closure, and nonlinear optimization. Although their goal is close proximity tower inspection, they could localize the tower fairly accurately.
In our previous work [16], we addressed the simple cases for (1) a stationary object, (2) an object moving parallel to the drone's motion, and (3) an object moving perpendicular to the direction of the drone's motion. For the stationary object case (case 1), by moving the single-camera drone horizontally to a new position, it becomes exactly the same as the classical stereo vision problem. We can recover the distance to the object using just two images. For the case of an object moving perpendicular to the direction of the drone's motion (case 3), it is also exactly the same as the classical stereo vision problem. It does not matter where the starting position of the object is, as we only need two images to recover the position of the object at the time the second image is taken. For the case of the object moving parallel to the drone's motion (case 2), by assuming that the object is moving at a constant speed, we showed that we can recover the distance to the object using three images.

System Overview
In this research, we want to propose a new algorithm for drones with a single camera to localize objects moving in an arbitrary direction. We do not assume any prior knowledge of the object class (e.g., person, vehicle, farm equipment, etc.) or its size, speed, and direction of motion. To achieve the maximal increase in baseline, we instruct the drone to fly in a direction perpendicular to the direction of the camera. As the drone flies, it will take a burst of images at small time intervals. As the drone's motion is known, the location of the drone where each image is taken is completely determined. Using this information and the location of the object in each of the images, we will be able to recover the distance of the object from the camera even though the object's motion is unknown. We demonstrate that, by using four images, we can recover the location of an object moving linearly in an arbitrary direction.
Our algorithm has the advantage that it does not require any additional cameras or apparatus, so it is very effective for lightweight or low-cost drones (e.g., Parrot rolling spider [36]) which have a low payload or low lift force. We demonstrate our algorithm on a pedestrian detector, but we will only use the centroid of the bounding box for our computations and will not make any assumptions about the shape or size of the object. By changing the object detector, our algorithm can work for objects of any class. Once the distance of the object from the drone is known, we can map the object's location to any global coordinate system (assuming that we know the drone's position-e.g., through Global Positioning System (GPS) or WiFi positioning). Our system is also robust to occlusions, as our drone will be taking a series of images as it flies, and we can always select only those that have the object in them. We can also arbitrarily choose the length of the baseline that we want in order to meet the accuracy that we need. Figure 1 shows an overview of our system. Our system consists of a drone mounted with a single camera at position (x d , y d ). Using the algorithm described later in this section, the drone will compute the distance Z d and angle θ d of an object (e.g., pedestrian) relative to itself, and then use this distance to compute the position of the object (x, y) in global coordinates. We assume that the position of the drone is always known (e.g., through GPS or WiFi positioning, or from an Inertial Navigation System (INS), if available).
After calculating the position of the object, the drone will compute a new position for it to fly to (if the object has moved) so that it will continue to have the object in its view. Figure 2 gives a flowchart Appl. Sci. 2020, 10, 6945 5 of 25 of the overall system, including takeoff and landing. Our algorithm is the part in the loop, and it can be repeated as often as necessary. After calculating the position of the object, the drone will compute a new position for it to fly to (if the object has moved) so that it will continue to have the object in its view. Figure 2 gives a flowchart of the overall system, including takeoff and landing. Our algorithm is the part in the loop, and it can be repeated as often as necessary.

Camera Calibration
In order to obtain an accurate estimation of the depth of the object, the camera needs to be calibrated. We want to be able to recover the angle subtended by an object from the camera axis based on the pixel offset of the object in the image. Figure 3 illustrates the scenario. The drone is at the

Camera Calibration
In order to obtain an accurate estimation of the depth of the object, the camera needs to be calibrated. We want to be able to recover the angle subtended by an object from the camera axis based on the pixel offset of the object in the image. Figure 3 illustrates the scenario. The drone is at the bottom of the figure with its camera pointing up the page. An object is at the top-right hand corner of the figure, at a depth Z from the camera and a horizontal displacement of P from the camera axis. The object subtends an angle θ from the optical center axis and causes an image offset of p from the image center. We want to obtain θ from p.  The computation from is dependent entirely on the focal length of the camera and the image resolution (dots per inch or dpi). To make the computations simpler, we introduce a new parameter where: The value of is a constant for a particular camera. We can obtain this value empirically by placing an object at a depth from the camera with a horizontal displacement from the camera axis in the drone's Field Of View (FOV) and then measuring the image offset . Table 1 shows the data measured from our experiment. With this ratio R, when the pixel offset p of an object from the image center is provided, we can immediately calculate the angle subtended: Figure 3. Geometry of view angle calibration. The object (pedestrian) is at depth Z and horizontal displacement P from the camera's optical center. It subtends an angle of θ from the camera axis and has an image offset of p pixels from the image center.

Baseline Calibration
The computation θ from p is dependent entirely on the focal length of the camera and the image resolution (dots per inch or dpi). To make the computations simpler, we introduce a new parameter R where: The value of R is a constant for a particular camera. We can obtain this value empirically by placing an object at a depth Z from the camera with a horizontal displacement P from the camera axis in the drone's Field Of View (FOV) and then measuring the image offset p. Table 1 shows the data measured from our experiment. With this ratio R, when the pixel offset p of an object from the image center is provided, we can immediately calculate the angle θ subtended:

Baseline Calibration
Another important set of parameters is the distances between the drone positions at each successive image capture. This represents the baseline between the images. We can determine the drone positions from GPS or WiFi positioning, but these computations are slow and have large errors. It is easier and more efficient to command the drone to perform exactly the same action in each run so that the distances are the same in each run. We can then determine these distances from calibration.
The constant action that the drone is tasked to perform is to take a sequence of images at a constant rate while flying with a preset speed to the left (or right). Any pair of images in this sequence can be used to recover the distance of the object, but to reduce the error in estimating depth the length of the baseline should be as long as possible. This means that we should use the first and the last images in the sequence for the best results. However, as the object may be out of the camera's view as the drone flies, we can use any of the intermediate images in the sequence where the object is last seen.
In our calibration experiments, we allow the drone to takeoff and hover for a few seconds until it has stabilized, and then we send an instruction to the drone to fly to the left (or right) at the maximum speed (i.e., roll angle phi = ±1.0, normalized), while taking a burst of images at a 200 ms interval for a total of 40 images. Figure 4 shows the layout of the baseline calibration. Here, the direction of object movement and drone flight should be equal. For example, if the target object is moving left relative to the drone, then the drone should fly left as well; otherwise, the object will disappear from the image very quickly. These specific values were chosen after numerous tries to give the best tradeoff between the distance flown and the time taken to complete, as the object (pedestrian) may have moved during that time interval. Procedure 1 describes the steps of the baseline calibration procedure. Drone takeoff and hover until stabilized.

2.
Instruct the drone to fly to the left (or right) with maximum speed (i.e., roll angle phi = ±1.0 , normalized).

4.
Continue to fly for 200 ms.
Until (number of images taken equals 40)

5.
Match each image taken with the pattern on the wall and measure the distance travelled at the locations where each image is taken.

End
It is noteworthy to mention that for a quadrotor drone, from a stabilized hovering position it cannot immediately move horizontally. The drone needs to rotate (i.e., "roll") into the direction of intended motion so as to have a component of its lift to generate the force necessary to begin its motion. Due to this, we observe that the position of the drone before the command is sent to the drone (we call this position 0) and the position when the first image is taken (we call this position 1) are exactly the same. The drone was at hovering when the command is sent (i.e., before the drone has even started to rotate (roll)), and the first image was taken just after the drone completed its rotation (roll) to the left or right (but before any horizontal translation takes place-so without any changes in its position). The rest of the images (at positions 2 to n − 1) were increasingly distant from the next, due to the inertia and the acceleration of the drone, which needed some time to accelerate to the desired speed.
To measure the length of the baselines, we made the drone fly parallel to a wall with a specific tiling pattern, taking images as mentioned above. From the images, we can determine the positions of the drone where the images were taken and measure the distances between them. We repeat the experiment five times and take the average of these distances as our calibrated baselines. The baseline between any two images in the sequence is simply the sum of the distance values between the two drone positions.
interval for a total of 40 images. Figure 4 shows the layout of the baseline calibration. Here, the direction of object movement and drone flight should be equal. For example, if the target object is moving left relative to the drone, then the drone should fly left as well; otherwise, the object will disappear from the image very quickly. These specific values were chosen after numerous tries to give the best tradeoff between the distance flown and the time taken to complete, as the object (pedestrian) may have moved during that time interval. Procedure 1 describes the steps of the baseline calibration procedure.

Object Detection
Our system is not restricted to work on only one class of objects. It can be modified to work on any class of objects, but we need to have a reliable object detection algorithm for it. In this paper, we demonstrate our system on pedestrians, and we make use of the Histogram of Oriented Gradients (HOG) pedestrian descriptor from the OpenCV 2.4.10 library ("peopledetect.cpp"). Figure 5 shows an example of the result of the pedestrian detection algorithm in OpenCV, which displays a bounding box over the detected pedestrian.
The pedestrian detector from OpenCV can detect more than one person in a scene. As with any other object tracking algorithm, if we have multiple objects in a scene, we need to make use of additional information (e.g., color, size, motion vectors, etc.) to correctly match the objects in different scenes. False positives can happen too. In our experiments, we will manually select the correct pedestrian if a false positive happens. The design of a more accurate pedestrian detector is not in the scope of our work.
We could have chosen YOLO (You Only Look Once) [37] or used popular object tracking algorithms such as TLD (Tracking-Learning-Detection) [38] or KCF (Kernelized Correlation Filter) [39], etc., in an attempt to make the bounding box more accurate. However, we believe that newer and better detection and tracking algorithms will always be developed, so the accuracy in our method will only increase and not decrease as we use more sophisticated detectors.

Flight Pattern
To obtain the required image disparity, the drone has to fly in a specific pattern. As pointed out in Section 3.2.2, the drone is instructed to perform the action of flying straight to the left or right with maximum speed. However, the drone takes time to reach its maximum speed from its hovering state. Since the drone is instructed to fly with maximum speed, it will be accelerating, which causes the distance travelled between each time interval to increase as time goes by. This is the intended flight pattern in order to make our algorithm work. If the drone flies with constant velocity, and if the object is also moving with a constant velocity, we will get multiple candidate paths due to the geometric similarity between them.
Our system is not restricted to work on only one class of objects. It can be modified to work on any class of objects, but we need to have a reliable object detection algorithm for it. In this paper, we demonstrate our system on pedestrians, and we make use of the Histogram of Oriented Gradients (HOG) pedestrian descriptor from the OpenCV 2.4.10 library ("peopledetect.cpp"). Figure 5 shows an example of the result of the pedestrian detection algorithm in OpenCV, which displays a bounding box over the detected pedestrian. The pedestrian detector from OpenCV can detect more than one person in a scene. As with any other object tracking algorithm, if we have multiple objects in a scene, we need to make use of additional information (e.g., color, size, motion vectors, etc.) to correctly match the objects in different scenes. False positives can happen too. In our experiments, we will manually select the correct pedestrian if a false positive happens. The design of a more accurate pedestrian detector is not in the scope of our work. This problem is illustrated in Figure 6. At the bottom is the quadrotor drone moving at a constant speed in a horizontal direction (it does not matter whether it is flying to the left or right). The object we are trying to localize is expressed as a peach-colored rectangle moving in an arbitrary direction (along the peach-colored path). However, if the object is at a further distance and moving with a slower velocity (i.e., the red rectangles along the red path), or if it is at a nearer distance with a faster velocity (i.e., the blue rectangles along the blue path), they will all generate the same image disparity in the image sequence. Of course, the size of the object in the image would be different if it is nearer or further away, but as we do not assume that we know the size of the object, we will not be able to determine which case it is. It is fair to assume that the object is moving at a constant speed; therefore, the drone must be in a state of acceleration so as to avoid this depth ambiguity.

Mathematical Formulation
We will now present our formulation for addressing the general moving object localization problem-i.e., estimating the depth of an object moving in an arbitrary direction-from a single flying camera. The simple cases in our previous work [16] are just special cases in the general moving object localization problem, where the motion of the object is limited to certain axes. We will first present the assumptions that will reduce the general moving object localization problem to a linear Figure 6. Illustration of depth ambiguity. The drone flies from the right to the left at a constant speed, keeping a separation of C between the drone positions (x d , y d ) where consecutive images were taken. The object (pedestrian) with position x p , y p is the peach-colored rectangle moving in a straight line with a constant speed. We can see that if the object (pedestrian) is moving along the blue line or red line instead, it will be captured at exactly the same position in all the images.

Mathematical Formulation
We will now present our formulation for addressing the general moving object localization problem-i.e., estimating the depth of an object moving in an arbitrary direction-from a single flying camera. The simple cases in our previous work [16] are just special cases in the general moving object localization problem, where the motion of the object is limited to certain axes. We will first present the assumptions that will reduce the general moving object localization problem to a linear motion problem, and then we will show that, by using four images, we are able to recover the distance of an object moving linearly in an arbitrary direction. Figure 7 shows the geometry of our localization problem. The first two assumptions will reduce the general motion of the object to that of a linear motion. Usually, we can expect a moving object to follow a straight line of motion and at constant speed, but there are situations where it may not be doing so. If the first two assumptions are true, the movement of the object can be approximated to a linear motion during the time interval that the drone is flying and taking a burst of images of the object. If the object continues to move in a curved or non-linear fashion, the error can be corrected when the drone flies the repeated pattern again, as described in Figure 2.
The third assumption is needed so as to remove the depth ambiguity illustrated in Figure 6. In our experiments, we use a Parrot AR Drone 2.0, which has a maximum speed of 11.11 m/s. We also instructed the drone to capture a sequence of 40 images at 200 ms intervals, which is a high enough frame rate. By the end of the 40 images, the drone would have reached its maximum speed and will not accelerate anymore. Thus, we believe that all these three assumptions are valid and reasonable.
Using all three assumptions, we can represent the object's movement as a straight line and set up the geometry as shown in Figure 8. Overview of the arbitrary movement of an object. The drone is flying from the right to the left, accelerating towards maximum speed. The distances C 1 , C 2 , etc., are increasing as the drone moves with increasing speed. The object may move in an arbitrary fashion, subtending angles θ 1 , θ 2 , etc., with the camera axes at each position.
The movement of the drone is from the right to the left. Note that the first image (the first black line from the right) is taken when the drone has just completed its rotation, but before any horizontal translation has taken place. Notice also that in subsequent images, the distances between the drones are further and further apart. This is because the drone is accelerating towards its maximum speed.
We make three important assumptions as follows: 1.
The drone is moving much faster than the object; 2.
The drone's camera's frame rate is high; 3.
The drone is always in a state of acceleration.
The first two assumptions will reduce the general motion of the object to that of a linear motion. Usually, we can expect a moving object to follow a straight line of motion and at constant speed, but there are situations where it may not be doing so. If the first two assumptions are true, the movement of the object can be approximated to a linear motion during the time interval that the drone is flying and taking a burst of images of the object. If the object continues to move in a curved or non-linear fashion, the error can be corrected when the drone flies the repeated pattern again, as described in Figure 2.
The third assumption is needed so as to remove the depth ambiguity illustrated in Figure 6.
In our experiments, we use a Parrot AR Drone 2.0, which has a maximum speed of 11.11 m/s. We also instructed the drone to capture a sequence of 40 images at 200 ms intervals, which is a high enough frame rate. By the end of the 40 images, the drone would have reached its maximum speed and will not accelerate anymore. Thus, we believe that all these three assumptions are valid and reasonable.
Using all three assumptions, we can represent the object's movement as a straight line and set up the geometry as shown in Figure 8.  To achieve the largest baseline in order to achieve the best accuracy, we should use the first two images and the last two images in the sequence. However, it is possible that the pedestrian may disappear from the drone's view before the end of the sequence. In such situations, we will use the last two images where the pedestrian is still detected (along with the first two images of the sequence) to compute the distance estimation. To achieve the largest baseline in order to achieve the best accuracy, we should use the first two images and the last two images in the sequence. However, it is possible that the pedestrian may disappear from the drone's view before the end of the sequence. In such situations, we will use the last two images where the pedestrian is still detected (along with the first two images of the sequence) to compute the distance estimation.
Since our final goal is to obtain ( , ), we need to determine the equation for line in Figure  9. The equation for line can be expressed as follows: To find this equation, we need to know the equations of the lines to so as to determine the intersection points between them. The equations for lines to can be expressed as follows: Line : = ( + ), Line : = ( + ).
Here, , , , and are the slopes for each line, and they can be expressed as follows: = tan , = tan , = tan , = tan , Since our final goal is to obtain (x 4 , y 4 ), we need to determine the equation for line L 0 in Figure 9. The equation for line L 0 can be expressed as follows: To find this equation, we need to know the equations of the lines L 1 to L 4 so as to determine the intersection points between them. The equations for lines L 1 to L 4 can be expressed as follows: Line Line L 3 : Line L 4 : y = L(x + l).
Here, I, J, K, and L are the slopes for each line, and they can be expressed as follows: and i, j, k, and l refer to the x axis intercept of the lines, which is essentially the baseline between the respective camera positions and the origin, which can be defined as follows.
where C n is the distance between successive positions of the drones. Each n 1 , n 2 , n 3 , and n 4 refers to the time period in which the image is taken (it is also the image number for that sequence) for the positions "i", "j", "k" and "l", respectively. For example, if n 1 is 20, the baseline calculation will be: Then, the intersections between line L 0 and the others can be expressed as follows: We can now use the above relations to find an expression for the distances between each successive position of the object. From the geometry in Figure 9, we can obtain the following relations: Equations (15)- (17) can be reduced further: By simplifying Equations (18) and (19), we can isolate the y-axis intercept "b" as an expression using the slope "a" from Equation (3). From Equations (18) and (19), we can get b 1 and b 2 , respectively, as follows: where: Since b 1 and b 2 must be the same value, we can equate (20) and (21) and derive the equation for the slope "a". Note that in our linear assumption, the object is moving at a constant speed in a straight line, so D A = D B = D C if four consecutive images in the sequence were used. If, instead, the first two images (i.e., frames 1 and 2) and the last two images where the pedestrian is still detected (say, at frames 18 and 19) were used, we would have 16D A = D B = 16D C .
The final equation for slope "a" is a cubic equation as follows: By solving Equation (22), we can get the slope "a" of line L 0 , as well as the intercept "b" from either Equation (20) or (21). The Cartesian location of the target object (x 4 , y 4 ) can then be determined. We solve the cubic Equation in (22) using Newton's method.
Since we are dealing with a cubic equation, there can be up to three real roots for the slope. Only one of the roots is the correct root and will give a valid object position (x 4 , y 4 ) that is consistent with the position and the flight path of the camera. Incorrect roots will give invalid positions, such as negative values of y 4 (i.e., behind the camera) or some extreme values of x 4 .
Note, also, that in our approach fluctuations in elevation do not affect accuracy. The accuracy of the algorithm only depends on the x-axis movement.

Hardware
The drone used in this study is the Parrot AR.Drone2 GPS edition [40], as shown in Figure 10. It has a forward-looking 720p HD camera and a vertical QVGA camera. The drone is controlled from an Intel i5 laptop running Windows 8.1 with 4GB of RAM. Since and must be the same value, we can equate (20) and (21) and derive the equation for the slope " ". Note that in our linear assumption, the object is moving at a constant speed in a straight line, so = = if four consecutive images in the sequence were used. If, instead, the first two images (i.e., frames 1 and 2) and the last two images where the pedestrian is still detected (say, at frames 18 and 19) were used, we would have 16 = = 16 .
The final equation for slope " " is a cubic equation as follows: By solving Equation (22), we can get the slope " " of line , as well as the intercept " " from either Equation (20) or (21). The Cartesian location of the target object ( , ) can then be determined. We solve the cubic Equation in (22) using Newton's method.
Since we are dealing with a cubic equation, there can be up to three real roots for the slope. Only one of the roots is the correct root and will give a valid object position ( , ) that is consistent with the position and the flight path of the camera. Incorrect roots will give invalid positions, such as negative values of (i.e., behind the camera) or some extreme values of . Note, also, that in our approach fluctuations in elevation do not affect accuracy. The accuracy of the algorithm only depends on the -axis movement.

Hardware
The drone used in this study is the Parrot AR.Drone2 GPS edition [40], as shown in Figure 10. It has a forward-looking 720p HD camera and a vertical QVGA camera. The drone is controlled from an Intel i5 laptop running Windows 8.1 with 4GB of RAM.

Software
The software we used to control the drone is the "CV Drone" package, which is available from Github [41]. The image processing routines were from the OpenCV 2.4.10 library, and the entire system was developed in Microsoft Visual Studio 2013.

Software
The software we used to control the drone is the "CV Drone" package, which is available from Github [41]. The image processing routines were from the OpenCV 2.4.10 library, and the entire system was developed in Microsoft Visual Studio 2013.

Test Environment
We should choose a test environment where the test subjects can move freely in any direction. In our experiments, we choose to detect pedestrians as they are not confined to moving along fixed roads (as opposed to cars). Our algorithm is sufficiently general and it can work on other object classes if we replace our object detector with an appropriate one for the object class of interest.
Ideally, we should allow our test subjects (i.e., pedestrians in this case) to walk around freely in the test environment, in any direction of their choice. However, regular people do not walk around aimlessly in an environment. Thus, instead of conducting a contrived experiment where we pay the same subject to walk around the test environment in fixed directions, we decided to choose a busy path where lots of pedestrians walk along it with varying walking speeds. To obtain the effect of the pedestrians walking in different directions, we make the drone fly in different directions with respect to the path. Figures 11 and 12 illustrate the ideal layout and the equivalent layout of the experiment.
We decided to record data where the camera moves with five different slopes with respect to the path. As the test environment requires a sufficiently large area for the drone to fly, we selected a large open field (100 m wide and 50 m long, approximately) at the Gwangju Institute of Science and Technology with a path that is used frequently by pedestrians to cross from one side of the campus to the other. Figure 13 shows the layout from a Google map of the area. We capture over 70 image sequences with different pedestrians moving at their own natural pace. Ideally, we should allow our test subjects (i.e., pedestrians in this case) to walk around freely in the test environment, in any direction of their choice. However, regular people do not walk around aimlessly in an environment. Thus, instead of conducting a contrived experiment where we pay the same subject to walk around the test environment in fixed directions, we decided to choose a busy path where lots of pedestrians walk along it with varying walking speeds. To obtain the effect of the pedestrians walking in different directions, we make the drone fly in different directions with respect to the path. Figures 11 and 12 illustrate the ideal layout and the equivalent layout of the experiment.
We decided to record data where the camera moves with five different slopes with respect to the path. As the test environment requires a sufficiently large area for the drone to fly, we selected a large open field (100 m wide and 50 m long, approximately) at the Gwangju Institute of Science and Technology with a path that is used frequently by pedestrians to cross from one side of the campus to the other. Figure 13 shows the layout from a Google map of the area. We capture over 70 image sequences with different pedestrians moving at their own natural pace.

Performance Indicators
For each of the experiments, we compute the depth error: where refers to the actual depth measured from the position of the drone at the last image of the four images used in the computation, while ′ indicates the computed depth using our algorithm. We will also compute the actual position error using the following equation: where ( , ) refers the actual position of a pedestrian, while ( ′, ′) indicates the computed position.

Performance Indicators
For each of the experiments, we compute the depth error: where Z refers to the actual depth measured from the position of the drone at the last image of the four images used in the computation, while Z indicates the computed depth using our algorithm. We will also compute the actual position error using the following equation: where (x, y) refers the actual position of a pedestrian, while (x , y ) indicates the computed position.

Performance Evaluation
For each image sequence, we took 40 images. The distance from the drone to the pedestrian varies between 5 and 20 m. The speed of each pedestrian is assumed to be constant, while the exact value is unknown. Various examples of the images that were captured by the drone are shown in Figures 14-18. For each figure, five image frames (frame 0, 1, 10, 18, and 19) were shown. Frame 0 represents the image as seen by the drone before the command is sent to the drone.

Performance Evaluation
For each image sequence, we took 40 images. The distance from the drone to the pedestrian varies between 5 and 20 m. The speed of each pedestrian is assumed to be constant, while the exact value is unknown. Various examples of the images that were captured by the drone are shown in Figures 14-18. For each figure, five image frames (frame 0, 1, 10, 18, and 19) were shown. Frame 0 represents the image as seen by the drone before the command is sent to the drone.     In each sequence, it is possible that the pedestrian will disappear from the drone's view. In such situations, we will use the last two images where the pedestrian is still detected (along with the first two images of the sequence) to compute the distance estimation. In this way, we keep the baseline as large as possible so as to achieve the most accurate results. For each experiment, we computed the depth error using Equation (23), and we used Equation (24)

Performance Evaluation
For each image sequence, we took 40 images. The distance from the drone to the pedestrian varies between 5 and 20 m. The speed of each pedestrian is assumed to be constant, while the exact value is unknown. Various examples of the images that were captured by the drone are shown in Figures 14-18. For each figure, five image frames (frame 0, 1, 10, 18, and 19) were shown. Frame 0 represents the image as seen by the drone before the command is sent to the drone.     In each sequence, it is possible that the pedestrian will disappear from the drone's view. In such situations, we will use the last two images where the pedestrian is still detected (along with the first two images of the sequence) to compute the distance estimation. In this way, we keep the baseline as large as possible so as to achieve the most accurate results. For each experiment, we computed the depth error using Equation (23), and we used Equation (24)

Performance Evaluation
For each image sequence, we took 40 images. The distance from the drone to the pedestrian varies between 5 and 20 m. The speed of each pedestrian is assumed to be constant, while the exact value is unknown. Various examples of the images that were captured by the drone are shown in Figures 14-18. For each figure, five image frames (frame 0, 1, 10, 18, and 19) were shown. Frame 0 represents the image as seen by the drone before the command is sent to the drone.     In each sequence, it is possible that the pedestrian will disappear from the drone's view. In such situations, we will use the last two images where the pedestrian is still detected (along with the first two images of the sequence) to compute the distance estimation. In this way, we keep the baseline as large as possible so as to achieve the most accurate results. For each experiment, we computed the depth error using Equation (23), and we used Equation (24) to obtain the position error. The results

Performance Evaluation
For each image sequence, we took 40 images. The distance from the drone to the pedestrian varies between 5 and 20 m. The speed of each pedestrian is assumed to be constant, while the exact value is unknown. Various examples of the images that were captured by the drone are shown in Figures 14-18. For each figure, five image frames (frame 0, 1, 10, 18, and 19) were shown. Frame 0 represents the image as seen by the drone before the command is sent to the drone.     In each sequence, it is possible that the pedestrian will disappear from the drone's view. In such situations, we will use the last two images where the pedestrian is still detected (along with the first two images of the sequence) to compute the distance estimation. In this way, we keep the baseline as large as possible so as to achieve the most accurate results. For each experiment, we computed the depth error using Equation (23), and we used Equation (24) to obtain the position error. The results

Performance Evaluation
For each image sequence, we took 40 images. The distance from the drone to the pedestrian varies between 5 and 20 m. The speed of each pedestrian is assumed to be constant, while the exact value is unknown. Various examples of the images that were captured by the drone are shown in Figures 14-18. For each figure, five image frames (frame 0, 1, 10, 18, and 19) were shown. Frame 0 represents the image as seen by the drone before the command is sent to the drone.     In each sequence, it is possible that the pedestrian will disappear from the drone's view. In such situations, we will use the last two images where the pedestrian is still detected (along with the first two images of the sequence) to compute the distance estimation. In this way, we keep the baseline as large as possible so as to achieve the most accurate results. For each experiment, we computed the Figure 18. Image sequence of a pedestrian moving perpendicular to the camera's motion.
In each sequence, it is possible that the pedestrian will disappear from the drone's view. In such situations, we will use the last two images where the pedestrian is still detected (along with the first two images of the sequence) to compute the distance estimation. In this way, we keep the baseline as large as possible so as to achieve the most accurate results. For each experiment, we computed the depth error using Equation (23), and we used Equation (24) to obtain the position error. The results are shown in Tables 2-6. Except for the case of pedestrian moving with an angle of 45 degrees, all the experiments yielded good results with a <8.0% depth error rate. For the case of the pedestrian moving with an angle of 45 degrees, there is one trial which gave a good accuracy with a depth error of 1.1%. In the other two trials, the algorithm had difficulty detecting the pedestrian near the end of the image sequence because the color of the clothing worn by the pedestrian was too similar to the background color. As a result, only the earlier images in the sequence were used, which led to a small baseline and thus a high error.

Comparison with Other Techniques
We compare our algorithm with the classical stereo vision technique of computing image disparity and then calculating the distance from the image disparity. This comparison is performed only for the stationary pedestrian case. We used two images of the pedestrian taken from the drone's camera placed 1 m apart (not in flight) and we used the same pedestrian detector (based on the HOG descriptor) to find the boundary boxes and then the centroids. The image disparity between the two centroids is then used to compute the distance in the same way as classical stereo vision. Table 7 shows the results. While our method performs worse at a short depth (2.75 m), it actually works better than the classical method for the longer depths. This is because at shorter depths, the object disappears from the drone's view rather quickly, so we can only use the earlier images in the sequence, which correspond to a small baseline. For longer depths, the object stays in view of the drone's camera as the drone flies over a longer distance, thus generating a larger baseline.
We note that at the shorter distances of 5.5 and 7.25 m, the classical stereo vision method has a large depth error rate. The reason for this is because the image feature used in the computation is only the centroid of the bounding box from the pedestrian detector. There are errors in computing this centroid, and when this is divided by a small denominator (i.e., the shorter distances), it results in a larger relative error.
We further compare our method with the method proposed by Sereewattana et al. [31]. In their approach, they placed colored markers on their target and used a drone with a single camera to fly to a second location to obtain the image disparity of the colored marker, so as to estimate the depth for UAV landing control. Their experiments were conducted only in the range of 2 m. In this evaluation, we extend their method by placing a larger marker on our object so that it can be seen from further away, and we made the drone fly sideways for a distance of 15 m to capture the second image. In our method, the distance travelled by our drone after taking 40 images at 200 ms is 15.5 m, which makes the comparison fair. However, in our method we use the HOG pedestrian detector to detect the bounding box and the centroid, while in Sereewattana et al.'s method they use color thresholding to detect the centroid of their circular marker. Table 8 shows the results. The results showed that our algorithm can estimate depth with an equal or better accuracy than Sereewattana et al.'s method. As both the baselines are almost the same, the depth error rates are similar, with our method being slightly better at longer distances but poorer at shorter distances. The difference is caused by them using a circular marker, which they can extract more accurately at shorter distances but not at longer distances.

Discussion
Our algorithm has been shown to work well and have a much lower depth error rate (less than 8.0% typically) than related state-of-the-art methods. Meanwhile, there are exceptional cases where the depth accuracy of our algorithm drops. The following are possible causes of errors.

Near Constant Speed Problem
As mentioned in Section 3.4, depth ambiguity happens when both the pedestrian and the drone travel with constant speed. Thus, the drone must be in a constant state of acceleration until it takes the entire image sequence. However, we observe that at around frame 25 of a sequence (any sequence), the acceleration of the drone has dropped to a fairly small value (i.e., it is almost reaching constant velocity). Figure 19 shows the images of frames 35 to 39 when the drone moves with an almost constant speed.

Discussion
Our algorithm has been shown to work well and have a much lower depth error rate (less than 8.0% typically) than related state-of-the-art methods. Meanwhile, there are exceptional cases where the depth accuracy of our algorithm drops. The following are possible causes of errors.

Near Constant Speed Problem
As mentioned in Section 3.4, depth ambiguity happens when both the pedestrian and the drone travel with constant speed. Thus, the drone must be in a constant state of acceleration until it takes the entire image sequence. However, we observe that at around frame 25 of a sequence (any sequence), the acceleration of the drone has dropped to a fairly small value (i.e., it is almost reaching constant velocity). Figure 19 shows the images of frames 35 to 39 when the drone moves with an almost constant speed. In Figure 19, the pedestrian appears at the same location in multiple images due to the almost constant speed of the drone. Thus, it is important not to choose baseline fragments where the drone has almost reached a constant speed. This problem can be mitigated by tuning the flight parameters of the drone so that it accelerates at a slower rate, thus reducing the likelihood of attaining an almost constant speed before the end of the sequence. Alternatively, the drone can fly in a sine wave pattern (i.e., it will accelerate to almost constant speed and then begin to decelerate towards zero, and the cycle continues) so that a significant acceleration is always maintained.

Wind Interference
Wind is the most critical and typical factor which affects the accuracy of the algorithm directly. The presence of wind can change the position of the drone in its flight. A tail wind will move the drone further along its axis, and a head wind will slow it down or even move it backwards. This is significant, because it changes the baseline between the camera positions. Figure 20 shows the situation when a sudden wind affects the drone's position and attitude while it is still hovering.  In Figure 19, the pedestrian appears at the same location in multiple images due to the almost constant speed of the drone. Thus, it is important not to choose baseline fragments where the drone has almost reached a constant speed. This problem can be mitigated by tuning the flight parameters of the drone so that it accelerates at a slower rate, thus reducing the likelihood of attaining an almost constant speed before the end of the sequence. Alternatively, the drone can fly in a sine wave pattern (i.e., it will accelerate to almost constant speed and then begin to decelerate towards zero, and the cycle continues) so that a significant acceleration is always maintained.

Wind Interference
Wind is the most critical and typical factor which affects the accuracy of the algorithm directly. The presence of wind can change the position of the drone in its flight. A tail wind will move the drone further along its axis, and a head wind will slow it down or even move it backwards. This is significant, because it changes the baseline between the camera positions. Figure 20 shows the situation when a sudden wind affects the drone's position and attitude while it is still hovering.
This situation can be mitigated if the wind speed or direction is known-e.g., from a wireless anemometer. We can find the component of the wind in the direction of the drone's flight and then add or subtract it from the drone's actual speed. This will allow us to compute a multiplier for the distance travelled by the drone so as to compensate for the effect of the wind. Another alternative is to make use of positioning systems such as GPS or WiFi positioning to provide the correction needed due to the wind. However, such positioning systems can be quite slow, so this alternative will only work for very slow-moving or stationary objects.

Wind Interference
Wind is the most critical and typical factor which affects the accuracy of the algorithm directly. The presence of wind can change the position of the drone in its flight. A tail wind will move the drone further along its axis, and a head wind will slow it down or even move it backwards. This is significant, because it changes the baseline between the camera positions. Figure 20 shows the situation when a sudden wind affects the drone's position and attitude while it is still hovering.

Low Resolution
The resolution of the forward camera in AR.Drone is enough for localizing close-range pedestrians. However, for pedestrians at far distances (e.g., >50 m), the resolution is not enough for accurate pedestrian detection. In this case, the center location of the pedestrian could be wrong. Figure 21 shows the error caused by poor resolution when localizing a pedestrian. This situation can be mitigated if the wind speed or direction is known-e.g., from a wireless anemometer. We can find the component of the wind in the direction of the drone's flight and then add or subtract it from the drone's actual speed. This will allow us to compute a multiplier for the distance travelled by the drone so as to compensate for the effect of the wind. Another alternative is to make use of positioning systems such as GPS or WiFi positioning to provide the correction needed due to the wind. However, such positioning systems can be quite slow, so this alternative will only work for very slow-moving or stationary objects.

Low Resolution
The resolution of the forward camera in AR.Drone is enough for localizing close-range pedestrians. However, for pedestrians at far distances (e.g., >50 m), the resolution is not enough for accurate pedestrian detection. In this case, the center location of the pedestrian could be wrong. Figure 21 shows the error caused by poor resolution when localizing a pedestrian. To mitigate this problem, we can replace the camera on the drone with a higher resolution camera, or to use more sophisticated algorithms to detect our objects.

False Detection of the Pedestrian Detector
We used a pedestrian detector which is provided by OpenCV 2.4.10, and the accuracy of the pedestrian detector is not within our scope of research. However, the accuracy of the pedestrian detector will affect our algorithm's robustness since the disparity is dependent on the detection performance. Figure 22 shows an example of the false detection of a pedestrian.
Our experiments showed that the false detection happens more frequently when the pedestrian is further away than 11 m, or when there are other objects in the background which have long and thin vertical shapes (such as trees, road lights, and pillars). This problem can be mitigated by using more sophisticated pedestrian detectors. To mitigate this problem, we can replace the camera on the drone with a higher resolution camera, or to use more sophisticated algorithms to detect our objects.

False Detection of the Pedestrian Detector
We used a pedestrian detector which is provided by OpenCV 2.4.10, and the accuracy of the pedestrian detector is not within our scope of research. However, the accuracy of the pedestrian detector will affect our algorithm's robustness since the disparity is dependent on the detection performance. Figure 22 shows an example of the false detection of a pedestrian.

Fast Moving Objects not Moving in a Straight Line
Our solution works on the assumptions that the drone is moving much faster than the object, and that the drone's camera's frame rate is high. This allows us to model the object's motion as a straight line. For slow-moving objects (e.g., a walking pedestrian), the speed of the drone (11.11 m/s) and the frame rate (200 ms interval) is sufficient for these assumptions to hold true. For fast-moving objects that are not moving in a straight line (e.g., a cheetah chasing a prey, or a car driving in a zigzag fashion), these assumptions may not be valid and there may be significant errors in the results. However, it is not the natural tendency of such objects to move at such a high speed and in an erratic fashion for long periods of time, and they are likely to revert to slower, linear motion (e.g., when the cheetah gets tired, and the driver of a car at high speed will prefer to drive in a straight line). Since we can repeat our algorithm as often as we like, we will be able to minimize the error and obtain an accurate measurement upon the repeated application of the algorithm.

Conclusions
This paper proposed a novel system to estimate the location of an object from a single moving camera mounted on a drone. The proposed algorithm aimed to fulfil two objectives: (1) to be able to make the baseline between cameras arbitrarily large, and (2) to be able to remove the depth ambiguity caused by a moving object relative to a moving camera. We have shown in our previous work [16] that, by using three images, we are able to recover the location of an object moving in a direction parallel to the motion of the camera. In this research, we showed that, by using four images, we are able to recover the location of an object moving linearly in an arbitrary direction.
The algorithm works by instructing the drone to fly in a specific pattern with acceleration, which allows us to use a varying set of camera baselines to compute the object position, thus avoiding the depth ambiguity. To maintain generality, we do not assume that we know the class of the object (e.g., person, vehicle, farming equipment, etc.) or the size, speed, and direction of motion of the object. This allows us to apply this technique to various applications, such as locating farm personnel and equipment, mining vehicles, delivery vehicles belonging to logistics supply chains, wild animals in a prairie, etc. Our evaluation results show that our algorithm can achieve a depth-error rate of <8%, and it is equal or better in terms of localization accuracy when compared to other state-of-the-art algorithms. Our experiments showed that the false detection happens more frequently when the pedestrian is further away than 11 m, or when there are other objects in the background which have long and thin vertical shapes (such as trees, road lights, and pillars). This problem can be mitigated by using more sophisticated pedestrian detectors.

Fast Moving Objects not Moving in a Straight Line
Our solution works on the assumptions that the drone is moving much faster than the object, and that the drone's camera's frame rate is high. This allows us to model the object's motion as a straight line. For slow-moving objects (e.g., a walking pedestrian), the speed of the drone (11.11 m/s) and the frame rate (200 ms interval) is sufficient for these assumptions to hold true. For fast-moving objects that are not moving in a straight line (e.g., a cheetah chasing a prey, or a car driving in a zig-zag fashion), these assumptions may not be valid and there may be significant errors in the results. However, it is not the natural tendency of such objects to move at such a high speed and in an erratic fashion for long periods of time, and they are likely to revert to slower, linear motion (e.g., when the cheetah gets tired, and the driver of a car at high speed will prefer to drive in a straight line). Since we can repeat our algorithm as often as we like, we will be able to minimize the error and obtain an accurate measurement upon the repeated application of the algorithm.

Conclusions
This paper proposed a novel system to estimate the location of an object from a single moving camera mounted on a drone. The proposed algorithm aimed to fulfil two objectives: (1) to be able to make the baseline between cameras arbitrarily large, and (2) to be able to remove the depth ambiguity caused by a moving object relative to a moving camera. We have shown in our previous work [16] that, by using three images, we are able to recover the location of an object moving in a direction parallel to the motion of the camera. In this research, we showed that, by using four images, we are able to recover the location of an object moving linearly in an arbitrary direction.
The algorithm works by instructing the drone to fly in a specific pattern with acceleration, which allows us to use a varying set of camera baselines to compute the object position, thus avoiding the depth ambiguity. To maintain generality, we do not assume that we know the class of the object (e.g., person, vehicle, farming equipment, etc.) or the size, speed, and direction of motion of the object. This allows us to apply this technique to various applications, such as locating farm personnel and equipment, mining vehicles, delivery vehicles belonging to logistics supply chains, wild animals in a prairie, etc. Our evaluation results show that our algorithm can achieve a depth-error rate of <8%, and it is equal or better in terms of localization accuracy when compared to other state-of-the-art algorithms.
The proposed algorithm will be very useful for drones with a single camera, including lightweight or low-cost drones. The algorithm can be used in a vast number of applications, such as drone navigation, intelligent CCTV systems, pursuit missions, military missions, etc. For example, this system can be used as a mobile intelligent CCTV system for both indoors and outdoors, and has the ability to locate suspicious entities (such as in supermarkets, factories, private estates, government properties, etc.). Additionally, the algorithm can help in situations when a drone pursues a distant target, such as following a suspected criminal (and avoiding detection by the suspect and avoiding hostile actions on the drone), following endangered animal species (with stealth, avoiding close encounter), and performing military reconnaissance missions (since the system is "passive", it does not emit any signals).