Flight-Parameter-Based Motion Vector Prediction for Drone Video Compression

Şimşek, Altuğ; Öncü, Ahmet; Dündar, Günhan

doi:10.3390/drones9100720

Open AccessArticle

Flight-Parameter-Based Motion Vector Prediction for Drone Video Compression

by

Altuğ Şimşek

,

Ahmet Öncü

^*

and

Günhan Dündar

Electrical and Electronics Engineering, Boğaziçi University, 34342 İstanbul, Türkiye

^*

Author to whom correspondence should be addressed.

Drones 2025, 9(10), 720; https://doi.org/10.3390/drones9100720

Submission received: 23 August 2025 / Revised: 8 October 2025 / Accepted: 10 October 2025 / Published: 16 October 2025

Download

Browse Figures

Versions Notes

Abstract

Block-based hybrid video coders typically use inter-prediction and bidirectionally coded (B) frames to improve compression efficiency. For this purpose, they employ look-ahead buffers, perform out-of-sequence frame coding, and implement similarity search-based general-purpose algorithms for motion estimation. While effective, these methods increase computational complexity and may not suit delay-sensitive practical applications such as real-time drone video transmission. If future motion can be predicted from external metadata, encoding can be optimized with lower complexity. In this study, a mathematical model for predicting motion vectors in drone video using only flight parameters is proposed. A remote-controlled drone with a fixed downward-facing camera recorded 4K video at 50 fps during autonomous flights over a marked terrain. Four flight parameters were varied independently, altitude, horizontal speed, vertical speed, and rotational rate. OpenCV was used to detect ground markers and compute motion vectors for temporal distances of 5 and 25 frames. Polynomial surface fitting was applied to derive motion models for translational, rotational, and elevational motion, which were later combined. The model was validated using complex motion scenarios (e.g., circular, ramp, helix), yielding worst-case prediction errors of approximately −1 ± 3 and −6 ± 14 pixels at 5 and 25 frames, respectively. The results suggest that flight-aware modeling enables accurate and low-complexity motion vector prediction for drone video coding.

Keywords:

VVC; video coding; motion estimation (ME); motion vector prediction (MVP); flight parameter modeling

1. Introduction

Video is a sequence of pictures at a certain resolution that must be displayed at a certain rate. Each video frame is incrementally different from the previous one, and when displayed at a fast enough rate, the human visual system tends to merge each still picture into a single moving video. When we focus on a single frame of a video sequence, objects in this picture are defined as a 2D ordered combination of a neighboring group of pixels. Unless we are at the edge of the object, those pixels generally tend to have similar values to each other, with some gradual differences in both horizontal and vertical directions. Successive pictures in a video sequence are similar to each other (temporal redundancy), and moreover, the neighboring pixels and the nature of the change in their values within a picture are also similar to each other (spatial redundancy).

Block-based hybrid video coding technique relies on a two-pronged approach to compression: it exploits temporal redundancy through inter-frame prediction and spatial redundancy through intra-frame prediction. In this framework, video frames are divided into blocks or coding units, which are then predicted either from other blocks in the same frame (intra-prediction) or from reference blocks in previously coded frames (inter-prediction). The residual, which is the difference between the original and predicted block, is then transformed (typically using a discrete cosine or sine transform), quantized, and entropy coded [1]. This structure enables effective compression while preserving visual fidelity across a range of content types.

Introduced in 1988 for video conferencing over ISDN, MPEG-1/H.261 was the first video compression standard in 1993 using block-based motion compensation and discrete cosine transform (DCT) [2]. Standardized in 1995, its successor, MPEG-2 Video/H.262, supported higher resolutions and interlaced formats and became the dominant standard for digital television and DVD media [3]. In 1996, H.263 brought further improvements for low-bitrate applications and served as the basis for many proprietary codecs in the 1990s [4]. With the MPEG4 standard in 1998, higher video quality compression levels, basic scalability concepts, and robustness for unreliable networks were introduced [5]. A major breakthrough came with H.264/AVC, jointly standardized by ITU-T and ISO/IEC in 2003 [6]. It introduced innovations such as multiple reference frames, variable block sizes, in-loop deblocking filters, and context-adaptive binary arithmetic coding (CABAC), achieving a significant increase in compression efficiency [7]. This was followed by H.265/HEVC, standardized in 2013 [8], which introduced coding tree units (CTUs), quadtree partitioning, and advanced motion prediction tools to support 4K and 8K video, enabling 50% bitrate savings over AVC [9].

The latest evolution in this lineage is H.266/Versatile Video Coding (VVC), and it is finalized in 2022 [10]. VVC introduces highly flexible quadtree with nested multi-type tree (QTMT) partitioning, affine motion compensation, bi-directional optical flow (BDOF), decoder-side motion vector refinement (DMVR) and improved intra-prediction, offering another 50% reduction in bitrate over HEVC for the same subjective quality [11,12].

Parallel to this standardized trajectory, alternative codec families have gained prominence, driven by the need for royalty-free solutions suitable for internet-scale deployment. VP9, developed by Google, offers compression performance comparable to HEVC but with a royalty-free license model [13]. Building on VP9, the Alliance for Open Media released AV1 in 2018, a next-generation codec designed for efficiency, openness and scalability [14]. AV1 introduces constrained directional enhancement filters (CDEF), multi-symbol entropy coding, compound prediction modes and partitioning similar to VVC’s tree-based structure, offering significant compression gains over VP9 and H.265, especially at low bitrates and high resolutions [15]. AV1 is increasingly used in commercial streaming services and is seen as a viable open alternative to VVC.

Common to all standards is the definition of intra-coded pictures, or I-pictures. Intra-coding compresses each video frame independently by exploiting spatial redundancies, rather than temporal correlations between frames. When intra-coded, the values of the pixels in a block are predicted by using the values of the pixels of the neighboring blocks. Predicted pixel values of the current block can be calculated by means of mathematical functions depending on the intra-mode used. Intra-modes tend to save the directionality in the pixel distribution and tend to preserve the edges of the objects.

Inter-picture coding, also known as inter-frame prediction, is a technique in modern hybrid video compression systems, allowing encoders to exploit temporal redundancies between video frames. When a block is inter-coded, a similar block is searched within the previously encoded and reconstructed frames. The prediction is the similar block in the reference picture, and the residual is the difference between the prediction and the actual block values. The compressed signal sent to the decoder is composed of the transformed and quantized residual coefficients, motion vector, and the reference picture.

In video compression, motion estimation (ME) identifies the movement of pixels between consecutive frames, which is used to predict the content of the current frame based on reference frames. During motion estimation (ME), the encoder searches for blocks in a list of previous or future reconstructed frames that best match the current block. The best match is identified using a metric like Mean Squared Error (MSE) or Sum of Absolute Differences (SAD).

VVC builds upon traditional block-based motion-compensated prediction by introducing several advanced tools that improve prediction accuracy and coding efficiency. Motion vector merge technique allows the encoder to combine multiple motion vectors from neighboring blocks or reference frames. Affine Motion Compensation generalizes translational motion by supporting more complex motion models, including rotation, scaling and shearing. This is particularly effective for scenes with camera zooms, rotations, or non-linear object movements. VVC supports both four-parameter (simplified affine) and six-parameter (full affine) models [16,17]. The use of these advanced tools comes with a significant increase in computational complexity, particularly at the encoder side. The encoder must evaluate a much larger set of motion hypotheses and partitioning configurations, which impacts both software- and hardware-based real-time applications. As a result, recent research has focused not only on the accuracy of motion estimation algorithms but also on their complexity optimization [18,19].

The primary objective of motion estimation is to determine how blocks of pixels in the current frame correspond to blocks in the reference frame(s). This correspondence is represented by motion vectors (MVs). Motion estimation is central to the performance of modern video codecs, with block matching algorithms (BMAs) serving as the core methodology for motion estimation.

Exhaustive Search (Full Search) is the simplest form of block matching, where the algorithm searches the entire reference frame to find the best match for each block. The algorithm computes a similarity measure, such as sum of absolute differences (SAD) or sum of squared differences (SSD), between the current block and candidate blocks within a predefined search window in the reference frame [20]. Diamond search algorithm (DS) is one of the most widely used fast block matching methods. Proposed by S. Zhu et al. [21], it reduces the search area by iteratively expanding the search pattern in the shape of a diamond. The search starts with a coarse search, followed by a refinement process, which reduces the number of points checked compared to exhaustive search. Hexagon-based search (HEXBS), introduced by C. Zhu et al. [22], uses a hexagonal search pattern that reduces the number of search points by filling in the gaps between square search grids. By selecting a hexagonal grid, the algorithm effectively covers the search area with fewer points. The three-step search (TSS) algorithm, proposed by R. Li et al. [23], divides the search process into three stages, a coarse search step, a medium search step and a fine search step. In the first stage, the algorithm performs a large step search and progressively narrows the search area in subsequent steps. Four-step search (FSS) is an improvement over TSS in terms of lower computational cost and better peak signal-to-noise ratio [24].

One of the most effective techniques for improving the speed and accuracy of motion estimation is multi-resolution search. This approach, illustrated by M.J. Kim et al. [25], uses an image pyramid, which represents the video frames at multiple resolutions. The idea behind this is to perform motion estimation at the coarsest resolution and progressively refine the search at higher resolutions.

To improve the accuracy of motion estimation, modern video compression standards use fractional pixel precision during motion vector search [7,9,11]. Instead of limiting motion vectors to integer pixel values, fractional pixel precision allows motion vectors to be expressed with sub-pixel accuracy (e.g., 1/4, 1/8 or 1/16 pixel). Although this approach is computationally more expensive, it yields significant improvements in visual fidelity.

To conclude, inter-prediction is an important stage in the block-based hybrid video compression procedure. For inter-coded pictures, a reference frame and a motion vector are sent to the decoder as the prediction signal. Motion estimation (ME) is the process of selecting the best matching reference frame and determining the motion vector. The more similar the prediction signal is to the original block, the smaller the residual and thus the generated bandwidth. At the hearth of motion estimation, there are the block matching algorithms (BMAs) that look for the minimum Sum of Absolute Differences (SAD) or Mean Squared Error (MSE) cost block in a list of frames. Motion estimation does not know how video will change in the upcoming frames. Therefore, it employs generic search algorithms, and to limit the complexity, some heuristics are employed. If the ME block had the opportunity to have some hints on where to look for that best matching block, its computational complexity, especially for high-resolution content, would be dramatically reduced, and since a better match would be possible, the residual signal would be smaller, leading to increased compression efficiency.

In this paper, we conduct an experimental study to perform motion estimation during video compression that is computationally less complex, especially for high-resolution content. By analyzing the real-world video captures from a drone flying over a marked terrain at fixed flight parameters imposed by the auto-pilot application, polynomial surface fit-based mathematical models based on drone flight parameters are constructed for the motion vectors of the objects in the video frames, and the success rate of this model is evaluated for two different temporal distance values.

The rest of this paper is organized as follows: Section 2 provides the problem definition and the aim of this work. Section 3 provides an overview of similar work available in the literature. In Section 4, we define the methodology used to build our motion model and in Section 5, we evaluate the performance of it. Finally, Section 6 concludes the paper.

2. Problem Definition

The aim of this study is to build a mathematical model to use in the motion estimation stage of the compression of the high-resolution video taken from a camera onboard a drone. As the input, the model will take the current flight parameters of the drone, the block position on the image, and the temporal distance for which the motion vectors are to be calculated, and as the output, it will produce the two-dimensional motion vector for that block for that temporal distance:

M V = f (H, V, Z, R, X, Y, T D)

(1)

where the parameters of the function are defined as

H

→ flight level (altitude), (m)

V

→ speed over ground, (m/s)

Z

→ vertical speed, (ascending/descending rate), (m/s)

R

→ rotation rate (yaw), (degrees/s)

X, Y

→ block position on picture, (normalized pixel coordinates in range [0,1))

T D

→ temporal distance for motion vector calculation, (number of frames)

This mathematical model will be used in the motion estimation stage of video coding as illustrated in Figure 1.

Currently, there is no data flow present between the avionics systems and the video encoder onboard the drone. The avionics system knows the instantaneous flight parameters of the drone, and should there be a path between the avionics systems and the video encoder onboard, those data can be streamed in a real-time fashion for the encoder to use during video compression.

The video encoder onboard the drone will use the flight data from the avionics systems and the motion model in (1) to estimate the global motion inherent in the video frames due to the movement of the drone itself. This way, instead of using computationally complex block matching algorithms (BMAs) in a limited range, we may produce more accurate and computationally cheap motion vectors during the motion estimation stage of the compression of the video taken from the camera onboard the drone.

Additionally, the motion model in (1) can be used to generate predicted frames in the video decoder section of the encoder. Not only the regenerated frames but also the predicted frames can be inserted into the decoded picture buffer, as illustrated in Figure 1. As this leads to smaller or possibly zero motion vectors, we may end up with increased compression efficiency. There are similar works in the literature, and the motion model in (1) may be used during frame prediction.

In the scope of this work, the motion model will be generated, and its precision will be tested with real world drone camera recordings. This motion model is intended to be used during the motion estimation stage of the inter-picture coding process of drone video. In order to maintain the focus of the study, we limit our consideration to model generation and validation. The impact of using such a motion-model-assisted motion estimation instead of the generic algorithms already available in the reference software of H.266/VVC will be the scope of another study.

3. Related Work

The process of estimating the characteristics of motion in a video sequence caused by the motion of the camera itself is called global motion estimation (GME). Academic work starts with H.264/AVC and is grouped in two classes: image-based GME examines the image sequence captured by the camera and sensor-assisted GME takes advantage of the position and behavior information of sensors, such as accelerometers, GPS devices, digital compasses, and rate gyros. Motion estimation using the sensor data can only be inaccurate due to accumulated errors and technical equipment imperfections, especially in low-cost UAVs, and data from external sensors, like GPS, may be unavailable at certain times. On the other hand, image-based GME requires more computational resources since it incorporates image processing techniques.

In ref. [26], Okade et al. proposed the use of discrete wavelet transform in the block motion vector field to estimate the global motion parameters in the compressed domain. By applying wavelet transform to the block motion vector field, the motion vectors are decomposed into wavelet sub-bands. Their experiments show that using only the LL sub-band coefficients for GME is enough to obtain a fair estimate of the camera motion parameters, while gaining significant computational savings.

In ref. [27], Amirpour et al. proposed a new class of prediction algorithms based on region prediction. Fixed-pattern algorithms divide the search window into four regions and select certain blocks from each region, calculating the SAD score for each of them. In most cases, the motion vectors of neighboring blocks are well correlated and can be used to predict and eliminate the unnecessary blocks. The proposed new class of algorithms can be applied in conventional fixed-pattern algorithms to predict the region in which the best matched block is located, thereby increasing the speed of the algorithm.

Those methods can be considered as image-based GME.

In ref. [28], Bhaskaranand et al. proposed a novel low-complexity encoder suited for moderate-to-high frame rate UAV video coding that uses global motion compensation for frame prediction. By using the data derived from inertial navigation sensors (INS) and/or global positioning sensors (GPS), the global motion information is specified using the homography transformation with eight parameters per frame and compensates the motion in the entire frame, unlike the block motion-based prediction used in mainstream video codecs, where the motion for each block can be specified separately using a motion vector. In this study, the highly complex block motion estimation engine of the H.264 encoder is replaced with the relatively simpler global motion compensation and does not need to transmit block MVs.

In ref. [29], Li et al. tried to improve GME precision under large-scale motion, reduce the dependence image information to adapt to different landforms, and enhance the adaptability of GME models to different UAVs. They employed a medium-altitude UAV to collect experimental data with translation, rotation, and zooming motion. With the data collected, they tested and evaluated the performance of their GME methods.

In ref. [30], Mi et al. proposed a sensor-assisted global motion estimation algorithm to calculate perspective transformation model and global motion vectors, which are used in both the inter-frame coding to improve the coding efficiency and intra-frame coding to reduce block search complexity. The proposed encoder takes the video stream and corresponding sensor log as inputs. The perspective transformation model (homography matrix) between the image coordinate system and the ground coordinate system is computed and updated once new frame and sensor data arrive. The proposed method completely relies on sensor information provided by the UAV system. A frame motion monitor is used in order to determine whether large-scale motion exists between two adjacent frames; if not, the encoding process could be skipped by transmitting a nine-element homography matrix instead. Otherwise, a block matching process with a fast motion vector predictor (MVP) candidate list is executed to locate the best local MV. The authors conducted comprehensive simulation experiments on HEVC reference software HM-16.10, and they state that proposed method can achieve faster block search by 50% to 60% speedup and lower bitrate by 15% to 30%, compared with the standard configuration.

A similar work was carried out in [31] by Jakov et al. On the remote controller of an UAV, user joystick gestures are captured, and the diamond search algorithm [21] during the motion estimation stage of the Kvazaar HEVC encoder [32] was modified accordingly. Depending on the direction and the strength of the joystick input, some of the directions in the diamond search algorithm were skipped.

The proposed method ends up with a complexity reduction in motion estimation. The authors carried out tests with both simulator and real drone footage. For the simulator sequences, the authors claim to achieve 32% better search times, whereas for the real drone footage they claim 4.25 times faster search performance. For all test scenarios, encoded video sizes (compression efficiency) and the PSNR values between the original and the reconstructed pictures (compression quality) were the same for modified and unmodified diamond search strategies.

In this study, only eight basic motion modes, i.e., forward, backward, to the left, to the right, upward, downward, rotation to the left, and rotation to the right motion modes are implemented. The combinations of multiple modes, such as forward and downward motion, were not implemented. This way, instead of a complete motion model, a specific motion model based on a set of specific user joystick input gestures was built up. Instead of directly calculating the motion vectors, the output of the model was used to modify the diamond search algorithm, which in turn is used to calculate the motion vectors.

Those methods can be considered as sensor-assisted GME.

There is an obvious need for speeding up inter-picture coding of modern video codecs, especially for VVC. The following work implements either conventional or deep learning methods to take global motion information into consideration and use it to reduce computational complexity and increase compression efficiency.

In ref. [33], Xiao et al. presented a new sensor-augmented system that generates ABR video streaming methodology based on deep reinforcement learning, which aims at obtaining optimal bitrate selection strategies under varying UAV channel conditions. The goal is to maximize the video quality of experience. Apart from the state information of past throughput experience and video playback, the inherent sensor data, including GPS coordinates, acceleration, and velocity info are also fed to the neural network. After the training process, the proposed model can automatically adapt to the throughput dynamics and make optimal bitrate decisions for the next video chunks. The authors claim that they achieved up to a 21.4% gain in the average QoE reward over the best-known existing ABR algorithm.

In ref. [34], Wei proposed a motion-aware reference frame generation algorithm to enhance inter-prediction in VVC. The reconstructed frames are fed into a trained motion-aware frame interpolation network to generate deep reference frames that match the visual perception characteristics of the human eye and have higher temporal correlation with the currently encoded frame. The deep reference frame is inserted into the reference frame list to provide a more reliable reference for subsequent motion estimation and motion compensation. Their experimental results show that the images generated by the reference frame generation network have clearer motion boundaries and are more in line with the visual perception characteristics of the human eye, and the coding efficiency for luma is improved by 1.56% BD-rate when integrated into the VVC H.266 standard reference software.

In ref. [35], Jia et al. proposed a deep reference frame generation method to optimize the inter-prediction in VVC. Reconstructed frames are sent to a well-designed frame generation network that employs optical flow to predict motion precisely, in order to synthesize a picture similar to the current encoding frame. The synthesized picture serves as an additional reference frame inserted into the reference picture list (RPL) to provide a more reliable reference for subsequent motion estimation (ME) and motion compensation (MC). The proposed method achieves 3.6–5.2% coding efficiency improvements when implemented in VVC reference software.

In ref. [36], Cheng et al. proposed a motion-information-based three-dimensional (3D) video coding method. The global motion information of the camera is obtained from the associated sensors to assist the encoder in improving its rate-distortion performance by projecting the temporal neighboring texture and depth frames into the position of the current frame. The projected frames are then added into the reference buffer list as virtual reference frames. Since they could be more similar to the current frame than the conventional reference frames, the required bits to represent the residual is reduced.

To conclude, there are several methods to integrate global motion information into inter-picture coding to reduce its computational complexity and increase compression efficiency. GME-related work addresses lower resolutions and previous generation video compression standards. Work related to the latest video standard (VVC) is based on deep learning methods generating reference frames. In this paper, we propose a novel, experimental, data-driven, both image-based and sensor-assisted, hybrid GME model for drone-captured video to be used for higher resolutions, with better modeling of the optical characteristics of the camera and the physical characteristics of the drone itself.

4. Methodology

4.1. Algorithm Specification

To develop the intended motion model, a drone equipped with a high-resolution camera and auto-pilot capabilities was selected. The terrain over which the drone would fly was marked with easily distinguishable visual markers. Three basic motion types were defined, translational, rotational, and elevational (ascending/descending). The final complex motion was defined as the superposition of these three basic motion types.

For each basic motion type, multiple sets of flight parameters were selected. Corresponding flight plans were generated within the auto-pilot application and validated using its built-in simulation environment. The drone was then autonomously flown over the marked terrain multiple times, following these predefined flight plans. During each flight, video footage was recorded using the onboard camera.

Each recorded video was processed using a custom application developed with the OpenCV library. The positions of the markers in each video frame were detected, and their motion vectors were computed for temporal intervals of 5 and 25 frames. This formed the dataset used to define the motion model.

Temporal intervals of 5 and 25 frames were specifically selected since they are widely used for real-world streaming GOP sequences. For 4K or 8K content at 50 or 100 fps, a GOP size of 25 or multiples of 25 frames are utilized: GOP structure starts with an I-frame and P-frames are utilized at regular intervals, say at every two to four B-frames between them. This is a closed GOP structure and is generally used for streaming applications. In such a GOP structure, the coding sequence is generally organized as I, B, B, B, P, B, B, B, P…, and the number of B-coded frames in between two I- or P-coded frames generally varies from two to four. TD = 5 is selected to easily detect MV between two I- or P- frames, and smaller temporal distances can be found by interpolation.

For every basic motion type and associated set of flight parameters, a polynomial surface fitting was performed for the motion vector components MV_x and MV_y, relative to normalized coordinates within the video frame. Subsequently, for each coefficient obtained from this fitting process, a secondary polynomial surface fit was carried out with respect to the corresponding flight parameters. As a result, two mathematical motion models (corresponding to 5- and 25-frame intervals) were derived for each of the three basic motion types.

To evaluate the accuracy of the basic motion models, motion vectors for each marker in every video frame were computed using the constructed models and compared against the actual vectors from the dataset. The resulting differences were considered as the errors introduced by the models. Error distributions for MV_x and MV_y components were analyzed independently, and their mean and standard deviation values were computed. This evaluation procedure was repeated for all basic motion types across both temporal intervals.

Upon completion of the basic motion model construction and corresponding error analysis, a final complex motion model was established as the vectorial summation (superposition) of the three basic motion models.

To assess the accuracy of the complex motion model, additional flight plans were created involving combined motion types: ramp (translation + elevation), orbit (translation + rotation), and helix (translation + rotation + elevation). The drone was flown over the same marked terrain under auto-pilot control using these plans, and additional test videos were obtained.

For each frame in the test videos, motion vectors of the markers were calculated using both the OpenCV-based application and previously developed complex motion model. The discrepancies between these vectors were considered as errors introduced by the complex model. Histograms of the error in MV_x and MV_y components were generated, and their statistical properties (mean and standard deviation) were evaluated for both 5- and 25-frame temporal intervals.

To conclude, a polynomial surface fit-based mathematical motion model was developed for estimating motion vectors from drone-captured video by using the flight data of the drone.

4.2. Terrain and Markers

To capture the test videos, a soccer field is marked with clearly distinguishable markers in a grid fashion. The purpose of markers is to obtain objective and quantitative data on how the video changes throughout the video frames. For this purpose, 120 pieces of training bowls with white color and a radius of 22 cm are used. Figure 2 illustrates the marked test field. White training bowls on green grass are easily detectable with image processing tools.

4.3. Drone and Auto-Pilot Application

As the drone, a DJI Mini 3 Pro with an RC-N1 remote controller is used. The onboard camera is adjusted to capture 100Mbit variable bitrate H265/HEVC video at a resolution of 4K (3840 × 2160 pixels) at 50 frames per second. That way, artifacts due to video compression at the drome camera are avoided. The gimbal is kept fixed at −90 degrees with respect to the horizon, which gives us a full downward-facing camera. The gimbal position with respect to the horizon can also be a parameter of the motion model we studied in this paper, but it is ignored for now. Figure 3a illustrates the drone used.

Currently, there is no link between the drone avionics system and the video encoder, therefore the encoder is unable to receive flight parameters. To simulate that link, an auto-pilot application is used, and consistent flight parameters are imposed.

In the auto-pilot application, multiple flight plans over the marked terrain are created with different flight parameters, and while recording the videos, the drone is flown over the marked terrain under the control of the auto-pilot application. For the auto-pilot application, DroneLink, available at https://dronelink.com, is used. Its application web interface is shown in Figure 3b. Flight plans for every type of motion over the marked terrain are prepared, flight parameters are enforced, and a software flight simulator is used to make sure that the drone will follow the imposed flight parameters. After the flight plans are ready and validated on the simulator, the drone is physically flown over the marked terrain under auto-pilot control, and the video files are captured.

4.4. OpenCV Processing of Recorded Video and Use of MATLAB Tools

To process the recorded videos and quantitatively detect the positions of the markers on video frames, an application using OpenCV version 4.10 is developed. This application processes each and every frame of the input video, detects markers on each frame, draws the trajectories, and calculates the motion vectors of each marker for the temporal distances of 5 and 25 frames.

In order to detect the markers on the video frames, Canny edge detection, dilation, erosion, and blob detection algorithms were used. Since the marker sizes in the captured video tend to get smaller as the flight level the video is recorded at increases, the detection parameters are varied accordingly.

During the processing of video frames with OpenCV tools, only the luma component is used, and the chroma is discarded. Figure 4 illustrates the processing stages of a sample frame captured at flight level H = 10 m.

During the processing of the captured videos, the initial seconds that contain the acceleration of the drone to the requested flight parameters and the final seconds that contain the deceleration to a stationary state are discarded. Only the steady state parts of the video with the requested flight parameters are processed. A total of 154 video files with 91 GB size are recorded and processed that way.

Afterwards, for the motion vectors of the markers extracted from the recorded video files, two-dimensional polynomial surface fitting is implemented by using the MATLAB R2024b environment. The fit() function with poly11, poly22 or poly33 settings are used for this purpose.

Finally, during the performance evaluation of the constructed motion models, the histogram of the error values is drawn. For this purpose, histogram(), mean() and std() functions are used, and the graphics produced are included in this paper.

4.5. Modeling of Basic Translational Motion

In order to independently model the basic translational motion of the drone, 90 video files are captured and processed at all combinations of

H

= {5, 10, 15, 20, 25, 30, 40} (flight level, m) and

V

= {1, 2, 3, 4, 5, 6} (translational velocity, m/s).

As illustrated in Figure 5, the flight plan over the marked terrain is prepared and verified. The drone repeatedly flies over the marked terrain at different flight levels with a fixed translational velocity. The flight plan is repeated for all available translational velocity settings. During the straight forward paths, the video coming from the downward-facing camera onboard the drone is captured.

After all the flights are complete, a set of 90 video files is processed using the analysis software written by using OpenCV 4.10, markers are detected, marker trajectories are visualized, and the motion vectors corresponding to the markers are calculated.

Figure 6a illustrates an example for how the marker trajectories (orange lines) are visualized for the basic translational motion type. The video used as an example here is shot at flight level

H

= 25 m with a translational velocity of

V

= 3 m/s. Since the drone is going straight forward, markers in the video are translating downwards.

By using the positions of the markers on each frame, motion vectors corresponding to every marker at every frame are calculated for two different temporal distances. Figure 6b illustrates the motion vectors for frame #648 of the example video file for a temporal distance of TD = 5 frames, whereas Figure 6c illustrates the motion vectors for TD = 25.

For each and every video frame of the video file, detected motion vectors for a fixed temporal distance value are dumped as a CSV file in the format illustrated in Table 1.

(X_a,Y_a) are the absolute and (X_n,Y_n) are the normalized coordinates of the motion vectors. In reality, these correspond to the marker location on the video frame. Normalized coordinates are derived by dividing the absolute coordinates by the resolution of the video frame and vary in the range [0, 1). (MV_x,MV_y) denotes the X and Y components of the calculated motion vector in pixels. H denotes the flight level.

This file is then analyzed by MATLAB R2024b tools, and two-dimensional surface fits are performed for the motion vector components. To illustrate the dataset used to model MV_x, corresponding data in Table 1 is marked with a blue background. Both poly11 and poly22 type fits are performed. As a result, for this flight level and translational speed setting, a polynomial model for the motion vectors for that temporal distance value is constructed as in the following equation:

M V = p_{00} + p_{10} \cdot x + p_{01} \cdot y + p_{20} \cdot x^{2} + p_{11} \cdot x \cdot y + p_{02} \cdot y^{2}

(2)

(x, y)

are the normalized coordinates of the location of the object on the video frame in the range [0, 1), and

M V

is the 2D motion vector for that temporal distance value. The polynomial coefficients are also 2D vectors.

Figure 7 illustrates the surface fits for the X- and Y-components of the motion vectors for the example video file for a temporal distance of 5 frames. In this illustration, OpenCV detected motion vector components are illustrated as the blue-white grid, poly11 fit is illustrated as the planar yellow-black surface, and the poly22 fit is illustrated as the curved green surface. Table 2 illustrates the polynomial fit parameters for each MV component.

This procedure is repeated for every translational motion test video captured at different flight levels (

H

) and translational velocities (

V

) for the same temporal distance setting. That way, a coefficient set for poly11 and poly22 fits are obtained for the translational motion vectors with respect to the normalized object coordinates on the video frame. Due to the fact that non-linear behavior is introduced by poly22 fits, poly11 fits are selected.

As illustrated in Table 3, poly11 fit coefficients for different flight parameters are grouped together and dumped to a new CSV file. In other words, polynomial surface fit coefficients calculated for all specific flight parameters are accumulated in this new file. The first two columns are the flight parameters, following three columns are the poly11 surface fit coefficients of the X-component, and the final three columns are the poly11 surface fit coefficients of the Y-component of the motion vectors.

To illustrate this process, the flight parameters shown by orange cells, poly11 fit coefficients of MV_x shown by blue cells, and poly11 fit coefficients of MV_y shown by green cells in Table 2 are copied to the corresponding locations in the CSV file, as shown in Table 3.

Our objective at this stage is to construct secondary models for poly11 surface fit parameters

{{M V}_{x} P}_{00}

,

{{M V}_{x} P}_{10}

,

{{M V}_{x} P}_{01}

,

{{M V}_{y} P}_{00}

,

{{M V}_{y} P}_{10}

, and

{{M V}_{y} P}_{01}

in Table 3 in terms of the flight parameters

H

and

V

. For each of the six poly11 surface fit parameters listed above, poly11-, poly22-, and poly33-based secondary surface fits with respect to the flight parameters of

H

and

V

are performed and polynomial models as in (5) are defined.

To explain this process further, by using the data illustrated in Table 4, the surface fits for the

{{M V}_{x} P}_{00}

and

{{M V}_{y} P}_{00}

coefficients illustrated in Figure 8 are obtained. In Table 4, grey cells illustrate the data used to derive the model for

{{M V}_{x} P}_{00}

and the italic cells illustrate the data used to derive the model for

{{M V}_{y} P}_{00}

fit parameter.

In Figure 8, X and Y coordinates are the flight parameters for the translational motion, and the Z coordinate is the

P_{00}

coefficient for the X- and Y-components of the motion vector. Coefficient data derived at the preceding step is illustrated by using the blue-white grid structure, and the poly33 fit is illustrated by using the dark blue curved surface. This procedure is repeated for the

{{M V}_{x} P}_{00}

,

{{M V}_{x} P}_{10}

,

{{M V}_{x} P}_{01}

,

{{M V}_{y} P}_{00}

,

{{M V}_{y} P}_{10}

, and

{{M V}_{y} P}_{01}

fit coefficients listed in Table 3, and poly33 fit is preferred. The motion model obtained for the translational motion is in the form of (3) and (4)

{M V}_{x} (H, V, X, Y) = {{M V}_{x} P}_{00} (H, V) + {{M V}_{x} P}_{10} (H, V) \cdot X + {{M V}_{x} P}_{01} (H, V) \cdot Y

(3)

{M V}_{y} (H, V, X, Y) = {{M V}_{y} P}_{00} (H, V) + {{M V}_{y} P}_{10} (H, V) \cdot X + {{M V}_{y} P}_{01} (H, V) \cdot Y

(4)

where each coefficient is modeled as a general third order polynomial function of flight parameters

H

(flight level) and

V

(translational velocity) in the form of (5).

\begin{array}{l} {M V}_{x | y} P_{00 | 10 | 01} (H, V) = & p_{00} + \\ p_{10} \cdot H + p_{01} \cdot V + \\ p_{20} \cdot H^{2} + p_{11} \cdot H \cdot V + p_{02} \cdot V^{2} + \\ p_{30} \cdot H^{3} + p_{21} \cdot H^{2} \cdot V + p_{12} \cdot H \cdot V^{2} + p_{03} \cdot V^{3} \end{array}

(5)

For every

{{M V}_{x | y} P}_{00 | 10 | 01}

parameter, there is a different set of [

p_{00}

,

p_{10}

,

p_{01}

,

p_{20}

,

p_{11}

,

p_{02}

,

p_{30}

,

p_{21}

,

p_{12}

,

p_{03}

] third order surface fit coefficients. Six different

{{M V}_{x | y} P}_{00 | 10 | 01}

parameters, each with a ten-element coefficient set, give us a sixty-parameter motion model for the temporal distance of TD = 5 frames.

This procedure is run for two different temporal distances: 5 and 25 frames. Two different basic translational motion models for the motion vectors are constructed, as explained in the preceding paragraphs. The coefficient set for TD = 5 is listed in Table 5, and the coefficient set for TD = 25 is listed in Table 6.

4.6. Modeling of Basic Rotational Motion

In order to independently model the basic rotational motion of the drone, 40 video files are captured and processed at all combinations of

H

= {5, 10, 15, 20, 25, 30, 40} (flight level, m) and

R

= {20, 30, 40, 50, 60} (rotational rate, deg/s).

As illustrated in Figure 9, the flight plan over the marked terrain is prepared and verified. The drone repeatedly flies over the marked terrain at different flight levels, with a fixed rotational rate along its Z-axis.

It is elevated to the desired flight level, settles there, starts rotating around in a clockwise direction at the desired rate, stops when a predetermined time elapses, elevates to the next flight level and rotates again, and so forth. During the rotation motion along its Z-axis, the flight level is kept constant, and the video coming from the downward-facing camera onboard the drone is captured.

After all the flights are complete, a set of 40 video files is processed using the analysis software written by using OpenCV, acceleration and deceleration transients at the beginning and end of the video files are discarded, markers are detected at the steady state rotation phase, marker trajectories are visualized, and the motion vectors corresponding to the markers are calculated.

Figure 10a illustrates an example of how the marker trajectories are visualized for the basic rotational motion type. The video used as an example here is shot at flight level

H

= 20 m and a rotational rate of

R

= 30 degrees/s. Since the drone is rotating along its Z-axis, the markers in the video are in a circular motion. The orange lines are the trajectories of the markers.

By using the positions of the markers on each frame, motion vectors corresponding to every marker at every frame are calculated for two different temporal distances. Figure 10b illustrates the motion vectors for the frame #261 of the example video file for the temporal distance of TD = 5 frames, whereas Figure 10c illustrates them for TD = 25 frames.

For each and every video frame of the video file, detected motion vectors for a fixed temporal distance value are dumped as a CSV file in the format illustrated in Table 7.

(X_a,Y_a) are the absolute and (X_n,Y_n) are the normalized coordinates of the motion vector and they vary in the range [0,1). (MV_x,MV_y) denotes the X and Y components of the calculated motion vector in pixels. H denotes the flight level. It is not used for the analysis and modeling of the translational and rotational motion, only for the analysis of elevational (ascending/descending) motion.

This file is then analyzed by MATLAB tools, and two-dimensional surface fits are performed for the motion vector components. To illustrate the dataset used to model MV_x, the corresponding data in Table 7 is marked with a blue background. Both poly11 and poly22 type fits are performed. As a result, for this flight level and rotation rate setting, a polynomial model for the motion vectors for that temporal distance value is constructed as in (2).

Here,

(x, y)

are the normalized coordinates of the location of the object on the video frame in the range [0, 1), and

M V

is the 2D motion vector for that temporal distance value. The polynomial coefficients are also 2D vectors.

Figure 11 illustrates the surface fit for the X- and Y-component of the motion vectors for the example video file for a temporal distance of 5 frames. In this illustration, OpenCV-detected motion vector component values are illustrated as the blue-white grid, poly11 fit is illustrated as the planar yellow-black surface, and the poly22 fit is illustrated as the curved green surface. Table 8 illustrates the polynomial fit parameters for each MV component.

As can be observed from Figure 11, the fits are mostly planar, not curved. Also, it can be observed from Table 8 that the fit coefficients [

p_{00}, p_{10}, p_{01}

] of poly11 and poly22 type fits are nearly identical, and the other coefficients of poly22 fit are very small compared to them. This shows us that the difference between the poly11 and poly22 fits will be incremental.

This procedure is repeated for every rotational motion test video captured at different flight levels (

H

) and rotational rates (

R

) for the same temporal distance setting. That way, a coefficient set for poly11 and poly22 fits are obtained for the rotational motion vectors with respect to the normalized object coordinates on the video frame.

As illustrated in Table 9, poly11 fit coefficients for different flight parameters are grouped together and dumped to a new CSV file. The first two columns are the flight parameters, the following three columns are the poly11 surface fit coefficients of the X-component, and the final three columns are the poly11 surface fit coefficients of the Y-component of the motion vectors.

To illustrate this process, the flight parameters shown by orange cells, the poly11 surface fit coefficients of the MV_x component shown by blue cells, and the poly11 surface fit coefficients of the MV_y component shown by the green cells in Table 8 are copied to the corresponding locations in the new CSV file, as illustrated in Table 9.

Additionally, as illustrated in Table 10, poly22 fit coefficients for different flight parameters are also grouped together and dumped to another CSV file. The first two columns are the flight parameters, the following six columns are the poly22 surface fit coefficients of the X-component, and the final six columns are the poly22 surface fit coefficients of the Y-component of the motion vectors.

To illustrate this process, the flight parameters shown by orange cells, the poly22 surface fit coefficients of the MV_x component shown by blue cells, and the poly22 surface fit coefficients of the MV_y component shown by the green cells in Table 8 are copied to the corresponding locations in the new CSV file, as illustrated in Table 10.

Our objective at this stage is to construct secondary models for six poly11 surface fit parameters

- {{M V}_{x} P}_{00}

,

{{M V}_{x} P}_{10}

,

{{M V}_{x} P}_{01}

,

{{M V}_{y} P}_{00}

,

{{M V}_{y} P}_{10}

, and

{{M V}_{y} P}_{01}

—in Table 9 and twelve poly22 surface fit parameters

- {{M V}_{x} P}_{00}

,

{{M V}_{x} P}_{10}

,

{{M V}_{x} P}_{01}

,

{{M V}_{x} P}_{20}

,

{{M V}_{x} P}_{11}

,

{{M V}_{x} P}_{02}

,

{{M V}_{y} P}_{00}

,

{{M V}_{y} P}_{10}

,

{{M V}_{y} P}_{01}

,

{{M V}_{y} P}_{20}

,

{{M V}_{y} P}_{11}

, and

{{M V}_{y} P}_{02}

—in Table 10 in terms of the flight parameters

H

and

R

. Therefore, for each and every one of those parameters, poly11, poly22, and poly33 surface fits with respect to the flight parameters of

H

and

R

are performed. This is again performed for the same temporal distance.

To explain this process further, the data marked with italic numbers in Table 9 is used to construct the surface fits for the

{{M V}_{x} P}_{00}

coefficient of the poly11 fit model, and the data marked with italic numbers in Table 10 is used to construct the surface fits for the

{{M V}_{x} P}_{00}

coefficient of the poly22 fit model. All coefficients are surface fit for both the poly11 fit and the poly22 fit model, and the performance of both models are compared.

Figure 12 illustrates the poly33 surface fit for the

{{M V}_{x} P}_{00}

and

{{M V}_{y} P}_{00}

coefficients of the poly22 fit model. Please note that the X and Y coordinates are the flight parameters for the rotational motion, and the Z coordinate is the

P_{00}

coefficient for the X- and Y-components of the motion vector. Coefficient data derived at the preceding step is illustrated by using the blue-white grid structure, and the poly33 fit is illustrated by using the dark blue curved surface. Note that the fitting surfaces are almost planar.

The motion model obtained for the basic rotational motion in the case of poly11 fit is in the following form:

{M V}_{x} (H, R, X, Y) = {{M V}_{x} P}_{00} (H, R) + {{M V}_{x} P}_{10} (H, R) \cdot X + {{M V}_{x} P}_{01} (H, R) \cdot Y

(6)

{M V}_{y} (H, R, X, Y) = {{M V}_{y} P}_{00} (H, R) + {{M V}_{y} P}_{10} (H, R) \cdot X + {{M V}_{y} P}_{01} (H, R) \cdot Y

(7)

whereas in the case of poly22 fit, it is in the following form:

\begin{matrix} {M V}_{x} (H, R, X, Y) = & {{M V}_{x} P}_{00} (H, R) + \\ {{M V}_{x} P}_{10} (H, R) \cdot X + {{M V}_{x} P}_{01} (H, R) \cdot Y + \\ {{M V}_{x} P}_{20} (H, R) \cdot X^{2} {{+ M V}_{x} P}_{11} (H, R) \cdot X \cdot Y {{+ M V}_{x} P}_{02} (H, R) \cdot Y^{2} \end{matrix}

(8)

\begin{matrix} {M V}_{y} (H, R, X, Y) = & {{M V}_{y} P}_{00} (H, R) + \\ {{M V}_{y} P}_{10} (H, R) \cdot X + {{M V}_{y} P}_{01} (H, R) \cdot Y + \\ {{M V}_{y} P}_{20} (H, R) \cdot X^{2} {{+ M V}_{y} P}_{11} (H, R) \cdot X \cdot Y + {{M V}_{y} P}_{02} (H, R) \cdot Y^{2} \end{matrix}

(9)

where each coefficient is modeled as a general third order polynomial function of flight parameters

H

(flight level) and

R

(rotational rate) in the following form:

\begin{matrix} {{M V}_{x | y} P}_{00 |10| 01 |20| 11 | 02} (H, R) = & p_{00} + \\ p_{10} \cdot H + p_{01} \cdot R + \\ p_{20} \cdot H^{2} + p_{11} \cdot H \cdot R + p_{02} \cdot R^{2} + \\ p_{30} \cdot H^{3} + p_{21} \cdot H^{2} \cdot R + p_{12} \cdot H \cdot R^{2} + p_{03} \cdot R^{3} \end{matrix}

(10)

For every

{{M V}_{x | y} P}_{00 |10| 01 |20| 11 | 02}

parameter, there is a different set of [

p_{00}

,

p_{10}

,

p_{01}

,

p_{20}

,

p_{11}

,

p_{02}

,

p_{30}

,

p_{21}

,

p_{12}

,

p_{03}

] third order surface fit coefficients. For the poly11 fit-based motion model, there are 6 different

{{M V}_{x | y} P}_{00 | 10 | 01}

parameters, each with a ten-element coefficient set, giving us a 60-parameter motion model, whereas for the poly22 fit-based motion model, there are twelve different

{{M V}_{x | y} P}_{00 |10| 01 |20| 11 | 02}

parameters, each with a ten-element coefficient set, giving us a 120-parameter motion model.

The coefficients of poly11 and poly22 fits are so close that the difference between them is incremental. The success rates of both models are studied, and no meaningful difference between them is observed. For the temporal distance TD = 5 model, the poly11 fit-based rotational motion model is selected, whereas for temporal distance TD = 25 model, since the motion vectors are longer, the poly22 fit-based rotational motion model is selected, for higher precision.

This procedure is run for two different temporal distances, 5 and 25 frames, and two different basic rotational motion models for the motion vectors are constructed, as explained in the preceding paragraphs. The coefficient set for TD = 5 is listed in Table 11, but the coefficient set for TD = 25 is omitted due to space concerns. Please note that, for every

{{M V}_{x | y} P}_{00 | 10 | 01}

coefficient, the first three fit parameters [

p_{00}

,

p_{10}

,

p_{01}

] are larger than the rest of the parameters, which gives flat surfaces.

4.7. Modeling of Basic Elevational (Ascending/Descending) Motion

In order to model the basic elevational motion, 10 video files are captured and processed for

Z

= {−3, −2, −1, 1, 2, 3} (ascend/descend rate, m/s). As illustrated in Figure 13, the flight plan over the marked terrain is prepared and verified in the simulator environment.

The drone starts at ground level, ascends to a 45 m flight level with a fixed elevational rate, settles there, starts descending with the same rate, and finally lands on the ground. During the elevational motion along its Z-axis, the video coming from the downward-facing camera is captured. This sequence is repeated for different elevational rates as stated.

After all the flights are complete, a set of 10 video files is processed using the analysis software written by using OpenCV, acceleration and deceleration transients at the beginning and end of the video files are clipped, markers are detected at the steady state ascending/descending phase, marker trajectories are visualized, and the motion vectors corresponding to the markers are calculated.

Figure 14a illustrates an example of how the marker trajectories are visualized for the elevational motion type. The video used as an example here is shot with an elevational rate of

Z

= −3 m/s. The video frame in the figure corresponds to flight level

H

= 30 m. Since the drone is descending along its Z-axis, the markers in the video are moving away from the center towards the periphery. The orange lines are the trajectories of the markers.

By using the positions of the markers on each frame, motion vectors corresponding to every marker at every frame are calculated for two different temporal distances. Figure 14b illustrates the motion vectors for frame #159 of the example video file for the temporal distance TD = 5 frames, whereas Figure 14c illustrates the motion vectors for the temporal distance TD = 25 frames. The flight level

H

= 30 m for those samples.

Exactly the same procedure applied for the modeling of the basic rotational motion is also applied for the basic elevational motion. The motion vectors for the markers on the video frames are extracted, and 2D surface fits with respect to the normalized object coordinates are applied.

At this stage, poly11 and poly22 fits are utilized. Afterwards, poly33 fits to those surface fit coefficients with respect to the flight parameters are carried out. For the temporal distance TD = 5 model, the poly11 fit-based elevational motion model is selected, whereas for temporal distance TD = 25 model, since the motion vectors are longer, the poly22 fit-based elevational motion model is selected, for higher precision. This procedure is run for two different temporal distances, 5 and 25 frames, and two different basic elevational motion models are constructed. The coefficient set for TD = 5 is listed in Table 12.

4.8. Final Complex Motion Model

The final complex motion model is the vectoral addition, or in other words, the superposition of the individual basic motion models derived in the previous sections. It can be stated mathematically as in (11).

M V (H, V, Z, R, X, Y, T D) = \{\begin{matrix} \begin{matrix} {M V}_{t r a n s 05} (H, V, X, Y) + \\ {M V}_{r o t 05} (H, R, X, Y) + \\ {M V}_{e l e v 05} (Z, H, X, Y) \end{matrix}, T D = 5 \\ \begin{matrix} {M V}_{t r a n s 25} (H, V, X, Y) + \\ {M V}_{r o t 25} (H, R, X, Y) + \\ {M V}_{e l e v 25} (Z, H, X, Y) \end{matrix}, T D = 25 \\ n o t s u p p o r t e d, o t h e r w i s e \end{matrix}

(11)

where

H

→ flight level (altitude), (m)

V

→ speed over ground, (m/s)

Z

→ vertical speed, (ascending/descending rate), (m/s)

R

→ rotation rate (yaw), (degrees/sec)

X, Y

→ block position on picture, (normalized pixel coordinates in range [0, 1))

T D

→ temporal distance for motion vector calculation, (number of frames)

Since the work is carried out for two temporal distance values, most commonly used values of 5 and 25 frames, the final motion model supports those values only.

4.9. Test Videos with Complex Motion

In order to test the complex motion model derived in the previous section, complex motion videos including two or more basic motion types are recorded. They are as follows:

Orbit, a combination of translation and rotation;
Ramp, a combination of translation and elevation;
Helix, a combination of translation, rotation, and elevation.

Those videos are captured as for the test videos. The same terrain, same auto-pilot application, and the same drone is used. During the examination of the captured video files, the same procedures explained in the previous sections are followed.

4.9.1. Orbit

This motion is a combination of translational and rotational motion. At a fixed flight level (H), the drone performs a circular motion around a center. The drone moves in a forward direction, not sideways. The translational velocity and the rotational rate of the drone are constant. The camera looks forward, and it is downward-facing, i.e., the camera is positioned at −90 degrees with respect to the horizon.

Four different videos with the following flight parameters are captured:

$H =$ 10 m, $V =$ 3 m/s, $R =$ 10 degrees/s;
$H =$ 20 m, $V =$ 5 m/s, $R =$ 20 degrees/s.

The flight plan over the marked terrain is prepared and verified as described earlier. Figure 15 illustrates the aerial and 3D views of it.

Figure 16 illustrates an example for how the marker trajectories are visualized for this type of motion. It illustrates an example frame from the orbit video captured with flight parameters

H

= 20 m,

V

= 5 m/s, and

R

= 20 degrees/s.

4.9.2. Ramp

This motion is a combination of translational and elevational motion. At a starting flight level of

H

₁ = 5 m, the drone starts moving forward and reaches its target translational speed. At a fixed waypoint in its flight plan, it starts ascending with a fixed elevational rate and keeps on going till a desired flight level

H

₂ is reached. During this ramp up motion, it turns on its camera and captures video.

The drone moves in a forward direction, not sideways. The translational velocity and the elevational rate of the drone are constant. The camera looks forward, and it is downward-facing, i.e., the camera is positioned at −90 degrees with respect to the horizon.

Four different videos with the following flight parameters are captured:

$V =$ 2 m/s, $Z =$ 1 m/s;
$V =$ 4 m/s, $Z =$ 2 m/s.

The flight plan over the marked terrain is prepared and verified as described earlier. Figure 17 illustrates the aerial and 3D views of it.

Figure 18 illustrates an example for how the marker trajectories are visualized for this type of motion. It illustrates an example frame from the ramp video captured with flight parameters

V

= 4 m/s and

Z

= 2 m/s. In this frame, the drone is at flight level 21.70 m and ascending.

4.9.3. Helix

This motion is a combination of all of the basic motion types: translational, rotational, and elevational motion at the same time. The drone starts performing an orbit motion but also starts ascending.

The drone moves in a forward direction, not sideways. The translational velocity, the rotational rate, and the elevational rate of the drone are constant. The camera looks forward, and it is downward-facing, i.e., the camera is positioned at −90 degrees with respect to the horizon.

Six different videos with the following flight parameters are captured:

$V =$ 5 m/s, $R =$ 15 deg/s, $Z =$ 1 m/s;
$V =$ 3 m/s, $R =$ 10 deg/s, $Z =$ 0.5 m/s.

The flight plan over the marked terrain is prepared and verified as described earlier. Figure 19 illustrates the aerial and 3D views of it.

Figure 20 illustrates an example for how the marker trajectories are visualized for this type of motion. It illustrates an example frame from the helix type video captured with flight parameters

V =

5 m/s,

R =

15 deg/s, and

Z =

1 m/s. In this frame, the drone is at flight level 18.50 m.

Those complex motion videos are processed with our OpenCV-based utility as described previously, positions of the markers are detected, and actual motion vectors for the markers for the temporal distances of 5 and 25 frames are calculated.

Afterwards, using the positions of the markers, the motion vectors are calculated by utilizing the complex motion model we built in (11).

The difference between the actual motion vectors and the calculated ones is our error rate, thus a direct indication of how successful our final complex motion model is. The results for the basic and complex motion types are evaluated in the following section.

5. Results

To evaluate the success rates of the motion models developed, two different approaches are used.

(a): Evaluation of the basic motion models

For every video captured for the analysis and modeling of the basic motion types, every motion vector in every frame is both detected by the OpenCV-based utility (

{M V}_{a c t u a l}

) and calculated by the basic motion model (

{M V}_{c a l c u l a t e d}

) defined in this paper. The former is the input used to derive the basic motion model, and the latter is the output of that model. As defined in (12), the difference between those two motion vectors is the error introduced.

{M V}_{e r r o r} = {M V}_{a c t u a l} - {M V}_{c a l c u l a t e d}

(12)

Error histograms for the X- and Y-components of the error signal are plotted for each basic motion model and mean/standard deviation values are calculated.

(b): Evaluation of the final complex motion model

Independent test videos with complex motion are captured. For every motion vector in every frame of each and every test video captured, the motion vector calculated by using the OpenCV-based utility (

{M V}_{a c t u a l}

) is compared with the motion vector calculated by using the final complex motion model defined in (11) (

{M V}_{c a l c u l a t e d}

). The difference between those two motion vectors, as described in (12), is the error introduced by our final complex motion model.

Error histograms for X- and Y-components of the error signal are plotted for each model and mean/standard deviation values are calculated. Those videos are not used during the analysis and definition stages of the complex motion model; therefore, they are taken as independent test videos.

5.1. Analysis of the Basic Translational Model

Figure 21a,b illustrate the distribution for the X- and Y-components of the error vector introduced by our basic translational motion model for a temporal distance of 5 frames. As illustrated in those figures, our basic translational motion model for a temporal distance of 5 frames is capable of calculating the X-component of the motion vectors with an error value of 0.081 ± 1.375 pixels and the Y-component of the motion vectors with an error value of −0.617 ± 2.306 pixels.

Figure 21c,d illustrate the distribution of the error vector for a temporal distance of 25 frames. As illustrated in those figures, our basic translational motion model for a temporal distance of 25 frames is capable of calculating the X-component of the motion vectors with an error value of −0.353 ± 5.677 pixels and the Y-component of the motion vectors with an error value of −2.819 ± 8.338 pixels.

5.2. Analysis of the Basic Rotational Model

For every video captured for the analysis and modeling of the basic rotational motion, every motion vector in every frame is both detected by the OpenCV-based utility and calculated by both poly11 and poly22 fit-based basic rotational motion models.

Figure 22a,b illustrate the distribution for the X- and Y-components of the error vector introduced by our poly11 fit-based basic rotational motion model for a temporal distance of 5 frames. As illustrated in those figures, our model is capable of calculating the X-component of the motion vectors with an error value of 0.016 ± 1.455 pixels and the Y-component of the motion vectors with an error value of 0.015 ± 1.629 pixels.

Figure 22c,d illustrate the distribution for the X- and Y-components of the error vector introduced by our poly22 fit-based basic rotational motion model for a temporal distance of 5 frames. As illustrated in those figures, our model is capable of calculating the X-component of the motion vectors with an error value of 0.017 ± 1.467 pixels and the Y-component of the motion vectors with an error value of −0.041 ± 1.629 pixels.

Table 13 illustrates the error characteristics of the poly11 and poly22 fit-based rotational motion models for a temporal distance value of 5 frames. The mean and the standard deviation values of the error signal are close to each other, and there is no meaningful difference between the two methods. For the sake of simplicity, the poly11 fit-based basic rotational motion model is selected and used.

Figure 23a,b illustrate the distribution for the X- and Y-components of the error vector introduced by our poly22 fit-based basic rotational motion model for a temporal distance of 25 frames. As illustrated in those figures, our model is capable of calculating the X-component of the motion vectors with an error value of −0.099 ± 3.690 pixels and the Y-component of the motion vectors with an error value of 0.023 ± 4.007 pixels. Since the motion vectors for TD = 25 are longer, the poly22-based motion model is selected for higher precision.

5.3. Analysis of the Basic Elevational (Ascending/Descending) Model

Error analysis of the basic elevational motion model is carried out exactly like the rotational motion model.

Figure 24a,b illustrate the distribution for the X- and Y-components of the error vector introduced by our poly11 fit-based basic elevational motion model for a temporal distance of 5 frames. As illustrated in those figures, our model is capable of calculating the X-component of the motion vectors with an error value of −0.006 ± 1.293 pixels and the Y-component of the motion vectors with an error value of 0.004 ± 1.209 pixels.

Table 14 illustrates the values of the poly11 and poly22 fit-based elevational motion models for a temporal distance value of 5 frames. The mean and the standard deviation values of the error signal are close to each other, and there is no meaningful difference between the two methods. For the sake of simplicity, the poly11 fit-based basic elevational motion model is selected and used.

Figure 25a,b illustrate the distribution for the X- and Y-components of the error vector introduced by our poly22 fit-based basic elevational motion model for a temporal distance of 25 frames. As illustrated in those figures, our model is capable of calculating the X-component of the motion vectors with an error value of 0.044 ± 4.545 pixels and the Y-component of the motion vectors with an error value of 0.060 ± 4.224 pixels.

Since the motion vectors for the temporal distance of 25 frames are longer, although the performance difference between the two are incremental, the poly22-based motion model is selected for higher precision.

5.4. Conclusion on Success Rate of Basic Motion Models

To conclude, Table 15 summarizes the success rates of our basic motion algorithms for temporal distances of 5 and 25 frames. For the TD = 5 model, the motion vectors are relatively short, and the worst-case error is −0.617 ± 2.306 pixels. For TD = 25 model, the motion vectors are longer, and the worst-case error is −2.819 ± 8.338 pixels.

5.5. Analysis of the Final Complex Motion Model

The error rate of the final complex motion model defined in (11) is tested by using three different types of complex motion videos.

5.5.1. Orbit

Figure 26a,b illustrate the distribution for the X- and Y-components of the error vector introduced by our final complex motion model for a temporal distance of 5 frames. As illustrated in those figures, our model is capable of calculating the X-component of the motion vectors with an error value of 0.201 ± 3.279 pixels and the Y-component of the motion vectors with an error value of −1.273 ± 3.246 pixels.

Figure 26c,d illustrate the distribution for the X- and Y-components of the error vector introduced by our final complex motion model for a temporal distance of 25 frames. As illustrated in those figures, our model is capable of calculating the X-component of the motion vectors with an error value of −5.452 ± 13.622 pixels and the Y-component of the motion vectors with an error value of 2.100 ± 12.268 pixels.

5.5.2. Ramp

Figure 27a,b illustrate the distribution for the X- and Y-components of the error vector introduced by our final complex motion model for a temporal distance of 5 frames. As illustrated in those figures, our model is capable of calculating the X-component of the motion vectors with an error value of 1.068 ± 1.882 pixels and the Y-component of the motion vectors with an error value of 0.039 ± 2.022 pixels.

Figure 27c,d illustrate the distribution for the X- and Y-components of the error vector introduced by our final complex motion model for a temporal distance of 25 frames. As illustrated in those figures, our model is capable of calculating the X-component of the motion vectors with an error value of −4.519 ± 8.751 pixels and the Y-component of the motion vectors with an error value of −6.794 ± 14.805 pixels.

5.5.3. Helix

Figure 28a,b illustrate the distribution for the X- and Y-components of the error vector introduced by our final complex motion model for a temporal distance of 5 frames. As illustrated in those figures, our model is capable of calculating the X-component of the motion vectors with an error value of 1.080 ± 2.689 pixels and the Y-component of the motion vectors with an error value of 1.783 ± 2.674 pixels.

Figure 28c,d illustrate the distribution for the X- and Y-components of the error vector introduced by our final complex motion model for a temporal distance of 25 frames. As illustrated in those figures, our model is capable of calculating the X-component of the motion vectors with an error value of −2.681 ± 10.317 pixels and the Y-component of the motion vectors with an error value of 7.988 ± 8.922 pixels.

5.6. Conclusion on Success Rate of Final Complex Motion Model

To conclude, Table 16 summarizes the success rates of our final complex motion model for temporal distances of 5 and 25 frames. For the TD = 5 model, the motion vectors are relatively short, and the worst-case error is −1.273 ± 3.246 pixels. For the TD = 25 model, the motion vectors are longer, and the worst-case error is −6.794 ± 14.805 pixels.

5.7. Complexity Analysis of Basic and Final Motion Models

To evaluate the computational complexity requirements of the mathematical motion models defined in this work, the number of CPU clock cycles reported by the RDTSC (Read Time-Stamp Counter) instruction is utilized.

The RDTSC instruction is an assembly-level instruction available on x86 and x86-64 architectures that reads the number of clock cycles elapsed since the last system reset. It provides a low-overhead mechanism for high-resolution timing and performance profiling in software systems.

The RDTSC instruction is called just before and just after the MV calculation, and the difference is the number of CPU clock cycles passed during the execution of the motion model. Mapping CPU clock cycles to calendar time is a complex issue, since the CPU clocks can vary. Instead, it can be used to compare the complexity of different methods.

Table 17 illustrates the average number of clock cycles for each motion model reported by RDTSC instruction for 100 executions.

The first execution time is longer than the subsequent ones since the execution unit fetches and caches the calculation tables and coefficients from the memory. Subsequent calculations are faster, since the data is available in cache and access times are much lower.

5.8. Discussion of Model Performance

The performance of the motion models derived in this study is considered to be promising. Final complex motion model is able to estimate the MVs with a worst-case error of −1.273 ± 3.246 and −6.794 ± 14.805 pixels for TD = 5 and TD = 25 settings, respectively. BMA-based motion vector search algorithms can be initialized by the motion vector predicted by using the model defined in this work, leading to reduced computational complexity. The complexity analysis of the derived model further supports this idea: the number of CPU clock cycles needed to estimate a single MV is found to be very small. Combined complexity gains will lead to more efficient and accurate motion vector estimation.

For future work, those motion models will be incorporated into the VVC reference software VTM and execution times of the conventional MVP methodology will be compared with the model-based one discussed in this work.

6. Conclusions

Video encoders implement generic algorithms to code generic video sequences. If they know how the video will change in the upcoming frames, their compression efficiency may be higher. For this purpose, they implement a small look-ahead buffer and use bidirectionally coded (B) frames. This results in increased latency and may not fit time critical and delay-sensitive applications. If the encoder is capable of receiving metadata from an external entity on how the video will be changing shortly, it may implement specialized algorithms and end up with increased compression efficiency and decreased compression complexity. This may be possible for remote-controlled systems with a camera onboard. A path between the video encoder and the motion control systems may provide the encoder with such information to be used during video coding.

In this paper, by using a remote-controlled drone with a camera onboard, the change in consecutive video frames with respect to the flight parameters is mathematically modeled. Four different flight parameters we focused on are flight level (altitude), speed over ground (horizontal translational speed), vertical speed (ascend or descend rate), and rotational rate (roll/bank). Gimbal position is kept fixed at −90 degrees with respect to the horizon, resulting in a downward-facing camera. A terrain is marked with clearly distinguishable markers, and the drone is flown over this terrain with full auto-pilot control. The flights are repeated multiple times with flight parameters changed one at a time, and video for pure translational, rotational, and elevational motion is recorded at 4K resolution and 50 frames per second. Flight level is changed between 5 and 40 m, speed over ground is changed between 1 and 6 m/s, vertical speed is changed between −3 to +3 m/s, and rotational rate is changed between 20 and 60 degrees/s. Recorded video clips are processed with OpenCV, positions of the markers on the ground are detected for each video frame, and motion vectors for each marker are computed for temporal distances in the range of 5–50 frames. A mathematical model for motion vectors with respect to the four flight parameters and the pixel positions on the video frame is derived for temporal distances of 5 and 25 frames. Translational, rotational, and elevational motion are modeled individually, and the final motion model is the superposition of those three motion models. To validate the derived motion model, complex videos with multiple motion types like circular (translation + rotation), ramp up (translation + elevation), and helix (translation + rotation + elevation) are recorded again on the marked terrain, and the performance of the motion model is verified. It is capable of calculating the motion vector of any object for a temporal distance of 5 and 25 frames with −1 ± 3 and −6 ± 14 pixels worst-case means and standard deviations, respectively, on a 4K frame.

To conclude, by controlling the flight parameters of a drone by means of an auto-pilot, a mathematical model to characterize the nature of change in the video taken from a camera onboard the drone is derived. The model takes six parameters as the input, flight level, speed over ground, vertical speed, rotational rate, and the X and Y position of the pixel on the input video frame and produces motion vectors for temporal distances of 5 and 25 frames. By establishing a link between the avionics systems of the drone and the video encoder onboard, such a model can be used in real time to increase the compression efficiency and decrease the computational complexity of video coding.

The benefits of using such a model in video coding may be taken either during motion estimation in inter-picture coding or during reference picture generation to insert into the reference picture buffer. In the former method, motion vector search algorithms can be initialized by the motion vector predicted by using the model defined in this work, leading to reduced computational complexity. In the latter method, together with the reconstructed picture, a motion-estimated picture can also be inserted into the decoded picture buffer to be used as a reference. The model parameters can be signaled to the decoder by means of an SEI or SPS message and the decoder can also reconstruct the same reference picture on its side. That way, motion vectors for the subsequently encoded picture will be smaller, maybe all zeroes, leading to significant compression efficiency.

Furthermore, this methodology can easily be implemented for other types of remote-controlled aerial and nautical vehicles, surveillance cameras with PTZ functionality, and mobile devices with position/acceleration sensors where additional motion metadata can be gathered from external control systems. By using this metadata, model-based predictions can be made on how the video captured from the camera will change shortly, which eventually leads to more efficient decisions during video coding.

In future work, the final complex motion model defined here will be integrated into the VVC reference software VTM 19.0, and the resulting video coding performance for high-resolution videos will be evaluated. It is expected that, with the use of such a model during the motion estimation stage of the video taken by the drone, the calculation of the motion vectors will be quicker and more precise, consequently leading to a reduction in computational complexity and an increase in general compression quality and efficiency.

Author Contributions

Conceptualization, A.Ş.; methodology, A.Ş. and G.D.; software, A.Ş.; validation, A.Ş. and G.D.; formal analysis, A.Ş. and G.D.; investigation, A.Ş. and G.D.; resources, A.Ş.; data curation, A.Ş.; writing—original draft preparation, A.Ş.; writing—review and editing, A.Ş., A.Ö. and G.D.; visualization, A.Ş.; supervision, G.D.; project administration, A.Ş.; funding acquisition, A.Ş. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Acknowledgments

The authors would like to acknowledge that the equipment used in the experimental part of this work was partially supported by the Scientific and Technological Research Council of Turkey under project number 119E203 and by the Boğaziçi University Research Fund under grant numbers 19002 and 22A2ADP1.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AVC	Advanced Video Coding
BDOF	Bi-directional Optical Flow
BMA	Block Matching Algorithm
CABAC	Context Adaptive Binary Arithmetic Coding
CSV	Comma Separated Values
FHD	Full High Definition
GME	Global Motion Estimation
GOP	Group of Pictures
HEVC	High Efficiency Video Coding
ISP	Intra sub-partition
MC	Motion Compensation
ME	Motion Estimation
MIP	Matrix Intra-prediction
MSE	Mean Squared Error
MVP	Motion Vector Prediction
PSNR	Peak Signal-to-Noise Ratio
QTMT	Quad Tree Multitype Tree
SAD	Sum of Absolute Differences
SSD	Sum of Squared Differences
TD	Temporal Distance
UAV	Unmanned Aerial Vehicle
UHD	Ultra-High Definition
VVC	Versatile Video Coding

References

Wiegand, T.; Schwarz, H. Video coding: Part II of fundamentals of source and video coding. Found. Trends Signal Process. 2016, 10, 1–346. [Google Scholar] [CrossRef]
Video Codec for Audiovisual Services at p × 64 kbit/s. ITU-T Recommendation H.261. 1993. Available online: https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-H.261-199303-I!!PDF-E&type=items (accessed on 22 August 2025).
ITU-T Recommendation H.262—ISO/IEC 13818-2 (MPEG-2); Generic Coding of Moving Pictures and Associated Audio Information—Part 2: Video. International Organization for Standardization: Geneva, Switzerland, 1994.
Video Coding for Low Bitrate Communication. ITU-T Recommendation H.263. 2005. Available online: https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-H.263-200501-I!!PDF-E&type=items (accessed on 22 August 2025).
ISO/IEC 14496-2 (MPEG-4 Visual); Coding of Audio-Visual Objects—Part 2: Visual. International Organization for Standardization: Geneva, Switzerland, 2001.
Advanced Video Coding for Generic Audiovisual Services. ITU-T Recommendation H.264. 2024. Available online: https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-H.264-202408-I!!PDF-E&type=items (accessed on 22 August 2025).
Wiegand, T.; Sullivan, G.J.; Bjontegaard, G.; Luthra, A. Overview of the H. 264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Technol. 2003, 13, 560–576. [Google Scholar] [CrossRef]
High Efficiency Video Coding. ITU-T Recommendation H.265. 2024. Available online: https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-H.265-202407-I!!PDF-E&type=items (accessed on 22 August 2025).
Sullivan, G.J.; Ohm, J.R.; Han, W.J.; Wiegand, T. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1649–1668. [Google Scholar] [CrossRef]
Versatile Video Coding. ITU-T Recommendation H.266. 2023. Available online: https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-H.266-202309-I!!PDF-E&type=items (accessed on 18 July 2025).
Bross, B.; Wang, Y.K.; Ye, Y.; Liu, S.; Chen, J.; Sullivan, G.J.; Ohm, J.R. Overview of the versatile video coding (VVC) standard and its applications. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 3736–3764. [Google Scholar] [CrossRef]
Hamidouche, W.; Biatek, T.; Abdoli, M.; François, E.; Pescador, F.; Radosavljević, M.; Menard, D.; Raulet, M. Versatile video coding standard: A review from coding tools to consumers deployment. IEEE Consum. Electron. Mag. 2022, 11, 10–24. [Google Scholar] [CrossRef]
Mukherjee, D.; Bankoski, J.; Grange, A.; Han, J.; Koleszar, J.; Wilkins, P.; Xu, Y.; Bultje, R. The latest open-source video codec VP9-an overview and preliminary results. In Proceedings of the IEEE Picture Coding Symposium (PCS), San Jose, CA, USA, 8–11 December 2013. [Google Scholar]
AOMedia Video 1 (AV1) Specification. Alliance for Open Media. 2018. Available online: https://aomedia.org/av1 (accessed on 12 August 2025).
Han, J.; Li, B.; Mukherjee, D.; Chiang, C.H.; Grange, A.; Chen, C.; Su, H.; Parker, S.; Deng, S.; Joshi, U.; et al. A technical overview of AV1. Proc. IEEE 2021, 109, 1435–1462. [Google Scholar] [CrossRef]
Choi, Y.J.; Lee, Y.W.; Kim, B.G. Design of perspective affine motion compensation for versatile video coding. In Advanced Concepts for Intelligent Vision Systems, Proceedings of the 20th International Conference, ACIVS 2020, Auckland, New Zealand, 10–14 February 2020; Springer International Publishing: Cham, Switzerland, 2020. [Google Scholar]
Park, S.H.; Kang, J.W. Fast affine motion estimation for versatile video coding (VVC) encoding. IEEE Access 2019, 7, 158075–158084. [Google Scholar] [CrossRef]
Wang, J.; Jing, Y. An Improved Fast Motion Estimation Algorithm in H.266/VVC. J. Innov. Soc. Sci. Res. 2022, 9, 193–200. [Google Scholar] [CrossRef] [PubMed]
Filipe, J.N.; Tavora, L.M.; Faria, S.M.; Navarro, A.; Assuncao, P.A. Complexity Reduction Methods for Versatile Video Coding: A Comparative Review. Digit. Signal Process. 2025, 160, 105021. [Google Scholar] [CrossRef]
Jain, A.K. Fundamentals of Digital Image Processing; Prentice-Hall Inc.: Hoboken, NJ, USA, 1989. [Google Scholar]
Zhu, S.; Ma, K.K. A new diamond search algorithm for fast block-matching motion estimation. IEEE Trans. Image Process. 2000, 9, 287–290. [Google Scholar] [CrossRef] [PubMed]
Zhu, C.; Lin, X.; Chau, L.P. Hexagon-based search pattern for fast block motion estimation. IEEE Trans. Circuits Syst. Video Technol. 2002, 12, 349–355. [Google Scholar] [CrossRef]
Li, R.; Zeng, B.; Liou, M.L. A new three-step search algorithm for block motion estimation. IEEE Trans. Circuits Syst. Video Technol. 1994, 4, 438–442. [Google Scholar] [CrossRef]
Po, L.M.; Ma, W.C. A novel four-step search algorithm for fast block motion estimation. IEEE Trans. Circuits Syst. Video Technol. 1996, 6, 313–317. [Google Scholar] [CrossRef]
Kim, M.J.; Lee, Y.G.; Ra, J.B. A fast multi-resolution block matching algorithm for multiple-frame motion estimation. IEICE Trans. Inf. Syst. 2005, 88, 2819–2827. [Google Scholar] [CrossRef]
Okade, M.; Biswas, P.K. Fast camera motion estimation using discrete wavelet transform on block motion vectors. In Proceedings of the Picture Coding Symposium (PCS), Krakow, Poland, 7–9 May 2012; pp. 333–336. [Google Scholar]
Amirpour, H.; Mousavinia, A. Motion estimation based on region prediction for fixed pattern algorithms. In Proceedings of the International Conference on Electronics, Computer and Computation (ICECCO), Ankara, Turkey, 7–9 November 2013; pp. 119–122. [Google Scholar]
Bhaskaranand, M.; Gibson, J.D. Global motion assisted low complexity video encoding for UAV applications. IEEE J. Sel. Top. Signal Process. 2014, 9, 139–150. [Google Scholar] [CrossRef]
Li, H.; Li, X.; Ding, W.; Huang, Y. Metadata-assisted global motion estimation for medium-altitude unmanned aerial vehicle video applications. Remote Sens. 2015, 7, 12606–12634. [Google Scholar] [CrossRef]
Mi, Y.; Luo, C.; Min, G.; Miao, W.; Wu, L.; Zhao, T. Sensor-assisted global motion estimation for efficient UAV video coding. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 2237–2241. [Google Scholar]
Jakov, B.; Hofman, D.; Mlinarić, H. Efficient Motion Estimation for Remotely Controlled Vehicles: A Novel Algorithm Leveraging User Interaction. Appl. Sci. 2024, 14, 7294. [Google Scholar] [CrossRef]
Viitanen, M.; Koivula, A.; Lemmetti, A.; Ylä-Outinen, A.; Vanne, J.; Hämäläinen, T.D. Kvazaar: Open-source HEVC/H. 265 encoder. In Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 1179–1182. [Google Scholar]
Xiao, X.; Wang, W.; Chen, T.; Cao, Y.; Jiang, T.; Zhang, Q. Sensor-augmented neural adaptive bitrate video streaming on UAVs. IEEE Trans. Multimed. 2019, 22, 1567–1576. [Google Scholar] [CrossRef]
Wei, W. Motion-aware VVC inter prediction enhancement. In Proceedings of the IEEE 7th International Conference on Communications, Information System and Computer Engineering (CISCE), Guangzhou, China, 9–11 May 2025; pp. 128–131. [Google Scholar]
Jia, J.; Zhang, Y.; Zhu, H.; Chen, Z.; Liu, Z.; Xu, X.; Liu, S. Deep reference frame generation method for VVC inter prediction enhancement. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 3111–3124. [Google Scholar] [CrossRef]
Cheng, F.; Tillo, T.; Xiao, J.; Jeon, B. Texture plus depth video coding using camera global motion information. IEEE Trans. Multimed. 2017, 19, 2361–2374. [Google Scholar] [CrossRef]

Figure 1. VVC encoder block diagram [11] and proposed motion model usage. Solid lines represent video data; dashed lines represent control data flow.

Figure 2. Marked test field. (a) Left and (b) right view.

Figure 3. (a) Picture of drone; (b) auto-pilot application used.

Figure 4. Processing stages of a frame at H = 10 m: (a) luma-only input frame; (b) after canny edge detection, grass causes a white noise-like texture in the background; (c) after dilation and erosion, markers are more distinguishable; and (d) six markers successfully detected by blob detection.

Figure 5. (a) 2D and (b) 3D views of translational motion flight plans from auto-pilot application. Blue line illustrates the flight path of the drone.

Figure 6. (a) Trajectories of markers for the example translational motion video. (b) Motion vectors for the temporal distances of 5 frames and (c) 25 frames. Number in top right is the frame no.

Figure 7. (a) Surface fit for X-components of the motion vectors of example video file. (b) Surface fit for Y-components of the motion vectors; TD = 5.

Figure 8. (a) Surface fit for

{{M V}_{x} P}_{00}

coefficient; (b) Surface fit for

{{M V}_{y} P}_{00}

coefficient; TD = 5.

Figure 8. (a) Surface fit for

{{M V}_{x} P}_{00}

coefficient; (b) Surface fit for

{{M V}_{y} P}_{00}

coefficient; TD = 5.

Figure 9. Rotational motion flight plans from auto-pilot application. (a) 2D and (b) 3D views. Blue line illustrates the rotation axis and the arrow illustrates the rotation direction.

Figure 10. (a) Trajectories of markers for the example rotational motion video. (b) Motion vectors for temporal distance of 5 frames. (c) Motion vectors for temporal distance of 25 frames. Number in top right is the frame no.

Figure 11. (a) Surface fit for X-components of the motion vectors of the example video file. (b) Surface fit for Y-components of the motion vectors; TD = 5.

Figure 12. (a) Surface fit for MVx_P00 coefficient. (b) Surface fit for MVy_P00 coefficient; TD = 5.

Figure 13. Elevational motion 3D flight plan from auto-pilot application. Blue line illustrates the elevation axis and the arrows illustrates the direction of the drone movements.

Figure 14. (a) Trajectories of markers for the example elevational motion video. (b) Motion vectors for temporal distance of 5 frames. (c) Motion vectors for temporal distance of 25 frames.

Figure 15. (a) 2D and (b) 3D views of orbit-type motion flight plans from auto-pilot application. Blue line illustrates the flight path.

Figure 16. Trajectories of markers for orbit-type complex motion video.

Figure 17. (a) 2D and (b) 3D views of ramp-type motion flight plans from auto-pilot application. Blue line illustrates the flight path.

Figure 18. Trajectories of markers for ramp-type complex motion video. Number in top right corner is the frame no.

Figure 19. (a) 2D and (b) 3D views of helix-type motion flight plans from auto-pilot application. Blue line illustrates the flight path.

Figure 20. Trajectories of markers for helix-type complex motion video. Number in top right corner is the frame no.

Figure 21. Distribution of the error introduced by the basic translational model. (a) MV_x/TD = 5; (b) MV_y/TD = 5; (c) MV_x/TD = 25; (d) MV_y/TD = 25. Mean value and standard deviations are illustrated with green and red lines respectively.

Figure 22. Distribution of the error introduced by the basic rotational model, TD = 5. (a) MV_x/poly11 fit; (b) MV_y/poly11 fit; (c) MV_x/poly22 fit; (d) MV_y/poly22 fit. Mean value and standard deviations are illustrated with green and red lines respectively.

Figure 23. Distribution of the error introduced by the basic rotational model, poly22 fit, TD = 25. (a) MV_x; (b) MV_y. Mean value and standard deviations are illustrated with green and red lines respectively.

Figure 24. Distribution of the error introduced by the basic elevational model, poly11 fit, TD = 5. (a) MV_x; (b) MV_y. Mean value and standard deviations are illustrated with green and red lines respectively.

Figure 25. Distribution of the error introduced by the basic elevational model, poly22 fit, TD = 25. (a) MV_x; (b) MV_y. Mean value and standard deviations are illustrated with green and red lines respectively.

Figure 26. Distribution of the error introduced by the complex motion model for orbit-type motion. (a) MV_x/TD = 5; (b) MV_y/TD = 5; (c) MV_x/TD = 25; (d) MV_y/TD = 25. Mean value and standard deviations are illustrated with green and red lines respectively.

Figure 27. Distribution of the error introduced by the complex motion model for ramp-type motion. (a) MV_x/TD = 5; (b) MV_y/TD = 5; (c) MV_x/TD = 25; (d) MV_y/TD = 25. Mean value and standard deviations are illustrated with green and red lines respectively.

Figure 28. Distribution of the error introduced by the complex motion model for helix-type motion. (a) MV_x/TD = 5; (b) MV_y/TD = 5; (c) MV_x/TD = 25; (d) MV_y/TD = 25. Mean value and standard deviations are illustrated with green and red lines respectively.

Table 1. List of MVs for the example file, TD = 5. Blue cells illustrate dataset used to model MV_x.

X_a	Y_a	X_n	Y_n	MV_x	MV_y	H
3830	1998	0.9974	0.925	−1	−36	25
3298	1964	0.8589	0.9093	−1	−35	25
2778	1927	0.7234	0.8921	−1	−34	25
2266	1892	0.5901	0.8759	0	−34	25
1775	1858	0.4622	0.8602	0	−34	25
1276	1834	0.3323	0.8491	0	−34	25
775	1808	0.2018	0.837	1	−34	25
274	1785	0.0714	0.8264	0	−34	25
3817	1465	0.994	0.6782	−1	−35	25
3298	1441	0.8589	0.6671	−1	−34	25
…	…	…	…	…	…	…

Table 2. Polynomial surface fit parameters for (a) X- and (b) Y-component of the MVs of the example file, TD = 5. Colored blocks show the use of primary fit coefficients for secondary models in Table 3.

$H$	$V$	Fit Type	$p_{00}$	$p_{10}$	$p_{01}$	$p_{20}$	$p_{11}$	$p_{02}$
25	3	poly11	0.5026	−0.8584	−0.005677	0	0	0
25	3	poly22	−0.1858	0.2885	1.749	−0.0859	−2.011	−0.7091
(a)

25	3	poly11	−33.48	−0.8397	−0.6947	0	0	0
25	3	poly22	−34.35	0.5087	2.761	−1.473	−0.2732	−3.484
(b)

Table 3. Translational motion poly11 surface fit parameters from all test files combined, TD = 5. Colored blocks show the data originated from primary fit models in Table 2.

$H$	$V$	${{M V}_{x} P}_{00}$	${{M V}_{x} P}_{10}$	${{M V}_{x} P}_{01}$	${{M V}_{y} P}_{00}$	${{M V}_{y} P}_{10}$	${{M V}_{y} P}_{01}$
10	1	1.86	−1.112	0.5264	−27.59	−1.021	−1.387
10	2	4.376	−1.147	0.1461	−57.37	−1.043	−1.279
10	3	0.6166	−1.365	0.7562	−83.81	−0.9646	−2.146
…	…	…	…	…	…	…	…
15	1	0.6586	−0.4916	−0.1922	−17.7	−0.4134	−0.4459
15	2	0.8555	−1.041	0.2654	−38.49	−0.9172	−1.124
15	3	0.5466	−1.239	0.4165	−54.75	−2.398	−1.746
…	…	…	…	…	…	…	…
20	1	0.8828	−0.4007	0.07564	−13.66	−0.4868	−0.334
20	2	1.611	−0.6783	0.2689	−28.85	−1.052	−0.02766
20	3	0.2925	−0.5419	−0.08401	−41.61	−1.207	−1.069
…	…	…	…	…	…	…	…
25	1	−0.3485	−0.4093	0.05329	−11.08	−0.483	−0.4854
25	2	−1.235	−0.6546	0.3872	−22.07	−0.8815	−0.7459
25	3	0.5026	−0.8584	−0.00658	−33.48	−0.8391	−0.6947
…	…	…	…	…	…	…	…

Table 4. Illustration of the dataset used to model

{{M V}_{x} P}_{00}

(marked with grey) and

{{M V}_{y} P}_{00}

(marked with italic) coefficients.

Table 4. Illustration of the dataset used to model

{{M V}_{x} P}_{00}

(marked with grey) and

{{M V}_{y} P}_{00}

(marked with italic) coefficients.

$H$	$V$	${{M V}_{x} P}_{00}$	${{M V}_{x} P}_{10}$	${{M V}_{x} P}_{01}$	${{M V}_{y} P}_{00}$	${{M V}_{y} P}_{10}$	${{M V}_{y} P}_{01}$
10	1	1.86	−1.112	0.5264	−27.59	−1.021	−1.387
10	2	4.376	−1.147	0.1461	−57.37	−1.043	−1.279
10	3	0.6166	−1.365	0.7562	−83.81	−0.9646	−2.146
…	…	…	…	…	…	…	…
15	1	0.6586	−0.4916	−0.1922	−17.7	−0.4134	−0.4459
15	2	0.8555	−1.041	0.2654	−38.49	−0.9172	−1.124
15	3	0.5466	−1.239	0.4165	−54.75	−2.398	−1.746
…	…	…	…	…	…	…	…
20	1	0.8828	−0.4007	0.07564	−13.66	−0.4868	−0.334
20	2	1.611	−0.6783	0.2689	−28.85	−1.052	−0.02766
20	3	0.2925	−0.5419	−0.08401	−41.61	−1.207	−1.069
…	…	…	…	…	…	…	…
25	1	−0.3485	−0.4093	0.05329	−11.08	−0.483	−0.4854
25	2	−1.235	−0.6546	0.3872	−22.07	−0.8815	−0.7459
25	3	0.5026	−0.8584	−0.00658	−33.48	−0.8391	−0.6947
…	…	…	…	…	…	…	…

Table 5. Translational motion model parameters for TD = 5 frames.

coeff	${{M V}_{x} P}_{00}$	${{M V}_{x} P}_{10}$	${{M V}_{x} P}_{01}$	${{M V}_{y} P}_{00}$	${{M V}_{y} P}_{10}$	${{M V}_{y} P}_{01}$
$p_{00}$	6.388	−1.682	1.187	−53.01	−0.0655	−3.995
$p_{10}$	−0.392	0.09074	−0.2254	8.542	−0.05364	0.3871
$p_{01}$	−0.8639	0.09118	0.7805	−48.21	−0.3079	0.3241
$p_{20}$	0.005216	−0.001405	0.009057	−0.3903	0.001065	−0.01385
$p_{11}$	−0.006803	−0.003993	0.003131	2.286	0.05857	0.0001624
$p_{02}$	0.404	−0.1273	−0.2492	0.2019	−0.3664	−0.2622
$p_{30}$	2.688 × 10⁻⁵	8.818 × 10⁻⁶	−0.0001147	0.005323	−2.158 × 10⁻⁵	0.0001723
$p_{21}$	0.0006657	−0.0002311	−1.865 × 10⁻⁵	−0.03313	−0.0001802	−0.0003948
$p_{12}$	−0.006215	0.004565	−0.0009358	0.00931	−0.005663	0.006046
$p_{03}$	−0.02771	−0.002587	0.0266	−0.04329	0.05602	0.007011

Table 6. Translational motion model parameters for TD = 25 frames.

coeff	${{M V}_{x} P}_{00}$	${{M V}_{x} P}_{10}$	${{M V}_{x} P}_{01}$	${{M V}_{y} P}_{00}$	${{M V}_{y} P}_{10}$	${{M V}_{y} P}_{01}$
$p_{00}$	27.27	−1.746	9.949	−255.6	−1.485	−27.79
$p_{10}$	−1.294	−0.4757	−1.72	41.32	−0.1548	3.203
$p_{01}$	−5.07	1.486	4.043	−240.3	−1.434	−0.09648
$p_{20}$	0.001965	0.02839	0.0688	−1.884	0.001068	−0.132
$p_{11}$	−0.06556	−0.006836	0.06247	11.2	0.3435	0.2089
$p_{02}$	2.376	−1.011	−1.496	1.886	−2.248	−1.557
$p_{30}$	0.0003996	−0.0003879	−0.0008582	0.02568	−0.0001117	0.001768
$p_{21}$	0.003947	−0.0008752	−0.0009154	−0.1623	2.327 × 10⁻⁵	−0.00588
$p_{12}$	−0.03003	0.01729	−0.004782	0.04985	−0.04202	0.0309
$p_{03}$	−0.1753	0.03527	0.1546	−0.306	0.3581	0.05378

Table 7. List of MVs for the example file, TD = 5. Blue cells illustrate the dataset used to model MV_x.

X_a	Y_a	X_n	Y_n	MV_x	MV_y	H
2868	1020	0.7469	0.4722	2	48	20
1521	1564	0.3961	0.7241	−24	−24	20
1999	1246	0.5206	0.5769	−8	3	20
1594	1234	0.4151	0.5713	−7	−19	20
2552	585	0.6646	0.2708	27	32	20
1070	1207	0.2786	0.5588	−4	−45	20
1590	871	0.4141	0.4032	12	−19	20
1260	839	0.3281	0.3884	15	−36	20
1695	469	0.4414	0.2171	33	−14	20
1598	323	0.4161	0.1495	41	−17	20
…	…	…	…	…	…	…

Table 8. Polynomial fit parameters for (a) X- and (b) Y-component of the MVs of example file, TD = 5. Colored blocks illustrate the use of primary fit coefficients for secondary models in Table 9 and Table 10.

$H$	$R$	Fit Type	$p_{00}$	$p_{10}$	$p_{01}$	$p_{20}$	$p_{11}$	$p_{02}$
20	30	poly11	59.43	−5.095	−112.8	0	0	0
20	30	poly22	59.04	−4.572	−111.6	−0.007181	−1.017	−0.7024
(a)

20	30	poly11	−100.5	200.7	−3.019	0	0	0
20	30	poly22	−101.5	204.6	−1.604	−3.687	−0.2559	−1.264
(b)

Table 9. Rotational motion poly11 surface fit parameters from all test files combined, TD = 5. Colored blocks show the data originated from primary fit models in Table 8. Italic numbers illustrate dataset used to model

{{M V}_{x} P}_{00}

.

Table 9. Rotational motion poly11 surface fit parameters from all test files combined, TD = 5. Colored blocks show the data originated from primary fit models in Table 8. Italic numbers illustrate dataset used to model

{{M V}_{x} P}_{00}

.

$H$	$R$	${{M V}_{x} P}_{00}$	${{M V}_{x} P}_{10}$	${{M V}_{x} P}_{01}$	${{M V}_{y} P}_{00}$	${{M V}_{y} P}_{10}$	${{M V}_{y} P}_{01}$
5	20	39.68	−1.481	−77.35	−70.23	138	−0.2423
10	20	39.32	−2.272	−75.16	−67.99	134	−1.526
15	20	39.58	−2.459	−75.31	−67.68	134	−1.25
20	20	39.51	−2.7	−74.97	−67.92	133.8	−1.584
25	20	38.86	−2.057	−74.72	−66.91	132.3	−1.154
30	20	39.43	−2.444	−75.38	−67.93	134.1	−1.495
40	20	39.5	−2.824	−75.5	−67.88	134.5	−1.393
5	30	61.76	−5.766	−112.8	−99.66	199.2	−6.006
10	30	60.04	−4.803	−113.3	−101.6	202.6	−2.82
15	30	60.34	−5.722	−112.8	−100.7	201	−3.008
20	30	59.43	−5.095	−112.8	−100.5	200.7	−3.019
25	30	59.57	−5.404	−113	−100.6	201.2	−3.188
30	30	59.89	−5.614	−113.9	−101.3	202.4	−3.078
40	30	59.25	−5.418	−113	−100.9	201.6	−3.229
5	40	78.76	−6.957	−151.4	−131	270.3	−9.464
…	…	…	…	…	…	…	…

Table 10. Rotational motion poly22 surface fit parameters from all test files combined, TD = 5. Colored blocks show the data originated from primary fit models in Table 8. Italic numbers illustrate dataset used to model

{{M V}_{x} P}_{00}

.

Table 10. Rotational motion poly22 surface fit parameters from all test files combined, TD = 5. Colored blocks show the data originated from primary fit models in Table 8. Italic numbers illustrate dataset used to model

{{M V}_{x} P}_{00}

.

$H$	$R$	${M V}_{x}$						${M V}_{y}$
$H$	$R$	$P_{00}$	$P_{10}$	$P_{01}$	$P_{20}$	$P_{11}$	$P_{02}$	$P_{00}$	$P_{10}$	$P_{01}$	$P_{20}$	$P_{11}$	$P_{02}$
5	20	40.28	−2.652	−76.9	−1.852	5.637	−3.305	−69.69	137.6	−3.683	1.305	−1.637	4.445
10	20	38.73	−0.4992	−73.98	−1.173	−1.352	−0.5508	−68.05	135.5	−2.094	−2.284	1.269	−0.03778
15	20	39.64	−3.39	−75.13	1.421	−0.985	0.3034	−67.86	134.1	−0.1245	−0.3099	0.4649	−1.34
20	20	38.96	−2.457	−73.37	1.242	−2.846	−0.1466	−68.49	134.3	0.4738	0.1953	−1.331	−1.357
25	20	38.25	−0.6413	−74.05	0.08619	−2.797	0.7151	−67.42	132.4	0.8308	0.8713	−1.797	−1.085
30	20	39.33	−3.35	−74.55	1.721	−1.442	−0.1182	−68.31	134.8	−0.4239	−0.6853	−0.1733	−0.9315
40	20	39.35	−3.109	−74.95	1.082	−1.332	0.1004	−68.19	135.2	−0.4965	−0.861	0.2496	−0.9518
5	30	62.24	−8.802	−112.7	2.778	0.9949	−0.5901	−100.5	196.3	3.254	0.8828	5.079	−11.88
10	30	59.55	−2.227	−113.5	−1.872	−1.642	0.9822	−102.9	207.4	−1.057	−4.23	−1.68	−0.9667
15	30	60.13	−5.951	−111.6	0.4752	−0.4982	−0.9331	−101.5	203.9	−1.499	−2.873	0.1321	−1.54
20	30	59.04	−4.572	−111.6	−0.00718	−1.017	−0.7024	−101.5	204.6	−1.604	−3.687	−0.2559	−1.264
25	30	59.51	−5.684	−112.5	0.5339	−0.5184	−0.2182	−101.4	204.1	−1.814	−3.278	0.7158	−1.719
30	30	59.47	−4.347	−113.3	−0.8397	−0.7861	−0.1277	−101.9	204.7	−2.299	−2.639	0.8156	−1.184
40	30	58.98	−5.274	−112.3	0.8406	−1.989	0.3635	−102	204.9	−1.746	−3.232	0.204	−1.552
5	40	88.06	−45.53	−143.9	25.64	25.76	−23.73	−156.7	352.9	20.31	−60.54	−46.73	−7.296
…	…	…	…	…	…	…	…	…	…	…	…	…	…

Table 11. Rotational motion model parameters for TD = 5 frames.

coeff	${{M V}_{x} P}_{00}$	${{M V}_{x} P}_{10}$	${{M V}_{x} P}_{01}$	${{M V}_{y} P}_{00}$	${{M V}_{y} P}_{10}$	${{M V}_{y} P}_{01}$
$p_{00}$	2.689	2.991	−7.6	−6.88	14.64	5.933
$p_{10}$	−0.1184	−0.4241	0.6806	0.591	−1.318	0.2989
$p_{01}$	1.791	−0.04901	−3.443	−3.321	6.099	−0.6175
$p_{20}$	0.005719	0.0124	−0.02374	−0.006249	0.04426	−0.0207
$p_{11}$	−0.00179	0.009386	−0.009941	−0.02442	0.02111	0.009394
$p_{02}$	0.005629	−0.006519	−0.006827	0.01007	0.0118	0.00848
$p_{30}$	−5.776 × 10⁻⁵	−0.0001704	0.000274	−4.585 × 10⁻⁵	−0.0005176	0.0002623
$p_{21}$	−2.719 × 10⁻⁵	3.086 × 10⁻⁷	0.0001124	0.0002945	−0.0001747	1.378 × 10⁻⁵
$p_{12}$	3.139 × 10⁻⁵	−0.0001282	6.585 × 10⁻⁵	0.0001059	−0.0001865	−0.0001099
$p_{03}$	−2.405 × 10⁻⁵	2.373 × 10⁻⁵	5.226 × 10⁻⁵	−0.0001081	−7.738 × 10⁻⁵	−6.619 × 10⁻⁵

Table 12. Elevational motion model parameters for TD = 5 frames.

coeff	${{M V}_{x} P}_{00}$	${{M V}_{x} P}_{10}$	${{M V}_{x} P}_{01}$	${{M V}_{y} P}_{00}$	${{M V}_{y} P}_{10}$	${{M V}_{y} P}_{01}$
$p_{00}$	1.599	−3.695	0.3597	0.7994	−0.2699	−1.481
$p_{10}$	−35.17	69.23	−0.2138	−19.54	0.2071	39.05
$p_{01}$	−0.1955	0.6348	−0.056	−0.104	0.03645	0.2558
$p_{20}$	−0.3143	0.4142	0.009219	−0.1973	−0.003527	0.2903
$p_{11}$	1.739	−3.46	0.01597	0.9765	−0.01146	−1.956
$p_{02}$	0.01058	−0.03514	0.00231	0.005166	−0.001613	−0.01501
$p_{30}$	0.07286	−0.1108	0.00155	0.01996	−0.004705	−0.06523
$p_{21}$	0.006885	−0.005815	−0.0002823	0.005336	−0.0001668	−0.006004
$p_{12}$	−0.02597	0.05206	−0.0002836	−0.01471	0.0001923	0.02948
$p_{03}$	−0.0001856	0.0005773	−2.741 × 10⁻⁵	−8.29 × 10⁻⁵	2.125 × 10⁻⁵	0.0002616

Table 13. Comparison of the performance of different fit models for TD = 5 frames.

Model	X-Component of Error		Y-Component of Error
Model	Mean	Std. Deviation	Mean	Std. Deviation
poly11-based	0.016	1.455	0.015	1.629
poly22-based	0.017	1.467	−0.041	1.629

Table 14. Comparison of the performance of different fit models for TD = 5 frames.

Model	X-Component of Error		Y-Component of Error
Model	Mean	Std. Deviation	Mean	Std. Deviation
poly11-based	−0.006	1.293	0.004	1.209
poly22-based	−0.012	1.287	0,006	1.206

Table 15. Summary of the error characteristics of our basic motion models. (a) TD = 5; (b) TD = 25.

TD = 5	X-Component of Error		Y-Component of Error
TD = 5	Mean	Std. Deviation	Mean	Std. Deviation
translational model	0.081	1.375	−0.617	2.306
rotational model	0.016	1.455	0.015	1.629
elevational model	−0.006	1.293	0.004	1.209
(a)
TD = 25
TD = 25
translational model	−0.353	5.677	−2.819	8.338
rotational model	−0.099	3.690	0.023	4.007
elevational model	0.044	4.545	0.060	4.224
(b)

Table 16. Summary of the error characteristics of our final complex motion models. (a) TD = 5; (b) TD = 25.

TD = 5	X-Component of Error		Y-Component of Error
TD = 5	Mean	Std. Deviation	Mean	Std. Deviation
Orbit	0.201	3.279	−1.273	3.246
Ramp	1.068	1.882	0.039	2.022
Helix	1.080	2.689	1.783	2.674
(a)
TD = 25
TD = 25
Orbit	−5.452	13.622	2.100	12.268
Ramp	−4.519	8.751	−6.794	14.805
Helix	−2.681	10.317	7.988	8.922
(b)

Table 17. Average number of CPU clock cycles that it takes to execute a single MV estimation.

Motion Model	Average Execution Clock Cycles
Motion Model	First Run	Subsequent Runs
Translational	5524	246
Rotational	1352	208
Elevational	5744	230
Final Complex	8975	534

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Şimşek, A.; Öncü, A.; Dündar, G. Flight-Parameter-Based Motion Vector Prediction for Drone Video Compression. Drones 2025, 9, 720. https://doi.org/10.3390/drones9100720

AMA Style

Şimşek A, Öncü A, Dündar G. Flight-Parameter-Based Motion Vector Prediction for Drone Video Compression. Drones. 2025; 9(10):720. https://doi.org/10.3390/drones9100720

Chicago/Turabian Style

Şimşek, Altuğ, Ahmet Öncü, and Günhan Dündar. 2025. "Flight-Parameter-Based Motion Vector Prediction for Drone Video Compression" Drones 9, no. 10: 720. https://doi.org/10.3390/drones9100720

APA Style

Şimşek, A., Öncü, A., & Dündar, G. (2025). Flight-Parameter-Based Motion Vector Prediction for Drone Video Compression. Drones, 9(10), 720. https://doi.org/10.3390/drones9100720

Article Menu

Flight-Parameter-Based Motion Vector Prediction for Drone Video Compression

Abstract

1. Introduction

2. Problem Definition

3. Related Work

4. Methodology

4.1. Algorithm Specification

4.2. Terrain and Markers

4.3. Drone and Auto-Pilot Application

4.4. OpenCV Processing of Recorded Video and Use of MATLAB Tools

4.5. Modeling of Basic Translational Motion

4.6. Modeling of Basic Rotational Motion

4.7. Modeling of Basic Elevational (Ascending/Descending) Motion

4.8. Final Complex Motion Model

4.9. Test Videos with Complex Motion

4.9.1. Orbit

4.9.2. Ramp

4.9.3. Helix

5. Results

5.1. Analysis of the Basic Translational Model

5.2. Analysis of the Basic Rotational Model

5.3. Analysis of the Basic Elevational (Ascending/Descending) Model

5.4. Conclusion on Success Rate of Basic Motion Models

5.5. Analysis of the Final Complex Motion Model

5.5.1. Orbit

5.5.2. Ramp

5.5.3. Helix

5.6. Conclusion on Success Rate of Final Complex Motion Model

5.7. Complexity Analysis of Basic and Final Motion Models

5.8. Discussion of Model Performance

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI