UAV Object Tracking Application Based on Patch Color Group Feature on Embedded System

: The discriminative object tracking system for unmanned aerial vehicles (UAVs) is widely used in numerous applications. While an ample amount of research has been carried out in this domain, implementing a low computational cost algorithm on a UAV onboard embedded system is still challenging. To address this issue, we propose a low computational complexity discriminative object tracking system for UAVs approach using the patch color group feature (PCGF) framework in this work. The tracking object is separated into several non-overlapping local image patches then the features are extracted into the PCGFs, which consist of the Gaussian mixture model (GMM). The object location is calculated by the similar PCGFs comparison from the previous frame and current frame. The background PCGFs of the object are removed by four directions feature scanning and dynamic threshold comparison, which improve the performance accuracy. In the terms of speed execution, the proposed algorithm accomplished 32.5 frames per second (FPS) on the x64 CPU platform without a GPU accelerator and 17 FPS in Raspberry Pi 4. Therefore, this work could be considered as a good solution for achieving a low computational complexity PCGF algorithm on a UAV onboard embedded system to improve ﬂight times.


Introduction
Unmanned aerial vehicles (UAVs) have rapidly evolved, and there are lots of examples of UAV analysis applications in various fields, such as transportation engineering systems [1], UAV bridge inspection platforms [2], UAV-based traffic analysis [3], and oil pipeline patrol factory inspection [4], etc. In recent years, several object monitoring strategies have been proposed for discriminative object tracking. The solutions for lowcost computational complexity, accurate object tracking, real-time operation speed, and embedded-system implementation must turn out to be necessary problems. In general, the UAV object tracking algorithms are categorized into the deep learning (DL) method and the generic method.
The DL method has been repeatedly used in many works for data compression [5], noise reduction [6], image classification [7], speech recognition [8], disease diagnosis [9], and so on. To extract the important features from input data, the DL model consists of several convolutional layers, activation functions, and pooling layers. The loss function is compared with the DL output error and the ground truth. To reach higher accuracy, the model needs to pre-train before using real cases [10]. In the DL tracking [11][12][13][14][15][16][17][18][19][20][21][22], the CNN [11][12][13]16,22] and RNN [18,19,21] are used to achieve higher accuracy on the discriminative object tracking. The DL model presents a powerful tracking performance. In addition, high-end computational platform, high-performance X86-64 multi-core CPU, GPU accelerator and huge computational complexities are required, which are also powerconsuming. Therefore, it is hard to achieve DL tracking on UAV onboard embedded systems because of its limitations.
In the generic method, the image features of each frame are obtained through the filters. These features are further used to acquire the object's location. Recently, the discriminative correlation filter (DCF) method has turned out to be a popular and efficient approach to obtain features for discriminative object tracking [23][24][25][26][27][28][29][30][31][32][33]. The objective of DCF is to find a suitable parameter to maximize the features extracted from the object. Because of the cyclic correlation operation, the boundary effect causes periodic expansion in the DCF, which degrades tracking performance [24]. In [25], the regularization condition is applied to suppress the boundary impact and enhance performance to overcome the disadvantage and improve performance. However, when the object moves faster than the variation of local response in the frame, the lack of information and reformation are restricted by the regularization condition. Additionally, more computation increases the complexity, and the decrease in the FPS tends significantly in deformed DCF.
The authors suggested the PCGF algorithm, which is a general approach that works well. It can minimize computational complexity and enhance the object tracking control system's efficiency on UAVs. The PCGF is made up of four GMMs [34] derived from the hue, saturation, and value (HSV) color model. The tracking object is divided into numerous non-overlapping patches, which are then converted into PCGFs, to represent attributes. By comparing the PCGFs from previous and current frames, the object's location is calculated. The backdrop feature of the item has been removed from the picture, ensuring that the current PCGFs have no background feature. In addition, the object is not moved very fast, and difference in the pixels from one frame to another is almost similar. The object position in the sequence frames only needs to be searched around the previous position in order to find the features matching. The main contributions of this work are summarized as follows:

O
An efficient PCGF algorithm for a generic approach is proposed to present the object features and compare the features matching score of the previous frame and the current frame. O We introduced a background subtraction technique for PCGF to eliminate background characteristics and ensure the object features are reserved. O The proposed method has been implemented on an embedded system, in addition, the proposed approach achieved the real-time speed on a CPU platform without a GPU accelerator.
The rest of this paper is organized as follows: Section 2 revisits the introduction of the previous works. Section 3 describes the proposed method. Section 4 exhibits experimental results as well as related discussions. Finally, the concluding remarks are drawn in Section 5.

Deep Learning-Based Tracking
The DL-based method consists of neural network layers, which contain millions of parameters. The DL-based process finds values for every parameter to minimize the error between output and the ground truth. In recent years, the DL-based method has been broadly used for discriminative tracking. The object location is predicted by the features, which are extracted through hidden layers of neural network layers. In [11,12,14,17,18,23], a fully convolutional neural network was used to extract the feature from the input image. The pre-trained VGG Net [7] was implemented as the first layer to improve the object's feature performance [11,12,17,18,21,22]. In [11], two networks, the generic network (GNet) and the specialized network (SNet) were used to produce two heat maps from the input image's area of interest (ROI). The object position was then identified by distracter detection, which was determined by the two heat maps. According to the tracking sequence, Yun et al. [21] presented an action-decision network to anticipate the object's action. The video's recurrent neural network (RNN) shows a strong correlation between frames [19,20,22]. The object feature was used to compare the changing temporal features in multiple spatial LSTM cells in [22]. Multi-directional RNNs were used to calculate the spatial confidence map in [19]. Background suppression is evident in the results. The authors of [20] utilized the quicker R-CNN to discover all probable item candidates in the immediate search area. In addition, the three-stage LSTM cascade decided whether the tracker should update its location based on the search region. The coarse-tracker and fine-tracker [12,15,22] were utilized to execute the coarse-to-fine searching method in [15]. The object location was predicted using a combined response map based on the divided parts of the object in [18]. The Siamese network, which learns the similarity response with template and search region, was proposed in [16]. In addition, various methods to the DL-based tracking system included correlation filters (CF) [12,[17][18][19]22]. The accuracy of discriminative tracking can be improved by combining the CF and DL-based methods. In short, DL-based discriminative tracking methods have high performance, but they are not practical on the UAV onboard platform due to the enormous computing required.

Generic Method
To enhance the object features in the generic approach, a filter is designed to reinforce the discriminative object characteristics. The object location is determined by the filter response. Furthermore, the filter's coefficient updates every frame to detect object position. For object tracking, the discriminative correlation filters (DCF) are worked as the framework. In [23], the CF process the 256 × 256 image for object tracking, and a good result was achieved using a single-core 2.4 GHz CPU platform. In [24], the spatially regularized correlation filters (SRDCF) are proposed to eliminate the effect of the boundary effect. Fu. et al. [25] kept images into background-aware correlation filter (BACF) through different levels to obtain multiple features. Shi et al. [26] proposed an effective visual object-tracking algorithm based on locally adaptive regression kernels and implemented it using two support vector machines. In [27], the software de Reconhecimento de Cor e Contorno Ativo (SRCCA) is presented as target recognition and tracking system that uses the hue, saturation, and value color space for color-based object identification in the active component model. The Kalman filter was used for preliminary tracking in [28], and the findings were improved using a saliency map for local detection. Choi et al. [29] created a visual tracking algorithm for attention-modulated disintegration and integration, which they used to disintegrate and integrate target data. A target item is fragmented into various structural characteristic components, such as size, colors, and forms. The dual regularized correlation filter (DRCF) is presented in [30], which is employed to more effectively mitigate the boundary effect. The time slot-based distillation algorithm (TSD) was introduced by Li et al. [31] to improve the BACF. The log-Gabor filter is utilized to encode texture information during tracking in [32], and boundary distortion may be mitigated via saliency-guided feature selection. In conclusion, the DCF-based tracking system achieves a good mix of speed and accuracy. The described method for UAV tracking achieved real-time operation in [24][25][26][27][28]. However, once the target is detected missing, this may affect the tracking in the other frame severely. The accuracy is also affected by object deformation, rapid motion, illumination variation, and background complexity. Figure 1 depicts the suggested method's working procedure schematic flowchart. Feature extraction, feature matching, and background removal are the three portions of the schematic flowchart, respectively. The feature extraction portion retrieves the target object's characteristics from each current frame. Furthermore, the RGB pixel values are transformed to HSV color space and then divided into numerous non-overlapping blocks.

Proposed Method
In addition, the object patches are segmented, and each patch's value combines with GMM to obtain the hue and saturation (µ, σ) values into four groups. The GMM enhances the feature extraction dynamically, and also the K means++ algorithm is developed to acquire the center point of Gaussian distribution before analyzing image features. In addition, the obtained features reinforce for improving target image object precision as well as object deformation, illumination variation, rapid motion, and visual obstruction. The features of the object are extracted from the first frame to evaluate object characteristics, then block patches segmented for the next operation. Before background subtraction, object features are compared, and the matching score is calculated for the current frame and the previous frame. If similar characteristics were identified, the object information of the current frame is sent into the background removal section; otherwise, the search window is expanded, and the target image object can be attempted to be found again from the input window. The properties of the target object and its background patches were compared to determine whether the patches belonged to the same target object or not. The segmented object image patches are scanned in four directions to eliminate the background from the object window: half-length from up to down and down to up, similarly half-width from left to right and right to left. The target picture object's coordinates, width, height, and patch characteristic values, such as dimensions, colors, and forms, are then changed before proceeding to the next frame assessment.

The Function of the Input Window
Initially, the object window uses the input window shown in Figure 2 to locate the target item (a). The upper-left side of the input window extracted (Xobj, Yobj) feature and the object window recognized the object's height (Hobj) and width (Wobj). The characteristic values of the target object and its surrounding pixels were retrieved from the input window and object window to decrease computational complexity. The initial frame determines the object's information. Because the target object is moving, the search window is enlarged by 72 pixels on either side of the object window in Figure 2b to identify the target object's current frame.

Patch Segmentation
The object window is captured using the input window, and 12 pixels are placed around it, as seen in Figure 3. Both the object window and the enlarged region are divided into non-overlapping 6 × 6-pixel patches, which are divided into two groups: the object patch (Po) and the outer backdrop patch (Out) (POB). The height and width of the new window are (Hobj+24) and (Wobj+24), respectively. To minimize object tracking complexity and computational cost, and to speed up the performance, the RGB color model has converted to HSV color and just hue and saturation were processed for the computation.

PCGF Extraction of K means++ and GMM
After converting the HSV color gamut and determining the patch color, the next step is the GMM clusters represent several feature values from the patches that are named patch color group feature (PCGF). Afterward, the hue and saturation values of each patch were captured, which requires the initial center value to be defined, in order to analyze the performance. Initial center values were determined by using the K means++ algorithm.
The K means method occasionally produced a poor cluster because of excessively random beginning settings. To solve the problem, Arthur et al. [33] increased the distance between each group's original center locations. We utilized K-means++ to obtain four cluster center points (K0, K1, K2, K3) in each patch, allowing us to determine the maximum distance from each group's starting center location. The detailed procedure is as follows: As illustrated in Figure 4, the HSV histogram is transformed first, then one index of the non-zero value is chosen to represent the initial cluster center point (K0) (a). Following the discovery of the original cluster center point K0, the second, third, and fourth cluster points (K1, K2, K3) were discovered. K1 is the result of calculating the distance between the non-zero index and the center point K0. After determining the center values, the closest distance (Dn) between each pixel's hue and saturation value is calculated using Equation (1) for the non-zero index value and the i-th group. Figure 4 depicts the calculated distance, center point, and center values because of the computations (b) (The distance between two locations is denoted by d i ). The D n of each pixel is used to determine the new center point of a group. A higher D n value is more likely to become the new center value K i . The patch pixels are divided into four groups, therefore Equations (1) and (2) must be performed four times to calculate the center values of the four groups. The values of these four groups are applied in the GMM to determine the initial conditions.
The GMMs [33] are prevalent in background subtraction, pattern recognition, and performance. The image distribution cannot be described only by using Gaussian distribution, a GMM enables the analysis of data distribution to improve the precision. The GMM consists of multiple Gaussian models and is expressed as follows: where x represents the input vector data, K denotes the total number of single Gaussian models (four are used to represent the characteristic values of a single block), k is the group index value of a single Gaussian model, w k represents the probability of an input belonging to the k-th single Gaussian model, and φ(x|θ k ) denotes the Gaussian distribution probability density of the k-th Gaussian model [θ k = (µ k , σ 2 k )]. In this study, the color blocks of the input image were grouped. The K means++ algorithm is first employed to identify the group center values of a color block, and the GMM is used to categorize each pixel of the color block to a specific group. To reduce the calculation complexity, the optimization equations proposed by Lin et al. [34], as expressed in Equations (4)-(6), are applied.
Here, α k,t = α w /ω k,t , ω k,t represents the probability value of the k-th single Gaussian model in the t-th frame, µ k,t represents the mean value of the single Gaussian model, σ 2 k,t represents its standard deviation, and α w represents the learning rate. Figure 5b shows the hue value computed using the K means++ method and GMM for one of the patches seen in Figure 5a. There is a total of four Gaussian distributions found, and the characteristic values of the patch's hue value are stated in Equation (7), where m and n are the patch's index values. Both hue and saturation values are required for each patch. In the first frame, just the object window and background window's characteristic values are extracted; in subsequent frames, all the search window's characteristic values are extracted.

Background Patch Removal from the Object Window
After defining the characteristic values (P m,n ) of every patch, the next step is removing the background patches from the object window. The proposed algorithm extracts patches and finds a threshold window from the background window for every vertical and horizontal patch. The threshold values are calculated by subtracting the background characteristic values of P OB (i.e., thr(m, n) = |P m,n − P m+1,n |), which are depicted in Figure 6a,b. The picture is split and scanned in four directions. As indicated in Figure 6, the half-length height from up to down and down to up was scanned at the same time, as was the half-length height from left to right and right to left Figure 6c,d. Figure 7 illustrates patch acquisition based on the upper half of object characteristics. The upper half part of the object patches is scanned and expressed as the label by the following Equation (8). When the upper half part of the object patch is scanned, (m, n) is the uppermost threshold (thr) coordinate, and k represents the displacement value. The thr value is compared with the characteristics of the upper half of the object patch (H obj ). The Label m,n+k is marked as 1 if the deviation between any characteristic values of the patch and the threshold are greater than the threshold, otherwise it is set to 0. To reduce the computation, when Label m,n+k−1 is determined as object patch, the remaining blocks require assessment that can be directly identified as the object patch.
Label m,n+k = 1, thr m,n − P m,n+k > thr m,n Label m,n+k−1 == 1 0, otherwise   Figure 8b shows each column pixel after scanning and computing the upper half, with 1 and 0 patches indicated in white and black, respectively. After finishing the upper-and lower-part scanning, the vertical scanning mask is subjected to a "OR" operation, as shown in Figure 8d. As shown in Figure 8, the left and right half of the patches are scanned in the same way, and a "OR" operation is calculated for horizontal mask scanning (Figure 8g). Finally, the entire mask is obtained by performing an "AND" operation on the vertical and horizontal scan results. Figure 8 shows the acquisition outcome of all patches collected by four directional scanning's Figure 8h.
The object's characteristic values are updated to the next frame to monitor the object's location. Figure 8a depicts the original frame, whereas Figure 8i depicts the background removal mask in combination with the image generated from the scanning results. Consequently, obtaining the object's upper-left coordinate, width, and height may be used to define the frame.

Feature Matching Based on PCGF
After all the characteristic values are defined in the search window, all patch characteristics within the window are compared with the previous frame to determine their similar mask ( Figure 9). Equation (9) is used to find the coordinate which makes the minimal difference between the search window and object window in Equation (10). Specifically, OW SET represents the patch set of the object window in the previous frame, SW SET indicates the search window group, (Q, V) is the searching window coordinates, and (m, n) is the block scan coordinate.
After identifying the searching coordinate (MatchPos), the feature matching scores are calculated by the following Equation (11). Specifically, DiffMAX is the maximum difference for color value (set as 255 in this paper), and NS represents the number of object patches defined as an object in the previous frame. If the feature matching score is lower than 40%, then the detection will be invalid, and the searching system requires additional assessment; otherwise, the system enters the background-removal block to update the upper-left coordinate, width, height, and patch characteristics of the object in the current frame.

Object Out of Range (Boundary Area)
If the feature matching is insufficient (<40%), it means the object is out of the boundary area. The searching system assesses whether the object is out of the boundary area or not. While the object is out of bounds, it may appear again from any edge of the boundary area. Therefore, the searching window switches the boundary area, and its height and width are determined using the object window, as shown in Figure 10.

Object Occlusion
If the feature matching score is low, but the object is still inside of the window, then the object may be obstructed by other objects illustrated in Figure 11b. The search window then expands to twice the size of the object window to increase the searching range, and the frame maintains its existing specifications to find the object. Figure 11. (a) The searching window extends 72 px from the object when unobscured; (b) object occlusion. The yellow box represents the object label of the last mark which was detected successfully; the blue box represents the size of searching windows increases since the PCGF doesn't detect the object.

Experimental Results and Analysis
The method is written in C++ and implemented with the OpenCV library to assess the experimental results. Following that, the code is translated to ARM architecture on the x86-64 platform to evaluate the embedded system's speed and correctness. The correctness of several algorithms was assessed in Section 3.2, and the execution efficiency of the algorithms was evaluated in Section 3.3.

Measurement Results
The multiple video test performance and tracking results for UAVs are compared with the various algorithms. The UAV123 dataset by Mueller et al. [35] is used to perform this experiment. The most difficult patterns are used to recognize the objects. Figure 12a,b depicts the precise tracking of a fast-moving and deforming object. The method is shown in Figure 12c looking for the pictured item that is out of the window. When the object reenters the window, the algorithm correctly selects the item, as seen in Figure 12d. Figure 13 depicts a case in which the tracking item is covered by a tree. #1399 is the final frame that PCGF can follow the object, as seen in Figure 13a. Because of the item covered by the tree in #1400-1446, the matching score is less than 40. As illustrated in Figure 13, the PCGF expands the search window to discover the item (b). The matching score in #1446 is more than 40. As a result, as illustrated in Figure 13, the object label can be re-marked (c). Figure 14 shows how the system correctly tracked an item while a human was dressed in the same color as the backdrop (grass color), enhancing the object's resemblance to the background. Figure 15 depicts the algorithm accurately identifying an object that is obstructed by leaves and crossing the intersection between sunlight and shade, thereby causing partial obstruction and illumination changes. Figure 15 shows how the algorithm correctly recognized an object that is partially obscured by leaves and crosses the intersection of sunshine and shadow, resulting in partial blockage and lighting variations.    Furthermore, we also compared the result of PCGF with some other experiments. This work makes a fair comparison with some state-of-the-art tracking algorithms, which include three DL algorithms: C2FT [14], PBBAT [17], and ADNET [20], and one generic algorithm: SCT [29]. The five test patterns and the results are shown in Figure 16. All of the algorithms have better capabilities for detecting objects with simple backgrounds in person1, person7, and bike1. However, in more complex scenery, PBBAT [17] and SCT [29] obtained a bad tracking and lost the object location in wakeboard4 #0417 because of the complicated waves. In car18, the car moved rapidly, which caused most algorithms to correctly frame the target, but there has a significant error in the size of the image. Figure 16. Examples of the UAV datasets tracking results. The first, second, third, fourth, and fifth columns show the challenging image sequences. From left to right are person1, person7, wakeboard4, car18, and bike1, respectively.

Algorithm Precision Assessment
Wu et al. [36] utilized Equation (12) to define and compute the center point placement error, the ground truth is defined by x gt and y gt in each frame, and the center position of the forecast object window is defined by x rs and y rs the error values of each algorithm. To reduce the error and more accurately track an object, the overlapping rate is calculated by the following Equation (13), whereas the overlapping area between the ground-truth object frame represents by (Win gt ) and the tracking object frame (Win rs ); a higher overlapping rate indicates that the tracking result is close to the ground truth.
OverlapRate = Win rs ∩ Win gt Win rs ∪ Win gt (13) After defining the center point location error and overlap rate of each frame, and the precision rate is calculated using Equation (14) as a reference, the success rate (15) can be obtained. The label value FT indicates whether the center point location error and overlap rate of the frame exceed their threshold values. If so, FT is set to 1; otherwise, FT is set to 0.
SuccessRate(OverlapThr) = ΣFT(OverlapRate,OverlapThr) In terms of precision and success, the proposed approach has been compared to those used in C2FT [14], PBBAT [17], ADNet [20], and SCT [29]. One-pass evaluation (OPE) was used in this experiment to compute accuracy, 12 distinct patterns, and various threshold levels for the findings shown in Figure 17. The accuracy is shown by the y-axis, while the threshold value is represented by the x-axis. Except for person18, person20, car18, and wakeboard5, the object center error between the proposed PCGF's projected results and the ground truth supplied by UAV123 is less than 50 pixels. Images of person20, car18, and wakeboard5 reveal that the precision is not closed to 1 for all the compared algorithms, even threshold set as 50 pixels. It means that the object center error is greater than 50 pixels in some frames because the background is complicated and the filming angle of the UAV changes continuously. Table 1 shows the precision result of 5 algorithms when the threshold is set to 20, which is the same evaluation terms as in [20,29]. For relatively simple patterns such as person1, person7, and person16, significant differences were not detected between the algorithms in their precision. However, more complex patterns such as person16 and person20, car18, and boat3, have observed significant differences. The center position predicted by SCT [29] has a big gap with other algorithms and has the same appearance as that shown in frame 417 of wakeboard4 in Figure 16. Under the average rating, although the PBBAT [17] which is a deep learning method has shown the highest average precision that mentions in the green color in Table 1, the operational function for real-time applications is difficult because of the advanced training procedure and highly complex calculation process. However, the proposed generic method has achieved the second-highest average precision that mentions in the blue color in Table 1, and the proposed generic method can operate at real-time applications on the UAV object tracking application with less calculation process.   Figure 18 illustrates the success plot of OPE, which investigated the overlapping rates of the algorithms; x represents the threshold value of the overlapping rate, and y represents the success rate. The result shows that in all patterns there exists an intersection between the PCGF result and the UAV123 ground truth. In person18, car18, and boat7, these algorithms reveal significantly different curves in boat3, boat7, wakeboard2, and wakeboard4; the SCT [29] shows that the success rate doesn't reach 1 even though the threshold set in a small value. The overlapping rate is measured by the areas under the curves (AUCs), which were calculated by using the Equation (16). Total AUCs are displayed in Table 2, and it should be mentioned that the proposed algorithm yielded good results, written in green in the table, which are better than the deep learning-based PBBAT [17] method.  For the overall evaluation of the average precision result and the average AUC score with the threshold set to 20, our algorithm reveals a great performance on object tracking. The result indicates that our approach achieved a very near result to PBBAT [17], which is a DL method, and its computational cost is more complex than our proposed approach.

Algorithm Speed Assessment
The suggested method was tested on the i5-6400 CPU architecture, which has a 14 nm lithography process, four cores and four threads, a maximum processing speed of 3.3 GHz, and a TDP of 65 W. In terms of performance, the PCGF is implemented using C++ and the Open CV library, and it achieves 32.5 FPS on the CPU platform. This work may also be carried out on the Raspberry Pi 4, which has a low-power BCM2711 SoC processor with an ARM-based cortex A72 4 core. On the high-configurational GPU platform, the ADNet [20] Table 3. On the other hand, the proposed PCGF algorithm achieved 32.5 FPS on the CPU platform and 17 FPS when executed on Raspberry Pi 4. To compare with the power consumption, the algorithms in [17,20,29] require a GPU accelerator for achieving the proposed, which takes 65 W (GTX 650) extra power consumption compared with the CPU-only platform. In addition, the Raspberry Pi 4 only requires 6.4 W on stress benchmark testing, which saves 97% of the power consumption of desktop computers with a 300 W power supply. It is the better solution for achieving the PCGF on the UAV onboard platform for discriminative tracking.

Discussion
In terms of precision comparability, the PCGF average result is displayed in Table 1. Achieving 100 percent precision is difficult because the UAV123 [35] dataset has some limitations corresponding to object occlusion, boundary, visibilities, and so on. In addition, for the field which may request higher precision such as defense, the PCGF can further combine with the re-identification or recognition function to reach the higher standard. However, there is still the opportunity to continue further research on the following areas: (1) To reduce the impact of light effects on PCGF, an additional process needs to convert the pixels into HSV color gamut. (2) When the object color and its neighboring area are similar colors, then the object feature value similar characteristics. (3) The proposed technique incorporates several conditional clauses that cause the parallel process to accelerate at a slower rate. (4) When the UAV is close to the object, the number of PCGF increases, affecting the FPS result due to computational costs.

Conclusions
In conclusion, in the general method, a novel algorithm for the discriminative tracking system has been developed. The target object features have been separated by the PCGFs which represent the object and its location. The object location is predicted from the searching window and calculating the PCGF matching score. Thereby, object pixels and the background pixels distinguished successfully with the strong resistance of object deformation, rotation, distortion. The proposed technique considerably reduced the computational complexity and increased the performance which achieved real-time proceeding speed in the CPU platform without the GPU accelerator. Lastly, we implemented the proposed technique on Raspberry Pi 4 as a discriminative tracking system for UAV applications.  Data Availability Statement: Publicly available datasets-UAV123 [35] were analyzed in this study. This data can be found here: https://cemse.kaust.edu.sa/ivul/uav123 (accessed on 8 July 2021).

Conflicts of Interest:
The authors declare that no conflict of interest.