Element-Weighted Neutrosophic Correlation Coefficient and Its Application in Improving CAMShift Tracker in RGBD Video

Neutrosophic set (NS) is a new branch of philosophy to deal with the origin, nature, and scope of neutralities. Many kinds of correlation coefficients and similarity measures have been proposed in neutrosophic domain. In this work, by considering that there may exist different contributions for the neutrosophic elements of T (Truth), I (Indeterminacy), and F (Falsity), a method of element-weighted neutrosophic correlation coefficient is proposed, and it is applied for improving the CAMShift tracker in RGBD (RGB-Depth) video. The concept of object seeds is proposed, and it is employed for extracting object region and calculating the depth back-projection. Each candidate seed is represented in the single-valued neutrosophic set (SVNS) domain via three membership functions, T, I, and F. Then the element-weighted neutrosophic correlation coefficient is applied for selecting robust object seeds by fusing three kinds of criteria. Moreover, the proposed correlation coefficient is applied for estimating a robust back-projection by fusing the information in both color and depth domains. Finally, for the scale adaption problem, two alternatives in the neutrosophic domain are proposed, and the corresponding correlation coefficient between the proposed alternative and the ideal one is employed for the identification of the scale. When considering challenging factors like fast motion, blur, illumination variation, deformation, and camera jitter, the experimental results revealed that the improved CAMShift tracker performs well.


Introduction
A neutrosophic set (NS) [1] is suitable for dealing with problems with indeterminate information.A neutrosophic set is always characterized independently via three membership functions, T, I, and F. These functions in the universal set X are real standard or nonstandard subsets of [ − 0, 1 + ].It is difficult to introduce the NS theory into science and engineering areas when considering the non-standard unit interval.The single-valued neutrosophic set (SVNS) [2] was proposed to handle such a problem.The membership functions are restrained to the normal standard real unit interval [0,1].
Until now, NS has been successfully applied in a lot of fields [3] such as medical diagnosis [4], image segmentation [5][6][7][8][9], skeleton extraction [10], and object tracking [11,12].Several similarity measurements or multicriteria decision-making methods [13][14][15][16][17][18][19][20] were employed to handle the neutrosophic problems.Decision-making can be regarded as a problem-solving activity terminating in a solution deemed to be satisfactory.It has been applied for residential house garage location selection [18], element and material selection [19], and sustainable market valuation of buildings [20].For the application of image segmentation, several criteria in the NS domain were usually proposed for calculating a specific neutrosophic image [5][6][7][8][9].The correlation coefficient between SVNSs [17] was applied for calculating a neutrosophic score-based image [9], and a robust threshold was estimated by employing the OTSU's method [9].In [11], two criteria were proposed in both color and depth domain.The information fusion problem was converted into a multicriteria decision-making issue, and the single-valued neutrosophic cross-entropy was employed to tackle this problem [11].For the neutrosophic theory-based MeanShift tracker [12], by taking the consideration of the background information and appearance changes between frames, two kinds of criteria were considered, the object feature similarity and the background feature similarity.The SVNS correlation coefficient [17] was applied for calculating the weighted histogram, and then the histogram was finally used to enhance the traditional MeanShift tracker.Besides the fields mentioned above, the NS theory was also introduced into clustering algorithms such as c-means [21].While NS-based correlation coefficients have been widely used for solving some engineering issues, the weights of the three membership functions are always treated equally.However, for the elements of T, I, and F, different contributions should be taken into consideration of the results of decision-making.In this work, a method of element-weighted neutrosophic correlation coefficient is proposed, and we try to utilize it for tracking a visual object in RGBD (RGB-Depth) video.
Visual object tracking is still an open issue [22][23][24], and its robustness still lacks the performance of vision applications like surveillance, traffic monitoring, video indexing, and auto-driving.For a tracking task, challenges like background clutter, fast motion, illumination variation, scale variation, motion blur, and camera jitter may happen during the tracking procedure.Two methods are considered to handle these challenging problems.One is utilizing robust features.The color feature is employed by the MeanShift [25] and CAMShift [26] tracker, due to its robustness when there exist challenges like deformation, blur, and rotation.Both trackers are of high efficiency [27].However, when the surroundings have a similar color, they may easily drift from the target.To deal with such a problem, Cross-Bin metric [28], scale-invariant feature transform (SIFT) [29], and texture feature [30] were introduced into the mean shift-based tracker, and better performance was achieved.The other way is to train robust models, such as the multiple instance learning [31,32] and compressive sensing [33]-based tracking.Besides the kinds of trackers mentioned above, the local-global tracker (LGT) [34], incremental visual tracker (IVT) [35] and tracking-learning-detection (TLD) [36] also perform well.
Recently, due to the fact that the depth information can provide another dimension for tackling the object tracking problem, some RGBD-based trackers [11,[37][38][39] have been proposed.Most algorithms are oriented to specific targets [37][38][39].A few category-free RGBD trackers have been proposed.An improved CAMShift tracker was proposed by using the neutrosophic decision-making method in [11], but the indeterminate information was only considered in the information fusion phase.As mentioned above, the CAMShift tracker will perform well, with high efficiency, if a discriminative feature can be employed.In this work, based on the CAMShift framework, we focus on tackling the visual tracking problem when challenges like fast motion, blur, illumination variation, deformation, and camera jitter exist, but without serious occlusion.It is difficult to find such a tracker due to the fact that tracking an object without occlusion is still very challenging for both RGB and RGBD trackers.For the CAMShift tracker, calculating a robust back-projection is one of the most important issues for tracking a target.Indeterminate information always exists in the procedure of the CAMShift process.For instance, it exists when estimating the likelihood probability map, as well as localizing the target.In this work, we try to utilize the method of element-weighted neutrosophic correlation coefficient to handling decision-making problems when there is indeterminate information.
This work mainly exhibits four contributions.First, a method of element-weighted neutrosophic correlation coefficient is proposed.Secondly, three criteria are proposed for object seeds selection, and the corresponding membership functions, T, I, and F, are given.Thirdly, the proposed correlation coefficient is applied for estimating a robust back-projection by fusing the information in both color and depth domains.Finally, for the scale adaption problem, two alternatives in the neutrosophic domain are proposed, and the corresponding correlation coefficient between the proposed alternative and the ideal one is employed for the identification of the scale.
The remainder of this paper is organized as follows: in Section 2, the element-weighted neutrosophic correlation coefficient is presented.In Section 3, main steps and basic flow of the improved CAMShift visual tracker is given first, and the details are illustrated in the following subsections.Experimental evaluations and discussions are presented in Section 4, and Section 5 is the conclusions.

Neutrosophic Correlation Coefficient
Let A = {A 1 , A 2 , . . .A m } be a set of alternatives and C = {C 1 , C 2 , . . .C n } be a set of criteria.We get where A i is an alternative in A, T C j (A i ) indicates the degree to which the alternative A i satisfies the criterion C j ; I C j (A i ) denotes the indeterminacy degree to which the alternative A i satisfies or does not satisfy the criterion C j ; F C j (A i ) denotes the degree to which the alternative A i does not satisfy the criterion The correlation coefficient under single-valued neutrosophic environment between two alternatives A i and A j is defined as [17]: Considering the contribution of each criterion, the weighted correlation coefficient between A i and A j is defined by where Then the weighted correlation coefficient between any alternative A i and the ideal alternative A* can be calculated by

Element-Weighted Neutrosophic Correlation Coefficient
As seen in Equation (3), only the contribution of each criterion is considered.However, in some engineering applications, it is better to provide specific weights for the T, I, and F elements for an alternative A i when considering the criteria C j .The element-weighted correlation coefficient between A i and A j is defined by where α, β, γ ∈ [0,1], α + β + γ = 1.Then S e (A i , A j ) is actually the cosine value between the vector . Hence, it is easy to find that Se satisfies the following properties: Then the element-criterion-weighted correlation coefficient between A i and the ideal alternative A* is defined as

Improved CAMShift Visual Tracker Based on the Neutrosophic Theory
The algorithmic details of the proposed tracker are presented in this section.
A rectangle bounding box [22,23] is always employed for the representation of the target location when dealing with the visual tracking problem.For a visual tracker, the critical task is to estimate the corresponding bounding box in each frame.
Algorithm 1 illustrates the basic flow of the proposed algorithm, as seen in Table 1.Details of each main step are given in the following subsections.

Selecting Object Seeds
Assuming the tracker can calculate an adequate bounding box, and R is the extracted object region, then it is reasonable that we assume pixels located in R and the bounding box with smaller depth value are more likely to be a part of the target.
To select robust object seeds, the pixels located in R and the bounding box are first sorted by the descending order of the depth value.Then several pixels are selected as the candidate object seeds set S by sampling the sorted pixels with a fixed sampling space.Considering there may exists background regions in the corresponding area, only the top 50 percent pixels in the sorted pixel set are sampled.
For the proposition of depth value is a critical criterion for judging the object seed, T D , I D , and F D represent the probabilities when a proposition is true, indeterminate and false degrees, respectively.Then we can give the definitions: where r max is the lowest rank in S, and r si is the rank of the i-th candidate seed S i ; ROI si is a set of pixels that is located in the square area centered at S i , and the length of the sides of the square should be set to an odd value; D(ROI si ) is the depth value set corresponding to ROI si , and var(x) is the standard variance function.
As there may be other objects with a similar depth to the tracked target, the color similarity criterion is considered in this work to conquer such a problem.The corresponding three membership functions T C , I C and F C are defined as follows: where P c is the back-projection calculated in color domain; P c (S i ) is the value of the back-projection located at S i .P c (ROI si ) is the value set corresponding to ROI si in P c .Besides the consideration of the depth and color feature separately, the fused color-depth information is also taken into consideration.The corresponding three membership functions T DC , I DC and F DC for the fused color-depth criterion are defined as follows: where P is the fused back-projection; P(S i ) is the value of the back-projection located at S i ; P(ROI si ) is the value set corresponding to ROI si in P.
By substituting the corresponding T, I, and F under the proposed three criteria into Equation ( 6), the probability of the reliability for the seed S i can be calculated by where w D , w C , w DC ∈ [0,1] are the corresponding weights of criteria and w D + w C + w DC = 1.Assuming the ideal alternative under all the three criteria is the same as A * = 1, 0, 0 , then Equation ( 16) can be simplified to After calculating all the probability of the reliability for the seed in S, the first N seeds sorted by pr si are selected as object seeds set OS.

Extracting Object
The critical task for extracting object is to determine the extracted object region R.
For each frame, the image is segmented into regions in the depth domain.In this work, the fast depth segmentation method [40] is employed for segmentation.Suppose there exist M regions in the t-th frame, C i represents the i-th region, and DC i is the depth value at the location of the centroid of C i .
Suppose OS i is the i-th object seed in the previous frame and B is the pixel set located in the area of the bounding box calculated by the tracker in the current frame.The extracted region R is defined as: where DOS i is the depth value of OS i ; T is a threshold value; ∪ and ∩ corresponds to the set operation.
All the regions C k belonging to R constructs the candidate object region RC.

Calculating the Fused Back-Projection
Back-projection is a probability distribution map with the same size as the input image.Each pixel value of the back-projection demonstrates the likelihood of the corresponding pixel belonging to the tracked object.Considering that using the discriminative feature is one of the most important factors for the visual tracker, both the color and depth information are employed for calculating the fused back-projection.
In [11], two criteria are proposed in both feature domain.For the near region similarity criterion, the corresponding neutrosophic elements are defined as follows: where qc u is the u-th bin of the feature vector for the target in the color domain, it is firstly calculated in the first frame by where {x i } i =1...n are the pixels located in the extracted object region R; b c is the transformation function in color domain, and the function b c : R 2 → {1 . . .m c } associates to the pixel at location x i the index b(x i ) of its bin in the quantized feature space; δ is the Kronecker delta function; C is the normalization constant derived by imposing the condition ∑ m c u=1 qc u = 1; pc u is the u-th bin of the feature vector corresponds to the extracted object region R in the current frame, and pc u corresponds to the feature vector in the annular region near the target, as defined in [11].
Similarly, for the near region similarity criterion, the corresponding neutrosophic elements in the depth domain can be computed by where qd u , pd u and pd u are the corresponding feature vector in the depth domain, similar to the calculation of qc u , pc u and pc u .
By applying the far region similarity criterion [11], the related functions are presented as follows: where T Cf , I Cf , and F Cf correspond to the color domain; T Df , I Df and F Df are the functions in the depth domain.As defined in [11], pc u and pd u are the feature vectors in the annular region far from the target in the color and depth domain, respectively.
As discussed in [11], the back-projection in the color domain is defined as where b c is the transformation function, as defined in Equation ( 22), and x is the pixel coordinate on the image plane.By using the assumption that the tracked target is with a relative low speed, the back-projection corresponding to the depth domain is defined as [11]: where d (x, S) is the minimal depth distance between the pixel x and the previous seed set S; MAXD is the maximum depth displacement of the target between adjacent frames; 2B pre is the pixel set that is covered by a bounding box of twice the size of the previous object location, but with the same center.By considering the robustness of P C and P D , the fused back-projection P can be calculated as where rc is calculated by where ns C is the element-weighted correlation coefficient in the color domain.The ideal alternative under the near or far region similarity criterion is the same as A * = 1, 0, 0 .Then ns C is defined as where w Cn , w Cf ∈ [0,1] are the corresponding weights of criteria and w Cn + w Cf = 1.Similarly, the element-weighted correlation coefficient in the depth domain is defined as where w Dn , w Df ∈ [0,1] are the corresponding weights of criteria and w Dn + w Df = 1.For P C and P D defined in Equation ( 27), they are computed by where T D , T C ∈ [0,1] are the threshold for filtering the noise in the color or depth back-projection.
As seen in Equation ( 31), the information in P D is employed for enhancing the color back-projection P C , and vice versa for Equation (32).

Scale Adaption
In this work, the bounding box of the tracked object is firstly decided by the traditional CAMShfit algorithm [26].For each frame, the bounding box B pre in the previous frame is employed as the start location of the mean shift process.The current tracking location can be calculated by where , and M 00 = ∑ x∈B P(x).Then, the size of the bounding box is s = 2 √ M 00 /256.For convenience, this bounding box is called the initial bounding box bellow.Getting the adequate scale of the bounding box is very important.If the scale cannot fit the object for a few frames, the tracker may finally drift from the target.Considering the size of the initial bounding box may be disturbed by the imprecision of the back-projection, a method in the neutrosophic domain is introduced into the scale identification process.
In this work, the color likelihood probability between the candidate object area and the object area is employed as the truth membership.The color likelihood probability between the candidate object area and the background area is employed as the indetermination membership.For the reducing scale alternative, the truth, indetermination, false membership functions are defined as where pc rsu is the u-th bin of the feature vector in the initial bounding box with a reduced scale; bc rsu corresponds to the feature vector in the annular region nearby the scale-reduced bounding box.It must be emphasized that all the pixels taken into consideration must located in the candidate object region RC in the current frame.By substituting Equations ( 34)-( 36) into the Equation ( 6) with the assumption that the idea alternative A * = 1, 0, 0 , the probability for reducing scale is defined as Similarly, for the expanding scale alternative, the truth, indetermination, and false membership functions are defined as where pc exsu and bc exsu are with the similar meaning as pc rsu and bc rsu , but with an expanded scale for the initial bounding box.Then the probability for reducing scale is defined as Finally, the scale of the initial bounding box is updated by where λ 0 ∈ (0,1) is the step value for scale adaption, it should be set to a relatively small value, it is set to 0.04 in this work; s > 1 is a scaling factor, it is employed for avoiding the noise in the color domain, and it is set to 1.1 here.
After the scale identification process, a bounding box with the same center as the initial bounding box, but with the tuned width and height is employed as the location of the target in the current frame.The method proposed in [11] for updating the feature vector of the target is employed here.

Experiment Results and Analysis
Several challenging video sequences captured from a Kinect sensor are employed.Both color and depth information is provided in each frame.As mentioned at the beginning, several challenging factors like fast motion, blur, deformation, illumination variation, and camera jitter are considered, and those selected testing sequences are all without serious occlusion challenge.
For comparison, the NeutRGBD [11] algorithm is employed.Compared to the tracker proposed here (NeutRGBDs), the NeutRGBD is also a tracker using the CAMShfit framework, but employing different strategies for object region extraction, information fusion, and scale identification.In addition, we implemented a neutrosophic CAMShfit tracker based on the tangent correlation coefficient [4] in the SVNS domain, we call it NeutRGBDst.The only difference between NeutRGBDs and NeutRGBDst is the correlation model, which allows us to ensure that the selection of correlation model is the cause of the performance difference.
To gauge absolute performance, four other trackers, compressive tracking (CT) [33], LGT [34], IVT [35], and TLD [36], are selected.The tracking-by-detection scheme is employed by the three trackers except LGT.For the LGT tracker, features like color, shape, and apparent local motion are introduced into the local layer updating process, and the local patches are applied for representing the target's geometric deformation in the local layer.

Setting Parameters
For the proposed NeutRGBDs, the fixed sampling space for selecting candidate object seeds is set to 0.02.Due to the top 50% of pixels in the sorted pixel set being sampled, there will be 26 candidate seeds for each frame.After considering the neutrosophic criterion, N = 6 seeds are finally selected as object seeds.In order to emphasize the color information to some extent, w D , w C , and w DC in Equation ( 17) are set to 0.3, 0.4, and 0.3, respectively.It is not suggested to set one or two of those three parameters to a very small value, because the information in the corresponding feature domain may be wrongly discarded.The threshold T, defined in Equation (18), which decides the accuracy of R, is set to 50 mm.If T is set to a big enough value, the whole image region will be added into R, and a too small value will lead to an incomplete object region.The accuracy of the region extracting result may influence the tracking result to some extent.According to the attribution of the testing sequences and the displacement of the target between adjacent frames, the parameter MAXD in Equation ( 26) is set to 70 mm.Then w Cn , w Cf , w Dn and w Df in Equations ( 29) and ( 30) are all set to 0.5 equally.A relatively low value should be given to the parameters T D and T C defined in Equations ( 31) and (32); otherwise, most useful information will be wrongly filtered out.In this work, both parameters are set to 0.1.For the element-weighted parameters α, β, and γ, they are set to 0.5, 0.25, and 0.25, respectively in Equations ( 17), ( 29), ( 30), ( 37) and (41) to enhance the truth element.Finally, all the values of these parameters are chosen by hand-tuning, and all of them are constant for all experiments.

Evaluation Criteria
Both the center position error and the success ratio are considered.The location error metric is employed for plotting the center position error curve.The center location Euclidean distance between the tracked target and the manually labeled ground truth is applied for calculating the center position error in each frame.
By setting an overlap score r, which is defined as the minimum overlap ratio, one can decide whether an output is correct or not.The success ratio R is calculated by the following formula: where N is the total number of frames, s i is the overlap score, and r is the corresponding threshold.
A robust tracker will earn a higher value for R. The overlap score can be calculated as where ROI Ti is the region covered by the target bounding box in the i-th frame, and ROI Gi is the region covered by the corresponding ground truth bounding box.

Tracking Results
Several screen captures for the testing sequences are given in Figures 1-4.Success and center position error plots of each testing sequence are shown in Figures 5-8.A more detailed discussion is described in the following.
Wr_no1 sequence: This sequence highlights the challenges of rolling, blur, fast motion, and appearance change.As shown in Figure 1, all the trackers perform well until frame #24.However, the bounding box calculated by the NeutRGBD tracker has a relatively small scale.As seen in frames #50 and #56, only the NeutRGBDs tracker produces an adequate scale for the bounding box.As shown in frames #56, #75, and #150, a relatively small scale is estimated by the NeutRGBDst tracker.The CT, LGT, and IVT trackers failed to track the rabbit in frame #75 because of the wrongly estimated scale, and the TLD tracker failed due to the challenge of serious appearance change when the person tried to turn the rabbit back.As shown in Figure 5a, the NeutRGBDs tracker has a good success ratio when different overlap thresholds are selected.Due to the good performance when dealing with the scale adaption problem, the center location of the tracked object is closer to the ground truth in most frames, as seen in Figure 5b.In summary, during the whole tracking process, the NeutRGBDs tracker has the best performance.
Toy_no sequence: The challenges like blur, fast motion, and rotation are included in this sequence.As seen in frame #6, the CT, IVT, and TLD trackers have already failed due to the fast motion of the toy.For the LGT tracker, an improper scale is estimated on account of the update of the local patches cannot follow such a rapid change.For the NeutRGBD tracker, due to the fact that the toy covers a relatively wide range of depth information, and the object seeds only cover a small range, the extracted object region sometimes only covers parts of the toy.As seen in Figure 6, on account of this factor, the center location of the target produced by the NeutRGBD tracker is less stable than that of the NeutRGBDs tracker.Thanks to the seed selection scheme, the NeutRGBDst tracker performs well on this sequence.Unlike the NeutRGBDs tracker, when calculating a neutrosophic correlation coefficient, T, I, and F are treated as the same weight for NeutRGBDst.As seen in Figure 2, the scale produced by the NeutRGBDs tracker performs the best.As shown in Figure 6a, both the NeutRGBDs and NeutRGBDst tracker perform well, and the NeutRGBDs tracker performs better when the overlap threshold is set to nearly 0.72.

Tracking Results
Several screen captures for the testing sequences are given in Figures 1-4.Success and center position error plots of each testing sequence are shown in Figures 5-8.A more detailed discussion is described in the following.Wr_no1 sequence: This sequence highlights the challenges of rolling, blur, fast motion, and appearance change.As shown in Figure 1, all the trackers perform well until frame #24.However, the bounding box calculated by the NeutRGBD tracker has a relatively small scale.As seen in frames #50 and #56, only the NeutRGBDs tracker produces an adequate scale for the bounding box.As shown in frames #56, #75, and #150, a relatively small scale is estimated by the NeutRGBDst tracker.The CT, LGT, and IVT trackers failed to track the rabbit in frame #75 because of the wrongly estimated scale, and the TLD tracker failed due to the challenge of serious appearance change when the person tried to turn the rabbit back.As shown in Figure 5a, the NeutRGBDs tracker has a good success ratio when different overlap thresholds are selected.Due to the good performance when dealing with the scale adaption problem, the center location of the tracked object is closer to the ground truth in most frames, as seen in Figure 5b.In summary, during the whole tracking process, the NeutRGBDs tracker has the best performance.Toy_no sequence: The challenges like blur, fast motion, and rotation are included in this sequence.As seen in frame #6, the CT, IVT, and TLD trackers have already failed due to the fast motion of the toy.For the LGT tracker, an improper scale is estimated on account of the update of the local patches cannot follow such a rapid change.For the NeutRGBD tracker, due to the fact that the toy covers a relatively wide range of depth information, and the object seeds only cover a small range, the extracted object region sometimes only covers parts of the toy.As seen in Figure 6, on account of this factor, the center location of the target produced by the NeutRGBD tracker is less stable than that of the NeutRGBDs tracker.Thanks to the seed selection scheme, the NeutRGBDst tracker performs well on this sequence.Unlike the NeutRGBDs tracker, when calculating a neutrosophic correlation coefficient, T, I, and F are treated as the same weight for NeutRGBDst.As seen in Figure 2, the scale produced by the NeutRGBDs tracker performs the best.As shown in Figure 6a, both the NeutRGBDs and NeutRGBDst tracker perform well, and the NeutRGBDs tracker performs better when the overlap threshold is set to nearly 0.72.Zball_no1 sequence: Challenges of illumination variation, rapid motion and camera jitter existed in this sequence.Due to the challenge of appearance change and rapid motion, the TLD tracker fails soon, as seen in Figure 3.The IVT tracker also fails due to the similar surroundings, especially for the wood floor.Though a relatively large scale of the bounding box is estimated by the CT and LGT trackers, both of them can localize the ball properly before frame #65.With the challenges of similar surroundings, rapid motion, and camera jitter, the CT tracker has already failed before frame #91.As shown in Figure 7b, when judging the center position error evaluation criteria, owing to the information fusion in the neutrosophic domain, the NeutRGBDs, NeutRGBDst, and NeutRGBD tracker all perform well.Thanks to the scale adaption and seeds selection strategy, the NeutRGBDs tracker has a more appropriate scale than the other two, as seen in Figures 3 and 7a.
Hand_no_occ sequence: This presents the challenges of illumination variation, deformation, out-of-plane rotation, and similar surroundings.As shown in frame #2 in Figure 4, a large background region is chosen as the object area at the phase of tracker initialization for the CT, LGT, IVT, and TLD tracker.All three trackers except LGT soon fail due to the weak initialization and the out-plane rotation challenge.The LGT tracker performs well throughout this sequence mainly due to the application of the scheme of apparent local motion.However, it is frequently disturbed by the similar surroundings, especially for regions with similar color and displacement.As seen in Figure 8b, a more accurate center has been produced by the NeutRGBDs tracker since frame #150.Although all three neutrosophic-based information fusion trackers perform well in this sequence, the NeutRGBDs tracker can produce a more accurate bounding box, as seen in Figures 4 and 8.
the toy covers a relatively wide range of depth information, and the object seeds only cover a small range, the extracted object region sometimes only covers parts of the toy.As seen in Figure 6, on account of this factor, the center location of the target produced by the NeutRGBD tracker is less stable than that of the NeutRGBDs tracker.Thanks to the seed selection scheme, the NeutRGBDst tracker performs well on this sequence.Unlike the NeutRGBDs tracker, when calculating a neutrosophic correlation coefficient, T, I, and F are treated as the same weight for NeutRGBDst.As seen in Figure 2, the scale produced by the NeutRGBDs tracker performs the best.As shown in Figure 6a, both the NeutRGBDs and NeutRGBDst tracker perform well, and the NeutRGBDs tracker performs better when the overlap threshold is set to nearly 0.72.Zball_no1 sequence: Challenges of illumination variation, rapid motion and camera jitter existed in this sequence.Due to the challenge of appearance change and rapid motion, the TLD tracker fails soon, as seen in Figure 3.The IVT tracker also fails due to the similar surroundings, especially for the Hand_no_occ sequence: This sequence presents the challenges of illumination variation, deformation, out-of-plane rotation, and similar surroundings.As shown in frame #2 in Figure 4, a large background region is chosen as the object area at the phase of tracker initialization for the CT, LGT, IVT, and TLD tracker.All three trackers except LGT soon fail due to the weak initialization and the out-plane rotation challenge.The LGT tracker performs well throughout this sequence mainly due to the application of the scheme of apparent local motion.However, it is frequently disturbed by the similar surroundings, especially for regions with similar color and displacement.As seen in Figure 8b, a more accurate center has been produced by the NeutRGBDs tracker since frame #150.Although all three neutrosophic-based information fusion trackers perform well in this sequence, the NeutRGBDs tracker can produce a more accurate bounding box, as seen in Figures 4 and 8

Discussion
From the above illustrations of the tracking results, we see that a more accurate bounding box can be estimated when the element-weighted neutrosophic correlation coefficient is introduced into the CAMShift framework.Firstly, for the NeutRGBD tracker, only the depth information is considered for seed selection.A more robust seed selection scheme is applied by the NeutRGBDs tracker.Each seed is judged by using the T, I, and F elements in the neutrosophic domain, and the depth, color, and fused information are all taken into consideration.In addition, the truth element is emphasized for each criterion for the NeutRGBDs tracker.The object seeds play an essential role in the procedure of the NeutRGBD, NeutRGBDs, and NeutRGBDst tracker.The robust object seeds help the tracker earn a more accurate object region, as well as a more robust back-projection in the depth domain.Secondly, compared to the NeutRGBD tracker, the NeutRGBDs and NeutRGBDst trackers keep more useful information in both the color and depth domains in the final back-projection.Such a back-projection can provide a more discriminative feature when there are surroundings with similar color or depth to the target.Finally, for the NeutRGBDs and NeutRGBDst trackers, a scale identification process is first proposed in the neutrosophic domain, and this method contributes a lot when the CAMShift scheme fails to estimate an adequate scale.The only difference between the NeutRGBDs and NeutRGBDst trackers is the calculation of the correlation coefficient.The tangent correlation coefficient employed by the NeutRGBDst tracker treats the T, I, and F

Discussion
From the above illustrations of the tracking results, we see that a more accurate bounding box can be estimated when the element-weighted neutrosophic correlation coefficient is introduced into the CAMShift framework.Firstly, for the NeutRGBD tracker, only the depth information is considered for seed selection.A more robust seed selection scheme is applied by the NeutRGBDs tracker.Each seed is judged by using the T, I, and F elements in the neutrosophic domain, and the depth, color, and fused information are all taken into consideration.In addition, the truth element is emphasized for each criterion for the NeutRGBDs tracker.The object seeds play an essential role in the procedure of the NeutRGBD, NeutRGBDs, and NeutRGBDst tracker.The robust object seeds help the tracker earn a more accurate object region, as well as a more robust back-projection in the depth domain.Secondly, compared to the NeutRGBD tracker, the NeutRGBDs and NeutRGBDst trackers keep more useful information in both the color and depth domains in the final back-projection.Such a back-projection can provide a more discriminative feature when there are surroundings with similar color or depth to the target.Finally, for the NeutRGBDs and NeutRGBDst trackers, a scale identification process is first proposed in the neutrosophic domain, and this method contributes a lot when the CAMShift scheme fails to estimate an adequate scale.The only difference between the NeutRGBDs and NeutRGBDst trackers is the calculation of the correlation coefficient.The tangent correlation coefficient employed by the NeutRGBDst tracker treats the T, I, and F elements equally.As can be seen from the above analysis, it is the main reason the NeutRGBDs tracker always produces a more robust target bounding box.

Conclusions
A method of element-weighted neutrosophic correlation coefficient is proposed, and it is successfully applied in improving the CAMShift tracker in RGBD video.The experimental results have revealed its robustness.For the selection of robust object seeds, three kinds of criteria are proposed, and each candidate seed is represented in the SVNS domain via three membership functions, T, I, and F. Then these seeds are employed for extracting the object region and calculating the depth back-projection.Furthermore, the proposed neutrosophic correlation coefficient is applied for fusing the likelihood probability in both the color and depth domains.Finally, in order to modify the scale of the bounding box, two alternatives in the neutrosophic domain are proposed, and the corresponding correlation coefficient between the proposed alternative and the ideal one is employed for the identification of the scale.As discussed in this work, challenges without serious occlusion are considered here.It will be our primary mission to try to tackle the occlusion problem through the RGBD information in the future.

Figure 1 .
Figure 1.Performance on "wr_no1" sequence by seven trackers.Wr_no1 sequence: This sequence highlights the challenges of rolling, blur, fast motion, and appearance change.As shown in Figure1, all the trackers perform well until frame #24.However, the bounding box calculated by the NeutRGBD tracker has a relatively small scale.As seen in frames #50 and #56, only the NeutRGBDs tracker produces an adequate scale for the bounding box.As shown in frames #56, #75, and #150, a relatively small scale is estimated by the NeutRGBDst tracker.The CT, LGT, and IVT trackers failed to track the rabbit in frame #75 because of the wrongly estimated scale, and the TLD tracker failed due to the challenge of serious appearance change when the person tried to turn the rabbit back.As shown in Figure5a, the NeutRGBDs tracker has a good success ratio when different overlap thresholds are selected.Due to the good performance when dealing with the scale adaption problem, the center location of the tracked object is closer to the ground truth in most frames, as seen in Figure5b.In summary, during the whole tracking process, the NeutRGBDs tracker has the best performance.

Figure 5 .Figure 5 .
Figure 5.The quantitative plots for each tracker of the wr_no1 sequence: (a) Success plots; (b) center position error plots.

Figure 5 .Figure 6 .Figure 7 .
Figure 5.The quantitative plots for each tracker of the wr_no1 sequence: (a) Success plots; (b) center position error plots.

Figure 6 .
Figure 6.The quantitative plots for each tracker of the toy_no sequence: (a) Success plots; (b) center position error plots.

Figure 5 .Figure 6 .Figure 7 .
Figure 5.The quantitative plots for each tracker of the wr_no1 sequence: (a) Success plots; (b) center position error plots.

Figure 8 .
Figure 8.The quantitative plots for each tracker of the hand_no_occ sequence: (a) Success plots; (b) center position error plots.

Table 1 .
Basic flow of the NeutRGBDs tracker.