Figure 1.
Tracking speed-accuracy plot of the same correlation filter tracker based on different features on a hyperspectral surveillance video (HSSV) dataset. The upper right corner indicates the best performance in terms of both standard and robust accuracy. The proposed FSSF algorithm achieves the best accuracy with faster speed.
Figure 1.
Tracking speed-accuracy plot of the same correlation filter tracker based on different features on a hyperspectral surveillance video (HSSV) dataset. The upper right corner indicates the best performance in terms of both standard and robust accuracy. The proposed FSSF algorithm achieves the best accuracy with faster speed.
Figure 2.
Spectral signatures of various materials measured at a center pixel of each object. (a) Target objects and various background materials. From left to right and top to bottom: box, camera, bottle, glass, car, electric car, airplane, boat, building, tree, human, and road. (b) Reflectance of various targets over 680-960 nm in wavelength.
Figure 2.
Spectral signatures of various materials measured at a center pixel of each object. (a) Target objects and various background materials. From left to right and top to bottom: box, camera, bottle, glass, car, electric car, airplane, boat, building, tree, human, and road. (b) Reflectance of various targets over 680-960 nm in wavelength.
Figure 3.
Sample images and their spectral reflectance. (a) Images are taken in various conditions (normal, deformation, object in shadow, object in light, and background clutter). (b) Spectral profiles of object in different states (normal, deformation and object in shadow, object in light). (c) Spectral signatures of facial skin and hair of the different subjects in the background clutter image of (a).
Figure 3.
Sample images and their spectral reflectance. (a) Images are taken in various conditions (normal, deformation, object in shadow, object in light, and background clutter). (b) Spectral profiles of object in different states (normal, deformation and object in shadow, object in light). (c) Spectral signatures of facial skin and hair of the different subjects in the background clutter image of (a).
Figure 4.
The scatter-plot visualization representations of different objects generated for the HSI and RGB datasets using t-SNE. (a) Sample images of the dataset (airplane, bicycle, boat, and person). (b) Visualization of the HSI dataset. (c) Visualization of the RGB dataset. The x axis and y axis represent the two feature values of the data in two-dimensional space, respectively. There are four kinds of objects, each of which is represented a particular color.
Figure 4.
The scatter-plot visualization representations of different objects generated for the HSI and RGB datasets using t-SNE. (a) Sample images of the dataset (airplane, bicycle, boat, and person). (b) Visualization of the HSI dataset. (c) Visualization of the RGB dataset. The x axis and y axis represent the two feature values of the data in two-dimensional space, respectively. There are four kinds of objects, each of which is represented a particular color.
Figure 5.
The scatter-plot visualization representations of the HSI and RGB datasets with the challenge of deformation using t-SNE. (a) Sample images of the dataset with deformation (normal and deformation). The target deforms as the face moves. (b) Visualization of the HSI dataset. (c) Visualization of the RGB dataset. The x axis and y axis represent the two feature values of the data in two-dimensional space, respectively. There are two states (normal and deformation) of the same object in two datasets, each of which is represented a particular color.
Figure 5.
The scatter-plot visualization representations of the HSI and RGB datasets with the challenge of deformation using t-SNE. (a) Sample images of the dataset with deformation (normal and deformation). The target deforms as the face moves. (b) Visualization of the HSI dataset. (c) Visualization of the RGB dataset. The x axis and y axis represent the two feature values of the data in two-dimensional space, respectively. There are two states (normal and deformation) of the same object in two datasets, each of which is represented a particular color.
Figure 6.
The scatter-plot visualization representations of the HSI and RGB datasets with the challenge of illumination variation using t-SNE. (a) Sample images of the dataset with illumination variation (object in light and object in shadow). The electric car is subjected to light changes during driving. (b) Visualization of the HSI dataset. (c) Visualization of the RGB dataset. The x axis and y axis represent the two feature values of the data in two-dimensional space, respectively. There are two states (light and shadow) of the same object in two datasets, each of which is represented by a particular color.
Figure 6.
The scatter-plot visualization representations of the HSI and RGB datasets with the challenge of illumination variation using t-SNE. (a) Sample images of the dataset with illumination variation (object in light and object in shadow). The electric car is subjected to light changes during driving. (b) Visualization of the HSI dataset. (c) Visualization of the RGB dataset. The x axis and y axis represent the two feature values of the data in two-dimensional space, respectively. There are two states (light and shadow) of the same object in two datasets, each of which is represented by a particular color.
Figure 7.
The scatter-plot visualization representations of the HSI and RGB datasets with the challenge of background clutter using t-SNE. (a) Sample images of the dataset with background clutter (from left to right: object1 and object2). The two objects are similar in visual appearance. (b) Visualization of the HSI dataset. (c) Visualization of the RGB dataset. The x axis and y axis represent the two feature values of the data in two-dimensional space, respectively. There are two kinds of objects in two data sets, each of which is represented by a particular color.
Figure 7.
The scatter-plot visualization representations of the HSI and RGB datasets with the challenge of background clutter using t-SNE. (a) Sample images of the dataset with background clutter (from left to right: object1 and object2). The two objects are similar in visual appearance. (b) Visualization of the HSI dataset. (c) Visualization of the RGB dataset. The x axis and y axis represent the two feature values of the data in two-dimensional space, respectively. There are two kinds of objects in two data sets, each of which is represented by a particular color.
Figure 8.
The initialization (purple box) and updating (blue box) process of the proposed real-time spatial-spectral convolution (RSSC) kernel. In the frame t − 1, RSSC kernels are initialized using the search region of interest centered at position and ground-truth bounding box of the object. For the new frame t, spatial-spectral features are extracted using the initialized RSSC kernel to estimate the object position . Then, RSSC kernels are updated using the search region of interest and bounding box centered at . For calculation convenience, here we update the numerator and denominator of the RSSC kernel separately. and denote the FFT and inverse FFT, respectively.
Figure 8.
The initialization (purple box) and updating (blue box) process of the proposed real-time spatial-spectral convolution (RSSC) kernel. In the frame t − 1, RSSC kernels are initialized using the search region of interest centered at position and ground-truth bounding box of the object. For the new frame t, spatial-spectral features are extracted using the initialized RSSC kernel to estimate the object position . Then, RSSC kernels are updated using the search region of interest and bounding box centered at . For calculation convenience, here we update the numerator and denominator of the RSSC kernel separately. and denote the FFT and inverse FFT, respectively.
Figure 9.
(a) Visualization of correlation coefficient matrix, (b) Relative entropy of each band relative to the first band.
Figure 9.
(a) Visualization of correlation coefficient matrix, (b) Relative entropy of each band relative to the first band.
Figure 10.
Visualization of the spatial-spectral feature maps extracted from different sub-HSIs. Activations are shown for two frames from the deformation challenging car sequences (left). The spatial-spectral features (right) are extracted on each sub-HSI. Notice that although the appearance of object changes significantly, we can still extract discriminative features even the background has changed dramatically.
Figure 10.
Visualization of the spatial-spectral feature maps extracted from different sub-HSIs. Activations are shown for two frames from the deformation challenging car sequences (left). The spatial-spectral features (right) are extracted on each sub-HSI. Notice that although the appearance of object changes significantly, we can still extract discriminative features even the background has changed dramatically.
Figure 11.
Illustration of a set of 25 bands of HSI. The 25 bands are ordered in ascending from left to right and top and bottom, and its center wavelengths are 682.27 nm, 696.83 nm, 721.13 nm, 735.04 nm, 747.12 nm, 760.76 nm, 772.28 nm, 784.81 nm, 796.46 nm, 808.64 nm, 827.73 nm, 839.48 nm, 849.40 nm, 860.49 nm, 870.95 nm, 881.21 nm, 889.97 nm, 898.79 nm, 913.30 nm, 921.13 nm, 929.13 nm, 936.64 nm, 944.55 nm, 950.50 nm, 957.04 nm, respectively.
Figure 11.
Illustration of a set of 25 bands of HSI. The 25 bands are ordered in ascending from left to right and top and bottom, and its center wavelengths are 682.27 nm, 696.83 nm, 721.13 nm, 735.04 nm, 747.12 nm, 760.76 nm, 772.28 nm, 784.81 nm, 796.46 nm, 808.64 nm, 827.73 nm, 839.48 nm, 849.40 nm, 860.49 nm, 870.95 nm, 881.21 nm, 889.97 nm, 898.79 nm, 913.30 nm, 921.13 nm, 929.13 nm, 936.64 nm, 944.55 nm, 950.50 nm, 957.04 nm, respectively.
Figure 12.
Example sequences with different tracking objects of the HSSV dataset. From top to bottom: airplane, boat, pedestrian, electric car, bicycle, car.
Figure 12.
Example sequences with different tracking objects of the HSSV dataset. From top to bottom: airplane, boat, pedestrian, electric car, bicycle, car.
Figure 13.
Comparison results for all SSCF trackers and their baseline trackers using three initialization strategies: one-pass evaluation (OPE), temporal robustness evaluation (TRE) and spatial robustness evaluation (SRE). (a) Precision plot and the success plot on OPE. (b) Precision plot and the success plot on SRE. (c) Precision plot and the success plot on TRE. The legend of precision plots and success plots report the precision scores at a threshold of 20 pixels and area-under-the-curve (AUC) scores, respectively.
Figure 13.
Comparison results for all SSCF trackers and their baseline trackers using three initialization strategies: one-pass evaluation (OPE), temporal robustness evaluation (TRE) and spatial robustness evaluation (SRE). (a) Precision plot and the success plot on OPE. (b) Precision plot and the success plot on SRE. (c) Precision plot and the success plot on TRE. The legend of precision plots and success plots report the precision scores at a threshold of 20 pixels and area-under-the-curve (AUC) scores, respectively.
Figure 14.
Success plots over eight tracking attributes, including (a) background clutter (24), (b) deformation (18), (c) illumination variation (20), (d) low resolution (27), (e) occlusion (36), (f) out-of-plane rotation (7), (g) out of view (4), (h) scale variation (37). The values in parentheses indicate the number of sequences associated with each attribute. The legend reports the area-under-the-curve score.
Figure 14.
Success plots over eight tracking attributes, including (a) background clutter (24), (b) deformation (18), (c) illumination variation (20), (d) low resolution (27), (e) occlusion (36), (f) out-of-plane rotation (7), (g) out of view (4), (h) scale variation (37). The values in parentheses indicate the number of sequences associated with each attribute. The legend reports the area-under-the-curve score.
Figure 15.
Qualitative results of our hyperspectral video compared to traditional video on some challenging sequences (electriccar, double5, airplane9, human4). The results of SSCF tracker and the baseline tracker are represented by green and red boxes, respectively.
Figure 15.
Qualitative results of our hyperspectral video compared to traditional video on some challenging sequences (electriccar, double5, airplane9, human4). The results of SSCF tracker and the baseline tracker are represented by green and red boxes, respectively.
Figure 16.
Precision and success plot of different features on a HSSV dataset using three initialization strategies: one-pass evaluation (OPE), temporal robustness evaluation (TRE) and spatial robustness evaluation (SRE). (a) Precision plot and the success plot on OPE. (b) Precision plot and the success plot on SRE. (c) Precision plot and the success plot on TRE. The legend of precision plots and success plots report the precision scores at a threshold of 20 pixels and area-under-the-curve scores, respectively. The fps of trackers in three initialization strategies is also shown in legend.
Figure 16.
Precision and success plot of different features on a HSSV dataset using three initialization strategies: one-pass evaluation (OPE), temporal robustness evaluation (TRE) and spatial robustness evaluation (SRE). (a) Precision plot and the success plot on OPE. (b) Precision plot and the success plot on SRE. (c) Precision plot and the success plot on TRE. The legend of precision plots and success plots report the precision scores at a threshold of 20 pixels and area-under-the-curve scores, respectively. The fps of trackers in three initialization strategies is also shown in legend.
Figure 17.
Success plots over six tracking attributes, including (a) low resolution (27), (b) occlusion (36), (c) scale variation (37), (d) fast motion (9), (e) background clutter (24), (f) deformation (18). The values in parentheses indicate the number of sequences associated with each attribute. The legend reports the area-under-the-curve score.
Figure 17.
Success plots over six tracking attributes, including (a) low resolution (27), (b) occlusion (36), (c) scale variation (37), (d) fast motion (9), (e) background clutter (24), (f) deformation (18). The values in parentheses indicate the number of sequences associated with each attribute. The legend reports the area-under-the-curve score.
Figure 18.
Comparison results with hyperspectral trackers. (a) Precision plot. (b) Success plot. The legend of the precision plot and success plot report the precision scores at a threshold of 20 pixels and area-under-the-curve (AUC) scores, respectively.
Figure 18.
Comparison results with hyperspectral trackers. (a) Precision plot. (b) Success plot. The legend of the precision plot and success plot report the precision scores at a threshold of 20 pixels and area-under-the-curve (AUC) scores, respectively.
Table 1.
Mean overlap precision (OP) metric (in %) and fps of our SSCF and their corresponding baseline trackers.
Table 1.
Mean overlap precision (OP) metric (in %) and fps of our SSCF and their corresponding baseline trackers.
| SS_STRCF | DeepSTRCF | STRCF | SS_ECO | DeepECO | ECO | SS_fDSST | fDSST | SS_CN | CN |
---|
Mean OP | 0.775 | 0.719 | 0.680 | 0.829 | 0.748 | 0.592 | 0.704 | 0.453 | 0.463 | 0.395 |
Table 2.
FPS of our SSCF and their corresponding baseline trackers.
Table 2.
FPS of our SSCF and their corresponding baseline trackers.
| SS_STRCF | DeepSTRCF | STRCF | SS_ECO | DeepECO | ECO | SS_fDSST | fDSST | SS_CN | CN |
---|
FPS | 23.64(cpu) | 5.73(gpu) | 32.11 | 46.68(cpu) | 11.87(gpu) | 67.58 | 45.8985 | 220.30 | 126.17 | 981.94 |
Table 3.
FPS comparison between spatial feature and spatial-spectral feature with same tracker.
Table 3.
FPS comparison between spatial feature and spatial-spectral feature with same tracker.
| Spatial-Sepctral Feature | Spatial Feature |
---|
| FSSF | DeepFeature | HOG | Color |
---|
FPS | 23.64 | 5.73(gpu) | 32.11 | 24.0113 |
Table 4.
Mean DP (MDP) and mean OP (MOP) in different threshold and fps of FSSF versus DeepFeature. MDP (20) denotes the mean DP (% at pixel distance <20). MOP (0.5) denotes the mean OP (% at IOU > 0.5).
Table 4.
Mean DP (MDP) and mean OP (MOP) in different threshold and fps of FSSF versus DeepFeature. MDP (20) denotes the mean DP (% at pixel distance <20). MOP (0.5) denotes the mean OP (% at IOU > 0.5).
Attribute | MDP(20) | MDP(15) | MDP(10) | MDP(5) | MOP(0.5) | MOP(0.6) | MOP(0.7) | MOP(0.8) | fps |
---|
FSSF | 0.620 | 0.558 | 0.457 | 0.265 | 0.471 | 0.395 | 0.298 | 0.178 | 46.68 |
DeepFeature | 0.594 | 0.510 | 0.378 | 0.178 | 0.385 | 0.303 | 0.214 | 0.116 | 1.23 |
Table 5.
Attribute-based comparison with DeepFeature in terms of mean OP (% at IOU>0.5). The best results are shown in bold, our FSSF ranks the first on 8 of 11 attributes: low resolution, background clutter, occlusion, out-of-plane rotation, in-plane rotation, fast motion, scale variation, and deformation.
Table 5.
Attribute-based comparison with DeepFeature in terms of mean OP (% at IOU>0.5). The best results are shown in bold, our FSSF ranks the first on 8 of 11 attributes: low resolution, background clutter, occlusion, out-of-plane rotation, in-plane rotation, fast motion, scale variation, and deformation.
Attribute | FSSF | DeepFeature |
---|
Illumination variation | 0.520 | 0.530 |
Scale variation | 0.419 | 0.388 |
Occlusion | 0.442 | 0.373 |
Deformation | 0.506 | 0.468 |
Motion blur | 0.444 | 0.473 |
Fast motion | 0.349 | 0.315 |
In-plane rotation | 0.392 | 0.317 |
Out-of-plane rotation | 0.378 | 0.369 |
Out-of-view | 0.357 | 0.372 |
Background clutter | 0.439 | 0.409 |
Low resolution | 0.450 | 0.287 |
Table 6.
Mean OP, DP metric (in %) and fps of SSCF and hyperspectral trackers.
Table 6.
Mean OP, DP metric (in %) and fps of SSCF and hyperspectral trackers.
| SS_ECO | MHT | DeepHKCF | HLT |
---|
Mean OP | 0.520 | 0.506 | 0.444 | 0.349 |
Mean DP | 0.829 | 0.788 | 0.375 | 0.110 |
FPS | 46.68 | 1.34 | 49.87 | 1.58 |
Table 7.
Attribute-based comparison with hyperspectral trackers in the term of AUC. The best results are shown in bold.
Table 7.
Attribute-based comparison with hyperspectral trackers in the term of AUC. The best results are shown in bold.
Attribute | SS_ECO | MHT | DeepHKCF | HLT |
---|
Illumination variation | 0.658 | 0.578 | 0.289 | 0.147 |
Scale variation | 0.618 | 0.607 | 0.387 | 0.146 |
Occlusion | 0.630 | 0.577 | 0.391 | 0.152 |
Deformation | 0.704 | 0.676 | 0.395 | 0.129 |
Motion blur | 0.641 | 0.555 | 0.434 | 0.087 |
Fast motion | 0.580 | 0.474 | 0.389 | 0.126 |
In-plane rotation | 0.596 | 0.591 | 0.479 | 0.178 |
Out-of-plane rotation | 0.623 | 0.586 | 0.437 | 0.076 |
Out-of-view | 0.574 | 0.407 | 0.419 | 0.158 |
Background clutter | 0.607 | 0.568 | 0.362 | 0.151 |
Low resolution | 0.680 | 0.623 | 0.388 | 0.105 |