^{1}

^{*}

^{2}

^{2}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

In video analytics, robust observation detection is very important as the content of the videos varies a lot, especially for tracking implementation. Contrary to the image processing field, the problems of blurring, moderate deformation, low illumination surroundings, illumination change and homogenous texture are normally encountered in video analytics. Patch-Based Observation Detection (PBOD) is developed to improve detection robustness to complex scenes by fusing both feature- and template-based recognition methods. While we believe that feature-based detectors are more distinctive, however, for finding the matching between the frames are best achieved by a collection of points as in template-based detectors. Two methods of PBOD—the deterministic and probabilistic approaches—have been tested to find the best mode of detection. Both algorithms start by building comparison vectors at each detected points of interest. The vectors are matched to build candidate patches based on their respective coordination. For the deterministic method, patch matching is done in 2-level test where threshold-based position and size smoothing are applied to the patch with the highest correlation value. For the second approach, patch matching is done probabilistically by modelling the histograms of the patches by Poisson distributions for both RGB and HSV colour models. Then, maximum likelihood is applied for position smoothing while a Bayesian approach is applied for size smoothing. The result showed that probabilistic PBOD outperforms the deterministic approach with average distance error of 10.03% compared with 21.03%. This algorithm is best implemented as a complement to other simpler detection methods due to heavy processing requirement.

Obtaining the correct observation for track maintenance is a very challenging task. Tracking accuracy is highly dependent on accurate observation. Improving the accuracy of observation detection and association are two crucial factors in building good trackers, especially in people counting and behaviour analysis systems. Even for a global positioning system [

Generally, there are two major approaches to obtain the measurement input, either by detection with recognition or detection without recognition. Recognition in this case means we know in the first place to which track a particular observation belongs. Foreground segmentation and optical flow are two methods of obtaining measurement input without recognizing the tracked object. Those algorithms work by detecting moving pixels without knowing the identity of the tracked object, and the detected foreground blob such as from [

Since illumination change is hard to model with a single colour model, we implemented two colour models for PBOD—RGB and HSV—where the latter is heavily used during illumination change. For histogram matching, we explored five histogram similarity modelling: correlation [

This paper is organized into 6 sections. A literature review will be presented in Section 2. Details of the algorithms are fully explained in Section 3, where each subsection explains in detail the methods to obtain points of interest, generate candidate patches, find the best patch, perform position alignment and adjust the patch's size. In Section 4, the pseudo-codes of both deterministic and probabilistic PBOD are given for more clarity. Simulation results and discussions are presented in Section 5. The conclusions are given in the last section to emphasize the performance difference.

Since we are focusing on obtaining an observation through recognition, the two most common methods of object recognition are the template- and feature-based methods. The feature-based approach usually recognizes the object by obtaining a match based on the feature descriptor. The template-based approach uses the shape or a collection of pixels in finding a match. The major trade-offs of both approaches are the distinctiveness property and the generalization property. Most feature-based approaches have high distinctiveness property but low in generalization property. This explains why feature-based approaches do not work well for blurred images but perform exceptionally well on rich textured objects. On the other hand, the template-based approach has low distinctiveness property but is very good in generalizes the object detection. Even blurred and non-rigid objects can still be recognized reliably.

Template-based recognition is an approach that requires a database or a collection of possible templates to be built before any matching can be performed. This method is used in many license plate recognition systems [

One of the most cited paper regarding point-based detector is Scale Invariant Feature Transform (SIFT) [

Another method of obtaining tracking observation is by applying a histogram-based method. Histogram-based tracking algorithms [

In general, a kernel-based algorithm performs well for single object tracking [

Both methods were first introduced in [

A point of interest is used to obtain the location to generate vector descriptors. These descriptors are matched between frames for building possible patches. The original location of the object or the original patch in the first frame is initialized by the user. The importance of this user defined patch is that it serves as the reference for building the statistical data used in matching and smoothing procedures and in particular, the reference histograms. Moreover, the size of the previous frame patch indicates the object's size. Let _{w}_{h}

Then, vector descriptor ^{t,x,y}

Each of the vector components is sorted from the lowest to the highest value. The reason for sorting is to account for the rotation of the object, while the use of colour differences allows the algorithm to find good points of interest even during an illumination change. This algorithm does not have rotation invariant abilities since all matchings are done based on the colour histogram. The reason for using 4-connected neighbourhood data instead of 8-connected neighbourhood is to produce as much as possible candidate patches in the early stage, which later will be filtered by the subsequent processes. Each vector from the initial frame is compared with each vector in the next frame. The decision rule
_{1}, which was found from experiments to be optimal in the range 11 to 13. Let

All the matched vectors are candidates for locations at which patches are built. Patches for the second frame are generated around the location of the matched vector in the first frame with respect to the original bounding box.

A subsequent test for distinguishing overlapping patches is performed after all patches have been assigned location and size. This is performed due to different matched features possibly lying close to each other. If the difference is small, the patch should be smoothed out as one. This is done in order to reduce the calculation burden by reducing the number of patches. Moreover, most of the small differences occur because of “noise” in the patch generation process. The decision rule _{3} for determining overlapping patches is calculated as in _{a}_{2}% of the original patch size.

The new combined patch location, (_{L}_{3}s detected. Comprehensive pseudo-code of both points of interest and possible patches generation are given in Algorithm 1.

_{2}to decide either to accept the POI or not

_{1}<

_{1}

_{2}= 1 The original size of each patch is similar with the final size of the previous frame (d) Smooth out overlapping patches by combining redundant patches

_{a}

_{2}×

_{w}

_{h}

Patch matching is performed to find the patch where the object most likely resides. The match is done by comparing the histograms of previous and current frame patches. Two colour models are considered,

where

After the number of patches has been finalized, histogram correlation (
_{
}) between current frame patches and previous frame patch is used to identify the object. The test is divided into two levels, where the first level is used to obtain the match under normal illumination, while the second-level test is initiated when an illumination change is detected. The first-level test depends on RGB colour space while a 1-dimensional hue histogram is used for the second level. Let _{b}_{i}^{th}_{
}) is zero, it signifies a very low correlation value that indicates an illumination change has occurred or no match is found.

For a 1-dimensional histogram:

where

and for a 3-dimensional histogram:

where

Since some of the matched vectors are found near the border of the image, certain patches may have some regions with components outside the frame. In this situation, we set out of bound components to be low (black), which consequently increases the probability of detecting occlusion. The patch with the highest correlation,
_{3} or otherwise the second-level test is initiated. The optimal value for
_{3} is found from extensive simulations to be around 0.73. Let _{d}

For the second-level test, both previous and current frames are transformed from RGB to HSV colour model. Only the hue channel is utilized where the illumination has changed due to the previous assumption. The reason for utilizing only hue information under illumination change is because of its stability compared with other colour information [_{4} in order to determine if the object still resides in the frame or not. Let _{4} represent the label of detecting the object.

For the probabilistic approach, histogram matching is done by modelling the relationship between two histograms as a Poisson distribution as in

For a 1-dimensional histogram:

and for a 3-dimensional histogram:

A maximum likelihood approach is used to find the matched patch for both colour models. The likelihoods are modelled by _{5} denotes the matched patch and

There are two candidates for the most likely patch. The decision to choose the hue colour model over RGB is made by using a Neyman–Pearson hypothesis test [_{0}) = _{2}(_{4},_{R}_{G}_{B}_{1}) = _{3}(_{4},_{1} represent the threshold for the Neyman–Pearson hypothesis test. If the test favours _{0}, _{p}_{1} is chosen, _{p}_{p}_{6} from the test will be the final matched patch.

Position smoothing is used to adjust the patch's centroid to precisely fit the object's centroid. Sometimes, the calculated patch is slightly misaligned with the tracked object. This error is prevalent during illumination change and low ambient illumination. The adjustment is divided into two cases depending on the value of _{5} denote a weight factor that takes values in [0 1].

The translation test for adjusting the patch location is performed in four directions as shown in _{d}_{d}

Every correlation (
_{
}) of the four new patches
_{6}) correlation (
_{
}^{current}) are compared. The new patch location is selected based on the maximum correlation
_{
}^{max} among them. If the original patch correlation is the maximum, the position will remain the same. If any of the new patch's correlation is the maximum, the detected patch is shifted toward that corresponding direction. The new maximum correlation is reset when the patch is moved. The procedures are repeated until the maximum correlation among the new patches is less than the original patch correlation.
_{
}^{max} is stored for later usage during the shrinkage and expansion test.

For the case of _{d}

In the probabilistic approach, the same four new candidate patches are created for adjusting the patch position as shown in _{6})— leftward
_{10} denote the output of position smoothing.

For each iteration, the pivot position is reinitialized by letting _{6} = _{10}, so that all four new translated patches for the next iteration are built around _{10}. The algorithm is iterated until the estimated patch position remains the same as shown by the decision rule _{5}.

This section focuses on adjusting the size of the patch so that it provides a good fit to the tracked object. Generally, the apparent size of the object becomes bigger as it moves closer to the camera and smaller as it moves away. However, size increment and decrement between consecutive frames should not be very large. Based on this assumption, we limit the scale change for size smoothing by at most a factor of

We first consider the case where _{d}

A test to determine the size pattern is performed to find out whether the object is expanding or shrinking. Here, only shrinkage patterns are considered. RGB histograms are generated for all new shrinkage patterns. Then, the histogram's size is normalized before correlations between the new patches and the anchor patch _{8} are calculated. Let _{β}_{1} and _{β}_{2} denote the number of pixels inside patches from the previous and current frames respectively. Each histogram bin value _{β}_{2} to _{β}_{1}.

The average correlation among the channels for each patch is calculated. The weighted correlation
_{6} is used to find
_{C}^{max} and the average of _{C}

Then the weighted correlation is compared with the maximum correlation (_{Cβ}_{8}) from the location smoothing to determine the size pattern, _{6}.

_{ 3}based on RGB space (b)

_{d}

_{d}

_{d}

^{nd}

^{nd}

_{ 1}for each patch (e) Select

_{d}

_{ }(e)

_{8}=

_{6}

_{ }

^{max}>

_{ }

^{ori}

_{8}=

_{6}

_{9}histograms either to use shrinkage or expansion pattern (c) Normalize histograms size for fairer histogram matching (d)

_{ }

^{i}< (

_{ }

^{max}−

_{13}= 0

_{13}= 1

_{ }

^{j}

_{ }

^{max}+

_{14}= 0

_{14}= 1

For the shrinkage pattern, the patches used are shown in _{8} as shown in

For the case of _{d}_{1} and _{2}. Both parameters are predefined values, as it is hard to find good closed form expression for them due to the complexity of the scene when the illumination changes. _{1} is applied during the test for determining the size pattern, while _{2} is applied during the shrinkage and expansion test. The algorithm stops when there is no size change or the iteration has exceeded two times. Full pseudo-code for the deterministic approach is given in Algorithm 2.

For the probabilistic approach, _{10} is used as the pivot point for creating all the new patches. A Bayesian approach is used to decide the final patch size _{13} from among the nine patches, including the original patch _{10}.

Since

The value of _{p}_{p}_{p}

Two sets of prior probabilities are used. These depend on whether the size of the detected object inclines towards expansion or shrinkage. The selection of a suitable prior probability is very important, as the likelihood of shrinkage is usually large even when the object expands. Thus, we apply lower prior probabilities to shrinkage candidates if the size is increasing. In order to determine which set of the prior probabilities to use, again a Neyman–Pearson hypothesis test is implemented, where _{0} and _{1} represent the expansion and shrinkage hypotheses. Only eight candidate patches are used (four shrinkage patches + four expansion patches) for this test where the same Poisson distribution as in _{0} probability, while the maximum probability among the shrinkage patches represents the _{1} probability.

Let _{2} be the threshold for the Neyman–Pearson test.

After the prior probability is obtained, each of the nine patches posterior
_{9} decision rule. Each side size is altered depending on whether the new posteriors exceed the original size posterior. _{9} equal to one indicates that the size is updated, while _{9} equal to zero indicates that the size remains constant.

Probabilistic PBOD will follow the same rules as deterministic PBOD, which allows each side to be independently updated as shown in

The accuracy and effectiveness of PBOD were validated rigorously, in which the simulations are divided into three subsections:

Histogram matching performance

Deterministic and probabilistic PBOD

Probabilistic PBOD, Kernel tracker and SIFT-based tracker.

_{ }

_{1}and

_{ }

_{3}for all patches (b) Find

^{max}for both

_{ }

_{1}and

_{ }

_{3}(c) Apply Neyman–Pearson to decide between RGB and HSV colour space

_{1}is true

_{p}

_{10}=

_{6}

_{0}

_{1}

_{1}is favoured select 2

^{nd}

^{st}

_{5}(

_{10}|

_{5}(

_{i}

We have selected 150 image pairs from various videos from Youtube, which contain challenging scenes between two consecutive frames. Some of the challenges that reduce accuracy and precision of the tracker are illumination changes, shadows, non-rigid object, blur and partial occlusion. The size of the frame varies from 320 × 240 to 960 × 720. The target object is not just a human, but also includes book, animal, ball and many more for both indoor and outdoor environments. However, only one object is tracked each time, since we limit the algorithm to single object tracking. Our tracked object varies in size from frame to frame and from video to video, in which the smallest size is 30 × 6, and the largest size is 341 × 365. Generally, a bigger tracked object will tend to perform better due to the smaller number of candidate patches after patch smoothing. They also tend to overlap with the ground truth after the patch matching process, which later will be fine-tuned by position and size smoothing. For a small size object, the possibility of overlapping is smaller compared with the bigger object, which diminishes the advantage of having position and size smoothing processes. _{1} and _{2} should be within [0.00005,0.0005] based on our repeated simulations, while
_{1} and
_{2} between 0.6 to 0.75 will give good results.
_{3} is used to determine either to continue the test to HSV space or just stop at RGB space, while
_{4} is the threshold to indicate that the object is already out of the frame. So, correlation value of 0.7 and above will give good results for both tests, where less stringent value will favour RGB space while a more stringent value will favour HSV space. Step size will be determined by
_{5} and we use 0.1 scale of the tracked object's size. Smaller step size will give better accuracy but total number of iterations will be bigger and _{6} is used to weight the contribution of maximum correlation and average correlation for size smoothing procedures in deterministic PBOD. We reduced the effect of dependency on maximum correlation alone, as sometimes it may be obtained from noisy or blur patch by adding the averaging components. 0 5 is chosen as it gives good balance between both components. We also analyze the effect of the histogram's size (_{b}

The algorithm performance is measured by calculating the Euclidean distance _{sim}_{truth}

In terms of average processing speed, the method by Yan

In this subsection, we demonstrated that the most appropriate histogram matching for PBOD is by using Poisson modelling. Partial of the probabilistic PBOD, which is without position and size smoothing is used to validate the best scheme for histogram matching. Five methods have been tested on 150 image pairs for various image conditions, including illumination changes, shadow, non-rigid and homogenous texture object. Those five methods are:

Correlation distance,
_{
}.

Poisson distance,
_{
}.

Chi square distance,
_{
}^{2}.

Intersection distance,
_{
}.

Bhattacharyya distance,
_{
}.

Correlation distance is based on Bradski and Kaehlerv [_{i}^{th}_{i}_{i}_{b}

For a 1-dimensional histogram:

and for a 3-dimensional histogram:

Intersection distance is based on the work by Swain and Ballard [

For a 1-dimensional histogram:

and for a 3-dimensional histogram:

Early formulation of Bhattacharyya distance can be traced back to [

For a 1-dimensional histogram:

and for a 3-dimensional histogram:

Distance error relative to the patch size
_{
} is also calculated to show the magnitude of the distance error compared with the object size. The average distance error among the methods is shown in

This subsection is intended to prove that probabilistic PBOD performs better than deterministic PBOD. 120 image pairs are used to verify the performance difference. Again,

Moreover, no error of more than 99 pixels has been observed for the probabilistic PBOD while there are nine image pairs for the deterministic PBOD. The average distance error relative to the output patch size is given in

In this section, we compare probabilistic PBOD with two other trackers,

The main reason for our better performance compared with the method by Yan

In this paper, we have shown that probabilistic PBOD works better compared with the deterministic approach in obtaining observation for single object tracking. Probabilistic PBOD registered 48.33% detection with less than 10 pixels error while the deterministic approach only achieved 42.50%. Both PBODs work well in challenging scenes, especially for the problems of low image sharpness, moderate deformation, illumination change, blur, size variation and homogeneous texture, by fusing feature- and template-based approaches. Probabilistic PBOD also performs better than kernel tracker by Yan

Block diagram of deterministic and probabilistic PBODs.

Neighbourhood pattern used for vector generation.

Examples of constructing new patches between the frames. The bounding boxes are aligned with respect to the matched vectors in the first frame. (

Example of several patches combination. (

Procedures for selecting the right patch. (

Patches coordination for location smoothing (

Sample output of position and size smoothing algorithm (

Patterns for shrinkage patch (

Patterns for expansion patch (

Example of the patch expansion (

Cumulative distribution of error distance among the histogram matching methods.

Cumulative distribution of error distance between probabilistic and deterministic PBOD.

Deterministic and probabilistic PBOD under illumination change: (

Deterministic and probabilistic PBOD for blur object: (

Generating centroid for the SIFT-based tracker (

Cumulative distribution of error distance between probabilistic PBOD, Kernel tracker and SIFT-based tracker.

Parameters used by our algorithm.

_{1} |
0.0001 |

_{2} |
0.0001 |

_{1} |
0.6 |

_{2} |
0.6 |

_{3} |
0.7 |

_{4} |
0.7 |

_{5} |
0.1 |

_{6} |
0.5 |

_{b} |
50 |

Comparison of the average distance error among histogram matching methods: A: correlation, B: chi-square, C: intersection, D: Bhattacharyya and E: Poisson for various histogram's size.

| |||||

_{
} |
26.51 | 22.35 | 26.88 | 22.37 | 19.16 |

| |||||

_{
} |
28.62 | 19.93 | 20.87 | 20.17 | 16.36 |

| |||||

_{
} |
21.11 | 17.27 | 17.84 | 18.04 | 15.02 |

Big-O notation for the algorithms.

Deterministic PBOD | 1 ^{3}) + 7 ^{2}) |

Probabilistic PBOD | 2 ^{4}) |

Yan |
3 ^{2}) + 8 |

SIFT based approach | ^{2}) + 2 ^{2}) |

Centroid distance for histogram matching methods: A: correlation, B: chi-square, C: intersection, D: Bhattacharyya and E: Poisson.

| |||||
---|---|---|---|---|---|

0–9 | 50 | 50 | 50 | 50 | 55 |

10–19 | 46 | 55 | 53 | 56 | 51 |

20–29 | 19 | 19 | 17 | 18 | 21 |

30–39 | 9 | 7 | 7 | 7 | 8 |

40–49 | 10 | 8 | 9 | 8 | 7 |

50–59 | 6 | 7 | 6 | 6 | 7 |

60–69 | 2 | 0 | 4 | 0 | 0 |

70–79 | 3 | 2 | 2 | 3 | 1 |

80–89 | 0 | 0 | 0 | 0 | 0 |

90–99 | 2 | 1 | 1 | 1 | 0 |

>99 | 3 | 1 | 1 | 1 | 0 |

Comparison of the centroid distance between deterministic and probabilistic PBOD.

| ||
---|---|---|

0–9 | 51 | 58 |

10–19 | 31 | 43 |

20–29 | 14 | 12 |

30–39 | 4 | 2 |

40–49 | 2 | 2 |

50–59 | 4 | 2 |

60–69 | 0 | 1 |

70–79 | 2 | 0 |

80–89 | 2 | 0 |

90–99 | 1 | 0 |

>99 | 9 | 0 |

Comparison of the average distance error between deterministic and probabilistic PBOD.

_{
} |
21.03 | 10.03 |

Comparison of the centroid distance among Probabilistic PBOD, kernel tracker and SIFT-based tracker.

| |||
---|---|---|---|

0-9 | 52 | 45 | 40 |

10-19 | 30 | 34 | 19 |

20-29 | 19 | 24 | 9 |

30-39 | 13 | 6 | 3 |

40-49 | 3 | 2 | 6 |

50-59 | 0 | 5 | 8 |

60-69 | 1 | 1 | 2 |

70-79 | 0 | 0 | 0 |

80-89 | 0 | 0 | 2 |

90-99 | 0 | 0 | 2 |

>99 | 2 | 3 | 30 |

Comparison of the average distance error among the PBOD, Kernel Tracker and SIFT-based tracker.

Average
_{
} (%) |
10.03 | 15.88 | 35.93 |