Efficient Pedestrian Detection at Nighttime Using a Thermal Camera

Most of the commercial nighttime pedestrian detection (PD) methods reported previously utilized the histogram of oriented gradient (HOG) or the local binary pattern (LBP) as the feature and the support vector machine (SVM) as the classifier using thermal camera images. In this paper, we propose a new feature called the thermal-position-intensity-histogram of oriented gradient (TPIHOG or TπHOG) and developed a new combination of the TπHOG and the additive kernel SVM (AKSVM) for efficient nighttime pedestrian detection. The proposed TπHOG includes detailed information on gradient location; therefore, it has more distinctive power than the HOG. The AKSVM performs better than the linear SVM in terms of detection performance, while it is much faster than other kernel SVMs. The combined TπHOG-AKSVM showed effective nighttime PD performance with fast computational time. The proposed method was experimentally tested with the KAIST pedestrian dataset and showed better performance compared with other conventional methods.


Introduction
For the commercialization of the advanced driver assistance system (ADAS), the most important factors are reliability and robustness, and pedestrian detection (PD) is certainly one of the ADAS functions that require high reliability and robustness. For a robust and reliable PD, reasonable performance even in the nighttime is important because more than half of pedestrian-related accidents occur in the nighttime, even though the volume of traffic is much less than in the daytime [1,2].
For effective nighttime PD, most studies used a thermal camera sensor because it visualizes objects using the infrared (IR) heat signature and it does not depend on lighting conditions. Among several types of thermal cameras, the far infrared (FIR) sensor is commonly used for PD in the nighttime because thermal radiation from pedestrian peaks in the FIR spectrum [3]. Compared with visible images, FIR images are robust against illumination variation but are significantly affected by weather because FIR sensors capture temperature changes in the output images. For example, pedestrians appear brighter than the background in cold days while they appear darker in hot days [1]. Furthermore, FIR images only contain a single channel of intensity information; thus, information on these images is not as detailed as that of visible images.
Similar to PD using visible images, PD using FIR images also consists of two steps: feature extraction and classification. In the feature extraction step, the features developed for daytime PD can also be used for nighttime PD. For example, local binary patter (LBP) [4] and its variations, such as the HOG-LBP [1,5], center-symmetric LBP (CSLBP) [6], and oriented CSLBP (OCSLBP) [7] were also proposed as daytime PD features. However, the LBP-based features have only orientation information of pixel intensity; therefore, they are sensitive to lighting conditions. On the other hand, there are some methods that use the shape of pedestrians as features. Dai et al. [8] utilized the joint shape and appearance cue to find the exact locations of pedestrians. Wang et al. [9] extracted the features using a shape describer and Zhao et al. [3] proposed the shape distribution histogram (SDH). These shape-based features simply used only pixel intensity information and employed background subtraction methods for fixed camera images. Therefore, shape-based features are not suitable for vehicle environment where complex background is not fixed.
As a robust feature for pedestrian detection, the histograms of oriented gradient (HOG) [10] is one of the most popular PD features and its variations have been proposed [11]. Co-occurrence HOG (CoHOG) is one of the extensions of the HOG and it utilizes pair of orientations for computing histogram feature [12]. N. Andavarapu et al. [13] proposed weighted CoHOG (W-CoHOG) that considers gradient magnitude factor to extracting CoHOG. Spatio-temporal HOG (SPHOG), which contains motion information, is proposed for image sequences with fixed camera [14]. Scattered difference of directional gradient (SDDG) that extracts local gradient information along the certain direction is also proposed for IR images [15]. Kim et al. proposed position-intensity HOG (PIHOG or πHOG) that includes not only HOG but also the detail position and intensity information for vehicle detection [16]. Theses HOG based features utilize only the gradient information based on color images or do not consider the thermal intensity information which is important cue for pedestrian detection in nighttime.
To address these problems of conventional features, we propose a thermal position intensity HOG (TPIHOG or TπHOG). The TπHOG is the extended version of πHOG and it is applied for pedestrian detection in nighttime. Unlike the πHOG, the proposed TπHOG has thermal intensity information and can be computed more simply than πHOG.
With respect to the classification, the linear support vector machine (linear SVM) is widely used as classifier in many studies, such as in [17][18][19] because it is fast and has reasonably good performance. The kernel SVM has better classification performance than the linear SVM but requires a longer computation time owing to kernel expansion [20][21][22]. However, the additive kernel SVM (AKSVM) has better performance than the linear SVM and also has a classification speed comparable with the linear SVM [20,23,24]. Recently, deep learning has also been applied to object detection system. Kim [29]. However, the deep learning based method requires dataset with large clean annotations and training procedure, which takes too much time to converge [30]. In addition, GPU is necessary for the training of deep learning, but it is not suitable for implement autonomous system because the system needs to be embedded system [31].
In this study, we propose a combination of the TπHOG and the AKSVM (TπHOG-AKSVM) to achieve improved performance for nighttime PD in terms of PD performance compared with conventional methods.
The remainder of this paper is organized as follows. In Section 2, some background about the πHOG is presented. Details of the proposed TπHOG-AKSVM are presented in Section 3. Section 4 presents experimental results and discussions, and conclusions are drawn in Section 5.

Position Intensity Histogram of Oriented Gradient (πHOG)
The HOG is defined as the histogram of the magnitude sum for gradient orientations in a cell and it is widely used as an effective feature for PD or vehicle detection (VD). The HOG feature, however, has the limitation that information on gradient position in the cell is lost and the pixel intensity information is not used. Recently, the πHOG has been proposed to address this problem and shows better detection performance than HOG for vehicle detection [16]. The πHOG contains not only the HOG but also additional information about gradient Position and pixel Intensity. The πHOG consists of three parts: the Position (P) part, the Intensity (I) part, and the conventional HOG. The P part of the πHOG is extracted by computing the average position of each orientation bin. That is, if θ(x, y, c) denotes the orientation of the gradient at position (x, y) of the c-th cell and the orientation bin of the gradient B(x, y, c) is defined by where T is the number of bins and B(x, y, c) ∈ {1, · · · , T}, then the averages of x and y positions of the d-th bin (d ∈ {1, · · · , T}) in the c-th cell are defined by where c s denotes the cell size and I(·) is the binary function which returns 1 if the input argument is true and 0 if the argument is false. Then the P part of the c-th cell in πHOG is x,T and M c y = M c y,1 , · · · , M c y,T . The I part of the πHOG can be defined in terms of the pixel intensity of vehicle images. There are a variety of shapes and sizes of vehicles, but all types of vehicles have low intensity values in some common areas, such as tires and bottom of the vehicles. Using this knowledge, the intensity invariant region (IIR) was proposed in [16]. The IIR is defined as the region of pixels in which the corresponding standard deviation is relatively low. Then, the I part is defined as the summation of standard normal deviation values in the IIR. A detailed procedure for extracting the I part of the πHOG is explained as follows. When a set of positive vehicle images V + = {s 1 , s 2 , · · · , s N v } is given, where N v is the number of the training images in V + ; s = [s 1 , s 2 , · · · , s N ] T ∈ N is the vehicle image and all the images in V + are resized to images of same size and aligned; s j is the intensity value of jth pixel value of s and N is the image size, the mean and standard deviation of the vehicle images are defined as where • denotes component-wise multiplication. In Equation (4), low σ means that the corresponding region has similar intensity values over all types of vehicles including sedans, trucks or sport utility vehicles (SUVs). Therefore, the region with low standard deviation σ can be used as a distinctive cue for classifying vehicles. This region was defined as IIR and a new feature was extracted from the IIR [19]. To determine the IIR, the values of σ are divided into M intervals and the binary mask U k (k = 1, 2, ..., M) is constructed as where ξ k is Finally, the I part of the πHOG is the feature for the IIR region masked by U k of standard normal deviate image z. That is, if the test image s ∈ N is given, then z is computed by and the I part is defined by where Figure 1 is an example of computing I part using 4 IIR masks from testing images.
and the I part is defined by Figure 1 is an example of computing I part using 4 IIR masks from testing images. Finally, the πHOG is defined as a concatenation of the three parts, the HOG, the P part and the I part. Finally, the πHOG is defined as a concatenation of the three parts, the HOG, the P part and the I part. Figure 2 shows some examples of pedestrians in thermal images. As shown in the figure, pedestrian detection using a thermal sensor is quite different from PD using a visible sensor owing to the characteristics of the thermal images.

Proposed Method
In thermal images, pedestrians appear brighter than the background and they do not include any color information, only silhouettes. In addition, their intensities vary according to changes in the weather because the thermal sensors visualize temperature radiation from the objects in the images. Therefore, it is important to extract features that can reliably capture pedestrian silhouette under various weather conditions in thermal images.  Figure 2 shows some examples of pedestrians in thermal images. As shown in the figure, pedestrian detection using a thermal sensor is quite different from PD using a visible sensor owing to the characteristics of the thermal images. In thermal images, pedestrians appear brighter than the background and they do not include any color information, only silhouettes. In addition, their intensities vary according to changes in the weather because the thermal sensors visualize temperature radiation from the objects in the images. Therefore, it is important to extract features that can reliably capture pedestrian silhouette under various weather conditions in thermal images.

Proposed Method
In previous works [1,5], the HOG was popularly used for nighttime pedestrian detection because it captures the appearance of pedestrians by stacking gradient information. However, in this paper, a new feature called the TπHOG is proposed to improve nighttime PD performance of the HOG. The proposed TπHOG is based on the πHOG [16] and it is developed so that the TπHOG has more distinctiveness than the HOG when thermal sensors are used. The TπHOG includes not only thermal gradient information but also its locations and thermal intensities. The TπHOG is not a simple application of the πHOG to thermal images, but it is redesigned to handle PD problems in thermal images.
In addition, instead of the linear SVM, the additive kernel support vector machine (AKSVM) is used as a classifier to enhance the detection performance, as well as the detection time.

Thermal Position Intensity Histogram of Oriented Gradient (T HOG)
The πHOG takes longer time and thus, computationally more expensive than the HOG since the πHOG requires additional pixel-wise computation to compute the mean of pixel locations. However, the pixel-wise computation in the πHOG is not suitable for commercial PD because it requires real time operation. Thus, in the proposed TπHOG, the cell-wise approach, and not the pixel-wise approach, was adopted to reduce the computational time. Since the cell values have already been computed in extracting the HOG, less computation is required to compute the TπHOG compared with the conventional HOG.
Furthermore, unlike the I part based on the IIR in [16], the I part in this paper was increased such that it has the same size as the orientation channel of the HOG. This is because the I part in the original work [16] used only 4 values from the 4 IIR masks as features and it had relatively small effects on the PD performance compared with the P or the HOG parts.
The TπHOG consists of four parts: the T part, the Position (P) part, the Intensity (I) part, and the conventional HOG. In the conventional HOG, we used the HOG of [32] which has 27 gradient channels (18 signed orientations, 9 unsigned orientations) and 4 gradient energy channels using different normalization methods. A detailed description of these four parts is presented in following subsections. In previous works [1,5], the HOG was popularly used for nighttime pedestrian detection because it captures the appearance of pedestrians by stacking gradient information. However, in this paper, a new feature called the TπHOG is proposed to improve nighttime PD performance of the HOG. The proposed TπHOG is based on the πHOG [16] and it is developed so that the TπHOG has more distinctiveness than the HOG when thermal sensors are used. The TπHOG includes not only thermal gradient information but also its locations and thermal intensities. The TπHOG is not a simple application of the πHOG to thermal images, but it is redesigned to handle PD problems in thermal images.
In addition, instead of the linear SVM, the additive kernel support vector machine (AKSVM) is used as a classifier to enhance the detection performance, as well as the detection time.

Thermal Position Intensity Histogram of Oriented Gradient (TπHOG)
The πHOG takes longer time and thus, computationally more expensive than the HOG since the πHOG requires additional pixel-wise computation to compute the mean of pixel locations. However, the pixel-wise computation in the πHOG is not suitable for commercial PD because it requires real time operation. Thus, in the proposed TπHOG, the cell-wise approach, and not the pixel-wise approach, was adopted to reduce the computational time. Since the cell values have already been computed in extracting the HOG, less computation is required to compute the TπHOG compared with the conventional HOG.
Furthermore, unlike the I part based on the IIR in [16], the I part in this paper was increased such that it has the same size as the orientation channel of the HOG. This is because the I part in the original work [16] used only 4 values from the 4 IIR masks as features and it had relatively small effects on the PD performance compared with the P or the HOG parts.
The TπHOG consists of four parts: the T part, the Position (P) part, the Intensity (I) part, and the conventional HOG. In the conventional HOG, we used the HOG of [32] which has 27 gradient channels (18 signed orientations, 9 unsigned orientations) and 4 gradient energy channels using different normalization methods. A detailed description of these four parts is presented in following subsections.

T Channel Part
For the first part of the TπHOG, we used the T channel proposed in [33]. The T channel is defined as an aggregated version of a thermal image. For example, given 64 × 32 IR images and 4 × 4 cell size, the T channel has 16 × 8 cells and the value in each cell is the sum of pixel intensities within the cell. Figure 3 shows an example of an IR image and its T channel.
Unlike the visible image, the pedestrians have higher pixel intensity values than backgrounds in IR images. Thus, T channel that consists of aggregations of IR intensities can play an important role in classifying pedestrians from other objects. For the first part of the TπHOG, we used the T channel proposed in [33]. The T channel is defined as an aggregated version of a thermal image. For example, given 64 32 

IR images and 4 4
 cell size, the T channel has 16 8  cells and the value in each cell is the sum of pixel intensities within the cell. Figure 3 shows an example of an IR image and its T channel. Unlike the visible image, the pedestrians have higher pixel intensity values than backgrounds in IR images. Thus, T channel that consists of aggregations of IR intensities can play an important role in classifying pedestrians from other objects.

Position Part
In the P part of the TπHOG, cell locations of the gradients and not pixel locations, are used unlike in the πHOG. In the feature implementation, the HOG consists of multiple orientation channels and each channel contains the bin value for the corresponding orientation of a cell histogram. In Figure 4, shown are the examples of HOG that has 16 8  cells with 9 gradient orientations. In Figure 4, the values in each channel denote the bin values of cell histogram for the corresponding orientation. For example, the first channel of the HOG contains the first bin value of the cell histogram. In computing the P part of the TπHOG, we divide the HOG cells into several blocks as shown in Figure 5.

Position Part
In the P part of the TπHOG, cell locations of the gradients and not pixel locations, are used unlike in the πHOG. In the feature implementation, the HOG consists of multiple orientation channels and each channel contains the bin value for the corresponding orientation of a cell histogram. In Figure 4, shown are the examples of HOG that has 16 × 8 cells with 9 gradient orientations.
defined as an aggregated version of a thermal image. For example, given IR images and 4 4  cell size, the T channel has 16 8  cells and the value in each cell is the sum of pixel intensities within the cell. Figure 3 shows an example of an IR image and its T channel. Unlike the visible image, the pedestrians have higher pixel intensity values than backgrounds in IR images. Thus, T channel that consists of aggregations of IR intensities can play an important role in classifying pedestrians from other objects.

Position Part
In the P part of the TπHOG, cell locations of the gradients and not pixel locations, are used unlike in the πHOG. In the feature implementation, the HOG consists of multiple orientation channels and each channel contains the bin value for the corresponding orientation of a cell histogram. In Figure 4, shown are the examples of HOG that has 16 8  cells with 9 gradient orientations. In Figure 4, the values in each channel denote the bin values of cell histogram for the corresponding orientation. For example, the first channel of the HOG contains the first bin value of the cell histogram. In computing the P part of the TπHOG, we divide the HOG cells into several blocks as shown in Figure 5. In Figure 4, the values in each channel denote the bin values of cell histogram for the corresponding orientation. For example, the first channel of the HOG contains the first bin value of the cell histogram. In computing the P part of the TπHOG, we divide the HOG cells into several blocks as shown in Figure 5. , The P part is defined as the (x, y) location in which each orientation component exists in a block. Assuming that HOG(B, x, y, d) is the value of a cell located at (x, y) of the Bth block in the dth orientation channel, then the P part is defined by where where τ d is the threshold for each orientation. Figure 6 shows an example of the computation of the two values for P 3 2 , the 3rd block in the second orientation channel, when the block size is 4 × 4 cells with 9 orientations.
In Figure 6, only the cells with values of that exceed the threshold τ 2 are used to compute the P part P 3 2 of the TπHOG. Similarly, the P part contains location information of each orientation channel and is computed by where P d = P 1 d , · · · , P 8 d . For the sake of better understanding of P part, the HOG and P part are visualized in Figures 7-9. Shown in Figure 7 are the examples of pedestrians IR images and they are adopted from KAIST pedestrian dataset [33].           The average HOG channels for training pedestrian data are visualized in Figure 8. The average channels are computing using 64 × 32 cropped IR images of 2244 pedestrians. In the figure, the first two rows indicate the HOGs for signed 18 orientations while the third row indicates the HOGs for unsigned 9 orientations.
In Figure 8, the closer the cell is to the red color, the higher the corresponding HOG. As shown in the figure, pedestrians have specific cell parts that have relatively high values for each orientation channel. The conventional HOG uses only these values as the feature but the TπHOG also uses the cell locations of the orientations as well for nighttime PD.
In this paper, the P part of the TπHOG is extracted from non-overlapped blocks that comprising of 4 × 4 cells. For example, assuming the HOG has 31 channels (18 signed orientations, 9 unsigned orientations, 4 different normalizations), the size of the HOG is 16 × 8 × 31 cells and it has 4 × 2 × 31 = 248 blocks. The P part P B o = M B x,o , M B y,o of the TπHOG is extracted from each block and the P part has additional 496 values. Figures 9 and 10, show the visualization of the average P = [P 1 , · · · , P 9 ] of the TπHOG for pedestrians and non-pedestrians, respectively. In Figures 9  and 10, the average P parts are computed for 2244 pedestrian images and 5000 non-pedestrian images, respectively. All the images are 64 × 32 cropped IR images.  As shown in Figures 9 and 10, the average P parts of the pedestrians are focused on a couple of cell locations for each block. In particular, the average P parts of the pedestrians are mostly larger than those of the non-pedestrians except for unsigned orientation of 0  and 100  . Instead, the average P parts of the non-pedestrians usually have lower values than those of pedestrians. Further, in the orientations of 20  and 160  , the P parts of non-pedestrians have the values close to 0 and As shown in Figures 9 and 10, the average P parts of the pedestrians are focused on a couple of cell locations for each block. In particular, the average P parts of the pedestrians are mostly larger than those of the non-pedestrians except for unsigned orientation of 0 • and 100 • . Instead, the average P parts of the non-pedestrians usually have lower values than those of pedestrians. Further, in the orientations of 20 • and 160 • , the P parts of non-pedestrians have the values close to 0 and they are not included in computation of P parts. This difference between the two classes provides the TπHOG with discriminatory power for robust PD compared with the HOG.

Intensity Part
The conventional I part of the πHOG is defined as a partial pixel-wise sum of the standard deviate image [16] within the IIR. Thus, the evaluation of the I part requires pixel-wise computation; however, it is obviously computationally expensive for real-time application. Furthermore, the conventional I part in [16] is 4 dimensions long and too short compared with 3968 (16 × 8 × 31) dimensions of the HOG, thereby producing minimal effect on the PD performance. In this paper, a modified version of the I part is developed for the PD in thermal images. Rather than using IIR masks, the new I part is directly computed from a normal standard deviate image of the set of T channels which are computed from training pedestrian data (64 × 32 cropped IR images of pedestrians). Therefore, the feature length of the I part is the same as that of the T channel. That is, given the T channel set of pedestrian images T + = T 1 , T 2 , · · · , T N p where the superscript '+' means the positive pedestrian samples, N p is the number of the T channels in T + and T ∈ 16×8 . M T and σ T are the mean and standard deviation of T + , respectively, and the I part for a testing T channel s T is computed by How to compute the I part from both pedestrian and non-pedestrian testing images is summarized in Figure 11. channel. That is, given the T channel set of pedestrian images   How to compute the I part from both pedestrian and non-pedestrian testing images is summarized in Figure 11.  Figure 11 is the average of I parts for pedestrians and non-pedestrians. The images are 64 32  cropped IR images of testing dataset of KAIST pedestrians Dataset.

Shown in
As shown in Figure 12, most of the average I parts for pedestrian images are less than 1. In particular, the parts corresponding to the lower body are less than 0.7. On the other hand, the average I parts for non-pedestrian generally are larger than 0.7 and the parts corresponding to the upper body have the values larger than 1. This difference provides the I parts with strong discriminatory power between pedestrians and non-pedestrians. Further, the extraction of the I part is a cell-wise computation and does not require the additional computation for developing the Figure 11. Example of computing I part from testing images. Figure 11 is the average of I parts for pedestrians and non-pedestrians. The images are 64 × 32 cropped IR images of testing dataset of KAIST pedestrians Dataset.

Shown in
As shown in Figure 12, most of the average I parts for pedestrian images are less than 1. In particular, the parts corresponding to the lower body are less than 0.7. On the other hand, the average I parts for non-pedestrian generally are larger than 0.7 and the parts corresponding to the upper body have the values larger than 1. This difference provides the I parts with strong discriminatory power between pedestrians and non-pedestrians. Further, the extraction of the I part is a cell-wise computation and does not require the additional computation for developing the histogram; therefore, the proposed I part is computationally more efficient than that of the conventional πHOG [16]. histogram; therefore, the proposed I part is computationally more efficient than that of the conventional πHOG [16].

Additive Kernel SVM (AKSVM)
The SVM is one of the popular binary classifiers used in object detection in computer vision.
Given the training set

Additive Kernel SVM (AKSVM)
The SVM is one of the popular binary classifiers used in object detection in computer vision.
Given the training set S = with L samples, the SVM is trained to classify input data is mapped to a higher dimensional feature space as φ(·) then the decision function of the SVM is defined by where φ(x) ∈ D , D N, w ∈ D is the weight and b ∈ is the bias. The SVM is trained by finding optimal solutions of w and b which maximizes the margin between the two classes. It can also be trained in dual space using the kernel trick with κ x (i) , x (j) = φ x (i) · φ x (j) ∈ . After training the SVM in dual space, the decision function (11) can be evaluated by (15) is nonlinear, the kernel SVM performs better than the linear SVM in classification. However, it requires the high computation and memory resource owing to the kernel computation with its support vectors for every testing.

denotes a set of support vectors. If the κ(·) of Equation
However, the additive kernels (AK) enable fast computation of the decision function, while maintaining the robust performance of the kernel SVM [23,24]. The AK is defined as the kernel that can be decomposed into a summation of dimension-wise components and it is represented by ., x N } ∈ N and z = {z 1 , z 2 , ..., z N } ∈ N . Various AKs have been reported and they include the linear kernel κ LI N , intersection kernel κ IK , the generalized intersection kernel κ GIK and the χ 2 kernel κ χ 2 defined as The decision function of the SVM in Equation (15) with the AK can be represented by where n , x n (20) and h n (x n ) is a one-dimensional function of x n ∈ . In Equation (20), the α (i) , y (i) are given; therefore the output of h n (·) can be pre-computed for all possible input data x n ∈ and computed output values are stored in look-up-table (LUT) for each h n (·). Assuming the LUT n ∈ N L is the LUT that consists of sampled N L values from h n (·), h n (x n ) of Equation (20) can be simply approximated as where s = 1/N L is the sampling interval on x n . Figure 13 shows an example of of retrieving value of h n (x n ) from LUT n . In Figure 13, the size of LUT n is N L = 25 and the values of LUT n are sampled from h n (·) with sample interval s = 0.04.  (20) can be simply approximated as

 
where is the sampling interval on n x . Figure 13 shows   Figure 13. Example of retrieving value of h n (x n ) from LUT n for x n = 0.85. Therefore, using the LUTs of h n (·), testing of Equation (20) can be simplify carried out as the summation of values taken from the LUT n s without kernel computation as

TπHOG-AKSVM for Nighttime PD
In this subsection, how to combine AKSVM with TπHOG for nighttime PD is explained. The test process of the combination of AKSVM with TπHOG is summarized in Figure 14.

T HOG-AKSVM for Nighttime PD
In this subsection, how to combine AKSVM with TπHOG for nighttime PD is explained. The test process of the combination of AKSVM with TπHOG is summarized in Figure 14. As shown in the figure, the HOG and T channel are extracted from input IR image first. Then P and I parts are extracted from HOG and T channel, respectively. All these features are vectorized and TπHOG is completed by concatenating these vectorized features (T channel, P part, I part, HOG). Then, the score of each component in TπHOG is read off from the LUT and the total score of TπHOG is computed by summing the scores of the component features. Finally, if the total score is larger than 0, the input image is classified as pedestrian. Otherwise, it is classified as non-pedestrian.  As shown in the figure, the HOG and T channel are extracted from input IR image first. Then P and I parts are extracted from HOG and T channel, respectively. All these features are vectorized and TπHOG is completed by concatenating these vectorized features (T channel, P part, I part, HOG). Then, the score of each component in TπHOG is read off from the LUT and the total score of TπHOG is computed by summing the scores of the component features. Finally, if the total score is larger than 0, the input image is classified as pedestrian. Otherwise, it is classified as non-pedestrian.

Experimental Results
In this section, the proposed method is applied to the KAIST pedestrian dataset [14] and its performance is compared with other conventional methods. The KAIST pedestrian dataset consists of a number of pairs of visible-thermal images that are aligned in the image size of 640 × 512. The dataset images were taken by both visible and thermal sensors (FIR) in the day and nighttime at three locations (Campus, Road, Downtown). In this experiment, we use images in the nighttime of the KAIST dataset for training and testing. In the nighttime dataset, there are 838 training images with 1122 annotations and 797 test images.
In this experiment, we set the size of the ROI image to 64 × 32 pixels, the cell size of the HOG to 4 × 4 pixels, and the block size of the TπHOG to 4 × 4 cells. The AKSVM classifiers are trained with TπHOGs of training set using the LibSVM MATLAB toolbox [24,34]. For LUTs of AKSVM, we used the LUT of N L = 100 and s = 0.01. Piotr's Computer Vision Toolbox [35] is also used for feature extraction and testing.
For testing, sliding window approach is employed to detect pedestrians with various scales. In the sliding window approach, the step size is fixed to the cell size and scale ratio is set to 1.09 (1/0.91), which result in 8 scales per octave. As in [36,37], we narrow down the search area by restricting the y-coordinate of the center of search window to lie within 210th and 355th pixel in y-axis. Shown in Figure 15 is an example of our sliding window approach for pedestrian detection. In the figure, the yellow boxes denote the search window, green boxes are the detection results and the red lines denote the boundaries of search region within which the y-coordinate of the center of search window is restricted.  We compare the detection performance of the proposed method with other conventional methods [10,19] using ROC curve and the log-average miss rate. The ROC curve shows the detection performance by plotting miss rate against the false positive per image (FPPI). The lower is the ROC curve, the better the detection performance. The log-average miss rate (MR) is the average of miss rate for FPPI of  Table 1. In this experiment, we set the number of blocks for P part as 8 blocks per orientation layer as Figure 5. We compare the detection performance of the proposed method with other conventional methods [10,19] using ROC curve and the log-average miss rate. The ROC curve shows the detection performance by plotting miss rate against the false positive per image (FPPI). The lower is the ROC curve, the better the detection performance. The log-average miss rate (MR) is the average of miss rate for FPPI of 10 −2 , 10 0 on the log scale. For comparison, we choose HOG-LinearSVM and the ACF-T-THOG [33] as a base line and the state-of-the-art, respectively. ACF-T-THOG utilizes pairs of visible-thermal images of nighttime and extracted ACF from visible images, T-THOG form thermal images. Except ACF-T-THOG, all SVM based classifier (LinearSVM, AKSVM) are trained with thermal images.
Then, the effect of the cell size on detection performance in analyzed. The ROC curves of TπHOG with AKSVM are plotted while varying the cell size in Figure 16. The intersection kernel κ IK in Equation (16) is used. The subsequent results (Log average MR, time per frame) are summarized in Table 1. In this experiment, we set the number of blocks for P part as 8 blocks per orientation layer as Figure 5.
the ROC curve, the better the detection performance. The log-average miss rate (MR) is the average of miss rate for FPPI of  Table 1. In this experiment, we set the number of blocks for P part as 8 blocks per orientation layer as Figure 5.   As shown in Table 1 and Figure 16, the smaller the cell size of TπHOG is, the better detection performance is obtained. TπHOG-IKSVM-cell8 spends comparable detection time with ACF-T-THOG but it demonstrates about 30% higher Log average MR than ACF-T-HOG and HOG-LinearSVM. On the other hand, TπHOG-IKSVM-cell2 demonstrates the best performance among all the competing methods but its detection speed is about 20 times slower than TπHOG-IKSVM-cell4. Considering the trade-off between the detection performance (log average MR) and detection time, the recommended cell size is 4.
To compare the discriminating power of the feature, we plot the ROC curves for several feature combinations with IKSVM in Figure 17. Table 2 presents the detailed information of the feature-classifier combinations used in experiments.
As shown in Figure 17, the THOG-LinearSVM performs better than the HOG-LinearSVM by 0.08% in terms of the log average MR. and THOG-IKSVM shows better detection performance than HOG-IKSVM by 1.2%. From this result, it can be observed that adding the T channel to the HOG improves detection performance from the conventional HOG. To examine the effect of the additive kernel, we measure the detection performance of the THOG-IKSVM. As shown in Figure 17 and Table 2, THOG-IKSVM demonstrates improved performance from THOG-LinearSVM indicating that the additive kernel resulted in significant improved detection performance of the SVM by 4.98% from the linear kernel. As shown in Table 1 and Figure 16, the smaller the cell size of TπHOG is, the better detection performance is obtained. T π HOG-IKSVM-cell8 spends comparable detection time with ACF-T-THOG but it demonstrates about 30% higher Log average MR than ACF-T-HOG and HOG-LinearSVM. On the other hand, TπHOG-IKSVM-cell2 demonstrates the best performance among all the competing methods but its detection speed is about 20 times slower than TπHOG-IKSVM-cell4. Considering the trade-off between the detection performance (log average MR) and detection time, the recommended cell size is 4.
To compare the discriminating power of the feature, we plot the ROC curves for several feature combinations with IKSVM in Figure 17. Table 2 presents the detailed information of the feature-classifier combinations used in experiments.  As shown in Figure 17, the THOG-LinearSVM performs better than the HOG-LinearSVM by 0.08% in terms of the log average MR. and THOG-IKSVM shows better detection performance than HOG-IKSVM by 1.2%. From this result, it can be observed that adding the T channel to the HOG  Finally, we measure the detection performance using the TPHOG and the TπHOG in connection with the AKSVM. The TPHOG which adds the P part to THOG demonstrates 0.92% enhancement from THOG-IKSVM. The TπHOG which adds I part to TPHOG demonstrates improved performance from TPHOG by 0.4% and outperforms all other methods. From these experimental results, it can be seen that the TπHOG-AKSVM enhance the performance of conventional method in terms of both feature and classifier performance.
In terms of detection speed, the TπHOG-IKSVM takes more computational time than HOG-IKSVM by 1.06 s. To be specific, adding P part takes 0.8 s more time than HOG while I part and T channels requires additional 0.1 s compared with HOG. Shown in Figure 18, shown are the ROC curves for the TπHOG with four different additive kernels: linear kernel, intersection kernel κ IK , generalized intersection kernel κ GIK and χ 2 kernel κ χ 2 . The computational time of TπHOG-AKSVMs in Figure 16 are the same as one of TπHOG-IKSVM of Table 2 because they are performed with LookupTable of same size in this experiment.
As shown in Figure 18, all types of TPIHOG-AKSVMs performs bettrer than the HOG-LinearSVM and TπHOG-LinearSVM. Among the AKSVMs, the TπHOG-IKSVM and TπHOG-GIKSVM demonstrate the significant improvement from the TπHOG-LinearSVM, and they also perform better than the ACF-T-HOG. In Figure 19, the examples of the detection results for TπHOG-IKSVM and ACF-T-THOG are compared As shown in Figure 19, the ACF-T-THOG generates a lot of false positives to the vertical objects such as headlight or buildings. On the other hand, the proposed method shows good detection results with no false positive. improves detection performance from the conventional HOG. To examine the effect of the additive kernel, we measure the detection performance of the THOG-IKSVM. As shown in Figure 17 and Table 2, THOG-IKSVM demonstrates improved performance from THOG-LinearSVM indicating that the additive kernel resulted in significant improved detection performance of the SVM by 4.98% from the linear kernel. Finally, we measure the detection performance using the TPHOG and the T π HOG in connection with the AKSVM. The TPHOG which adds the P part to THOG demonstrates 0.92% enhancement from THOG-IKSVM. The TπHOG which adds I part to TPHOG demonstrates improved performance from TPHOG by 0.4% and outperforms all other methods. From these experimental results, it can be seen that the T π HOG-AKSVM enhance the performance of conventional method in terms of both feature and classifier performance.
In terms of detection speed, the T HOG-IKSVM takes more computational time than HOG-IKSVM by 1.06 s. To be specific, adding P part takes 0.8 s more time than HOG while I part and T channels requires additional 0.1 s compared with HOG. Shown in Figure 18, shown are the ROC curves for the TπHOG with four different additive kernels: linear kernel, intersection kernel IK  , generalized intersection kernel GIK  and 2  kernel 2   . The computational time of T π HOG-AKSVMs in Figure 16 are the same as one of TπHOG-IKSVM of Table 2 because they are performed with LookupTable of same size in this experiment. As shown in Figure 18, all types of TPIHOG-AKSVMs performs bettrer than the HOG-LinearSVM and T π HOG-LinearSVM. Among the AKSVMs, the T π HOG-IKSVM and TπHOG-GIKSVM demonstrate the significant improvement from the TπHOG-LinearSVM, and they also perform better than the ACF-T-HOG. In Figure 19, the examples of the detection results for TπHOG-IKSVM and ACF-T-THOG are compared As shown in Figure 19, the ACF-T-THOG generates a lot of false positives to the vertical objects such as headlight or buildings. On the other hand, the proposed method shows good detection results with no false positive.

Conclusions
In this paper, a novel night-time pedestrian detection method using a thermal camera has been proposed. A new feature named TπHOG was developed and it was combined with AKSVM. The

Conclusions
In this paper, a novel night-time pedestrian detection method using a thermal camera has been proposed. A new feature named TπHOG was developed and it was combined with AKSVM. The proposed TπHOG has more robust discriminative power than HOG because it uses not only the gradient information but also cell location of the gradient for each orientation channel. The proposed method was applied to KAIST pedestrian dataset and results show that its detection performance improved compared with other conventional methods for pedestrian detection in the nighttime. A comparison of experimental results with KAIST pedestrian dataset shows that the TπHOG performs better than the HOG in the distinctiveness of feature and the TπHOG-AKSVM shows better performance than other conventional methods.