Figure 1.
      Example images and ground truth labels for the university dataset.
  
 
   Figure 1.
      Example images and ground truth labels for the university dataset.
  
 
  
    
  
  
    Figure 2.
      Example images and ground truth labels for the beach dataset.
  
 
   Figure 2.
      Example images and ground truth labels for the beach dataset.
  
 
  
    
  
  
    Figure 3.
      Example images and ground truth labels for the shore dataset.
  
 
   Figure 3.
      Example images and ground truth labels for the shore dataset.
  
 
  
    
  
  
    Figure 4.
      Visualization of the area of interest for surveillance in the beach dataset.
  
 
   Figure 4.
      Visualization of the area of interest for surveillance in the beach dataset.
  
 
  
    
  
  
    Figure 5.
      The network architecture of the proposed person detection method.
  
 
   Figure 5.
      The network architecture of the proposed person detection method.
  
 
  
    
  
  
    Figure 6.
      Loss function (logarithmic scale) of the first 20 epochs during training for each dataset. (*) indicates that temporal input is used.
  
 
   Figure 6.
      Loss function (logarithmic scale) of the first 20 epochs during training for each dataset. (*) indicates that temporal input is used.
  
 
  
    
  
  
    Figure 7.
      Detection results for the university dataset: (a) original infrared image, (b) ground truth, (c) simple thresholding, (d) adaptive thresholding, (e) background subtraction, (f) K-means clustering, (g) baseline convolution neural network (CNN), (h) our method, and (i) our method*.
  
 
   Figure 7.
      Detection results for the university dataset: (a) original infrared image, (b) ground truth, (c) simple thresholding, (d) adaptive thresholding, (e) background subtraction, (f) K-means clustering, (g) baseline convolution neural network (CNN), (h) our method, and (i) our method*.
  
 
  
    
  
  
    Figure 8.
      Detection results for the beach dataset: (a) original infrared image, (b) ground truth, (c) simple thresholding, (d) adaptive thresholding, (e) background subtraction, (f) K-means clustering, (g) baseline CNN, (h) our method, and (i) our method*.
  
 
   Figure 8.
      Detection results for the beach dataset: (a) original infrared image, (b) ground truth, (c) simple thresholding, (d) adaptive thresholding, (e) background subtraction, (f) K-means clustering, (g) baseline CNN, (h) our method, and (i) our method*.
  
 
  
    
  
  
    Figure 9.
      Detection results for the shore dataset: (a) original infrared image, (b) ground truth, (c) simple thresholding, (d) adaptive thresholding, (e) background subtraction, (f) K-means clustering, (g) baseline CNN, (h) our method, and (i) our method*.
  
 
   Figure 9.
      Detection results for the shore dataset: (a) original infrared image, (b) ground truth, (c) simple thresholding, (d) adaptive thresholding, (e) background subtraction, (f) K-means clustering, (g) baseline CNN, (h) our method, and (i) our method*.
  
 
  
    
  
  
    Figure 10.
      Detection results over time for the beach dataset. The top row shows the original infrared image; the middle row shows the ground truth; the bottom row shows the detection results based on our proposed method.
  
 
   Figure 10.
      Detection results over time for the beach dataset. The top row shows the original infrared image; the middle row shows the ground truth; the bottom row shows the detection results based on our proposed method.
  
 
  
    
  
  
    Figure 11.
      Precision-recall curve of pixel-wise detection results. Yellow boxes indicate the detection threshold at each point on the curve.
  
 
   Figure 11.
      Precision-recall curve of pixel-wise detection results. Yellow boxes indicate the detection threshold at each point on the curve.
  
 
  
    
  
  
    Table 1.
    Details of each dataset.
  
 
  
      Table 1.
    Details of each dataset.
      
        | Dataset | University | Beach | Shore | 
|---|
| Number of images | 120 | 160 | 540 | 
| Resolution | 1364 × 1024 | 1920 × 1080 | 1920 × 1080 | 
| Duration | 120 s | 40 s | 180 s | 
| Sampling rate | 1 Hz | 4 Hz | 3 Hz | 
      
 
  
    
  
  
    Table 2.
    Pixel-level detection score for the university dataset.
  
 
  
      Table 2.
    Pixel-level detection score for the university dataset.
      
        | Method | Precision | Recall | F1-Score | 
|---|
| Simple thresholding | 0.040 | 0.871 | 0.076 | 
| Adaptive thresholding | 0.083 | 0.312 | 0.131 | 
| Background subtraction | 0.534 | 0.951 | 0.684 | 
| K-means clustering | 0.037 | 0.966 | 0.071 | 
| Baseline CNN | 0.503 | 0.901 | 0.646 | 
| Our method | 0.726 | 0.919 | 0.811 | 
| Our method* | 0.759 | 0.948 | 0.843 | 
      
 
  
    
  
  
    Table 3.
    Object-level detection score for the university dataset.
  
 
  
      Table 3.
    Object-level detection score for the university dataset.
      
        | Method | Precision | Recall | F1-Score | 
|---|
| Simple thresholding | 0.084 | 0.909 | 0.153 | 
| Adaptive thresholding | 0.068 | 0.932 | 0.128 | 
| Background subtraction | 0.506 | 1.000 | 0.672 | 
| K-means clustering | 0.066 | 0.886 | 0.122 | 
| Baseline CNN | 0.976 | 0.909 | 0.941 | 
| Our method | 1.000 | 0.909 | 0.952 | 
| Our method* | 1.000 | 0.932 | 0.965 | 
      
 
  
    
  
  
    Table 4.
    Pixel-level detection score for the beach dataset.
  
 
  
      Table 4.
    Pixel-level detection score for the beach dataset.
      
        | Method | Precision | Recall | F1-Score | 
|---|
| Simple thresholding | 0.013 | 0.971 | 0.025 | 
| Adaptive thresholding | 0.802 | 0.601 | 0.687 | 
| Background subtraction | 0.214 | 0.590 | 0.314 | 
| K-means clustering | 0.687 | 0.725 | 0.705 | 
| Baseline CNN | 0.510 | 0.471 | 0.490 | 
| Our method | 0.750 | 0.796 | 0.772 | 
| Our method* | 0.645 | 0.760 | 0.698 | 
      
 
  
    
  
  
    Table 5.
    Object-level detection score for the beach dataset.
  
 
  
      Table 5.
    Object-level detection score for the beach dataset.
      
        | Method | Precision | Recall | F1-Score | 
|---|
| Simple thresholding | 0.131 | 0.953 | 0.230 | 
| Adaptive thresholding | 0.769 | 0.836 | 0.801 | 
| Background subtraction | 0.313 | 0.661 | 0.425 | 
| K-means clustering | 0.622 | 0.684 | 0.652 | 
| Baseline CNN | 0.971 | 0.585 | 0.730 | 
| Our method | 1.000 | 0.877 | 0.935 | 
| Our method* | 0.961 | 0.860 | 0.907 | 
      
 
  
    
  
  
    Table 6.
    Pixel-level detection score for the shore dataset.
  
 
  
      Table 6.
    Pixel-level detection score for the shore dataset.
      
        | Method | Precision | Recall | F1-Score | 
|---|
| Simple thresholding | 0.383 | 0.801 | 0.518 | 
| Adaptive thresholding | 0.254 | 0.320 | 0.283 | 
| Background subtraction | 0.293 | 0.551 | 0.383 | 
| K-means clustering | 0.302 | 0.649 | 0.412 | 
| Baseline CNN | 0.478 | 0.569 | 0.520 | 
| Our method | 0.711 | 0.914 | 0.800 | 
| Our method* | 0.652 | 0.933 | 0.768 | 
      
 
  
    
  
  
    Table 7.
    Object-level detection score for the shore dataset.
  
 
  
      Table 7.
    Object-level detection score for the shore dataset.
      
        | Method | Precision | Recall | F1-Score | 
|---|
| Simple thresholding | 0.459 | 0.834 | 0.593 | 
| Adaptive thresholding | 0.425 | 0.788 | 0.552 | 
| Background subtraction | 0.269 | 0.532 | 0.357 | 
| K-means clustering | 0.349 | 0.819 | 0.490 | 
| Baseline CNN | 0.974 | 0.569 | 0.718 | 
| Our method | 0.977 | 0.813 | 0.888 | 
| Our method* | 0.948 | 0.741 | 0.832 | 
      
 
  
    
  
  
    Table 8.
    Cross-comparison of F1-scores with different train/test configurations. Each cell shows the pixel-level F1 score on the left and the object-level F1-score on the right.
  
 
  
      Table 8.
    Cross-comparison of F1-scores with different train/test configurations. Each cell shows the pixel-level F1 score on the left and the object-level F1-score on the right.
      
        |  | Test Dataset | University | Beach | 
|---|
| Train Dataset |  | 
|---|
| University | 0.811 / 0.952 | 0.436 / 0.749 | 
| University* | 0.843 / 0.965 | 0.278 / 0.439 | 
| Beach | 0.294 / 0.383 | 0.772 / 0.935 | 
| Beach* | 0.089 / 0.354 | 0.698 / 0.907 | 
| University + Beach | 0.779 / 0.901 | 0.435 / 0.702 | 
| University + Beach* | 0.618 / 0.782 | 0.604 / 0.825 | 
      
 
  
    
  
  
    Table 9.
    Computation time per frame for each method.
  
 
  
      Table 9.
    Computation time per frame for each method.
      
        | Method | Dataset #1 Processing Time (s) | Dataset #2 Processing Time (s) | Dataset #3 Processing Time (s) | 
|---|
| Simple thresholding | 0.001 | <0.001 | <0.001 | 
| Adaptive thresholding | 0.008 | <0.001 | 0.003 | 
| Background subtraction | 0.007 | 0.001 | 0.007 | 
| K-means clustering | 1.947 | 0.327 | 2.36 | 
| Baseline CNN | 0.148 | 0.102 | 0.068 | 
| Our method | 0.126 | 0.084 | 0.061 | 
| Our method* | 0.156 | 0.088 | 0.070 |