Classification of Infrared Objects in Manifold Space Using Kullback-Leibler Divergence of Gaussian Distributions of Image Points

: Infrared image recognition technology can work day and night and has a long detection distance. However, the infrared objects have less prior information and external factors in the real-world environment easily interfere with them. Therefore, infrared object classification is a very challenging research area. Manifold learning can be used to improve the classification accuracy of infrared images in the manifold space. In this article, we propose a novel manifold learning algorithm for infrared object detection and classification. First, a manifold space is constructed with each pixel of the infrared object image as a dimension. Infrared images are represented as data points in this constructed manifold space. Next, we simulate the probability distribution information of infrared data points with the Gaussian distribution in the manifold space. Then, based on the Gaussian distribution information in the manifold space, the distribution characteristics of the data points of the infrared image in the low-dimensional space are derived. The proposed algorithm uses the Kullback-Leibler (KL) divergence to minimize the loss function between two symmetrical distributions, and finally completes the classification in the low-dimensional manifold space. The efficiency of the algorithm is validated on two public infrared image data sets. The experiments show that the proposed method has a 97.46% classification accuracy and competitive speed in regards to the analyzed data sets.


Introduction
Feature detection and matching are the basis of many image processing applications in the computer vision domain [1][2][3][4][5] and elsewhere [6]. Infrared small object detection is a focus of ongoing research in numerous areas, such as aircraft tracking [7], ship detection [8], 3D scene reconstruction [9] and video surveillance [10]. Infrared small object recognition in difficult environments such as those with a complex background and object clutter or those with low illumination is highly important, and is a difficult task in infrared search and tracking systems [11]. Contrary to visible light images, the infrared images do not have color information, while luminance is influenced by the thermal radiation of the object and background. Moreover, small infrared objects miss texture features due to the long sensing distance [12]. As a result, common object tracking methods based on visible features cannot recognize the difference between the object and its background in the infrared image [13].
Manifold learning assumes that the infrared object images to be classified are distributed as a set of points on the manifold space [28][29][30][31]. The purpose of manifold learning is to put forward a representation method for mapping manifold data point sets to a low-dimensional space [32]. The prior knowledge of the low-dimensional manifold of an image can be effectively used for the image reconstruction method, as has been demonstrated for the computer tomography images in [33]. The current infrared image classification methods in manifold space mainly map high-dimensional infrared object data point sets to low-dimensional space, and then complete classification of infrared object data points. Knowing the intrinsic structure of data, efficient manifold-based image classification methods can be constructed [34].
The infrared object classification method on the manifold takes advantage of one property of the manifold space, so that the manifold space can be regarded as a small piece of Euclidean space locally [35]. It attempts to obtain the distribution information of the infrared object data point set in the entire manifold with all the low-dimensional local maps. Most of the infrared object classification methods on the manifold are developed under the concept of describing the relationship between points in the data point set in high-dimensional manifold space [36]. In order to better describe the local relationship of infrared object data points on high-dimensional manifolds, initial research focused on the topological relationships between local region data points. The local linear embedding (LLE) method was proposed to describe the local data points in the manifold space by measuring the Euclidean distance between data points [37]. LLE considers that each point can be represented by its surrounding points. The distance between the surrounding data points and the object point within the neighborhood is used as the weight. However, LLE cannot well reconstruct a high-dimensional manifold data point set with unevenly distributed data point sets. In order to further describe the distance relationship between the data points of the infrared object image on the manifold, the isometric feature mapping (ISOMAP) method [38] introduced the concept of geodesic distance. The idea of this method is to construct a feature neighborhood map of infrared object image data points, which can represent the local information of the infrared object image data point set in a highdimensional manifold space. However, this method is more suitable for scenarios where the manifold space of infrared image data points is relatively flat, and the calculation cost is relatively high when calculating the optimal route in the neighborhood of the infrared object image data point set [39]. The Laplace feature mapping algorithm (LE) [40] is a different idea to using the Euclidean distance between points on the manifold. LE expresses the local relationship of the infrared object image data point set in the manifold space through the graph theory. However, LE does not perform well when the data points on the manifold are far away from each other.
Image classification using non-Euclidean manifolds such as a Grassmann manifold and the Symmetric Positive Definite (SPD) manifold [41] is becoming increasingly more attractive. The weights of the iterative manifold embedding (IME) layer are learned by unsupervised strategy, which has been used to analyze the intrinsic manifolds of data sets with missing data [42]. The distribution of image data in multi-view manifold space can be captured by Multi-view Generative Adversarial Network (GAN), which can map the shape and view manifolds in a lower dimensionality latent space [43]. The Wasserstein-driven low-dimensional manifold model (W-LDMM) can be used for noise estimation, image denoising and noisy image inpainting tasks [44]. On the other hand, Riemannian manifolds can be used to visualize geometric transformations in images [45], which has many useful applications for image augmentation aiming to improve the accuracy of classification for small data sets. Pixels corresponding to different image classes tend to be segmented better on the Riemannian manifold than in the spectral space, since image points mapped from the Gaussian probability distribution are radially distributed on the skewed surface of Riemannian manifold [46]. However, due to the specific characteristics of Riemannian manifolds, the traditional machine learning methods often fail on them [47], which motivates the exploration of new feature extraction and classification methods.
Currently, there are still many problems to be addressed in infrared object classification, such as the occlusion of objects, change of object position and change of light. [48]. Improvements in the accuracy of infrared object classification when the infrared object data set has only a few samples (the "small data" problem [49]) and lack of prior knowledge have become important research directions for manifold learning in infrared object classification.
The novelty and contribution of this paper is outlined as follows. We propose a novel manifold learning algorithm for infrared object detection and classification that uses Kullback-Leibler (KL) Divergence to minimize the loss function between two symmetrical distributions of points (the distribution in the manifold space and the distribution of the points of the infrared image), and finally completes the classification in the low-dimensional space.

Materials and Methods
To improve the classification accuracy of infrared objects, this manuscript proposes a manifold learning method for classification. The proposed algorithm (see Figure 1) constructs a highdimensional manifold space by using the infrared object image pixels as the manifold space dimension. The constructed high-dimensional manifold is then mapped into a low-dimensional space. By describing the local probability distribution information of the infrared image data points, the reduced-dimensional data set can accurately retain the high-dimensional information. Finally, the difference between the probability distribution of the data point set in manifold space and the probability distribution of the low-dimensional space is minimized using the KL Divergence.

Construction of High-Dimensional Manifold Space
Each pixel of the infrared object image is used as each dimension in the manifold space to construct a high-dimensional manifold space. The infrared objects of various categories appear as data points in this manifold space. The infrared object image manifold constructed in this paper is a differential manifold, but its local space can be understood as Euclidean-style space. The definition of this high-dimensional infrared object manifold is as follows.
If every point in has an open neighborhood ∈ , and in addition and an open subset in the Euclidean space are homeomorphic, then is an -dimensional topological manifold. Let be a -dimensional manifold and = {( , )} ∈ is a set of coordinate cards. is a differential structure of when the following conditions are met. The manifold formed by { : ∈ } is an open cover, and the mapping of : to ( )Ì is homeomorphic. If ∩ ≠ ∅, and the double mapping : ( ∩ ) to ( ∩ ) and its inverse mapping are all times differentiable, then ( , ) is compatible with ( , ).

Low-Dimensional Mapping and Classification Method of Infrared Object Manifold Space
After the high-dimensional manifold space is constructed, the dimension of the manifold space is still high, and it needs to be reduced to complete the classification of infrared objects. In order to improve the efficiency, we have adopted a dimensionality reduction method for the manifold space. The Gaussian distribution is used to describe the distribution characteristics of the infrared object image data point set in the high-dimensional manifold space, and the Student's t-distribution is used to describe the distribution characteristics of infrared object image data point set after dimensionality reduction in the low-dimensional space. Finally, the difference of two distributions is minimized to complete the infrared object classification process. The steps are mainly divided into the description of the distribution information of the infrared object image data points in two spaces and the process of obtaining the minimum value of the difference between two distributions.

Projection of Infrared Object Image Data Points
The infrared object image data point set is projected onto a plane by random projection. On this plane, all data point sets show a discrete distribution. The data set of infrared object image data in the -dimensional manifold space can be defined by Equations (1)-(3).
Since the infrared object image data points are projected onto a two-dimensional plane, when calculating the distance between these points and the surrounding neighboring points, the distance between and can be defined by the Euclidean distance using the following formula:

Construction of the KNN Map of the Infrared Object Image Data Points
The k-Nearest Neighbor (KNN) map of the infrared object image data points can be constructed by the above steps, and the obtained KNN map initially describes the local features of the manifold data point set. The local feature information of the infrared object image data in the manifold space should also have the same local characteristics after being projected into the low-dimensional space. For infrared object image data points in the high-dimensional space, the KNN maps can be used to represent the weighted image maps of the surrounding data points to the object data points. Here a weight is the Euclidean distance between points. We assume that a normal distribution describes the distribution probability of these neighboring points. For example, there are two infrared object image data points and in a high-dimensional manifold space. The infrared object image data point is the center of the Gaussian distribution.
uses probability | to select as its nearest neighbor. | of the neighboring points is inversely proportional to the distance of and can be expressed as: where is the variance of the Gaussian distribution with as the center point. In order to avoid the congestion problem when the Gaussian distribution is projected into the two-dimensional space, Student's t-distribution is used to describe the local relationship of the infrared object image data points in the low-dimensional space. The t-distribution in the twodimensional space is derived from the Gaussian distribution of the infrared object image data point set in the manifold space. Assuming that the probability distribution of the neighborhood near the point set of the infrared object image data in the high-dimensional space can be represented by the normal distribution ( , ), the mean value of the T distribution in the low-dimensional space can be derived by: where is the infrared object image data points in the KNN domain and is its mean. Similarly, the variance of these points can be derived from the following formula.
The distribution of the infrared object image data points in the two-dimensional space is determined by the variance and standard deviation of the normal distribution in the manifold space and the number of data points. Therefore, the T distribution random variable can be constructed by: In this way, the probability distribution of the infrared object image data points in the twodimensional space can be constructed according to the T distribution: where | is the probability distribution of the infrared object image data points in the 2D space. In order to maintain the symmetry of the two probability distributions, a uniform symmetrical distance function is introduced as shown in the following formula:

Dimensionality Reduction
Since the distribution of high-dimensional space | and the distribution of low-dimensional space | should be as similar as possible, the KL divergence is used as a loss function to minimize the difference of this two distributions. The process of dimension reduction and classification is converted to the process of obtaining the minimum value of the loss function: In order to control overfitting that occurs due to the small number of samples, a degree of confusion is also set to avoid overfitting. The degree of confusion can be defined by where ( ) is the Shannon entropy with defined as follows: The degree of confusion changes in proportion to the entropy. With this feature, the value of entropy can be changed by adjusting the degree of confusion.
The next step is to construct the objective function. The positive samples of the infrared object image data points projected into the 2D space should be clustered, and the negative samples of the infrared object image data points should be placed far away from the positive samples. The weight between the points can be defined by Equation (14), where and represent two points in the low-dimensional space. A binary edge between these two points has a weight value of 1, and ( = 1) represents the probability that these two points exist. When the distance between and becomes closer, the value = 1 becomes larger. The expression with weight is used in practical applications as follows: The positive sample set in the infrared object image data point set is defined as , and the negative sample set is defined as . These sets are obtained from the KNN diagram as follows.
where is the weight of the negative samples. The optimization process can be understood as maximizing the probability of weighted edges of positive samples in the KNN graphs and minimizing the probability of weighted edges of negative samples in the KNN graphs. For the convenience of calculation, the above optimization formula can be transformed into the following formula: The negative sample increases the computational complexity, and it is not easy to directly use gradient descent for training. Therefore, a negative sampling algorithm is selected in this paper, and a negative sample is formed based on a randomly selected number of infrared object image data points that conform to the noise distribution ( ). The objective function is expressed by Equation (18).

Classification in Infrared Object Manifold Space
Finally, the low-dimensional representation of the manifold space and the classification results of different categories of objects are obtained. First, we find the distance between the input infrared object image and each point set in a two-dimensional space. Then, a set of points with the smallest distance from the input object image is considered as the category to which the object image belongs.
The problems of gradient vanishing and gradient explosion still occur during training. This problem can be solved by converting the edge between two points into a binary edge with the number of . When more edges with large weights are converted into binary edges, the calculation cost will become higher. Therefore, we use random sampling in these transformed binary edges to solve the problem of high computational cost. In the optimization loss function, we use the asynchronous stochastic gradient descent (ASGD) algorithm [50] to improve the execution performance.

Illustration of the Method Stages
To illustrate the different stages of the method operation, we used the thermal infrared image "ambassador_morning" from the CSIR-CSIO Moving Object Thermal Infrared Imagery Dataset (MOTIID) [51]. Figure 2 shows the probability distributions calculated for different classes of image data points and the constructed KNN tree.

Figure 2.
Illustration of the construction of the k-Nearest Neighbor (KNN) tree (right) using probability distributions of image points (left) for the "ambassador_morning" image. The red color shows a tree node corresponding to the tracked infrared object. Figure 3 illustrates the low-dimensional embedding of the "ambassador_morning" image into a two-dimensional manifold and the corresponding tracking result. Note that the sets of closely related image points form clusters in the manifold space, which correspond to the particular areas of the target infrared image such as the tracked object. Figure 3. Illustration of the embedding of the image points into a two-dimensional manifold (left) and the corresponding tracked infrared object (right) for the "ambassador_morning" image. The red color outlines the cluster of a closely related set of image points in the two-dimensional manifold and the tracked object.

Experimental Verification
The hardware platform used in this experiment is a Microsoft Surface Pro laptop with Intel Core M3-7Y30 1.61GHz CPU and 4 GB RAM.
The experiments in this paper used the following infrared object image data sets.
 Two infrared video sequences named "8_quadrocopter1" and "8_horse" in the open-source LTIR data set (v1.0) of the Computer Vision Laboratory of Linköping University [52].  The infrared video sequence named "data1" in the data set for dim-small object detection and tracking of aircraft in infrared image sequences of the ATR Key Laboratory of National University of Defense Technology [7].  An infrared video sequence named "6a" in the OSU Color-Thermal Database data set [53].  The four infrared image sequences of the CSIR-CSIO Moving Object Thermal Infrared Imagery Dataset (MOTIID) data set named "ambassador_morning", "auto_partially_occluded", "bike_far" and "dog_evening" [51].
The characteristics of image sequences are summarized in Table 1. Eight kinds of objects are selected in the eight infrared object image data sets. There are 20 samples of each object. The size of these infrared object images is resized to 40 × 40 px, and the corresponding category label data is also added. The data set information is shown in Table 2, while the examples of images are shown in Figure 4.  After reducing the dimensionality of the manifold space of the infrared object images, different categories of infrared object image data points are represented in the form of point sets in the lowdimensional space. The set of points with different label numbers in Figure 5 represents different categories of infrared object images. We can see that different categories of infrared objects have been effectively classified, and there are obvious gaps between different categories. There were no misclassifications observed, partly because of the small number of samples, which were separated in the manifold space well. The reason for the clear classification results is that the characteristics of different infrared object types are quite different from each other. The number of infrared object categories we used is small, so the results of infrared object classification do not overlap with each other. The background changes of categories 1, 2, 4, 5, 7, 8 are not obvious, and the differences between objects and background are more obvious. These six categories of infrared object images have similar image distribution characteristics, so the infrared object point sets of these categories are closer in the figure. The point set of cars (category 3) and dog (category 6) is far from the point sets of other categories. This is because the pixel values of these two categories of infrared objects change greatly during the movement. Therefore, the position of their point set is farther away from several other categories of infrared objects.
We compare the proposed algorithm with three image classification algorithms: convolutional neural network (CNN) [14], multi-class Support Vector Machine (SVM) from LIBSVM [54], and multilabel lazy KNN (ML-KNN) [55] in terms of operation speed and classification accuracy. The comparison of the classification results of different algorithms used in this paper is shown in Figure  6. As we can see from Figure 4, CNN (Convolutional Neural Network) is not as good as the proposed algorithm because of its complex structure and a small number of data set samples, which does not allow us to train the network effectively. Although SVM (Support Vector Machine) is faster in finding classification hyperplanes in the high-dimensional space, it cannot accurately divide the infrared object image data points. As a result, the accuracy of SVM is lower than the accuracy of the proposed algorithm. The ML-KNN (Multi-label Lazy KNN) algorithm is more accurate, but it takes more time to calculate the result when compared to the algorithm proposed in this paper.
The results are summarized in Table 3 using typical classification assessment metrics [56]. Here FPR is False Positive Rate and AUC is Area under Receiver Operating Characteristic (ROC) Curve.

Discussion and Final Remarks
We have developed and implemented an infrared object classification method for infrared images with mainly static backgrounds, i.e., under the condition that there are few movements in the background. For this type of image, our method achieved a high accuracy of 97.46%, which exceeded the accuracy of other methods using state-of-the-art object classifiers such as CNN, SVM or ML-KNN. The algorithm proposed in this paper can establish a high-dimensional manifold space of infrared object images and can classify different categories of infrared objects. Particularly, the proposed method can successfully perform infrared object classification even if only a small number of images are available for training. Most importantly, our method can effectively work with small data samples, on which the deep learning networks cannot be trained effectively. Finally, our experiments verify that the proposed algorithm can effectively classify different categories of infrared objects in the manifold space.
The achieved result demonstrates that our method could already be used for several applications such as infrared security cameras. Based on the main concepts used in the development of the method presented in this paper, we plan to work further on the development of new methods for infrared object tracking in images with dynamic backgrounds and cluttered object space, focusing on such applications as autonomous driving or military applications (such as those described in [57,58]) based on the concepts and ideas which were successfully validated in this paper.