1. Introduction
Optical remote sensing (ORS) image classification has always been a key focus in image processing, playing an important role in land cover monitoring, ecological environment monitoring, and many other fields [
1,
2]. With the improvement of the spatial resolution of ORS images, the distribution of land features is becoming increasingly complex, which brings difficulties and challenges to high-accuracy classification. Although deep learning is the most widely researched method in ORS image classification, the dependence on the quantity and quality of the sample severely restricts its generalization ability [
3,
4]. In practical applications, there are many cases where the quantity and quality of samples are insufficient. Therefore, unsupervised image classification still holds significant research value [
5,
6].
In unsupervised image classification, the key lies in how to accurately characterize the relationship between pixels and classes [
7]. The most commonly used models primarily include distance measurement and probability measurement. Distance measurement achieves classification by describing the degree of dissimilarity between pixels and classes. Fuzzy C-means (FCM), defined by Euclidean distance, is one of the most representative algorithms [
8]. However, Euclidean distance can only describe data with spherical distribution, which is ineffective in describing complex distribution feature data [
9]. Considering the significant spatial interaction between pixels in an image, some studies introduce neighborhood pixel constraints to improve image classification accuracy [
10,
11], such as FCM_S, EnFCM, FLICM et al. To explore the profound influence of neighborhood effects, Wu and Wu [
12] proposed the master-slave hierarchy local information driven FCM (MSHLICM), extending the first-order neighborhood to the second order. The noise resistance and robustness have been improved. The anti-noise performance of pixel-based algorithms is limited. Therefore, object-based methods have been vigorously developed [
13]. It groups similar pixels into objects and then performs classification by taking the superpixel as the basic unit. Lei et al. [
14] proposed the superpixel-based fast FCM (SF_FCM). It defines a multiscale morphological gradient reconstruction operation to obtain superpixels. Then, it performs FCM based on the histogram of superpixels. SF_FCM improves noise resistance but relies too much on the quality of the superpixel segmentation results. To increase the flexibility of superpixels, Li et al. [
15] proposed a fully fuzzy Voronoi tessellation FCM algorithm (FVT-FCM). It extends superpixels to fuzzy superpixels and optimizes the fuzzy superpixel segmentation results according to the objective function. FVT-FCM can maintain high noise resistance and detail preservation. However, the distance measurements used above are sensitive to noise, so some probability measures used to describe the statistical characteristics of classes have been studied [
16]. Based on the scalability of FCM, Chatzis and Varvarigou [
17] proposed HMRF-FCM. It assumes that the spectrum of pixels follows a Gaussian distribution. Then, the probability density function is used to describe the degree to which pixels belong to a class. To adapt to different complex distribution situations, the finite mixture model (FMM), which models the statistical distribution of data using multiple linear combinations from the same distribution, is developed in image classification [
18], such as the Gaussian mixture model (GMM) [
19], Gamma mixture model (GaMM) [
20], and Student’s-t mixture model (SMM) [
21]. Shi and Li [
22] proposed a hierarchical mixed model image segmentation method (HMM). The components of the mixed model can be selected based on the statistical distribution characteristics of the image. Zhao et al. [
23] proved that GMM can approximate any complex distribution. Therefore, GMM can be used to model the general HMM, known as HGMM. Finally, it obtains segmentation results by optimizing the parameters of the hybrid model within the Bayesian framework. HGMM provides an effective approach for modeling complex image features, but the parameter optimization is inefficient.
To overcome the dependence of deep learning on samples, a series of unsupervised deep learning methods have been studied, such as Deepcluster [
3], SWAV [
4], DINO [
24], PiCIE [
5], Diffseg [
25] et al. Deepcluster first combined clustering with deep learning. It performs K-means clustering on the features produced by the convnet and updates its weight by predicting the cluster assignments as pseudo-labels in a discriminative loss. To avoid the computational complexity caused by feature comparison, Caron et al. [
4] proposed SWAV. It proposed a new paradigm that compares cluster assignments, allowing for contrast between different image views while not relying on explicit pairwise feature comparisons. Furthermore, DINO utilized the advantage of the Vision Transformer’s global perception to further improve classification accuracy. However, the above three algorithms are all image-level classification and are not suitable for pixel-level classification of remote sensing images. Although PiCIE extends unsupervised deep learning to the pixel-level, it still requires a large number of unlabeled samples for training. In order to completely get rid of the sample, Tian et al. [
25] proposed Diffseg. It introduces a simple yet effective iterative merging process based on measuring the KL divergence among attention maps to merge them into valid segmentation masks. Diffseg does not require any training or language dependency to extract quality pixel-level classification for any images.
The above methods have certain limitations in the fine classification of spatial levels. To overcome this issue and describe complex ORS image features more effectively, an ORS image classification algorithm based on quantum statistics (QS) is proposed in this paper. The classification process of complex images is described by referencing quantum physics theory. The fundamental particles of quantum systems include bosons and fermions. Bosons are not exclusive and can easily lead to system collapse, resulting in class loss when applied in image classification. Following the Pauli exclusion principle, fermions can only occupy one quantum state, which is consistent with image classification, where a pixel can only belong to one class. Therefore, fermions and their theories will serve as the modeling foundation for this article. First, each pixel in the image is regarded as a fermion. The negative logarithm of the probability distribution, followed by pixel spectra, is used to describe the energy of the energy level where fermions are located. The membership relationship between pixels and classes is defined based on the Fermi-Dirac statistical distribution that is used to depict the complex physical processes of which energy level is occupied by fermions. This modeling approach, based on quantum statistics, can better describe complex situations than distance measurement and probability measurement, and it does not rely on neighborhood effects. Meanwhile, it also avoids the curse of dimensionality problem, i.e., the higher the data dimension, the more severe the invalidity of distance measurement, and the lower the accuracy. Then, the cost function for classification is modeled by free energy, which is a physical quantity that describes whether a system is in a state of thermal equilibrium according to energy, temperature, and entropy. Finally, the model parameters are optimized using the simulated annealing algorithm, with the criterion of minimizing free energy.
The main contributions of this paper are summarized as follows:
- (1)
This paper systematically proposes a classification method based on quantum statistics, combining quantum physics theory and image processing theory. It establishes a one-to-one correspondence between quantum systems and image classification processes.
- (2)
It can better describe complex classification processes and obtain fine classification results.
- (3)
It can overcome the curse of dimensionality.
This paper is organized as follows:
Section 2 reviews the Fermi-Dirac distribution theory.
Section 3 introduces the proposed algorithm.
Section 4 discusses the feasibility and effectiveness of the proposed algorithm through multispectral and hyperspectral image classification experiments.
Section 5 discusses the impact of model parameters, spectral dimensional robustness, time complexity and remaining challenges.
Section 6 is the conclusion.
2. Fermi-Dirac Distribution
The Fermi-Dirac distribution describes the occupation probability of fermions at energy levels in a state of thermal equilibrium [
26].
where
is the average number of particles occupying energy level
l,
εl is the energy of energy level
l,
μ is the chemical potential,
k is the Boltzmann constant,
k = 1.380649 × 10
−23 J/K, and
T is the absolute temperature. The Fermi-Dirac distribution is derived based on the Pauli exclusion principle, which stipulates that each quantum state can be occupied by at most one fermion. Therefore,
nl = 0 or
nl = 1, 0 ≤
≤ 1.
Given a system of mutually independent particles that can be in different energy levels, the total energy
E and the total number of particles
N are
When the particle is a fermion, the total energy of the system can be expressed as,
The entropy of the fermion system is,
Free energy is an important physical quantity for judging whether a system is in an equilibrium state. The smaller the free energy, the closer the system is to an equilibrium state. It can be defined in different ways depending on the physical conditions. When temperature, volume, and particle number remain constant, it is defined as the Helmholtz free energy,
When temperature, pressure, and particle number remain constant, it is defined as the Gibbs free energy,
where
P is the pressure, and
V is the volume. When the number of particles is variable, it is defined as the grand potential,
3. Methods
Given an image I = {Ii, i = 1, …, n}, where Ii = (Ii1, Ii2, …, Iid) is a spectral vector, i is the index of pixels, n is the total number of pixels, and d is the dimension of the image. Image classification is the process of assigning a class label to each pixel, i.e., L = {Li, i = 1, …, n}, where Li is the class label of pixel i, Li ∈ {1, …, l, …, m}, l is the index of classes, and m is the number of classes.
3.1. Classification Model
Assuming that the spectral vectors of pixels within the same class follow an independent and identical multivariate Gaussian distribution, as the Gaussian distribution is the most commonly used distribution, and parameter optimization is convenient.
where
θl= {
μl,
Σl},
μl and
Σl are the mean and covariance of the Gaussian distribution for class
l.
If each pixel is regarded as a fermion, then the image constitutes a multiparticle system. To convert image features into energy in quantum systems, the negative logarithm of the Gaussian distribution is modeled. Therefore, the energy occupied by fermions in energy levels can be expressed as
According to
Section 2, fermion
i occupying energy level
l follows a Fermi-Dirac distribution. Pixel
i belonging to class
l is equivalent to fermion
i occupying energy level
l. To strictly meet the condition
, the Fermi-Dirac distribution is normalized, then,
is modeled according to Equation (1),
where
β = 1/
kT,
αi is the chemical potential of pixel
i.
Therefore, the total energy of the image classification process is modeled according to Equation (3),
The entropy is modeled according to Equation (4),
Since the total number of pixels in an image remains unchanged, this paper selects the Helmholtz free energy (Equation (5)) to define the cost function for image classification,
The parameter solution corresponding to the minimum cost function is
and
,
Then, the image classification result is,
3.2. Parameter Solution
For Gaussian distribution parameters
θl= {
μl,
Σl}, the solution can be obtained by the derivative method, i.e.,
, and
.
where
t is the index of iteration.
For parameter
α, the simulated annealing algorithm is used to estimate the optimal solution, assuming that the state of
α in the
t-th iteration is
α(t) = {
αi(t), i = 1, …,
n}. After being disturbed, its state is
α* =
α(t) + ∆
α, where ∆
α follows a normal distribution with a mean of 0 and a variance of
σ2, i.e., ∆
α~N(0,
σ2). Then, the Helmholtz free energy of the entire system changes from
J(
α(t),
θ(t)) to
J(
α*,
θ(t)). According to the Metropolis criterion, the acceptance probability of
α(t) becoming
α* is
If accepted,
α(t+1) =
α*, otherwise,
α(t+1) =
α(t).
For the temperature parameter
T, the simulated annealing cooling method is used to update the parameters,
where
T(0) is the initial value.
3.3. Parameter Initialization
There are seven parameters that need to be initialized, i.e., (1) The number of classes m, (2) Normal distribution parameters σ2, (3) Boltzmann constant k, (4) The initial mean of Gaussian distribution μl(0), (5) The initial covariance of Gaussian distribution Σl(0), (6) The initial temperature T(0), (7) The initial chemical potential α(0).
For
m, it is manually set based on the image. For
σ2, it is set based on human experience. For
k, due to the difference between the image system and the physical system, the given physical parameter values cannot be directly used; based on experience, 0 <
k < 2 in this paper. For the mean and covariance of the Gaussian distribution, they are calculated from the initial random classification result,
L(0) = {
Li(0),
i = 1, …,
n}.
where
nl(0) is the total number of pixels within class
l.For parameters
α(0) and
T(0), they satisfy the condition shown in Equation (22),
where
εi(0) is calculated according to Equation (9),
3.4. Summary of the Proposed QS Model
The correspondence between image classification and quantum systems is shown in
Table 1. The process of the proposed QS algorithm can be summarized as follows.
S1 Parameter initialization, including m, σ2, k, μl(0), Σl(0), T(0), α(0);
S2 Calculating Gaussian distribution parameter θl(t) = {μl(t), Σl(t)} by Equations (16) and (17);
S3 Calculating the total energy of the image classification system E(α(t), θ(t)) by Equation (11);
S4 Calculating the entropy of an image classification system S(α(t), θ(t)) by Equation (12);
S5 Calculating the cost function J(α(t), θ(t)) by Equation (13);
S6 Updating parameter α(t+1) by Equation (18);
S7 Updating parameter T(t+1) by Equation (19);
S8 Repeating S2-S7, until the objective function converges, i.e., |J(α(t+1), θ(t+1)) − J(α(t), θ(t))| < ξ, or t reaches the threshold, where ξ is the smallest positive number. Then, the classification result is obtained by Equation (15).
4. Results
Extensive experiments are conducted on multispectral and hyperspectral images using FCM [
8], MSHLICM [
12], SF_FCM [
14], FVT-FCM [
15], HGMM [
22], Diffseg [
25], and the proposed QS algorithm. Their characteristics are listed in
Table 2. Then, the overall accuracy (OA) and the Kappa coefficient (Ka) are used to quantitatively evaluate the effectiveness of the proposed QS algorithm, where OA is the proportion of correctly classified samples, and Ka is a statistical measure that accounts for the agreement occurring by chance between the classifier’s predictions and the ground truth.
(1) Multi-spectral images:
Figure 1(a1) is clipped from the GF1 satellite remote sensing image. The spectral bands include R, G, B, and NIR. The spatial resolution is resampled to 2 m.
Figure 1(a2) is clipped from the IKONOS satellite image. The spectral bands include R, G, and B. The spatial resolution is 1 m.
Figure 1(b1,b2) is the corresponding ground truth.
(2) Hyperspectral images:
Figure 1(a3) is clipped from the Houston 2018 dataset. Its hyperspectral data was captured using an ITRES CASI 1500 in 48 bands with a spectral range of 380–1050 nm at a 1 m ground sampling distance.
Figure 1(a4) is clipped from the Pavia University dataset. It was captured using 103 bands with a spectral range of 430–860 nm at 1.3 m spatial resolution.
Figure 1(b3,b4) is the corresponding ground truth.
4.1. Multi-Spectral Image Classification
Figure 2a–g are the representative classification results of GF1 with FCM, MSHLICM, SF_FCM, FVT-FCM, HGMM, Diffseg, and the proposed QS algorithm. The enlarged image of the representative area is shown in
Figure 3, where
Figure 3(a1,b1) is the ground truth,
Figure 3(a2–a8) represent the classification results of various algorithms in area 1, and
Figure 3(b2–b8) represent area 2. It can be seen that the classification results of FCM are relatively fine, but there are many errors, such as artificial surfaces being misclassified as oceans. MSHLICM has effectively classified oceans, land under construction, and forests, benefiting from their smooth or regular textures. Meanwhile, the master-slave hierarchy of local information blurs the boundaries of land features, making it difficult to distinguish low vegetation and artificial surfaces in complex situations. SF_FCM uses superpixels as the basic unit and has strong spatial constraint capability. It can only classify large areas and cannot achieve fine classification of small area targets. FVT-FCM uses fuzzy superpixels as the basic unit, which is relatively more flexible than SF_FCM in describing the boundaries of ground objects. It can recognize slender artificial surfaces such as roads, but the fixed initialization parameters cannot adapt to the global scope. HGMM uses probability measures to model the similarity between pixels and classes, which greatly improves the classification performance compared to the previous distance measures. However, there are still many misclassifications, such as oceans and artificial surfaces. Although Diffseg has introduced deep learning, it still cannot recognize low vegetation. The proposed QS algorithm is modeled based on quantum theory. It can effortlessly describe complex scenes and achieve fine classification of complex images.
Figure 4a–g are the representative classification results of IKONOS with FCM, MSHLICM, SF_FCM, FVT-FCM, HGMM, Diffseg, and the proposed QS algorithm. The enlarged image of the representative area is shown in
Figure 5, where
Figure 5(a1,b1) is the ground truth,
Figure 5(a2–a8) represents the classification results of various algorithms in area 1, and
Figure 5(b2–b8) represents area 2. It can be seen that FCM confuses buildings and roads. Due to the strong neighborhood effect, the classification results of MSHLICM and SF_FCM deviate significantly from the ground truth. FVT-FCM is good, but the classification boundary is too rough. HGMM is also slightly affected by neighborhood constraints, making it difficult to distinguish fine boundaries. Diffseg is similar to MSHLICM; the road is covered by surrounding classes. By contrast, the proposed QS algorithm can obtain high-quality classification results.
To further evaluate the effectiveness of the proposed method from a quantitative perspective, the median and the median absolute deviation of OA and Ka are listed in
Table 3. The box plot is shown in
Figure 6. It shows that FCM is generally stable at around 60%. Although MSHLICM has higher accuracy than FCM, it has lower Ka, which can be as low as 0.2, as shown in
Figure 6(b2). SF_FCM has a small accuracy deviation. The accuracy of FVT-FCM and HGMM has significantly improved, but their accuracy deviation is relatively large. The accuracy of Diffseg varies significantly across different images. For example, IKONOS has an accuracy of 73.55%, but GF1 has an accuracy of 54.73%. The proposed QS algorithm can stabilize above 75 with minimal deviation.
4.2. Hyperspectral Image Classification
To verify the effectiveness of the proposed method for high-dimensional data, hyperspectral images are also experimentally tested.
Figure 7a–e are the classification results of the Houston 2018 hyperspectral image with FCM, MSHLICM, SF_FCM, FVT-FCM, HGMM, Diffseg, and the proposed QS algorithm, where the background does not participate in classification. The enlarged image of the representative area is shown in
Figure 8, where
Figure 8(a1,b1) is the ground truth,
Figure 8(a2–a8) represents the classification results of various algorithms in area 1, and
Figure 8(b2–b8) represents area 2. It can be seen that FCM, MSHLICM, SF_FCM, and FVT-FCM algorithms modeled based on Euclidean distance all rely to varying degrees on the proximity of the image space to classify, due to interference from high-dimensional features. SF_FCM is the most severe. HGMM modeled by the Gaussian mixture distribution has overcome this problem to some extent, but it still misclassifies the non-residential buildings and roads. Diffseg automatically learns features by deep learning and overcomes the curse of high dimensionality to some extent. The proposed QS algorithms, modeled based on the Fermi-Dirac distribution, can effectively overcome the disasters caused by high-dimensional data and obtain better classification results, as shown in
Figure 7g and
Figure 8(a8,b8).
Figure 9a–e are the classification results of the Pavia University hyperspectral image with FCM, MSHLICM, SF_FCM, FVT-FCM, HGMM, Diffseg, and the proposed QS algorithm, where the background does not participate in classification. The enlarged image of the representative area is shown in
Figure 10, where
Figure 10(a1,b1) is the ground truth,
Figure 10(a2–a8) represents the classification results of various algorithms in area 1, and
Figure 10(b2–b8) represents area 2. It can be seen that FCM, MSHLICM, SF_FCM, and FVT-FCM all exhibit varying degrees of classification based on spatial distance because of the ineffective distance measurement in high-dimensional space. A large number of self-blocking bricks are classified as meadows. The results of HGMM have improved, but it also cannot distinguish self-blocking bricks and meadows. Diffseg achieves outstanding results by benefiting from deep learning. The same goes for proposing methods, which can obtain the result closest to the ground truth.
Table 4 is the quantitative evaluation of hyperspectral image classification.
Figure 11 is the corresponding box plot. Although MSHLICM and SF_FCM have higher accuracy than FCM and FVT-FCM, their class numbers are often missing. In this situation, accuracy is severely affected by categories with larger areas, resulting in the illusion of false height. The accuracy of HGMM remains around 70%. Diffseg completely overcame the curse of dimensionality, with an accuracy of up to 95.87%. The proposed QS obtains similar results to Diffseg in the Houston 2018 image. The OA is 81.88, the Ka is 0.73. Although the accuracy of the proposed QS is not as good as Diffseg in the Pavia University image, it is still higher than other algorithms.