Data Field Modeling and Spectral-Spatial Feature Fusion for Hyperspectral Data Classification

Classification is a significant subject in hyperspectral remote sensing image processing. This study proposes a spectral-spatial feature fusion algorithm for the classification of hyperspectral images (HSI). Unlike existing spectral-spatial classification methods, the influences and interactions of the surroundings on each measured pixel were taken into consideration in this paper. Data field theory was employed as the mathematical realization of the field theory concept in physics, and both the spectral and spatial domains of HSI were considered as data fields. Therefore, the inherent dependency of interacting pixels was modeled. Using data field modeling, spatial and spectral features were transformed into a unified radiation form and further fused into a new feature by using a linear model. In contrast to the current spectral-spatial classification methods, which usually simply stack spectral and spatial features together, the proposed method builds the inner connection between the spectral and spatial features, and explores the hidden information that contributed to classification. Therefore, new information is included for classification. The final classification result was obtained using a random forest (RF) classifier. The proposed method was tested with the University of Pavia and Indian Pines, two well-known standard hyperspectral datasets. The experimental results demonstrate that the proposed method has higher classification accuracies than those obtained by the traditional approaches.


Introduction
With the development of imaging instruments in the past few years, hyperspectral data processing has become increasingly more important in many fields [1][2][3][4][5]. As a data tool with high spectral resolution, hyperspectral sensors usually utilized hundreds of spectral channels to describe spectral signatures. Generally, the primary purpose of hyperspectral images (HSI) processing is to analyze and recognize spectral data acquired by hyperspectral sensors. It is established that, different materials have distinct reflectance spectral signatures. Thus, reflectance spectra are always used for material recognition and image analysis [6].
However, while the high dimensionality of HSI supports accurate descriptions for spectral signatures, they lead to some theoretical and practical problems, particularly the curse of dimensionality problem. In classification problems, classification accuracies are not positively correlated to the dimensionality of input data. Usually, classification is most accurate with a particular feature number, as has been demonstrated in References [7][8][9]. Hence, feature extraction and dimensionality reduction techniques are important and indispensable in high-dimensional data classification and analysis. Based on known information, feature extraction (FE) techniques are generally categorized into unsupervised and supervised methods. Unsupervised FE techniques, e.g., principle component

Data Field Modeling
Data fields are the mathematical expression of field theory in physics. Data fields establish models in which data can be seen as a whole by studying the interactions of data. To describing the relationship between data, data are treated as radiation sources within the data field. Thus, the radiation effect can be used to mathematically describe the data interaction. Employing this approach, the property of a vector point is determined not only by its location in the data space, but also by the other surrounding data in the data field owing to the radiation effect. In this paper, both the spectral and spatial domains of an HSI are considered as data fields. Thus, the recognition and identification of a pixel in a HSI do not depend only on its position in the spectral space-on its spectral signature for simplicity-but also take into account its interactions with the other pixels in the HSI.
In this paper, we define the radiation intensity as a function that depends on a distance measurement. The function is called the radiation function, and is mathematically expressed as: (1) where d denotes the distance to the radiation source, E is the radiation intensity at d, ρ is a radiation factor and E 0 indicates the initial energy. Both Mahalanobis and Euclidean distances are employed as the distance measurements in this paper. We term the Mahalanobis distance d M and the Euclidean distance d E . Apparently, while d is small, the points in a data pair interact with each other intensively. In contrast, the e −ρd 2 term tends toward zero and the interaction is negligible when d is large. The radiation function allows us to establish the connections between the data in data fields and to describe the interactions between the data pairs as radiation intensities. Suppose x = [x ϕ , x ω ] T is a feature vector that corresponds to a pixel in HSI; here, x ϕ represents the spectral feature extracted by supervised FE techniques, and x ω denotes the spatial structural feature. In the following description, the symbols related to spectral space are denoted by suffix ϕ and those related to spatial space are denoted by ω. Thus, a pixel in HSI corresponds to a feature vector x ϕ in the spectral feature space R ϕ , and a feature vector x ω in the spatial feature space R ω . In this paper, both R ϕ and R ω are considered data fields. Thus, a data point receives radiations in both R ϕ and R ω . Furthermore, we suppose that all the data have a unit initial radiation energy, i.e., E 0 = 1, when data field modeling in both the spectral and spatial domains.
Suppose a training sample set {( T denotes an input pattern, u i ∈ {1, ..., L} denotes its class label, and N and L are the numbers of the training samples and classes, respectively. For training sample (x i , l), according to the label u i = l, two subsets of the training set are defined. The first subset contains all the training samples that have the same label as x i , and we term this subset the Same Class Subset. The other subset contains all the training samples with class labels different from x i , and is called the Different Class Subset. We suppose that a given training sample x i receives radiation from its k-nearest training samples in every class. For example, (s j , v j ) denotes the minimum distance from x e i to x e i 1 , . . . , x e i k 2 . Then Equation (8) can be changed into: Furthermore, we define a coefficients vector and patch matrix: ] T and L i = ∑ where diag(·) is the diagonalization operation. Then Equation (10) can be reduced to: (12) where tr(·) is the trace operator. Furthermore, all the Y i are taken into account, and then [tr(AGA T )α 2 + tr(AGB T + BGA T )α + tr(BGB T )] (13) where G = ∑ N i=1 (λ i X e i L i X eT i ) and weight coefficient α can be uniquely determined. Hence, the spectral-spatial relationship is described and hidden information is explored. It can be seen from Equation (13) that the weight coefficient training is actually an additional information extraction operation. In other words, the most discriminative features in the spectral and spatial features are extracted by introducing α in this procedure.
The implementation scheme of the proposed algorithm for hyperspectral imagery classification is shown in Figure 1. As shown the data field modeling operation is implemented in both the spectral space and image spatial domain. Based on the prior information provided by the training set, which consists of spectral information, local spatial information, and label information, the spectral features can be obtained by supervised FE techniques. The spatial structural features can be extracted by the spatial feature extraction algorithms, such as EMP, EAP, and EMAP. The data field modeling operation is carried out in the two spaces, and then the DFRF is built. The feature fusion with local information is then performed. This process fuses the spectral and spatial features into an FDFRF, and learns the fusing weight coefficient. For an unlabeled test pixel, we extract the spectral and spatial features. Then, the extracted features are fused into an FDFRF based on data field modeling. Finally, the classification is implemented by classifiers.  (13) where = ∑ ( ) and weight coefficient  can be uniquely determined. Hence, the spectral-spatial relationship is described and hidden information is explored. It can be seen from Equation (13) that the weight coefficient training is actually an additional information extraction operation.
In other words, the most discriminative features in the spectral and spatial features are extracted by introducing in this procedure. The implementation scheme of the proposed algorithm for hyperspectral imagery classification is shown in Figure 1. As shown the data field modeling operation is implemented in both the spectral space and image spatial domain. Based on the prior information provided by the training set, which consists of spectral information, local spatial information, and label information, the spectral features can be obtained by supervised FE techniques. The spatial structural features can be extracted by the spatial feature extraction algorithms, such as EMP, EAP, and EMAP. The data field modeling operation is carried out in the two spaces, and then the DFRF is built. The feature fusion with local information is then performed. This process fuses the spectral and spatial features into an FDFRF, and learns the fusing weight coefficient. For an unlabeled test pixel, we extract the spectral and spatial features. Then, the extracted features are fused into an FDFRF based on data field modeling. Finally, the classification is implemented by classifiers. Figure 1. The implementation scheme of the proposed algorithm.

Experiments and Results
Two standard datasets, the Reflective Optics Systems Imaging Spectrometer (ROSIS-03) University of Pavia dataset and the Airborne Visible Infrared Imaging Spectrometer (AVIRIS) Indian Pines dataset, which are frequently used in research, were used in this study.
The first test dataset is a hyperspectral dataset collected from the University of Pavia, Italy, by the ROSIS-03 airborne instrument. In this dataset, nine classes of interest were considered in the image scene. This dataset, which is composed of 103 bands of 610 × 340 pixels, provides a high spatial resolution of 1.3 m/pixel. The training and test sets were composed of 3909 and 42,788 samples, respectively. The number of training and test samples is shown in Table 1.
The Indian Pines dataset is a standard test dataset acquired in 1992 using the AVIRIS sensor. The data consists of 145 × 145 pixels with a medium spatial resolution of about 20 m/pixel. In this test case, the spectral channels in the atmosphere absorption bands were removed, so 200 data channels were used. Sixteen classes of interest were considered. For this dataset, a total of 695 pixels and 9671 pixels were used to make up the training and test sets, respectively. The number of available test and training samples is displayed in Table 2.

Experiments and Results
Two standard datasets, the Reflective Optics Systems Imaging Spectrometer (ROSIS-03) University of Pavia dataset and the Airborne Visible Infrared Imaging Spectrometer (AVIRIS) Indian Pines dataset, which are frequently used in research, were used in this study.
The first test dataset is a hyperspectral dataset collected from the University of Pavia, Italy, by the ROSIS-03 airborne instrument. In this dataset, nine classes of interest were considered in the image scene. This dataset, which is composed of 103 bands of 610 × 340 pixels, provides a high spatial resolution of 1.3 m/pixel. The training and test sets were composed of 3909 and 42,788 samples, respectively. The number of training and test samples is shown in Table 1.
The Indian Pines dataset is a standard test dataset acquired in 1992 using the AVIRIS sensor. The data consists of 145 × 145 pixels with a medium spatial resolution of about 20 m/pixel. In this test case, the spectral channels in the atmosphere absorption bands were removed, so 200 data channels were used. Sixteen classes of interest were considered. For this dataset, a total of 695 pixels and 9671 pixels were used to make up the training and test sets, respectively. The number of available test and training samples is displayed in Table 2.
The details of the training and test sets of the two datasets are given in References [27,36]. To maintain consistency with previous results, we used the same size training and test sets adopted by other state-ofthe-art approaches. We also adopted samples with precisely the same spatial locations as in the previous studies. Each method was executed only once because the samples that we used were identical to those used in the previous studies. False color images of the two datasets are presented in Figure 2. The details of the training and test sets of the two datasets are given in References [27,36]. To maintain consistency with previous results, we used the same size training and test sets adopted by other state-of-the-art approaches. We also adopted samples with precisely the same spatial locations as in the previous studies. Each method was executed only once because the samples that we used were identical to those used in the previous studies. False color images of the two datasets are presented in Figure 2.   Labels  Name  Training  Test  1  Asphalt  548  6631  2  Meadow  540  18,649  3  Gravel  392  2099  4  Trees  524  3064  5  Metal Sheets  256  1345  6  Bare Soil  532  5029  7  Bitumen  375  1330  8  Bricks  514  3682  9 Shadows 231 947 The details of the training and test sets of the two datasets are given in References [27,36]. To maintain consistency with previous results, we used the same size training and test sets adopted by other state-of-the-art approaches. We also adopted samples with precisely the same spatial locations as in the previous studies. Each method was executed only once because the samples that we used were identical to those used in the previous studies. False color images of the two datasets are presented in Figure 2.   Labels  Name  Training  Test  1  Asphalt  548  6631  2  Meadow  540  18,649  3  Gravel  392  2099  4  Trees  524  3064  5  Metal Sheets  256  1345  6  Bare Soil  532  5029  7  Bitumen  375  1330  8  Bricks  514  3682  9 Shadows 231 947

Number of Samples
Trees, The details of the training and test sets of the two datasets are given in References [27,36]. To maintain consistency with previous results, we used the same size training and test sets adopted by other state-of-the-art approaches. We also adopted samples with precisely the same spatial locations as in the previous studies. Each method was executed only once because the samples that we used were identical to those used in the previous studies. False color images of the two datasets are presented in Figure 2.   Labels  Name  Training  Test  1  Asphalt  548  6631  2  Meadow  540  18,649  3  Gravel  392  2099  4  Trees  524  3064  5  Metal Sheets  256  1345  6  Bare Soil  532  5029  7  Bitumen  375  1330  8  Bricks  514  3682  9 Shadows 231 947

Number of Samples
Asphalt, The details of the training and test sets of the two datasets are given in References [27,36]. To maintain consistency with previous results, we used the same size training and test sets adopted by other state-of-the-art approaches. We also adopted samples with precisely the same spatial locations as in the previous studies. Each method was executed only once because the samples that we used were identical to those used in the previous studies. False color images of the two datasets are presented in Figure 2.   Labels  Name  Training  Test  1  Asphalt  548  6631  2  Meadow  540  18,649  3  Gravel  392  2099  4  Trees  524  3064  5  Metal Sheets  256  1345  6  Bare Soil  532  5029  7  Bitumen  375  1330  8  Bricks  514  3682  9 Shadows 231 947

Number of Samples
Bitumen, The details of the training and test sets of the two datasets are given in References [27,36]. To maintain consistency with previous results, we used the same size training and test sets adopted by other state-of-the-art approaches. We also adopted samples with precisely the same spatial locations as in the previous studies. Each method was executed only once because the samples that we used were identical to those used in the previous studies. False color images of the two datasets are presented in Figure 2.  The details of the training and test sets of the two datasets are given in References [27,36]. To maintain consistency with previous results, we used the same size training and test sets adopted by other state-of-the-art approaches. We also adopted samples with precisely the same spatial locations as in the previous studies. Each method was executed only once because the samples that we used were identical to those used in the previous studies. False color images of the two datasets are presented in Figure 2.  The details of the training and test sets of the two datasets are given in References [27,36]. To maintain consistency with previous results, we used the same size training and test sets adopted by other state-of-the-art approaches. We also adopted samples with precisely the same spatial locations as in the previous studies. Each method was executed only once because the samples that we used were identical to those used in the previous studies. False color images of the two datasets are presented in Figure 2.  The details of the training and tes maintain consistency with previous res other state-of-the-art approaches. We a as in the previous studies. Each metho were identical to those used in the p presented in Figure 2.   The details of the training and test sets of the two datasets are given in References [27,36]. To maintain consistency with previous results, we used the same size training and test sets adopted by other state-of-the-art approaches. We also adopted samples with precisely the same spatial locations as in the previous studies. Each method was executed only once because the samples that we used were identical to those used in the previous studies. False color images of the two datasets are presented in Figure 2.  The details of the training and test sets of the two datasets are given in References [27,36]. To maintain consistency with previous results, we used the same size training and test sets adopted by other state-of-the-art approaches. We also adopted samples with precisely the same spatial locations as in the previous studies. Each method was executed only once because the samples that we used were identical to those used in the previous studies. False color images of the two datasets are presented in Figure 2.  he details of the training and test sets of the two datasets are given in References [27,36]. To in consistency with previous results, we used the same size training and test sets adopted by tate-of-the-art approaches. We also adopted samples with precisely the same spatial locations he previous studies. Each method was executed only once because the samples that we used dentical to those used in the previous studies. False color images of the two datasets are ted in Figure 2. details of the training and test sets of the two datasets are given in References [27,36]. To consistency with previous results, we used the same size training and test sets adopted by te-of-the-art approaches. We also adopted samples with precisely the same spatial locations previous studies. Each method was executed only once because the samples that we used ntical to those used in the previous studies. False color images of the two datasets are d in Figure 2. The details of the training and test sets of the two maintain consistency with previous results, we used th other state-of-the-art approaches. We also adopted sam as in the previous studies. Each method was executed were identical to those used in the previous studies. presented in Figure 2.  of the training and test sets of the two datasets are given in References [27,36]. To ency with previous results, we used the same size training and test sets adopted by -art approaches. We also adopted samples with precisely the same spatial locations s studies. Each method was executed only once because the samples that we used those used in the previous studies. False color images of the two datasets are re 2. training and test sets of the two datasets are given in References [27,36]. To ith previous results, we used the same size training and test sets adopted by pproaches. We also adopted samples with precisely the same spatial locations dies. Each method was executed only once because the samples that we used e used in the previous studies. False color images of the two datasets are r representation and corresponding ground truth of (a,b) ROSIS-03 University of ing and test sets of the two datasets are given in References [27,36]. To previous results, we used the same size training and test sets adopted by aches. We also adopted samples with precisely the same spatial locations Each method was executed only once because the samples that we used ed in the previous studies. False color images of the two datasets are nd test sets of the two datasets are given in References [27,36]. To us results, we used the same size training and test sets adopted by . We also adopted samples with precisely the same spatial locations method was executed only once because the samples that we used the previous studies. False color images of the two datasets are tion and corresponding ground truth of (a,b) ROSIS-03 University of Grass-pasture-mowed, 16,2146 The details of the training and test sets of the two datasets are given in maintain consistency with previous results, we used the same size training an other state-of-the-art approaches. We also adopted samples with precisely the as in the previous studies. Each method was executed only once because the were identical to those used in the previous studies. False color images of presented in Figure 2.  The details of the training and test sets of the two datasets are given in Re maintain consistency with previous results, we used the same size training and other state-of-the-art approaches. We also adopted samples with precisely the sa as in the previous studies. Each method was executed only once because the sa were identical to those used in the previous studies. False color images of th presented in Figure 2.  The details of the training and test sets of the two datasets are given in Refe maintain consistency with previous results, we used the same size training and tes other state-of-the-art approaches. We also adopted samples with precisely the sam as in the previous studies. Each method was executed only once because the sam were identical to those used in the previous studies. False color images of the presented in Figure 2.  The details of the training and test sets of the two datasets are given in Referenc maintain consistency with previous results, we used the same size training and test se other state-of-the-art approaches. We also adopted samples with precisely the same sp as in the previous studies. Each method was executed only once because the samples were identical to those used in the previous studies. False color images of the two presented in Figure 2.  The details of the training and test sets of the two datasets are given in Reference maintain consistency with previous results, we used the same size training and test sets other state-of-the-art approaches. We also adopted samples with precisely the same spat as in the previous studies. Each method was executed only once because the samples t were identical to those used in the previous studies. False color images of the two presented in Figure 2.  The details of the training and test sets of the two datasets are given in References maintain consistency with previous results, we used the same size training and test sets a other state-of-the-art approaches. We also adopted samples with precisely the same spatia as in the previous studies. Each method was executed only once because the samples th were identical to those used in the previous studies. False color images of the two d presented in Figure 2.  The details of the training and test sets of the two datasets are given in References [2 maintain consistency with previous results, we used the same size training and test sets ad other state-of-the-art approaches. We also adopted samples with precisely the same spatial as in the previous studies. Each method was executed only once because the samples that were identical to those used in the previous studies. False color images of the two dat presented in Figure 2.   Training  Test  548  6631  540  18,649  392  2099  524  3064  256  1345  532  5029  375  1330  514  3682  231 947

Experimental Setup
In all the experimental datasets, the spectral-spatial classification method η n which was proposed in Reference [27], known as AUTOMATIC, was employed for comparison. The FE approach used is denoted by n. Here, the HSI data were first transformed by the FE approach. The spectral feature x ϕ was the output of this step. Next, the spatial feature x ω was obtained by EMAP and the FE approach. Finally, x ϕ and x ω were stacked together for classification. DAFE and DBFE were employed for supervised FE. DAFE is often applied to dimension reduction and feature extraction in a pattern recognition field. The class centers and covariance matrix of each class are calculated by training samples in DAFE. As a parametric method, DAFE achieves a satisfactory performance if the data approximately follow a normal distribution. DBFE extracts both discriminately informative and redundant features from the decision boundary. Using the decision boundary feature matrix, the decision boundary is described and features are extracted. For example, η DA denotes that the raw data were first transformed by DAFE. Then the EMAP was performed on the baseline images obtained by DAFE. Finally, the spectral features extracted by DAFE and the spatial features obtained by EMAP were stacked together.
It should be emphasized that the features containing more than 99% of cumulative eigenvalues were selected when DAFE and DBFE were employed in the following experiments. The classification results obtained by using the spectral information were reported only for comparison. We use DA and DB to indicate the spectral information extracted by DAFE and DBFE, respectively. The EMAP methods were also employed to demonstrate the superiority of the proposed algorithm. The DA p and DB p denote the EMAPs that were generated based on the features extracted by DAFE and DBFE, respectively. The EMAP-based classification methods proposed in References [25,26], which were respectively denoted by GA and SUnSAL, were employed. The recent state-of-the-art spectral-spatial classification approaches, including MH [29] and LBP [30], were used for comparisons. For the MH approach, the hypotheses for prediction were generated using the manually selected spectral-band partitions as suggested in [29]. In the LBP method, the criterion of linear prediction error (LPE) [37] was used for spectral band selection, and LBP features were extracted on these selected bands. Then, the LBP features and selected spectral bands were fused at the feature level, and processed by the classifier. To make our methods fully comparable with the reference techniques, the thresholds and values used for this experimental setup were selected from References [15,27].
The term F n signifies our proposed method. The FE approach is denoted by n. The spectral feature x ϕ and spatial feature x ω were fused into FDFRF in our proposed method. In the experiments, we set k = 5, i.e., five NNs in each class were considered in the data field modeling. The features extracted by all the methods were analyzed by an RF classifier. In all the experiments, the number of trees was set to 200, as suggested in References [15,35,36], in order to achieve a trade-off between the classification performance and time cost for the learning phase. The method performances were evaluated by three measurements: the overall accuracy (OA), the average accuracy (AA), and the Kappa coefficient (κ). However, in order to avoid unnecessary redundancy in the following, the experimental results and comparison will only be analyzed based on OA. Tables 3 and 4, the results of our experiments with the two datasets show that feature fusion based on the data field theory can improve classification accuracy compared to the reference methods. The classification results acquired by the proposed method on the two datasets by the proposed method are shown in detail in Figures 3 and 4. The term ℱ signifies our proposed method. The FE approach is denoted by n. The spectral feature and spatial feature were fused into FDFRF in our proposed method. In the experiments, we set 5 k  , i.e., five NNs in each class were considered in the data field modeling. The features extracted by all the methods were analyzed by an RF classifier. In all the experiments, the number of trees was set to 200, as suggested in References [15,35,36], in order to achieve a trade-off between the classification performance and time cost for the learning phase. The method performances were evaluated by three measurements: the overall accuracy (OA), the average accuracy (AA), and the Kappa coefficient ( ). However, in order to avoid unnecessary redundancy in the following, the experimental results and comparison will only be analyzed based on OA. Tables 3 and 4, the results of our experiments with the two datasets show that feature fusion based on the data field theory can improve classification accuracy compared to the reference methods. The classification results acquired by the proposed method on the two datasets by the proposed method are shown in detail in Figures 3 and 4. (a) (b)   For the University of Pavia dataset, the data field feature fusion resulted in significantly improved classification accuracy. As can be observed from  The details of the training and test sets of the two datasets are given in Referenc maintain consistency with previous results, we used the same size training and test se other state-of-the-art approaches. We also adopted samples with precisely the same sp as in the previous studies. Each method was executed only once because the samples were identical to those used in the previous studies. False color images of the two presented in Figure 2.   Labels  Name  Training  Test  1  Asphalt  548  6631  2  Meadow  540  18,649  3  Gravel  392  2099  4  Trees  524  3064  5  Metal Sheets  256  1345  6  Bare Soil  532  5029  7  Bitumen  375  1330  8  Bricks  514  3682  9 Shadows 231 947

Number of Samples
Trees, The details of the training and test sets of the two datasets are given in Reference maintain consistency with previous results, we used the same size training and test set other state-of-the-art approaches. We also adopted samples with precisely the same spa as in the previous studies. Each method was executed only once because the samples were identical to those used in the previous studies. False color images of the two presented in Figure 2.   Labels  Name  Training  Test  1  Asphalt  548  6631  2  Meadow  540  18,649  3  Gravel  392  2099  4  Trees  524  3064  5  Metal Sheets  256  1345  6  Bare Soil  532  5029  7  Bitumen  375  1330  8  Bricks  514  3682  9 Shadows 231 947

Number of Samples
Asphalt, The details of the training and test sets of the two datasets are given in References maintain consistency with previous results, we used the same size training and test sets other state-of-the-art approaches. We also adopted samples with precisely the same spat as in the previous studies. Each method was executed only once because the samples th were identical to those used in the previous studies. False color images of the two presented in Figure 2.   Labels  Name  Training  Test  1  Asphalt  548  6631  2  Meadow  540  18,649  3  Gravel  392  2099  4  Trees  524  3064  5  Metal Sheets  256  1345  6  Bare Soil  532  5029  7  Bitumen  375  1330  8  Bricks  514  3682  9 Shadows 231 947

Class Number of Samples
Bitumen, The details of the training and test sets of the two datasets are given in References maintain consistency with previous results, we used the same size training and test sets a other state-of-the-art approaches. We also adopted samples with precisely the same spatia as in the previous studies. Each method was executed only once because the samples th were identical to those used in the previous studies. False color images of the two d presented in Figure 2.   Labels  Name  Training  Test  1  Asphalt  548  6631  2  Meadow  540  18,649  3  Gravel  392  2099  4  Trees  524  3064  5  Metal Sheets  256  1345  6  Bare Soil  532  5029  7  Bitumen  375  1330  8  Bricks  514  3682  9 Shadows 231 947

Class Number of Samples
Gravel, The details of the training and test sets of the two datasets are given in References [ maintain consistency with previous results, we used the same size training and test sets ad other state-of-the-art approaches. We also adopted samples with precisely the same spatial as in the previous studies. Each method was executed only once because the samples tha were identical to those used in the previous studies. False color images of the two da presented in Figure 2.   Training  Test  548  6631  540  18,649  392  2099  524  3064  256  1345  532  5029  375  1330  514  3682  231 947

Number of Samples
Shadows, The details of the training and test sets of the two datasets are given in References [27,3 maintain consistency with previous results, we used the same size training and test sets adop other state-of-the-art approaches. We also adopted samples with precisely the same spatial loc as in the previous studies. Each method was executed only once because the samples that w were identical to those used in the previous studies. False color images of the two datase presented in Figure 2.   Labels  Name  Training  Test  1  Asphalt  548  6631  2  Meadow  540  18,649  3  Gravel  392  2099  4  Trees  524  3064  5  Metal Sheets  256  1345  6  Bare Soil  532  5029  7  Bitumen  375  1330  8  Bricks  514  3682  9 Shadows 231 947

Class Number of Samples
Meadows, The details of the training and test sets of the two datasets are given in References [27,36] maintain consistency with previous results, we used the same size training and test sets adopted other state-of-the-art approaches. We also adopted samples with precisely the same spatial locat as in the previous studies. Each method was executed only once because the samples that we u were identical to those used in the previous studies. False color images of the two datasets presented in Figure 2.   Labels  Name  Training  Test  1  Asphalt  548  6631  2  Meadow  540  18,649  3  Gravel  392  2099  4  Trees  524  3064  5  Metal Sheets  256  1345  6  Bare Soil  532  5029  7  Bitumen  375  1330  8  Bricks  514  3682  9 Shadows 231 947

Class Number of Samples
Bricks, The details of the training and test sets of the two datasets are given in References [27,36]. T maintain consistency with previous results, we used the same size training and test sets adopted b other state-of-the-art approaches. We also adopted samples with precisely the same spatial location as in the previous studies. Each method was executed only once because the samples that we use were identical to those used in the previous studies. False color images of the two datasets a presented in Figure 2.  The term ℱ signifies our proposed method. The FE approach is denoted by n. The spectral feature and spatial feature were fused into FDFRF in our proposed method. In the experiments, we set 5 k  , i.e., five NNs in each class were considered in the data field modeling. The features extracted by all the methods were analyzed by an RF classifier. In all the experiments, the number of trees was set to 200, as suggested in References [15,35,36], in order to achieve a trade-off between the classification performance and time cost for the learning phase. The method performances were evaluated by three measurements: the overall accuracy (OA), the average accuracy (AA), and the Kappa coefficient ( ). However, in order to avoid unnecessary redundancy in the following, the experimental results and comparison will only be analyzed based on OA. Tables 3 and 4, the results of our experiments with the two datasets show that feature fusion based on the data field theory can improve classification accuracy compared to the reference methods. The classification results acquired by the proposed method on the two datasets by the proposed method are shown in detail in Figures 3 and 4.   For the University of Pavia dataset, the data field feature fusion resulted in significantly improved classification accuracy. As can be observed from  etails of the training and test sets of the two datasets are given in References [27,36]. To nsistency with previous results, we used the same size training and test sets adopted by of-the-art approaches. We also adopted samples with precisely the same spatial locations revious studies. Each method was executed only once because the samples that we used ical to those used in the previous studies. False color images of the two datasets are n Figure 2. ails of the training and test sets of the two datasets are given in References [27,36]. To sistency with previous results, we used the same size training and test sets adopted by f-the-art approaches. We also adopted samples with precisely the same spatial locations vious studies. Each method was executed only once because the samples that we used al to those used in the previous studies. False color images of the two datasets are Figure 2.

As shown in
. The details of the training and test sets of the two da maintain consistency with previous results, we used the sa other state-of-the-art approaches. We also adopted sample as in the previous studies. Each method was executed onl were identical to those used in the previous studies. Fa presented in Figure 2.  Shadows 231 Corn-mintill, Sensors 2016, 16,2146 The details of the training and test sets of the two dataset maintain consistency with previous results, we used the same s other state-of-the-art approaches. We also adopted samples wit as in the previous studies. Each method was executed only on were identical to those used in the previous studies. False c presented in Figure 2.  aining and test sets of the two datasets are given in References [27,36]. To h previous results, we used the same size training and test sets adopted by roaches. We also adopted samples with precisely the same spatial locations s. Each method was executed only once because the samples that we used used in the previous studies. False color images of the two datasets are g and test sets of the two datasets are given in References [27,36]. To evious results, we used the same size training and test sets adopted by hes. We also adopted samples with precisely the same spatial locations ach method was executed only once because the samples that we used in the previous studies. False color images of the two datasets are Grass-pasture-mowed, Sensors 2016, 16,2146 The details of the training and test sets of the two datasets are given in R maintain consistency with previous results, we used the same size training and other state-of-the-art approaches. We also adopted samples with precisely the s as in the previous studies. Each method was executed only once because the s were identical to those used in the previous studies. False color images of presented in Figure 2.   Training  Test  1  Asphalt  548  6631  2  Meadow  540  18,649  3  Gravel  392  2099  4  Trees  524  3064  5  Metal Sheets  256  1345  6  Bare Soil  532  5029  7  Bitumen  375  1330  8  Bricks  514  3682  9 Shadows 231 947 Hay-windrowed, Sensors 2016, 16,2146 The details of the training and test sets of the two datasets are given in Re maintain consistency with previous results, we used the same size training and t other state-of-the-art approaches. We also adopted samples with precisely the sam as in the previous studies. Each method was executed only once because the sa were identical to those used in the previous studies. False color images of th presented in Figure 2.   Training  Test  1  Asphalt  548  6631  2  Meadow  540  18,649  3  Gravel  392  2099  4  Trees  524  3064  5  Metal Sheets  256  1345  6  Bare Soil  532  5029  7  Bitumen  375  1330  8  Bricks  514  3682  9 Shadows 231 947 Oats, Sensors 2016, 16,2146 The details of the training and test sets of the two datasets are given in Refe maintain consistency with previous results, we used the same size training and tes other state-of-the-art approaches. We also adopted samples with precisely the sam as in the previous studies. Each method was executed only once because the sam were identical to those used in the previous studies. False color images of the presented in Figure 2.   Training  Test  1  Asphalt  548  6631  2  Meadow  540  18,649  3  Gravel  392  2099  4  Trees  524  3064  5  Metal Sheets  256  1345  6  Bare Soil  532  5029  7  Bitumen  375  1330  8  Bricks  514  3682  9 Shadows 231 947 Soybean-notill, 7 of 17 sets of the two datasets are given in References [27,36]. To lts, we used the same size training and test sets adopted by adopted samples with precisely the same spatial locations was executed only once because the samples that we used vious studies. False color images of the two datasets are corresponding ground truth of (a,b) ROSIS-03 University of  Training  Test  sphalt  548  6631  eadow  540  18,649  ravel  392  2099  rees  524  3064  al Sheets  256  1345  re Soil  532  5029  tumen  375  1330  ricks  514  3682  adows  231  947 Soybean-mintill, Sensors 2016, 16,2146 The details of the training and test sets of the two datasets are given in Referenc maintain consistency with previous results, we used the same size training and test se other state-of-the-art approaches. We also adopted samples with precisely the same sp as in the previous studies. Each method was executed only once because the samples were identical to those used in the previous studies. False color images of the two presented in Figure 2.   Labels  Name  Training  Test  1  Asphalt  548  6631  2  Meadow  540  18,649  3  Gravel  392  2099  4  Trees  524  3064  5  Metal Sheets  256  1345  6  Bare Soil  532  5029  7  Bitumen  375  1330  8  Bricks  514  3682  9 Shadows 231 947

Class Number of Samples
Soybean-clean, Sensors 2016, 16,2146 The details of the training and test sets of the two datasets are given in Reference maintain consistency with previous results, we used the same size training and test sets other state-of-the-art approaches. We also adopted samples with precisely the same spat as in the previous studies. Each method was executed only once because the samples t were identical to those used in the previous studies. False color images of the two presented in Figure 2.   Labels  Name  Training  Test  1  Asphalt  548  6631  2  Meadow  540  18,649  3  Gravel  392  2099  4  Trees  524  3064  5  Metal Sheets  256  1345  6  Bare Soil  532  5029  7  Bitumen  375  1330  8  Bricks  514  3682  9 Shadows 231 947

Class Number of Samples
Wheat, Sensors 2016, 16,2146 The details of the training and test sets of the two datasets are given in References maintain consistency with previous results, we used the same size training and test sets a other state-of-the-art approaches. We also adopted samples with precisely the same spatia as in the previous studies. Each method was executed only once because the samples th were identical to those used in the previous studies. False color images of the two d presented in Figure 2.   Labels  Name  Training  Test  1  Asphalt  548  6631  2  Meadow  540  18,649  3  Gravel  392  2099  4  Trees  524  3064  5  Metal Sheets  256  1345  6  Bare Soil  532  5029  7  Bitumen  375  1330  8  Bricks  514  3682  9 Shadows 231 947

Class Number of Samples
Woods, Sensors 2016, 16,2146 The details of the training and test sets of the two datasets are given in References [2 maintain consistency with previous results, we used the same size training and test sets ad other state-of-the-art approaches. We also adopted samples with precisely the same spatial as in the previous studies. Each method was executed only once because the samples that were identical to those used in the previous studies. False color images of the two dat presented in Figure 2.   Training  Test  548  6631  540  18,649  392  2099  524  3064  256  1345  532  5029  375  1330  514  3682  231 947

Number of Samples
Stone-steel-towers.
For the University of Pavia dataset, the data field feature fusion resulted in significantly improved classification accuracy. As can be observed from Table 3, F DB outperformed the other methods with an OA of 99.4%. F DA achieved 19%, 4.1%, and 13.4% improvement in OA over DA, DA p and η DA , respectively. Compared with the corresponding reference DB, DB p and η DB methods, F DB improved the OA by 20.5%, 3.4%, and 2.6%, respectively. It is also important to emphasize that η DB exhibited excellent classification performances with an OA of 96.8%. In comparison, F DA and F DB achieved small improvements in OA of 2.1% and 2.6%, respectively. Although the improvements in classification accuracy are not remarkable in the manner of OA, more than 65.6% and 81.2% of test samples misclassified by η DB were corrected by F DA and F DB , respectively. We can therefore conclude that the proposed method effectively improved the classification performance.
Compared with the results reported in Table 3, it is easy to deduce that DBFE outperforms DAFE. The primary reason may be that DAFE is not full rank, so that some discriminative spectral information was lost. It should be noted that the classification performances of AUTOMATIC, which stacked the spectral and spatial features together, were affected by different FE approaches. The OA resulting from η DB is 11.3% more than that of η DA . Compared to the EMAP approaches, AUTOMATIC improved the classification accuracy when DBFE was employed. However, AUTOMATIC classification decreased when DAFE was performed. The proposed method is much more robust with respect to the choice of the FE technique. Classification results always remained at a high level when different FE approaches were used. This is because our method further fused the extracted spectral and spatial features. As a result, the useful information that lies in the spectral-spatial relationship and can contribute to the classification was included.
Compared with the employed state-of-the-art HSI classification methods, the proposed method additionally achieved competitive classification performance in this test case. F DB achieved the best classification results in terms of OA, AA and the κ value. As can be observed from the classfication results, F DB achieved approximately 3.3%, 1.3%, 0.6% and 0.2% improvements in OA over GA, SUnSAL, LBP and MH, respectively. Though the OA improvements are seemingly very small, almost 84.6%, 68.4%, 50% and 25% of misclassified samples in these methods were corrected, respectively. Moreover, F DA also produced a satisfactory classification performance with an OA of 98.9%. Although the MH approach reported a higher classification accuracy with an OA of 99.2%, F DA is competitive because it perfomed better than all the other reference methods.
In contrast to the University of Pavia dataset, the low spatial resolution, which leads to more mixed pixels, makes the classification task more complex in the Indian Pines dataset. For this test case, the HSI classification results, obtained by further feature fusion, were generally better than the corresponding compared methods. For example, F DA achieved 33.3%, 5.7%, and 3.5% improvements in OA over DA, DA p , and η DA , respectively. F DB improved the OA of DB, DB p , and η DB by 31.9%, 5.4%, and 11.9%, respectively. The best accuracies were obtained by using F DA which achieved an OA of 96.8%. It should be noted that reference methods exhibited acceptable performances in terms of classification accuracies. In contrast, the F DA achieved the best performances in 11 classes and F DB performed better than all the reference methods in 11 classes. As the results represented in Table 4 show, DAFE performs better than DBFE in terms of OA, AA, and the Kappa coefficient. A possible reason may be that the presence of the pixels with mixed spectra leads to the features number extracted by DBFE being insufficient to discriminate the samples in different classes.
In the Indian Pines dataset, the results also indicate that the AUTOMATIC approach is affected by different FE methods. Our method avoided this problem by using data field modeling and further feature fusion. As can be observed from the classification results reported in Table 4, the state-of-the-art spectral-spatial methods improved the classification more significantly than the spectral-based methods DA and DB in this test case. This may be because the spectral information is less dominant in this test case and introducing spatial information effectively contributes to the classification problem. As with the Pavia University dataset, our method obtained competitive results for this dataset in comparison to the other state-of-the-art methods. The best classification result was obtained by F DA with an OA of 96.8%, and the missclassified rates decreased approximately 48.4%, 46.6%, 52.2% and 25.6% compared to GA, SUnSAL, LBP and MH, respectively. Moreover, F DB also performed competitively with better classification accuracies than the other reference methods, except for MH.
As Equation (5) shows, the feature number (i.e., the dimensionality of the FDFRFs) in our method is determined by the number of the classes and NNs used in the data field modeling. The feature numbers of our method were 45 and 80 using the Pavia University dataset and Indian Pines dataset, respectively. The proposed method can be seen as an advancement of the AUTOMATIC approach. Accordingly, the feature numbers of the proposed method and AUTOMATIC are listed in Table 5. It can be seen from Table 5 that the proposed method achieved better classification results with acceptable feature numbers. Compared to the EMAP reference methods, our methods effectively reduced the feature numbers and improved classification accuracy. Moreover, F DB (consisting of 45 features) performed better than η DB , which consisted of 59 features in the Pavia University dataset. In the Indian Pines dataset, the proposed method also showed superior classification performance over AUTOMATIC approaches with an acceptable feature number.
Finally, we compare the computational complexity of the classification methods. As an example, the processing times (in seconds) of the methods with the Indian Pines dataset are shown in Table 6. All experiments were implemented using MATLAB on an Intel Core i5 CPU with 3.2 GHz and 4 GB of RAM. As can be seen in Table 6, the DAFE-based methods have an obvious advantage in computational time compared to DBFE-based approaches because DAFE is faster than DBFE. The computational costs of data field-based methods are higher than those of the corresponding AUTOMATIC approaches owing to the burden of building FDFRFs. Compared to the other methods, our method achieved superior classification performances at the cost of greater computational complexity and time consumption. However, the speed of our method could be improved by using time-efficient feature extraction approaches and parallel computing techniques.

Parameters
In this section, two important parameters used in the presented algorithm are discussed. First, the radiation factors used in the radiation function are analyzed, and an adaptive method for determining the radiation factor is put forward. Secondly, the relationship between the algorithm performance and k, the number of NNs used in data field modeling, is discussed.
As shown in Equation (1), the radiation intensity is jointly determined by the distance measurement d and the radiation factor. Radiation factors determine the character of the radiation effects in data fields or, for simplicity, the range of the data radiation domain. The distance measurement can lose meaning when ρ is extremely small or large. The data interact strongly when ρ is very small, whereas, the interactions between data are negligible if ρ is very large. Additionally, as before, we use different radiation factors in different spaces and classes when calculating radiation intensities. In this study, the values of the radiation factors were determined by the training samples. For a given training sample (x i , l), the training set can be divided into two parts, as mentioned in Section 2. The vector mean value of the Same Class Subset spectral features is denoted by x i,ϕ , which can be considered as the center of the class l in the spectral feature space. It is desirable for the training samples in the same class and different classes to have as strong and weak radiations as possible, respectively, i.e.,: where d +,ϕ is the mean value of the distances between x i,ϕ and the samples in the Same Class Subset, and d −,ϕ is the mean value of the distances from x i,ϕ to the samples in the Different Class Subset.
Therefore, ρ l ϕ (i.e., the radiation factor of (x i , l) in the spectral domain data field) can be adaptively determined by the training samples. The radiation factor of (x i , l) in the spatial domain data field, which is denoted by ρ l ω , can be determined in the same way. In our proposed method, the number of NNs k is the most important parameter in determining the data field modeling accuracy and classification performance. The influence of k on the algorithm performance, measured by OA, can be observed in Figure 5. Note that OA increases with k. However, the classification performance decreases when k > 5 and k > 10 in the Indian Pines dataset and Pavia University dataset, respectively. This is because a large k may lead to a higher dimension of FDFRF, which may cause the Hughes phenomenon. Moreover, a large k also brings a higher computation cost. Based on our experimental results, it is reasonable to set k = 5, which avoids the Hughes phenomenon and achieves a good trade-off between the classification performance and computation cost. Subset. Therefore, l   (i.e., the radiation factor of ( , ) i l x in the spectral domain data field) can be adaptively determined by the training samples. The radiation factor of ( , ) i l x in the spatial domain data field, which is denoted by l   , can be determined in the same way.
In our proposed method, the number of NNs k is the most important parameter in determining the data field modeling accuracy and classification performance. The influence of k on the algorithm performance, measured by OA, can be observed in Figure 5. Note that OA increases with k. However, the classification performance decreases when 5 and 10 in the Indian Pines dataset and Pavia University dataset, respectively. This is because a large k may lead to a higher dimension of FDFRF, which may cause the Hughes phenomenon. Moreover, a large k also brings a higher computation cost. Based on our experimental results, it is reasonable to set k = 5, which avoids the Hughes phenomenon and achieves a good trade-off between the classification performance and computation cost.

Experiments Using Reduced Training Samples
As can be observed in Tables 1 and 2, a large training set with 3909 training samples was used in the University of Pavia test case and a relatively small training set was employed in the Indian Pines dataset with 15 or 50 training pixels per class. In order to further validate the classification performance using a small training sample size, an additional experiment was performed using the Pavia University dataset with a reduced number of training samples. In this experiment, 30 training samples per class were randomly selected from the provided 3909 training samples to form the small training sample set. Table 6 reports the classification OA, AA,  value, and individual class accuracies achieved by different approaches. The classification maps acquired by our proposed method using the small training sample size are shown in Figure 6. As can be observed in Table 7, ℱ and LBP achieved the best classification performance in terms of OA, with an OA of approximately 96.6%. However, LBP performed better in terms of the AA and  value and

Experiments Using Reduced Training Samples
As can be observed in Tables 1 and 2, a large training set with 3909 training samples was used in the University of Pavia test case and a relatively small training set was employed in the Indian Pines dataset with 15 or 50 training pixels per class. In order to further validate the classification performance using a small training sample size, an additional experiment was performed using the Pavia University dataset with a reduced number of training samples. In this experiment, 30 training samples per class were randomly selected from the provided 3909 training samples to form the small training sample set. Table 6 reports the classification OA, AA, κ value, and individual class accuracies achieved by different approaches. The classification maps acquired by our proposed method using the small training sample size are shown in Figure 6. As can be observed in Table 7, F DB and LBP achieved the best classification performance in terms of OA, with an OA of approximately 96.6%. However, LBP performed better in terms of the AA and κ value and obtained the smallest degradation in OA. The reason might be because the LBP approach can extract the detailed local image characteristics, such as corners, edges and knots. Hence, it is more efficient and robust in describing spatial features than EMAP-based methods, particularly in the small training sample size case. F DA also demonstrated a competitive performance under the small training sample size. Compared with all the reference methods except LBP and F DB , F DA obtained higher classification accuracies. Therefore, it can be concluded that our proposed technique can achieve satisfactory classification results with limited training data. extract the detailed local image characteristics, such as corners, edges and knots. Hence, it is more efficient and robust in describing spatial features than EMAP-based methods, particularly in the small training sample size case. ℱ also demonstrated a competitive performance under the small training sample size. Compared with all the reference methods except LBP and ℱ , ℱ obtained higher classification accuracies. Therefore, it can be concluded that our proposed technique can achieve satisfactory classification results with limited training data.

Discussion
The experimental results demonstrate that feature fusion can further promote accurate classification performance. Compared to the reference methods, which simply fused the extracted features via vector stacking, the proposed method further fused the spectral and spatial information through the introduction of data field theory. A relationship between the spectral and spatial features was built and previously hidden information was explored. It can be concluded from our results that our method fused the spectral and spatial features in a reasonable and effective way. Furthermore, the proposed method is robust to the FE approaches, which is also desirable.
Two standard hyperspectral data sets were employed to measure the efficacy of our proposed method. The two test cases represent two typical types of classification problems. The Pavia University dataset covers an urban area with both high spectral and spatial resolution. It is a typical urban classification problem. The Indian Pines dataset, with relatively low spatial resolution, represents agriculture land-cover problems. The experimental results obtained on both datasets demonstrate that our proposed method is generally applicable to different classification problems.

Discussion
The experimental results demonstrate that feature fusion can further promote accurate classification performance. Compared to the reference methods, which simply fused the extracted features via vector stacking, the proposed method further fused the spectral and spatial information through the introduction of data field theory. A relationship between the spectral and spatial features was built and previously hidden information was explored. It can be concluded from our results that our method fused the spectral and spatial features in a reasonable and effective way. Furthermore, the proposed method is robust to the FE approaches, which is also desirable.
Two standard hyperspectral data sets were employed to measure the efficacy of our proposed method. The two test cases represent two typical types of classification problems. The Pavia University dataset covers an urban area with both high spectral and spatial resolution. It is a typical urban classification problem. The Indian Pines dataset, with relatively low spatial resolution, represents agriculture land-cover problems. The experimental results obtained on both datasets demonstrate that our proposed method is generally applicable to different classification problems.
A subject for future investigation is the optimization of data field modeling based on an imaging mechanism. The fusing model used in this paper is a linear weighted addition model. A more reasonable and effective model will be studied in future research. Another subject that deserves further research is the adaptive selection of the number of NNs used in the data field modeling.

Conclusions
In this study, a feature fusion method based on data field theory was proposed to carry out the supervised classification of HSI. As a mathematical realization of field theory concepts in physics, data field theory was employed to establish data field modeling in HSI. Both the spectral features and spatial space were considered data fields. The fusion weight coefficient was trained based on the data field modeling. Thus, a relationship between the spectral and spatial feature was constructed, and the two features were fused into a discriminative FDFRF. The weight coefficient training procedure was a further feature extraction process. The relationship between the spectral and spatial information was explored and the method was shown to achieve improved classification performance.