Figure 1.
Overall framework of the proposed MSST network for HSI classification. The MSST network comprises a token generator, a transformer encoder, and a classifier. The transformer encoder features token fusion self-attention, cross-covariance attention, and a feedforward network.
Figure 1.
Overall framework of the proposed MSST network for HSI classification. The MSST network comprises a token generator, a transformer encoder, and a classifier. The transformer encoder features token fusion self-attention, cross-covariance attention, and a feedforward network.
Figure 2.
Processing details for the spatial–spectral token generator.
Figure 2.
Processing details for the spatial–spectral token generator.
Figure 3.
Structures of the self-attention (a) and token fusion self-attention (b) mechanisms. In self-attention, the attention computation involves initially mapping tokens into Q, K, and V, followed by utilizing them to calculate the attention output. In TFSA, tokens are first fused at multiple scales, followed by mapping them into multi-scale Q, K, and V representations. Using Q, K, and V of corresponding scales, attention outputs are computed for different scales. Finally, these multiple attention outputs are summed.
Figure 3.
Structures of the self-attention (a) and token fusion self-attention (b) mechanisms. In self-attention, the attention computation involves initially mapping tokens into Q, K, and V, followed by utilizing them to calculate the attention output. In TFSA, tokens are first fused at multiple scales, followed by mapping them into multi-scale Q, K, and V representations. Using Q, K, and V of corresponding scales, attention outputs are computed for different scales. Finally, these multiple attention outputs are summed.
Figure 4.
Illustration of a transformer encoder.
Figure 4.
Illustration of a transformer encoder.
Figure 5.
Trento dataset: (a) false color and (b) ground truth.
Figure 5.
Trento dataset: (a) false color and (b) ground truth.
Figure 6.
Pavia University dataset: (a) false color and (b) ground truth.
Figure 6.
Pavia University dataset: (a) false color and (b) ground truth.
Figure 7.
Houston2013 dataset: (a) false color and (b) ground truth.
Figure 7.
Houston2013 dataset: (a) false color and (b) ground truth.
Figure 8.
Classification maps for the Trento dataset. (a) False color, (b) ground truth, (c–l): SVM (OA = 82.12%), RF (OA = 84.01%), 3D-CNN (OA = 93.20%), G2C-3DCNN (OA = 96.92%), HybridSN (OA = 96.35%), SSRN (OA = 96.75%), ViT (OA = 95.27%), SpecFormer (OA = 95.99%), SSFFT (OA = 98.45%), MSST (OA = 98.83%).
Figure 8.
Classification maps for the Trento dataset. (a) False color, (b) ground truth, (c–l): SVM (OA = 82.12%), RF (OA = 84.01%), 3D-CNN (OA = 93.20%), G2C-3DCNN (OA = 96.92%), HybridSN (OA = 96.35%), SSRN (OA = 96.75%), ViT (OA = 95.27%), SpecFormer (OA = 95.99%), SSFFT (OA = 98.45%), MSST (OA = 98.83%).
Figure 9.
Classification maps for the Pavia University dataset. (a) False color, (b) ground truth, (c–l): SVM (OA = 82.19%), RF (OA = 80.04%), 3D-CNN (OA = 90.86%), G2C-3DCNN (OA = 96.73%), HybridSN (OA = 94.08%), SSRN (OA = 97.14%), ViT (OA = 93.95%), SpecFormer (OA = 94.85%), SSFFT (OA = 97.57%), MSST (OA = 97.85%).
Figure 9.
Classification maps for the Pavia University dataset. (a) False color, (b) ground truth, (c–l): SVM (OA = 82.19%), RF (OA = 80.04%), 3D-CNN (OA = 90.86%), G2C-3DCNN (OA = 96.73%), HybridSN (OA = 94.08%), SSRN (OA = 97.14%), ViT (OA = 93.95%), SpecFormer (OA = 94.85%), SSFFT (OA = 97.57%), MSST (OA = 97.85%).
Figure 10.
Classification maps for the Houston 2013 dataset. (a) False color, (b) ground truth, (c–l): SVM (OA = 81.02%), RF (OA = 83.14%), 3D-CNN (OA = 82.03%), G2C-3DCNN (OA = 87.02%), HybridSN (OA = 85.51%), SSRN (OA =89.59%), ViT (OA = 83.32%), SpecFormer (OA = 84.07%), SSFFT (OA = 89.48%), MSST (OA = 90.29%).
Figure 10.
Classification maps for the Houston 2013 dataset. (a) False color, (b) ground truth, (c–l): SVM (OA = 81.02%), RF (OA = 83.14%), 3D-CNN (OA = 82.03%), G2C-3DCNN (OA = 87.02%), HybridSN (OA = 85.51%), SSRN (OA =89.59%), ViT (OA = 83.32%), SpecFormer (OA = 84.07%), SSFFT (OA = 89.48%), MSST (OA = 90.29%).
Figure 11.
Effect of patch size and number of tokens on the OA. (a) Trento. (b) Pavia University. (c) Houston 2013.
Figure 11.
Effect of patch size and number of tokens on the OA. (a) Trento. (b) Pavia University. (c) Houston 2013.
Figure 12.
The effect of the number of attention heads on overall accuracy.
Figure 12.
The effect of the number of attention heads on overall accuracy.
Figure 13.
The effect of token fusion patterns on overall accuracy.
Figure 13.
The effect of token fusion patterns on overall accuracy.
Figure 14.
OA of MSST for different number of training samples. (a) Trento. (b) Pavia University. (c) Houston 2013.
Figure 14.
OA of MSST for different number of training samples. (a) Trento. (b) Pavia University. (c) Houston 2013.
Figure 15.
Graphical visualizations in 2D of the features extracted by the proposed MSST using t-SNE. (a) Trento. (b) Pavia University. (c) Houston 2013.
Figure 15.
Graphical visualizations in 2D of the features extracted by the proposed MSST using t-SNE. (a) Trento. (b) Pavia University. (c) Houston 2013.
Table 1.
Details of classes in the Trento dataset and the numbers of samples used for training and testing.
Table 1.
Details of classes in the Trento dataset and the numbers of samples used for training and testing.
Class No. | Color | Class Name | Test | Train |
---|
1 | | MidnightBlue | Apple trees | 3994 | 40 |
2 | | Blue | Buildings | 2874 | 29 |
3 | | LawnGreen | Ground | 474 | 5 |
4 | | Yellow | Woods | 9032 | 91 |
5 | | Red | Vineyard | 10,396 | 105 |
6 | | FireBrick | Roads | 3142 | 32 |
Table 2.
Details of classes in the Pavia University dataset and the numbers of samples used for training and testing.
Table 2.
Details of classes in the Pavia University dataset and the numbers of samples used for training and testing.
Class No. | Color | Class Name | Test | Train |
---|
1 | | Blue | Asphalt | 1249 | 13 |
2 | | Green | Meadows | 201 | 3 |
3 | | Cyan | Gravel | 607 | 7 |
4 | | ForestGreen | Trees | 148 | 2 |
5 | | Magenta | Painted metal sheets | 1750 | 18 |
6 | | SaddleBrown | Bare Soil | 357 | 4 |
7 | | Purple | Bitumen | 4984 | 51 |
8 | | Red | Self-Blocking Bricks | 6310 | 64 |
9 | | Yellow | Shadows | 394 | 4 |
Table 3.
Details of classes in the Houston dataset and the numbers of samples used for training and testing.
Table 3.
Details of classes in the Houston dataset and the numbers of samples used for training and testing.
Class No. | Color | Class Name | Test | Train |
---|
1 | | SaddleBrown | Healthy Grass | 13,901 | 140 |
2 | | Blue | Stressed Grass | 3477 | 35 |
3 | | Orange | Synthetic Grass | 21,603 | 218 |
4 | | Green | Trees | 161,653 | 1632 |
5 | | Orchid | Soil | 6156 | 62 |
6 | | SkyBlue | Water | 44,111 | 446 |
7 | | MintGreen | Residential | 23,862 | 241 |
8 | | CoolGray | Commercial | 4013 | 41 |
9 | | Yellow | Road | 10,711 | 108 |
10 | | BananaYellow | Highway | 12,270 | 124 |
11 | | Magenta | Railway | 10,905 | 110 |
12 | | BlueViolet | Parking Lot 1 | 8864 | 90 |
13 | | DodgerBlue | Parking Lot 2 | 22,282 | 225 |
14 | | Linen | Tennis Court | 7282 | 74 |
15 | | Red | Running Track | 4000 | 40 |
Table 4.
Classification performance obtained by different methods for the Pavia University dataset (best results are bolded).
Table 4.
Classification performance obtained by different methods for the Pavia University dataset (best results are bolded).
Class | SVM | RF | 3D-CNN | G2C-3DCNN | HybridSN | SSRN | ViT | SpecFormer | SSFFT | Ours |
---|
1 | 89.00 | 92.92 | 96.92 | 96.73 | 94.01 | 98.08 | 93.12 | 96.98 | 97.36 | 96.38 |
2 | 94.41 | 97.39 | 99.76 | 99.81 | 99.49 | 99.63 | 99.80 | 99.98 | 99.88 | 99.90 |
3 | 49.47 | 23.00 | 56.59 | 82.44 | 75.31 | 84.46 | 60.73 | 71.90 | 90.18 | 90.42 |
4 | 83.02 | 79.95 | 84.37 | 89.02 | 87.90 | 88.33 | 92.71 | 84.31 | 92.05 | 93.68 |
5 | 98.80 | 99.77 | 99.62 | 99.85 | 100.00 | 98.87 | 99.55 | 99.77 | 99.10 | 99.22 |
6 | 62.88 | 27.70 | 85.46 | 98.57 | 93.63 | 100.00 | 94.92 | 97.33 | 99.84 | 100.00 |
7 | 27.79 | 33.64 | 68.03 | 97.27 | 90.51 | 94.76 | 79.57 | 90.05 | 100.00 | 100.00 |
8 | 68.20 | 80.43 | 85.87 | 91.08 | 82.28 | 97.06 | 88.34 | 88.29 | 92.37 | 95.35 |
9 | 73.32 | 71.78 | 37.99 | 99.68 | 94.34 | 84.20 | 91.14 | 75.88 | 90.50 | 91.21 |
OA (%) | 82.19 ± 0.47 | 80.04 ± 0.65 | 90.86 ± 0.41 | 96.73 ± 0.24 | 94.08 ± 0.40 | 97.14 ± 0.23 | 93.95 ± 0.54 | 94.85 ± 0.10 | 97.57 ± 0.08 | 97.85 ± 0.24 |
AA (%) | 71.88 ± 0.66 | 67.40 ± 0.74 | 79.40 ± 0.26 | 94.94 ± 0.21 | 90.83 ± 0.32 | 93.93 ± 0.18 | 88.88 ± 0.61 | 89.39 ± 0.25 | 95.70 ± 0.20 | 96.24 ± 0.05 |
K × 100 | 75.94 ± 0.42 | 72.50 ± 0.50 | 87.70 ± 0.14 | 95.65 ± 0.15 | 92.11 ± 0.18 | 96.21 ± 0.06 | 91.93 ± 0.51 | 93.12 ± 0.28 | 96.78 ± 0.33 | 97.16 ± 0.37 |
Table 5.
Classification performance obtained by different methods for the Trento dataset (best results are bolded).
Table 5.
Classification performance obtained by different methods for the Trento dataset (best results are bolded).
Class | SVM | RF | 3D-CNN | G2C-3DCNN | HybridSN | SSRN | ViT | SpecFormer | SSFFT | Ours |
---|
1 | 78.14 | 68.70 | 97.65 | 98.87 | 99.70 | 99.17 | 96.52 | 98.97 | 99.67 | 99.65 |
2 | 59.19 | 64.65 | 66.74 | 82.67 | 85.32 | 88.45 | 84.06 | 90.22 | 94.33 | 95.72 |
3 | 31.01 | 36.71 | 49.16 | 83.33 | 81.86 | 46.62 | 36.50 | 22.36 | 57.38 | 81.22 |
4 | 95.26 | 94.39 | 99.09 | 99.92 | 99.03 | 100.00 | 99.75 | 100.00 | 99.98 | 99.98 |
5 | 85.58 | 90.22 | 99.89 | 100.00 | 99.72 | 100.00 | 99.99 | 99.95 | 99.97 | 100.00 |
6 | 66.65 | 77.94 | 79.28 | 90.07 | 85.55 | 88.77 | 84.28 | 83.96 | 97.45 | 96.12 |
OA (%) | 82.12 ± 1.02 | 84.01 ± 0.74 | 93.20 ± 0.37 | 96.92 ± 0.17 | 96.35 ± 0.38 | 96.75 ± 0.10 | 95.27 ± 0.41 | 95.99 ± 0.24 | 98.45 ± 0.08 | 98.83 ± 0.38 |
AA (%) | 69.30 ± 0.84 | 72.10 ± 0.58 | 81.97 ± 0.32 | 92.58 ± 0.34 | 91.86 ± 0.15 | 87.17 ± 0.07 | 83.52 ± 0.55 | 82.58 ± 0.10 | 91.46 ± 0.15 | 95.45 ± 0.20 |
K × 100 | 76.00 ± 0.80 | 78.43 ± 0.66 | 90.86 ± 0.24 | 95.88 ± 0.20 | 95.13 ± 0.26 | 95.66 ± 0.07 | 93.66 ± 0.34 | 94.64 ± 0.14 | 97.93 ± 0.18 | 98.44 ± 0.18 |
Table 6.
Classification performance obtained by different methods for the Houston 2013 dataset (best results are bolded).
Table 6.
Classification performance obtained by different methods for the Houston 2013 dataset (best results are bolded).
Class | SVM | RF | 3D-CNN | G2C-3DCNN | HybridSN | SSRN | ViT | SpecFormer | SSFFT | Ours |
---|
1 | 96.40 | 96.32 | 89.01 | 86.62 | 98.01 | 86.03 | 84.49 | 87.87 | 86.47 | 89.12 |
2 | 87.57 | 96.04 | 82.37 | 91.67 | 95.63 | 90.28 | 95.63 | 93.16 | 95.90 | 92.85 |
3 | 99.62 | 99.75 | 97.78 | 97.97 | 94.54 | 97.20 | 95.55 | 94.16 | 95.93 | 95.43 |
4 | 90.57 | 90.25 | 88.49 | 89.05 | 89.69 | 93.76 | 83.21 | 89.49 | 81.14 | 91.13 |
5 | 97.59 | 99.46 | 99.77 | 100.00 | 99.30 | 100.00 | 99.46 | 99.77 | 100.00 | 100.00 |
6 | 45.24 | 61.01 | 83.63 | 83.33 | 84.82 | 86.31 | 83.33 | 61.76 | 86.31 | 86.31 |
7 | 81.66 | 81.66 | 63.72 | 67.28 | 74.81 | 79.95 | 75.63 | 67.45 | 74.06 | 72.35 |
8 | 76.57 | 68.43 | 67.31 | 72.46 | 65.75 | 70.00 | 80.97 | 77.54 | 78.06 | 76.87 |
9 | 68.88 | 79.27 | 64.72 | 77.26 | 77.52 | 88.43 | 79.53 | 76.16 | 80.25 | 79.40 |
10 | 78.23 | 79.50 | 94.90 | 99.08 | 96.95 | 100.00 | 99.72 | 98.98 | 100.00 | 99.57 |
11 | 79.94 | 87.87 | 76.68 | 87.29 | 74.39 | 88.00 | 64.45 | 83.39 | 95.48 | 98.06 |
12 | 69.89 | 68.90 | 90.04 | 94.49 | 75.76 | 91.95 | 94.70 | 93.71 | 98.16 | 98.09 |
13 | 23.80 | 12.78 | 50.32 | 65.50 | 68.05 | 81.63 | 31.79 | 39.54 | 73.00 | 80.35 |
14 | 86.61 | 96.46 | 90.16 | 99.61 | 100.00 | 99.02 | 68.31 | 70.97 | 100.00 | 100.00 |
15 | 96.08 | 97.09 | 99.69 | 99.37 | 100.00 | 99.37 | 84.05 | 84.69 | 100.00 | 100.00 |
OA (%) | 81.02 ± 0.34 | 83.14 ± 0.27 | 82.03 ± 0.21 | 87.02 ± 0.48 | 85.51 ± 0.16 | 89.59 ± 0.41 | 83.32 ± 0.58 | 84.07 ± 0.15 | 89.48 ± 0.10 | 90.29 ± 0.12 |
AA (%) | 78.58 ± 0.62 | 80.99 ± 0.29 | 82.57 ± 0.18 | 87.40 ± 0.30 | 86.35 ± 0.29 | 90.10 ± 0.64 | 81.39 ± 0.60 | 81.24 ± 0.08 | 89.65 ± 0.07 | 90.63 ± 0.08 |
K × 100 | 79.45 ± 0.33 | 81.74 ± 0.53 | 80.56 ± 0.15 | 85.96 ± 0.23 | 84.34 ± 0.08 | 88.75 ± 0.28 | 81.94 ± 0.52 | 82.74 ± 0.22 | 88.62 ± 0.28 | 89.50 ± 0.27 |
Table 7.
Training times and test times for the contrasting methods and the proposed method on the three datasets.
Table 7.
Training times and test times for the contrasting methods and the proposed method on the three datasets.
Methods | Train(S) | Test(S) |
---|
Trento | PU | Houston 2013 | Trento | PU | Houston 2013 |
---|
3DCNN | 127.99 | 174.96 | 103.97 | 2.65 | 3.12 | 1.53 |
G2C-3DCNN | 122.41 | 192.26 | 101.98 | 2.51 | 4.37 | 1.34 |
HybridSN | 124.03 | 163.80 | 95.05 | 2.22 | 3.84 | 1.58 |
SSRN | 165.22 | 200.15 | 122.02 | 3.37 | 4.54 | 2.23 |
ViT | 188.06 | 236.09 | 145.92 | 4.56 | 6.12 | 3.40 |
SpecFormer | 187.24 | 227.10 | 142.248 | 4.01 | 5.71 | 2.90 |
SSFTT | 145.59 | 183.85 | 116.71 | 3.22 | 3.79 | 2.17 |
MSST | 185.72 | 222.32 | 123.47 | 4.12 | 5.44 | 2.94 |
Table 8.
Details of the settings for the five fusion modes of TFSA.
Table 8.
Details of the settings for the five fusion modes of TFSA.
Pattern1 | Pattern2 | Pattern3 | Pattern4 | Pattern5 |
---|
| | | | |
Table 9.
Ablation study results of the main components on three datasets (best results are bolded).
Table 9.
Ablation study results of the main components on three datasets (best results are bolded).
Case | Components | Dataset |
---|
SSTG | TFSA | CCA | Houston 2013 | Trento | PU |
---|
1 | × | × | × | 83.49 | 95.45 | 93.15 |
2 | √ | × | × | 88.85 | 97.05 | 96.22 |
3 | × | √ | × | 89.65 | 97.49 | 95.57 |
4 | √ | √ | × | 90.01 | 98.68 | 97.80 |
5 | × | √ | √ | 89.46 | 97.62 | 96.85 |
6 | √ | √ | √ | 90.29 | 98.83 | 97.85 |
Table 10.
Ablation study results of the detailed elements of the SSTG on three datasets (best results are bolded).
Table 10.
Ablation study results of the detailed elements of the SSTG on three datasets (best results are bolded).
Case | Components | Dataset |
---|
Main Branch | Query Pixel Branch | Houston 2013 | Trento | PU |
---|
1 | √ | × | 89.67 | 98.81 | 97.44 |
2 | × | √ | 84.18 | 93.73 | 94.62 |
3 | √ | √ | 90.29 | 98.83 | 97.85 |