Figure 1.
The overall architecture of SAFS. Different SABlocks are stacked with a sequential architecture while ‘feature jump concatenation’ is employed to obtain more feature interaction and avoid possible information loss.
Figure 1.
The overall architecture of SAFS. Different SABlocks are stacked with a sequential architecture while ‘feature jump concatenation’ is employed to obtain more feature interaction and avoid possible information loss.
Figure 2.
The two key designs in SAFS. The left part is the basic building SABlock, and the right part shows the basic structure on how to stack two SAblocks.
Figure 2.
The two key designs in SAFS. The left part is the basic building SABlock, and the right part shows the basic structure on how to stack two SAblocks.
Figure 3.
The overall architecture of SAFS-Pa. Different SABlocks are stacked with a parallel architecture, different SABlocks vote to obtain weights, and then use the average as the final results.
Figure 3.
The overall architecture of SAFS-Pa. Different SABlocks are stacked with a parallel architecture, different SABlocks vote to obtain weights, and then use the average as the final results.
Figure 4.
Performance with different TopK levels. (a) SVHN: Description of the first panel. (b) CIFAR10: Description of the second panel. (c) GAS: Description of the third panel. (d) ISOLET: Description of the fourth panel. TopK represents important features selected by the respective baselines.
Figure 4.
Performance with different TopK levels. (a) SVHN: Description of the first panel. (b) CIFAR10: Description of the second panel. (c) GAS: Description of the third panel. (d) ISOLET: Description of the fourth panel. TopK represents important features selected by the respective baselines.
Figure 5.
Weight distributions on MNIST (Tabular) from different layers, the larger the feature weight, the more important it is.
Figure 5.
Weight distributions on MNIST (Tabular) from different layers, the larger the feature weight, the more important it is.
Figure 6.
Performance under different scenarios: (a–c) Few-shot learning with high-dimensional features (), showing performance at different TopK levels; (d–f) Robustness evaluation with 30% randomly masked data on datasets of varying scales: large (SVHN), medium (ISOLET), and small (DNA).
Figure 6.
Performance under different scenarios: (a–c) Few-shot learning with high-dimensional features (), showing performance at different TopK levels; (d–f) Robustness evaluation with 30% randomly masked data on datasets of varying scales: large (SVHN), medium (ISOLET), and small (DNA).
Figure 7.
Robustness analysis under different perturbation scenarios: (a) Performance with varying feature perturbations using Top-3% features; (b) Performance with 0%, 5%, and 10% label noise using Top-1% features. Average performance across all datasets is reported.
Figure 7.
Robustness analysis under different perturbation scenarios: (a) Performance with varying feature perturbations using Top-3% features; (b) Performance with 0%, 5%, and 10% label noise using Top-1% features. Average performance across all datasets is reported.
Figure 8.
Feature selection performance in two specialized tasks: (a) impact of changing task complexity on feature selection effectiveness; (b) feature selection applied to transfer learning scenarios. Both plots use the ISOLET dataset, with number of classes on the x-axis and Micro-F1 score on the y-axis.
Figure 8.
Feature selection performance in two specialized tasks: (a) impact of changing task complexity on feature selection effectiveness; (b) feature selection applied to transfer learning scenarios. Both plots use the ISOLET dataset, with number of classes on the x-axis and Micro-F1 score on the y-axis.
Figure 9.
Performance comparison of different stacked layer architectures on datasets of varying scales: (a) smaller-scale ISOLET dataset (617 dimensions); (b) larger-scale SVHN dataset (3072 dimensions). For clearer visualization, a slight horizontal offset was applied to the curves.
Figure 9.
Performance comparison of different stacked layer architectures on datasets of varying scales: (a) smaller-scale ISOLET dataset (617 dimensions); (b) larger-scale SVHN dataset (3072 dimensions). For clearer visualization, a slight horizontal offset was applied to the curves.
Figure 10.
Parameter sensitivity analysis: (a) model performance with varying batch sizes (); (b) model performance with varying inertia parameters (). Results are shown for different parameter ratios to demonstrate sensitivity.
Figure 10.
Parameter sensitivity analysis: (a) model performance with varying batch sizes (); (b) model performance with varying inertia parameters (). Results are shown for different parameter ratios to demonstrate sensitivity.
Figure 11.
Performance comparison across multiple datasets at different TopK levels: (1st row) SVHN, CIFAR10, MNIST; (2nd row) ISOLET, HAR, GAS; (3rd row) DNA, SATIMAGE, SEGMENT. The plots demonstrate the feature selection effectiveness across different data scales and domains.
Figure 11.
Performance comparison across multiple datasets at different TopK levels: (1st row) SVHN, CIFAR10, MNIST; (2nd row) ISOLET, HAR, GAS; (3rd row) DNA, SATIMAGE, SEGMENT. The plots demonstrate the feature selection effectiveness across different data scales and domains.
Figure 12.
Feature selection performance in the high-dimensional regime (), where the number of features exceeds the number of samples: (1st row) ISOLET, CIFAR10, SVHN; (2nd row) HAR, MNIST, DNA. Results show comparative performance at different TopK levels (D = dimensions).
Figure 12.
Feature selection performance in the high-dimensional regime (), where the number of features exceeds the number of samples: (1st row) ISOLET, CIFAR10, SVHN; (2nd row) HAR, MNIST, DNA. Results show comparative performance at different TopK levels (D = dimensions).
Figure 13.
Wind power forecasting using regression models with different FS methods. The x-axis is the time step and y-axis is the wind power (kW). For SAFS, the TOP-1 feature is Torque, while TOP-2 and -3 are Power Factor and Pitch Demand Baseline Degree, respectively.
Figure 13.
Wind power forecasting using regression models with different FS methods. The x-axis is the time step and y-axis is the wind power (kW). For SAFS, the TOP-1 feature is Torque, while TOP-2 and -3 are Power Factor and Pitch Demand Baseline Degree, respectively.
Table 1.
Notation description.
Table 1.
Notation description.
| Notation | Description |
|---|
| n | The number of samples |
| m | The number of features |
| g | Inertia parameter |
| s | Stack parameter |
| The i-th sample |
| The i-th label |
| The bias vector |
| The j-th feature of the i-th sample |
| K | The number of selected features |
| Distribution of dataset |
| Original data matrix |
| The label set corresponding to |
| Hidden units in a neural network |
| The weight vector of features |
| The selected features |
| The i-th feature |
| The batch-wise inputs |
| Low dimension embeddings |
| Trainable weight matrix |
| Loss function |
Table 2.
Datasets description.
Table 2.
Datasets description.
| Datasets | Features | TopK | Classes | Samples | Domains |
|---|
| Chiar. | 12,625 | %3(390) | 4 | 127 | Medical |
| SVHN(Tabular) | 3072 | %3(92) | 10 | 10,000 | Picture |
| CIFAR10(Tabular) | 3072 | %3(92) | 10 | 10,000 | Picture |
| Gravier | 2905 | %3(90) | 2 | 168 | Medical |
| Alon | 2000 | %3(60) | 2 | 62 | Medical |
| MNIST(Tabular) | 784 | %3(24) | 10 | 8000 | Picture |
| ISOLET | 618 | %3(18) | 26 | 2600 | Speech |
| HAR | 561 | %3(16) | 6 | 3000 | Physics |
| DNA | 180 | %3(5) | 3 | 450 | Biology |
| GAS | 128 | %3(5) | 6 | 6000 | Chemistry |
| SATIMAGE(SAT.) | 37 | 5 | 6 | 600 | Physics |
| SEGMENT(SEG.) | 19 | 5 | 7 | 1400 | Picture |
Table 3.
Average performance (Micro-F1↑) with LightGBM classifier with ten runs, ‘OT’ means overtime (more than 24 h of calculation in EPYC 7552*2 192 cores), ‘−’ means no result because of an internal error occurred. The best- and second-best results are highlighted in bold and with underline, respectively.
Table 3.
Average performance (Micro-F1↑) with LightGBM classifier with ten runs, ‘OT’ means overtime (more than 24 h of calculation in EPYC 7552*2 192 cores), ‘−’ means no result because of an internal error occurred. The best- and second-best results are highlighted in bold and with underline, respectively.
| Algorithm | SEG. | SAT. | GAS | DNA | HAR | ISOLET | MNIST | Alon | Gravier | SVHN | CIFAR10 | Chiar. |
|---|
| LASSO | 86.95 | 74.77 | 90.21 | 68.89 | 80.57 | 66.82 | 54.31 | 72.45 | 73.72 | 22.55 | 27.23 | 82.56 |
| RFE | 93.09 | 72.88 | 94.85 | 31.25 | 83.24 | 55.86 | 40.75 | 66.84 | 68.43 | OT | OT | OT |
| RF | 96.50 | 83.44 | 89.37 | 87.48 | 93.78 | 76.73 | 61.34 | 81.05 | 82.15 | 51.96 | 36.50 | 82.82 |
| XGB | 90.52 | 83.88 | 95.37 | 82.74 | 93.35 | 68.38 | 61.78 | 84.73 | 87.25 | 57.24 | 41.43 | 85.64 |
| CCM | 95.12 | 82.56 | 93.63 | 64.59 | 82.77 | 59.12 | 42.71 | 80.05 | 75.68 | 45.52 | 41.34 | − |
| FIR | 91.73 | 73.88 | 93.61 | 43.33 | 76.72 | 63.28 | 41.18 | 80.00 | 68.23 | 47.40 | 41.24 | 73.84 |
| AFS | 95.98 | 82.09 | 96.21 | 72.67 | 91.56 | 75.20 | 62.80 | 80.00 | 80.88 | 55.89 | 38.16 | 86.92 |
| SANs | 94.04 | 79.11 | 94.37 | 61.03 | 88.77 | 68.79 | 36.86 | 84.31 | 71.37 | 45.22 | 39.84 | 71.28 |
| FM | 95.24 | 80.45 | 97.01 | 83.91 | 89.94 | 75.88 | 63.67 | 78.02 | 77.57 | 57.92 | 42.87 | 78.85 |
| STG | 96.33 | 81.77 | 95.57 | 79.55 | 94.06 | 77.94 | 62.44 | 83.15 | 80.78 | 56.30 | 40.61 | 83.68 |
| NeuroFS | 94.34 | 82.54 | 86.11 | 70.37 | 94.95 | 60.59 | 55.61 | 78.59 | 80.26 | 45.90 | 42.42 | 87.91 |
| A-SFS | 96.17 | 80.04 | 94.35 | 80.59 | 92.44 | 69.25 | 64.16 | 75.78 | 72.55 | 42.14 | 38.94 | 74.36 |
| SEFS | 95.59 | 83.42 | 96.94 | 76.13 | − | 68.07 | 61.77 | 78.64 | 75.61 | 53.76 | 39.99 | − |
| SAFS-Pa | 96.42 | 84.91 | 97.14 | 87.64 | 93.86 | 74.00 | 64.71 | 85.38 | 85.58 | 54.40 | 41.66 | 88.23 |
| SAFS | 96.75 | 84.33 | 97.99 | 89.33 | 96.17 | 80.84 | 62.64 | 86.92 | 87.55 | 60.61 | 43.86 | 88.46 |
Table 4.
Average performance (Micro-F1↑) with Catboost classifier with ten runs, ‘OT’ means overtime (more than 24 h of calculation in EPYC 7552*2 192 cores), ‘−’ means no result because of an internal error occurred. The best- and second-best results are highlighted in bold and with underline, respectively.
Table 4.
Average performance (Micro-F1↑) with Catboost classifier with ten runs, ‘OT’ means overtime (more than 24 h of calculation in EPYC 7552*2 192 cores), ‘−’ means no result because of an internal error occurred. The best- and second-best results are highlighted in bold and with underline, respectively.
| Algorithm | SEG. | SAT. | GAS | DNA | HAR | ISOLET | MNIST | Alon | Gravier | SVHN | CIFAR10 | Chiar. |
|---|
| LASSO | 87.88 | 75.50 | 90.68 | 69.85 | 80.51 | 68.20 | 56.85 | 78.42 | 73.92 | 25.68 | 29.29 | 78.68 |
| RFE | 93.83 | 77.89 | 95.87 | 46.67 | 85.58 | 63.72 | 43.68 | OT | OT | OT | OT | OT |
| RF | 97.42 | 84.38 | 89.50 | 87.11 | 93.81 | 77.29 | 63.22 | 83.15 | 82.54 | 58.01 | 39.27 | 84.47 |
| XGB | 90.69 | 83.72 | 95.40 | 82.44 | 92.67 | 69.23 | 64.25 | 85.79 | 89.21 | 63.38 | 43.88 | 89.73 |
| CCM | 96.12 | 82.67 | 94.05 | 68.51 | 89.83 | 67.01 | 56.10 | 84.21 | 76.07 | 52.04 | 43.97 | − |
| FIR | 94.04 | 82.22 | 92.24 | 40.74 | 76.83 | 65.33 | 38.23 | 83.15 | 74.51 | 55.82 | 43.76 | 69.74 |
| AFS | 95.02 | 78.67 | 96.38 | 77.89 | 91.80 | 75.96 | 65.10 | 83.07 | 80.51 | 58.39 | 42.56 | 85.38 |
| SANs | 93.66 | 79.55 | 94.30 | 59.40 | 89.11 | 70.71 | 30.32 | 85.36 | 76.47 | 54.26 | 41.27 | 72.30 |
| FM | 95.33 | 80.76 | 97.12 | 83.47 | 90.45 | 76.33 | 64.02 | 81.14 | 77.73 | 59.26 | 44.56 | 76.30 |
| STG | 97.09 | 83.77 | 96.44 | 86.22 | 91.20 | 75.35 | 65.11 | 86.88 | 80.78 | 59.75 | 43.22 | 81.57 |
| NeuroFS | 95.28 | 83.03 | 88.49 | 82.05 | 95.22 | 58.47 | 57.64 | 82.87 | 84.26 | 46.62 | 43.58 | 86.67 |
| A-SFS | 95.57 | 83.55 | 94.35 | 80.51 | 87.73 | 76.64 | 65.99 | 76.38 | 74.07 | 56.29 | 42.23 | 76.92 |
| SEFS | 96.08 | 82.79 | 96.67 | 83.33 | − | 77.02 | 65.42 | 80.29 | 77.89 | 54.78 | 43.01 | − |
| SAFS-Pa | 96.96 | 84.50 | 97.12 | 86.22 | 94.43 | 75.44 | 66.63 | 86.15 | 84.61 | 60.74 | 45.29 | 85.76 |
| SAFS | 97.17 | 82.62 | 98.18 | 89.11 | 95.95 | 81.82 | 65.33 | 87.38 | 88.11 | 66.86 | 46.89 | 89.69 |
Table 5.
Relevant feature discovery results for synthetic datasets with 20 features.The best and second-best results are highlighted in bold and underline, respectively.
Table 5.
Relevant feature discovery results for synthetic datasets with 20 features.The best and second-best results are highlighted in bold and underline, respectively.
| Dataset | E1
| E2
| E3
| E4
| E5
| E6
|
|---|
| Metrics (%)
| TPR/F1
| TPR/F1
| TPR/F1
| TPR/F1
| TPR/F1
| TPR/F1
|
|---|
| XGB | 100/66.7 | 100/66.7 | 96.7/64.4 | 85.7/63.2 | 85.7/63.2 | 88.9/64.0 |
| RF | 100/66.7 | 100/66.7 | 100/66.7 | 71.4/58.8 | 74.3/59.6 | 76.5/60.4 |
| AFS | 100/66.7 | 100/66.7 | 96.7/64.4 | 46.7/48.0 | 57.2/53.3 | 46.7/48.0 |
| STG | 100/66.7 | 100/66.7 | 100/66.7 | 71.4/58.8 | 61.9/55.2 | 85.2/62.9 |
| NeuroFS | 100/66.7 | 100/66.7 | 96.7/64.4 | 45.6/47.6 | 65.6/57.2 | 52.5/51.6 |
| SAFS | 100/66.7 | 100/66.7 | 100/66.7 | 98.6/65.6 | 86.7/63.5 | 85.2/62.9 |
Table 6.
Ablation studies. SAFS-s: Remove the stack architecture, do FS only use one SABlock; SAFS-bn: Remove the BN-layer of SABlock; SAFS-i: Remove the inertia-based weight update strategy; SAFS-c: Remove the feature skip connection. The best and second-best results are highlighted in bold and underline, respectively.
Table 6.
Ablation studies. SAFS-s: Remove the stack architecture, do FS only use one SABlock; SAFS-bn: Remove the BN-layer of SABlock; SAFS-i: Remove the inertia-based weight update strategy; SAFS-c: Remove the feature skip connection. The best and second-best results are highlighted in bold and underline, respectively.
| Datasets | SAFS | SAFS-s | SAFS-bn | SAFS-i | SAFS-c |
|---|
| Chiar. | 88.46 ± 6.70 | 86.92 ± 7.33 | 70.38 ± 14.49 | 87.27 ± 7.52 | 86.00 ± 7.13 |
| SVHN | 60.61 ± 0.74 | 57.03 ± 3.18 | 57.82 ± 3.23 | 59.40 ± 1.06 | 59.47 ± 1.43 |
| ISOLET | 80.84 ± 2.16 | 73.01 ± 3.68 | 65.73 ± 0.66 | 78.20 ± 2.68 | 79.09 ± 2.29 |
| GAS | 97.99 ± 0.39 | 96.83 ± 0.72 | 93.95 ± 2.07 | 97.08 ± 0.81 | 97.65 ± 0.59 |
Table 7.
Performance comparison with different variants. The best and second-best results are highlighted in bold and underline, respectively.
Table 7.
Performance comparison with different variants. The best and second-best results are highlighted in bold and underline, respectively.
| # of Layer | HAR | ISOLET | SVHN | Chiar. |
|---|
| 1 | 94.85 ± 0.81 | 73.01 ± 3.68 | 54.68 ± 1.61 | 86.92 ± 7.33 |
| 2 | 95.92 ± 0.99 | 81.94 ± 1.71 | 55.62 ± 1.78 | 88.23 ± 8.03 |
| 3(Ours) | 96.17 ± 0.83 | 80.84 ± 2.16 | 60.61 ± 0.74 | 88.46 ± 6.70 |
| 5 | 96.08 ± 0.83 | 80.34 ± 1.13 | 59.64 ± 0.63 | 90.00 ± 7.73 |
| 10 | 95.71 ± 0.95 | 80.63 ± 2.05 | 59.54 ± 0.79 | 88.84 ± 7.38 |
| 20 | 95.63 ± 0.64 | 77.36 ± 3.03 | 59.69 ± 0.86 | 87.30 ± 8.25 |
Table 8.
Computational complexity per-iteration (in seconds).
Table 8.
Computational complexity per-iteration (in seconds).
| Dataset | (Sam./Dim.) | FIR | AFS | SANs | STG | SAFS |
|---|
| SVHN | (10,000, 3072) | 0.2128 | 0.0474 | 3.2687 | 0.1534 | 0.0748 |
| MNIST | (8000, 784) | 0.0710 | 0.0233 | 0.8126 | 0.0474 | 0.0316 |
| GAS | (6000, 128) | 0.0121 | 0.0073 | 0.1778 | 0.0111 | 0.0118 |
Table 9.
Average performance (Micro-F1↑) with LightGBM classifier with ten runs, ‘OT’ means overtime (more than 24 h of calculation in EPYC 7552*2 192 cores), ‘−’ means no result because of an internal error occurred. The best- and second-best results are highlighted in bold and with underline, respectively.
Table 9.
Average performance (Micro-F1↑) with LightGBM classifier with ten runs, ‘OT’ means overtime (more than 24 h of calculation in EPYC 7552*2 192 cores), ‘−’ means no result because of an internal error occurred. The best- and second-best results are highlighted in bold and with underline, respectively.
| Algorithm | SEGMENT | SATIMAGE | GAS | DNA | HAR | ISOLET |
|---|
| LASSO | 86.95 | 74.77 | 90.21 | 68.89 | 80.57 | 66.82 |
| RFE | 93.09 | 72.88 | 94.85 | 31.25 | 83.24 | 55.86 |
| RF | 96.50 | 83.44 | 89.37 | 87.48 | 93.78 | 76.73 |
| XGB | 90.52 | 83.88 | 95.37 | 82.74 | 93.35 | 68.38 |
| CCM | 95.12 | 82.56 | 93.63 | 64.59 | 82.77 | 59.12 |
| FIR | 91.73 | 73.88 | 93.61 | 43.33 | 76.72 | 63.28 |
| AFS | 95.98 | 82.09 | 96.21 | 72.67 | 91.56 | 75.20 |
| SANs | 94.04 | 79.11 | 94.37 | 61.03 | 88.77 | 68.79 |
| FM | 95.24 | 80.45 | 97.01 | 83.91 | 89.94 | 75.88 |
| STG | 96.33 | 81.77 | 95.57 | 79.55 | 94.06 | 77.94 |
| NeuroFS | 94.34 | 82.54 | 86.11 | 70.37 | 94.95 | 60.59 |
| A-SFS | 96.17 | 80.04 | 94.35 | 80.59 | 92.44 | 69.25 |
| SEFS | 95.59 | 83.42 | 96.94 | 76.13 | − | 68.07 |
| SAFS-Pa | 96.42 | 84.91 | 97.14 | 87.64 | 93.86 | 74.00 |
| SAFS | 96.75 | 84.33 | 97.99 | 89.33 | 96.17 | 80.84 |
| | MNIST | Alon | Gravier | SVHN | CIFAR10 | Chiaretti |
| LASSO | 54.31 | 72.45 | 73.72 | 22.55 | 27.23 | 82.56 |
| RFE | 40.75 | 66.84 | 68.43 | OT | OT | OT |
| RF | 61.34 | 81.05 | 82.15 | 51.96 | 36.50 | 82.82 |
| XGB | 61.78 | 84.73 | 87.25 | 57.24 | 41.43 | 85.64 |
| CCM | 42.71 | 80.05 | 75.68 | 45.52 | 41.34 | − |
| FIR | 41.18 | 80.00 | 68.23 | 47.40 | 41.24 | 73.84 |
| AFS | 62.80 | 80.00 | 80.88 | 55.89 | 38.16 | 86.92 |
| SANs | 36.86 | 84.31 | 71.37 | 45.22 | 39.84 | 71.28 |
| FM | 63.67 | 78.02 | 77.57 | 57.92 | 42.87 | 78.85 |
| STG | 62.44 | 83.15 | 80.78 | 56.30 | 40.61 | 83.68 |
| NeuroFS | 55.61 | 78.59 | 80.26 | 45.90 | 42.42 | 87.91 |
| A-SFS | 64.16 | 75.78 | 72.55 | 42.14 | 38.94 | 74.36 |
| SEFS | 61.77 | 78.64 | 75.61 | 53.76 | 39.99 | − |
| SAFS-Pa | 64.71 | 85.38 | 85.58 | 54.40 | 41.66 | 88.23 |
| SAFS | 62.64 | 86.92 | 87.55 | 60.61 | 43.86 | 88.46 |
Table 10.
Average performance (Micro-F1↑) with Catboost classifier with ten runs, ‘OT’ means overtime (more than 24 h of calculation in EPYC 7552*2 192 cores), ‘−’ means no result because of an internal error occurred. The best- and second-best results are highlighted in bold and with underline, respectively.
Table 10.
Average performance (Micro-F1↑) with Catboost classifier with ten runs, ‘OT’ means overtime (more than 24 h of calculation in EPYC 7552*2 192 cores), ‘−’ means no result because of an internal error occurred. The best- and second-best results are highlighted in bold and with underline, respectively.
| Algorithm | SEGMENT | SATIMAGE | GAS | DNA | HAR | ISOLET |
|---|
| LASSO | 87.88 | 75.50 | 90.68 | 69.85 | 80.51 | 68.20 |
| RFE | 93.83 | 77.89 | 95.87 | 46.67 | 85.58 | 63.72 |
| RF | 97.42 | 84.38 | 89.50 | 87.11 | 93.81 | 77.29 |
| XGB | 90.69 | 83.72 | 95.40 | 82.44 | 92.67 | 69.23 |
| CCM | 96.12 | 82.67 | 94.05 | 68.51 | 89.83 | 67.01 |
| FIR | 94.04 | 82.22 | 92.24 | 40.74 | 76.83 | 65.33 |
| AFS | 95.02 | 78.67 | 96.38 | 77.89 | 91.80 | 75.96 |
| SANs | 93.66 | 79.55 | 94.30 | 59.40 | 89.11 | 70.71 |
| STG | 97.09 | 83.77 | 96.44 | 86.22 | 91.20 | 75.35 |
| FM | 95.33 | 80.76 | 97.12 | 83.47 | 90.45 | 76.33 |
| NeuroFS | 95.28 | 83.03 | 88.49 | 82.05 | 95.22 | 58.47 |
| A-SFS | 95.57 | 83.55 | 94.35 | 80.51 | 87.73 | 76.64 |
| SEFS | 96.08 | 82.79 | 96.67 | 83.33 | − | 77.02 |
| SAFS-Pa | 96.96 | 84.50 | 97.12 | 86.22 | 94.43 | 75.44 |
| SAFS | 97.17 | 82.62 | 98.18 | 89.11 | 95.95 | 81.82 |
| LASSO | 56.85 | 78.42 | 73.92 | 25.68 | 29.29 | 78.68 |
| RFE | 43.68 | OT | OT | OT | OT | OT |
| RF | 63.22 | 83.15 | 82.54 | 58.01 | 39.27 | 84.47 |
| XGB | 64.25 | 85.79 | 89.21 | 63.38 | 43.88 | 89.73 |
| CCM | 56.10 | 84.21 | 76.07 | 52.04 | 43.97 | − |
| FIR | 38.23 | 83.15 | 74.51 | 55.82 | 43.76 | 69.74 |
| AFS | 65.10 | 83.07 | 80.51 | 58.39 | 42.56 | 85.38 |
| SANs | 30.32 | 85.36 | 76.47 | 54.26 | 41.27 | 72.30 |
| STG | 65.11 | 86.88 | 80.78 | 59.75 | 43.22 | 81.57 |
| FM | 64.02 | 81.14 | 77.73 | 59.26 | 44.56 | 76.30 |
| NeuroFS | 57.64 | 82.87 | 84.26 | 46.62 | 43.58 | 86.67 |
| A-SFS | 65.99 | 76.38 | 74.07 | 56.29 | 42.23 | 76.92 |
| SEFS | 65.42 | 80.29 | 77.89 | 54.78 | 43.01 | − |
| SAFS-Pa | 66.63 | 86.15 | 84.61 | 60.74 | 45.29 | 85.76 |
| SAFS | 65.33 | 87.38 | 88.11 | 66.86 | 46.89 | 89.69 |
Table 11.
Ablation studies. SAFS-s: Remove the stack architecture, do FS only use the SABlock; SAFS-bn: Remove the BN-layer of SABlock; SAFS-i: Remove the inertia-based weight update strategy; SAFS-c: Remove the feature skip connection. The best and second-best results are highlighted in bold and underline, respectively.
Table 11.
Ablation studies. SAFS-s: Remove the stack architecture, do FS only use the SABlock; SAFS-bn: Remove the BN-layer of SABlock; SAFS-i: Remove the inertia-based weight update strategy; SAFS-c: Remove the feature skip connection. The best and second-best results are highlighted in bold and underline, respectively.
| Datasets | SAFS | SAFS-s | SAFS-bn | SAFS-i | SAFS-c |
|---|
| SVHN | 60.61 ± 0.74 | 57.03 ± 3.18 | 57.82 ± 3.23 | 59.40 ± 1.06 | 59.47 ± 1.43 |
| CIFAR10 | 43.86 ± 1.31 | 41.48 ± 1.29 | 41.39 ± 1.76 | 43.61 ± 0.90 | 43.57 ± 1.33 |
| MNIST | 62.64 ± 2.37 | 64.04 ± 2.77 | 59.98 ± 4.12 | 63.65 ± 2.05 | 62.38 ± 1.74 |
| ISOLET | 80.84 ± 2.16 | 73.01 ± 3.68 | 65.73 ± 0.66 | 78.20 ± 2.68 | 79.09 ± 2.29 |
| HAR | 96.17 ± 0.83 | 94.85 ± 0.81 | 82.25 ± 5.84 | 94.25 ± 1.80 | 95.95 ± 0.92 |
| DNA | 89.33 ± 3.41 | 85.67 ± 6.67 | 87.22 ± 3.10 | 88.88 ± 3.44 | 88.11 ± 4.95 |
| GAS | 97.99 ± 0.39 | 96.83 ± 0.72 | 93.95 ± 2.07 | 97.08 ± 0.81 | 97.65 ± 0.59 |
| SATIMAGE | 84.33 ± 3.06 | 84.67 ± 2.42 | 81.33 ± 7.09 | 83.08 ± 3.07 | 82.37 ± 2.93 |
| SEGMENT | 96.75 ± 1.37 | 96.43 ± 0.71 | 96.64 ± 1.10 | 96.04 ± 1.86 | 96.17 ± 1.31 |
Table 12.
Description of the wind power dataset.
Table 12.
Description of the wind power dataset.
| Statistics | Mean | Std | Min | 25% | 50% | 75% | Max |
|---|
| Power (kW) | 1052.86 | 1083.11 | −42.19 | 33.75 | 588.81 | 2218.21 | 2777.19 |