This section evaluates the performance of PFS integrated with KANs and MLPs for the fundamental computer vision task of classification tasks. We conduct comprehensive comparisons with fully connected networks and CNNs in terms of accuracy, parameter efficiency, and training time. To thoroughly assess the generalizability of PFS, experiments are performed not only on standard benchmarks, such as MNIST and CIFAR-10, but also extended to more challenging datasets, including Fashion-MNIST and CIFAR-100, ensuring a robust evaluation across varying datasets.
3.1. Model Achitectures
In this study, we designed multiple architectures to evaluate the feature extraction capabilities of PFS against mainstream CNN-based approaches, as well as its compatibility with KANs and classical MLPs. The tested architectures include
PFSKAN: PFS combined with KANs;
PFSMLP: PFS integrated with MLPs;
Standalone KAN and MLP: one-layer implementations for baseline comparison;
ConvMLP: traditional convolutional network with an MLP head;
ConvKAN: convolutional network paired with KANs.
Given that MNIST contains low-resolution (28 × 28) single-channel grayscale images, we employed small and medium models, which were subsequently applied to Fashion-MNIST to maintain experimental consistency. In contrast, the CIFAR datasets, while also low-resolution (32 × 32), consist of three-channel RGB color images We therefore introduced large models to adequately process the increased visual complexity. Models of varying scales (small, medium, and large) were uniformly applied to both CIFAR-10 and CIFAR-100 to ensure consistent evaluation conditions.
All models in this work maintain lightweight architectures, with parameters in the million-scale range. The terms “small”, “medium”, and “large” are used in a relative sense within this scale. The detailed model architectures are illustrated in
Figure 10. The hyperparameters of all models were tuned through iterative experiments, referencing prior work [
25], though further optimization remains possible.
3.2. Datasets
The four datasets used in this study are all classic image classification benchmarks. Since the PFS was derived from KAN training on MNIST and CIFAR-10, it is essential to evaluate the PFS performance not only on these source datasets, but also to extend validation to Fashion-MNIST and CIFAR-100 to assess its generalization capability.
Both MNIST and Fashion-MNIST datasets consist of 70,000 grayscale images each, with a standardized split of 60,000 training samples and 10,000 test samples [
31,
32]. All images share an identical resolution of 28 × 28 pixels and are categorized into 10 classes. While MNIST contains handwritten digits (0–9), Fashion-MNIST comprises apparel categories, resulting in greater visual complexity and marginally higher classification difficulty compared to MNIST.
The CIFAR-10 dataset consists of 60,000 color images, including 50,000 training images and 10,000 test images, divided into 10 categories [
33]. Each image is a 32 × 32 RGB representation, exhibiting greater channel depth, and more complex visual content than MNIST.
The CIFAR-100 dataset also contains 60,000 RGB images (50,000 for training and 10,000 for testing) at 32 × 32 resolution, organized hierarchically into 20 superclasses with 5 fine-grained subclasses each [
33]. For model consistency (maintaining a 10-class output), we selected the following 10 subclasses: dolphin, sunflower, bottle, orange, table, butterfly, lion, crab, snake, and bicycle, yielding a 6000-image subset (5000 for training and 1000 for test), denoted CIFAR-100_10. All experiments used default datasets splits.
3.3. Experimental Setup
All experiments were conducted on a single NVIDIA RTX 4090 GPU (24GB) and 24 CPU cores, using Python 3.9 with Pykan and PyTorch 2.2 as the primary frameworks. The computational environment utilized CUDA 11.8 for GPU acceleration, with auxiliary libraries including OpenCV 4.11, NumPy 1.24, Scikit-learn 1.11, Matplotlib 3.6, and Seaborn 0.13 for image processing, data preprocessing, and visualization. Each model was trained independently under identical computational conditions to ensure fairness.
We employed the AdamW optimizer, with hyperparameters initially set according to the default values suggested in [
25]. A grid search was then conducted on the validation set to fine-tune the initial learning rate. As shown in
Figure 11, five candidate values (1 × 10
−4, 5 × 10
−4, 1 × 10
−3, 5 × 10
−3 and 1 × 10
−2) were evaluated. The results demonstrate that the default setting of 1e-3 yields the best accuracy performance. In addition, the gamma value in an exponential learning rate scheduler was tested using the same grid search strategy over values of 0.75, 0.8, 0.85, and 0.9. As presented in
Figure 12, the setting of 0.8 effectively mitigates convergence oscillations, which is consistent with expectations.
In the KAN model, a spline order of three was adopted, which is commonly used and sufficient for fitting smooth curves. The grid size was set to five, as tests with values of five, seven, and nine on the validation set show no significant improvement in accuracy with larger grids, as shown in
Table 3, while the number of parameters increased substantially, given that the parameter count of the KAN model is linearly related to the grid size. Hence, a grid size of five was chosen to balance performance and model complexity.
To further reduce the complexity of the KAN model, pruning techniques were applied. The loss function combined cross-entropy with L1 regularization, as defined in Equation (5).
3.4. Results
The experiments evaluated the performance of the proposed method and baseline models on the MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100_10 datasets using multiple metrics: accuracy, precision, recall, and F1 score, as well as parameter count, training time, and evaluation time.
For the MNIST and Fashion-MNIST datasets, where small and medium models were employed, the results of the classification metrics are summarized in
Table 4. It can be observed that PFSKAN (Medium) achieved the highest performance across all four metrics on the MNIST dataset, reaching 99.12%. On the Fashion-MNIST dataset, PFSKAN (Medium) obtained a 90.10% accuracy, slightly lower than ConvMLP (Medium)’s 90.12%, but outperformed it in the other three metrics (all 90.10% vs. 90.09%). Notably, ConvMLP (Medium) required 2.58 times more parameters than PFSKAN (Medium).
To facilitate a more intuitive comparison of accuracy across models with similar parameter scales,
Figure 13 presents the results using a color-coding scheme where each method is represented by a specific color hue, with lighter shades indicating smaller model sizes. The results demonstrate that PFSKAN models (blue) consistently outperformed comparable PFSMLP (cyan), ConvMLP (yellow), and ConvKAN (magenta) models, with the only exception being PFSKAN (Medium) (dark blue), showing slightly lower performance than ConvMLP (Medium) (dark yellow) on Fashion-MNIST.
The parameter efficiency analysis, shown in
Figure 14, reveals that PFSKAN (Medium) not only used fewer parameters but also achieved higher accuracy than a one-layer KAN. The PFSKAN (Small) model required only one-third of the parameters of a one-layer KAN while maintaining comparable accuracy (within 0.1%). Although PFSMLP (Small) showed slightly lower accuracy than a one-layer MLP, it used less than one-sixth of the parameters. Furthermore, PFSKANs consistently demonstrated both smaller parameter sizes and higher accuracy compared to ConvMLPs and ConvKANs of equivalent scales, indicating that PFS can effectively extract features and significantly reduce input dimensionality.
Moreover, PFS appears to be better suited to KAN than to MLP. This may be attributed to two factors: (1) PFS features are selected by KAN and (2) KAN’s higher nonlinearity makes it better equipped to handle pixel-level features. Even though PFS showed slightly less compatibility with MLP, it still maintained high parameter efficiency. For instance, while ConvMLP (Medium) achieved 1.09% and 2.27% higher accuracy than PFSMLP (Medium) on MNIST and Fashion-MNIST, respectively, it required 4.81 times more parameters. Training and evaluation times for all models are shown in
Table 5, where PFSKANs consistently consume less time than ConvKANs and ConvMLPs of comparable model size.
Overall, PFS demonstrated significantly higher parameter efficiency than convolutional methods on grayscale images, benefiting from its superior dimensionality reduction capability. The consistent performance patterns between MNIST and Fashion-MNIST further validate the generalizability of PFS.
Our experimental evaluation of the CIFAR-10 and CIFAR-100_10 datasets incorporated large-scale models while maintaining lightweight architectures. As evidenced in
Table 6, the proposed PFSKAN (Large) achieved superior performance, with a 78.06% accuracy on CIFAR-10 and 58.92% on CIFAR-100_10, despite the inherent challenges of these more complex color image datasets. The results demonstrate a clear performance hierarchy across model scales. While PFSKANs consistently outperformed alternatives on CIFAR-10, as shown on the left of
Figure 15, only medium and large PFSKANs maintained this advantage on the more challenging CIFAR-100_10, as shown on the right of
Figure 15. This scaling behavior, coupled with the poor performance of the one-layer KAN, underscores the necessity of deeper and larger architectures for effective feature extraction in complex visual domains.
From a parameter efficiency perspective, shown in
Figure 16, PFSKAN (Small) achieved comparable accuracy to the one-layer KAN (within 0.8% difference) while using a mere 27.8% of the parameters, demonstrating remarkable architectural efficiency. Meanwhile, PFSKANs maintain higher accuracy with fewer parameters than ConvKANs and ConvMLPs of comparable model size. PFSMLPs show slightly lower accuracy than other models, remaining highly parameter-efficient. This demonstrates that, on moderately complex color image datasets, PFS is more effective than convolutional methods at reducing input dimensionality, and, again, shows better compatibility with KAN.
It is worth noting that the advantages of PFSKANs in terms of both accuracy and parameter efficiency become increasingly pronounced with the model scale, as PFSKAN (Large) shows substantially better parameter accuracy tradeoffs than the ConvMLP and ConvKAN counterparts. The framework’s computational efficiency is equally noteworthy, with PFSKAN (Large) requiring 37–49% shorter training times than equivalently sized convolutional baselines, as shown in
Table 7, while maintaining competitive evaluation speeds. This aligns with the need for larger and deeper models with higher parameter capacity to achieve improved accuracy on complex color images.
The overall performance degradation on the CIFAR-100_10 dataset compared to CIFAR-10 is mainly due to the fact that each class in CIFAR-100_10 contains only 1/10 the number of samples. Nonetheless, the relative advantages of PFS remain consistent across datasets, further supporting its generalizability.
These results collectively validate PFS’s superior dimensionality reduction capability and its particular synergy with KAN architectures, while demonstrating consistent generalization across datasets of varying complexity. The maintained performance advantage on CIFAR-100_10 further confirms the robustness of PFSKAN to data scarcity conditions.
For higher-resolution image datasets such as ImageNet, the key pixels with the highest contribution level selected by KAN are still primarily edge points. However, classification accuracy using only these points does not always surpass that of convolutional networks. When the contribution threshold is lowered, the second-level key pixels include some internal object pixels and a small number of external background pixels. The interpretability and feature extraction methods for these pixels remain under further investigation. Hence, the proposed PFSKAN method, which relies mainly on edge features, is only effective for low-resolution images. To achieve higher accuracy compared to more complex convolutional architectures, such as GoogleNet and ResNet, a hierarchical key pixel extraction strategy must be further refined.