4.1. Ablation Experiment
To assess the contribution of each component in MCFCTransformer-DD, a set of ablation experiments was conducted (see
Table 1). Starting from a plain Transformer baseline, we progressively integrated multiscale convolution (MCTransformer), frequency-domain enhancement (MCFTransformer), dual-branch self-attention (MCFCTransformer), and finally the dendritic discriminator (MCFCTransformer-DD).
The ablation experiments follow an incremental design. Compared with the Transformer baseline, adding multiscale convolutions (MCTransformer) raises accuracy from 84.00% to 87.40% (), showing that multiscale receptive fields provide the first substantial gain. Adding frequency-domain enhancement (MCFTransformer) further increases accuracy from 87.40% to 89.30% (), confirming that explicit modeling of periodic and harmonic components steadily improves discrimination. Introducing the dual-branch self-attention module (MCFCTransformer) lifts accuracy from 89.30% to 90.50% (), indicating that cross-channel information exchange is especially effective for fine-grained distinctions. Finally, adding the dendritic discriminator (MCFCTransformer-DD) raises accuracy from 90.50% to 94.50% (); in this step, the F1 score increases from 90.67% to 94.65%, and AUROC increases from 0.99% to 1.00. Overall, the metrics exhibit a monotonic upward trend as modules are stacked, and the dendritic discriminator provides the largest single performance gain.
These results show that MCFCTransformer-DD is not robust to arbitrary simplification: it relies on all components working together to reach peak performance. Across the progressively stacked configurations, the model consistently delivers strong results, validating the soundness of the overall architecture. Multiscale convolutions, frequency-domain enhancement, dual-branch self-attention, and the dendritic discriminator all make significant contributions to overall effectiveness; omitting any one of them leads to a measurable degradation.
To quantify the stability of the reported gains, we additionally perform repeated runs and summarize performance with 95% confidence intervals (CI), as reported in
Table 2. We also conduct a targeted removal study to isolate the effect of spectral modeling. This complementary evaluation validates that the improvement introduced by frequency-domain enhancement is consistent across repeated trials and is not driven by a single random split or initialization.
4.2. Comparative Experiments
To validate the effectiveness of MCFCTransformer-DD, we select a backpropagation neural network (BP), a genetic algorithm–optimized BP (GA-BP), CNN-LSTM, BiTCN-AT, and MCNN-Transformer as representative comparison models. BP, as a traditional feedforward baseline, is simple and easy to implement, but for high-dimensional time series and fine-grained differences, it is sensitive to initialization and prone to overfitting. GA-BP uses a genetic algorithm to globally optimize weights and thresholds, which can improve fitting and generalization to some extent, but its overall capability remains bounded by the expressive power of the base network. CNN-LSTM strikes a balance between convolutional feature extraction and sequence modeling; however, its gains tend to diminish under strong domain shift, and it requires high training stability. BiTCN-AT strengthens dependency modeling through channel attention and bidirectional temporal convolution and has been shown to be effective in bearing-related scenarios, but it is sensitive to hyperparameters and its transferability needs further validation. MCNN-Transformer is a recent deep time series paradigm that combines multi-channel convolutions with a Transformer; it is robust to noise and uncertainty, but cross-domain reuse depends heavily on tuning and calibration.
Model performance is evaluated using three established metrics, namely accuracy, F1 score, and the area under the ROC curve (AUROC). Accuracy quantifies overall predictive correctness; the F1 score measures performance under class imbalance and near-boundary cases; AUROC summarizes the model’s discriminative ability. ROC analysis visualizes the relationship between sensitivity and specificity across thresholds, and the confusion matrix details the class-wise distribution of predictions. Together, these metrics provide a comprehensive assessment of model performance.
Based on the results in
Table 3, the models show clear differences in classification performance. The traditional BP neural network performs relatively poorly (accuracy 85.40%, F1 85.50%, AUROC 0.91). GA-BP improves all three metrics (accuracy 87.30%, F1 87.42%, AUROC 0.92), indicating that optimizing initial weights and thresholds can enhance BP’s classification ability. CNN-LSTM further strengthens performance, achieving an accuracy of 90.10%, an F1 score of 90.19%, and an AUROC of 0.98, which highlights the benefit of combining convolutional feature extraction with sequence modeling. MCNN-Transformer and BiTCN-AT are slightly lower than CNN-LSTM in accuracy and F1, but both reach AUROC 0.98, showing strong overall discriminative capacity. In contrast, the proposed MCFCTransformer-DD attains the best results across all metrics (accuracy 94.50%, F1 94.65%, AUROC 1.00), demonstrating clear advantages in feature extraction and spatiotemporal dependency modeling and yielding more accurate and robust classification.
We further benchmark two widely used lightweight baselines for tabular classification. This comparison clarifies the performance margin between conventional low-capacity models and the proposed architecture under the same feature setting.
Table 4 reports mean results with 95% CI. While MLP and SVM achieve reasonable performance, they consistently underperform MCFCTransformer-DD across all metrics. This indicates that the proposed design yields stronger representation and decision capacity even when the input dimension is limited.
To verify the correspondence between predicted and true labels across the five classes, we analyze the confusion matrix, as shown in
Figure 4.
In the confusion matrix, the diagonal elements indicate correctly classified samples. The off-diagonal elements indicate misclassifications.
Figure 4 summarizes the confusion matrices of six models on the test set. The traditional BP and GA-BP models show strong diagonal responses mainly for classes 1, 2, 4, and 5. However, they almost fail to recognize class 3. Their errors are concentrated in column 5, forming a clear cluster of false positives. After introducing deep modeling with CNN-LSTM, MCNN-Transformer, and BiTCN-AT, the diagonals become thicker overall. Even so, class 3 remains the principal weak link. A typical error is that class 3 is predicted as class 5. In a few cases, it is misclassified as class 4. The proposed MCFCTransformer-DD attains high recognition across all five classes. Small confusions between adjacent classes still persist. Likely causes include high similarity between adjacent classes in time- and frequency-domain patterns, sparse boundary samples, and slight class imbalance. These factors can locally shift the decision boundary. Overall, the model still exhibits strong feature capture and stable classification performance.
In addition, to better present the model’s strongest overall classification performance and anti-fraud detection capability, we evaluate with ROC curves and the quantitative AUC metric, as shown in
Figure 5.
For BP and GA-BP, the ROC curves are highly jagged. The AUCs for class 3 and class 5 are 0.784 and 0.863 for BP, and 0.810 and 0.895 for GA-BP. These patterns indicate dispersed score distributions, less smooth decision boundaries, and rapid saturation at low false positive rates. CNN-LSTM, BiTCN-AT, and MCNN-Transformer show smoothly rising ROC curves in the low false positive region. Their AUCs for class 3 reach 0.957, 0.952, and 0.944, and for class 5 reach 0.972, 0.970, and 0.969. This shows that adding multiscale convolutions and attention significantly strengthens discrimination for easily confused classes.
MCFCTransformer-DD’s five one-vs-rest ROC curves almost coincide with the upper-left corner of the plot, with AUCs of 1.000, 1.000, 0.996, 0.989, and 0.998. These results indicate that the model maintains a high true positive rate across a wide range of thresholds. MCFCTransformer-DD therefore sustains the strongest discriminative power among all models. This advantage arises from progressive architectural optimization, as confirmed by the ablation experiments. On top of the Transformer backbone, multiscale convolution and frequency-domain enhancement capture cross-scale patterns in the workload sequences. The dual-branch self-attention module strengthens the coupling between temporal and spectral cues. The dendritic discriminator further refines the final decisions. The performance gains at each stage accumulate and yield high accuracy together with improved robustness and interpretability.
To complement the in-distribution evaluation, we further report performance under distribution shifts and perturbations. These tests provide evidence that the decision boundary remains stable under changes in arrival patterns, workload fluctuations, and noise characteristics. Under these more challenging conditions, AUROC decreases to around 0.95, which is consistent with increased task difficulty and indicates that the high in-distribution AUROC is not dependent on fragile separability.