An Interpretable CPU Scheduling Method Based on a Multiscale Frequency-Domain Convolutional Transformer and a Dendritic Network

Peng, Xiuwei; Wang, Honghua; Zhou, Guohui; Jiang, Jun; Fang, Hao; Wu, Zhengxing; Li, Xiaohui

doi:10.3390/electronics15030693

Open AccessArticle

An Interpretable CPU Scheduling Method Based on a Multiscale Frequency-Domain Convolutional Transformer and a Dendritic Network

by

Xiuwei Peng

¹,

Honghua Wang

¹

,

Guohui Zhou

¹

,

Jun Jiang

²,

Hao Fang

¹

,

Zhengxing Wu

³ and

Xiaohui Li

^1,2,*

¹

School of Computer Science and Information Engineering, Harbin Normal University, Harbin 150025, China

²

School of Education Science, Harbin Normal University, Harbin 150025, China

³

Zhongneng Ruifa (Beijing) Technology Development Company Limited, Beijing 102200, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(3), 693; https://doi.org/10.3390/electronics15030693

Submission received: 27 November 2025 / Revised: 27 January 2026 / Accepted: 28 January 2026 / Published: 5 February 2026

(This article belongs to the Special Issue Celebrating the 70th Anniversary of Beijing University of Posts and Telecommunications—Computer Science and Engineering)

Download

Browse Figures

Versions Notes

Abstract

In modern operating systems, CPU scheduling policy selection and evaluation still rely mainly on heuristic methods, especially at the single-processor level or the abstract ready-queue level, and there is still a lack of systematic modeling and interpretable analysis for complex workload patterns. Traditional approaches are easy to implement and respond quickly in specific scenarios, but they often fail to remain stable under dynamic workloads and high-dimensional features, which can harm generalization. In this work, we build a simulation dataset that covers five typical scheduling policies, redesign a deep learning framework for scheduling policy identification, and propose the MCFCTransformer-DD model. The model extends the standard Transformer with multiscale convolution, frequency-domain augmentation, and cross-attention to capture both low-frequency and high-frequency signals, learn local and global patterns, and model multivariate dependencies. We also introduce a Dendrite Network, or DD, into scheduling policy identification and decision support for the first time, and its gated dendritic structure provides a more transparent nonlinear decision boundary that reduces the black-box nature of deep models and helps mitigate overfitting. Experiments show that MCFCTransformer-DD achieves 94.50% accuracy, a 94.65% F1 score, and an AUROC of 1.00, which indicates strong policy identification performance and strong potential for decision support.

Keywords:

transformer encoder; dendritic discriminator; CPU scheduling optimization; multimodal feature fusion

1. Introduction

In modern computer systems, performance optimization and workload scheduling are essential for efficient system operation [1,2]. As a key component of operating system resource management, CPU scheduling policies determine the ordering and service discipline of tasks in the execution queue. Their objective is to balance system throughput, responsiveness, and Quality of Service (QoS) requirements under constrained resources [3,4,5]. Common CPU schedulers include First-Come, First-Served (FCFS), Shortest Job First (SJF), Round Robin (RR), Priority Scheduling (PS), and Multilevel Feedback Queue (MLFQ). Each policy offers distinct strengths and is suitable for specific scenarios in general-purpose and real-time operating systems [6,7,8,9,10,11]. Moreover, general-purpose scheduling policies are typically designed for best-effort responsiveness and throughput, and therefore often cannot provide strict hard real-time guarantees without additional real-time mechanisms or dedicated kernel-level support [12,13]. However, as hardware architectures and application workloads become increasingly complex, traditional schedulers show limited adaptability and generalization when facing dynamic, diverse, and high-dimensional workload characteristics [14].

To address these issues, researchers have begun to introduce intelligent optimization and deep learning into CPU scheduling. Nevertheless, three challenges remain. First, single-scale modeling is inherently limited. It struggles to capture local and global patterns simultaneously. It also lacks explicit frequency-domain modeling, which degrades performance on workloads with strong periodicity or spectral signatures. Second, feature interaction is often insufficient. Existing mechanisms for multi-channel or multimodal fusion remain incomplete, which constrains the representation of complex system states. Third, most current approaches lack biologically inspired nonlinear structures, such as dendritic branch-based decision mechanisms. Such mechanisms are important for handling continuously evolving workloads and resource dynamics.

To overcome these limitations, we propose MCFCTransformer-DD, an adaptive CPU scheduling optimization framework based on multiscale frequency-domain cross-attention and a dendritic decision mechanism. Built upon a standard Transformer backbone, the framework integrates several complementary components. A multiscale module [15] is used to capture both local and global patterns. A frequency enhancement module [16] is introduced to extract periodic and spectral characteristics. A cross-attention mechanism [17] enables efficient interactions among multi-channel features. In addition, a dendritic network (Dendrite Net, DD) [18] provides interpretable nonlinear decision boundaries. These designs address the limitations of conventional machine learning methods, such as support vector machines (SVMs), in expressing complex nonlinear relationships and adapting to dynamic environments.

This multi-module integrated design preserves the Transformer’s capacity to model long-range dependencies. It also leverages the logical interpretability of the DD network and a unified representation of multiscale and multimodal features. As a result, it forms an efficient framework for intelligent scheduling optimization.

The main innovations and contributions of this work are as follows:

Model architecture. We augment a Transformer backbone with multiscale convolution, frequency-domain enhancement, and a dual-branch self-attention module, substantially strengthening its ability to model workload sequences. The multiscale convolution block captures short-term fluctuations, local details, and long-term trends; the frequency-domain module explicitly extracts low- and high-frequency components and periodic patterns, compensating for the limited ability of standard self-attention to capture periodic regularities; and the dual-branch self-attention mechanism models dependencies between task sequences and system variables, enabling effective multivariate information exchange.
Decision mechanism. We introduce a dendritic discriminator (DD) into CPU scheduling for the first time. By mimicking dendritic computation in biological neurons, the DD acts as a white-box decision module with explicit logical pathways, markedly improving interpretability and decision traceability.
Performance and system optimization. By combining attention-based enhancements with an interpretable dendritic decision structure, MCFCTransformer-DD delivers significant gains over traditional methods in prediction accuracy, scheduling efficiency, and response latency, offering a practical path toward intelligent and highly reliable resource-scheduling systems.

2. Literature Review

Recent studies in CPU scheduling optimization have shown that traditional scheduling algorithms can behave very differently under different workloads and system states, which makes it difficult to achieve a universally optimal policy. To address this limitation, some studies have introduced machine learning methods to enable predictive selection of scheduling policies. Biswas et al. proposed a machine learning-based decision method for single-core CPU scheduling. The method learns static process features, such as arrival time, burst time, priority, and time quantum, to predict the most suitable scheduling algorithm. This design aims to improve turnaround time and waiting time. The study compared several machine learning models, including SVM, LR, and SGD. The results showed that support vector machines achieved the best classification performance. The reported test accuracy was about 94.56%, which was higher than that of the other methods [19].

Beyond conventional machine learning, artificial neural networks have also been used for modeling and predicting single-core CPU scheduling performance. Prior work indicates that neural networks can predict waiting time with high accuracy and can provide a feasible data-driven solution for single-core scheduling [20]. Khan et al. proposed an intelligent runtime algorithm selection method based on pattern recognition. They formulated scheduler selection as a classification problem. They trained machine learning classifiers to predict the optimal policy from system states. They validated the approach using multiple performance metrics, including waiting time, turnaround time, and CPU utilization [21]. In addition, decision tree methods have been applied to single-core CPU scheduling classification. These methods build a tree structure that maps process features to the optimal scheduling class. This provides fast and interpretable decision making [22]. Experiments showed that decision trees can maintain high predictive accuracy while offering clear decision paths. This improves transparency of the selection logic and helps analyze how different features influence the scheduling policy.

Although these methods have achieved promising results, they still have limitations in capturing dynamic workloads, modeling local nonlinear dependencies, and representing sequential patterns. In contrast, Transformer models can capture short-term variations and long-term trends through self-attention. They enable global feature modeling for single-core scheduling sequences. This often yields higher accuracy and better adaptability than traditional machine learning models in sequence prediction and scheduling decision tasks [23].

Building on this, dendritic networks (DD) introduce branched dendritic structures inspired by biological neurons. They perform conditional gating and nonlinear discrimination over multivariate inputs. They can decompose complex nonlinear relationships into traceable decision paths. As a result, they exhibit white-box properties in classification and regression. They can also analyze inter-task dependencies and changes in system states. In single-core scheduling, DD networks can effectively capture local nonlinear dependencies and provide interpretable support for complex scheduling tasks. When combined with the global sequence modeling capability of Transformers, DD networks can improve predictive accuracy and strengthen interpretability. This provides reliable evidence for single-core CPU scheduling decisions [18,24].

3. Materials and Methods

3.1. Data Processing

The dataset used in this study was generated by a self-built scheduling simulation platform. The platform runs five algorithms in sequence on a single Lenovo GeekPro host: FCFS, SJF, SRTF, PSA, and RR. For the same task set containing five processes, each algorithm computes the average turnaround time. The algorithm with the best performance is then used as the sample label (1 = FCFS, 2 = SJF, 3 = SRTF, 4 = PSA, 5 = RR). Each process record included four attributes (arrival time, run time, priority, and time quantum), so each sample contained 20 numeric features.

During data collection, the system monitored the size of each class in real time to keep the dataset balanced. When any class reached approximately 1000 samples, recording for that class was stopped, and samples from the other classes were added so that the overall distribution across the five classes remained roughly balanced. To ensure label consistency, repeated simulations were run under the same load configurations, only records with consistent outcomes were retained, and anomalous and redundant samples were removed. The resulting dataset contained no missing values. It was structurally complete. It also required no additional time synchronization or alignment.

The dataset was split into a training set comprising 80% of the samples and a test set comprising the remaining 20%, and the indices were locked after the split to ensure no overlap and to prevent information leakage [25]. All input features were normalized to the [0, 1] range. Normalization parameters were estimated on the training set only and then applied to the test set [26], ensuring consistent normalization across sets.

3.2. MCFCTransformer-DD

Drawing on the design principles of the Transformer [23], this study uses multi-head self-attention as the core component. A standard Transformer consists of an encoder and a decoder. The encoder extracts sequence representations, whereas the decoder focuses on generation tasks and is not directly related to our classification goal. Therefore, only the encoder’s core component, multi-head self-attention, is used for modeling. Because the original Transformer is primarily designed for natural language scenarios [27,28] and responds weakly to multiscale, frequency-domain, and cross-variable information in scheduling sequences, a multiscale convolution block, a frequency-domain enhancement module, and a dual-branch self-attention module are inserted after the input layer. In addition, the conventional fully connected classifier is replaced with a dendritic discriminator head to produce interpretable nonlinear decisions. Accordingly, for the high-dimensional and complex features of process scheduling, a novel MCFCTransformer-DD architecture is proposed, as shown in Figure 1.

We propose MCFCTransformer-DD, a multiscale, frequency-enhanced dual-branch self-attention framework for process scheduling. The inputs (arrival time, run time, priority, and time quantum) are encoded as a short sequence and processed by three parallel 1D convolutions with kernel sizes 3, 5, and 7 to capture both short- and long-range temporal patterns. A frequency branch applies an FFT-based transform, amplifies selected bands, and returns to the time domain via IFFT to form a frequency-enhanced path that runs in parallel with the original multiscale path. On top of these, two dual-branch self-attention paths are built in opposite directions and their outputs are fused into a joint representation, which is passed to a two-layer dendritic discriminator and a final softmax layer to predict one of five scheduling strategies (FCFS, SJF, SRTF, PSA, RR). Figure 1 illustrates the overall architecture.

3.2.1. Multiscale Convolution

In the proposed multiscale convolution architecture, each input sample is treated along the 1D time axis as a numeric sequence of length 20. To capture both short- and longer-range local dependencies, three parallel 1D convolution branches with kernel sizes 3, 5, and 7 are built; each branch outputs 16 channels and uses ReLU activation. The three feature streams are concatenated along the channel dimension to form local representations at multiple granularities, serving as the shared input to subsequent frequency-domain enhancement and attention modeling [29]. Each process load trajectory is modeled as an input vector:

x = [x_{1}, x_{2}, \dots, x_{L}], L = 20 .

(1)

where

L = 20

denotes the sampling length on the time axis. To capture local patterns at different scales, three parallel 1D convolution branches with kernel widths

k \in {3, 5, 7}

are constructed, and each branch outputs 16 channels. After ReLU activation, the feature representation at scale k is obtained as

H^{(k)} = σ ({Conv}_{k} (x)), k \in {3, 5, 7},

(2)

where

{Conv}_{k} (\cdot)

denotes a 1D convolution operator with kernel width k,

σ (\cdot)

is an elementwise activation function, and the output is a feature tensor of length L with 16 channels. Subsequently, the three feature maps are concatenated along the channel dimension to form a unified multiscale representation:

H = Concat (H^{(3)}, H^{(5)}, H^{(7)}),

(3)

where

H

fuses the three receptive field responses at each time step, balances local high-frequency changes and long-range dependencies, and provides multiscale input for subsequent frequency-domain enhancement and attention modeling.

3.2.2. Frequency-Domain Enhancement

Although the time domain features from the multiscale convolution module can capture pattern changes associated with local spectral variation, the attention mechanism is still not sufficiently sensitive to global spectral responses [30]. To address this limitation, the frequency domain enhancement module operates as follows, preparing for the formulas that follow.

Input: the multiscale tensor output from Section 3.2.1, denoted by

H

.

Discrete Fourier transform along the sequence dimension for each channel [31], yielding the frequency-domain representation:

\tilde{H} = FFT (H),

(4)

where

FFT (\cdot)

denotes the frequency domain transform of the sequence.

To enhance local high-frequency responses, we scale the spectral representation by a gain factor

α_{freq}

:

{\tilde{H}}^{'} = (1 + α_{freq}) \cdot \tilde{H} .

(5)

Inverse Fourier transform is then applied to return to the time domain and obtain the frequency-enhanced representation:

H_{freq} = ℜ \{IFFT ({\tilde{H}}^{'})\} .

(6)

IFFT (\cdot)

denotes the inverse Fourier transform;

ℜ {\cdot}

denotes the real-part operator. The resulting feature matrix preserves the original features’ time domain details, enhances local high frequency responses, and keeps the phase unchanged, providing a coordinated frequency and time representation for the subsequent dual-branch self-attention.

3.2.3. Dual-Branch Self-Attention Fusion

After frequency-domain enhancement, the resulting feature tensor combines multiscale receptive fields with rhythmic information and provides rich temporal structure [32]. To further capture cross-channel static associations and within-sequence dynamic dependencies, a lightweight dual-branch self-attention module is designed. One branch consumes the output of the frequency-domain–enhanced path. The other branch consumes the original multiscale path.

The two branches model global context in parallel. Its two input paths are the frequency-domain–enhanced features and the original multiscale features, denoted as

H_{freq}

and

H

, respectively. Each path then passes through a multi-head attention operator to model channel and temporal context. Formally, the two branches are defined as

\begin{matrix} Z_{1} & = MHA (Q_{1}, K_{1}, V_{1}) = MHA (H_{freq}, H_{freq}, H_{freq}), \\ Z_{2} & = MHA (Q_{2}, K_{2}, V_{2}) = MHA (H, H, H), \end{matrix}

(7)

where

MHA (\cdot)

denotes the multi-head attention operator with its three arguments corresponding to the query, key, and value, respectively;

H_{freq}

is the frequency-domain–enhanced feature tensor and

H

is the original multiscale feature tensor. The fusion unit performs elementwise addition to obtain

F = Z_{1} + Z_{2},

(8)

where

F

is the fused dual-branch self-attention representation that integrates information from both the frequency-domain–enhanced and original multiscale paths.

3.2.4. Dendritic Discriminator Head

This study introduces a DD at the model’s output stage; the DD network performs feature construction over the extracted representations to produce the final scheduling policy classification. DD is a novel, lightweight machine learning algorithm that combines the decision-boundary strength of SVMs with the nonlinear modeling capacity of multilayer perceptrons (MLPs), and it uses the Hadamard product and Taylor expansion to emulate higher-order feature interactions. Compared with conventional MLPs, DD converges faster, reduces overfitting, and generalizes better, and it has shown strong performance across multiple decision tasks [18,33].

The DD network can be regarded as a logic extractor [34], obtaining accurate outputs by extracting and analyzing preexisting logical relations within the input data [34]. Studies show that increasing the number of DD modules improves the precision of the extracted information, and the module supports tasks such as classification and regression. The core algorithm of the DD network is given in Equation (9):

Y = (W X) ⊙ X,

(9)

where

Y

is the output vector,

W

is the weight matrix, and

X

is the input vector. (obtained from the fused feature vector z in the previous equation). The symbol “·” denotes matrix–vector multiplication, and “⊙” denotes the Hadamard (elementwise) product.

The DDNet architecture is shown in Figure 2:

x_{1}, \dots, x_{n}

are fed into DDNet,

W_{1}, \dots, W_{n}

denote the weight matrices from layer 1 to layer n, and Y denotes the output.

Similar to a backpropagation neural network, DDNet differs in that, after the fully connected mapping, the output of each layer is multiplied elementwise by the original input via a Hadamard product [24], which explicitly introduces polynomial interactions among features. Consequently, DDNet applies to both regression and classification. In this study’s CPU scheduling scenario, the input feature vector X is constructed from a bias channel, normalized scheduling features, and a semantic representation extracted by a Transformer:

X = [\begin{matrix} X_{0} \\ α X_{sched} \\ β X_{trans} \end{matrix}], α = 0.6, β = 0.4,

(10)

where

X_{0}

denotes the bias channel,

X_{sched}

denotes the normalized scheduling features (arrival time, run time, priority, and time quantum), and

X_{trans}

denotes the semantic representation extracted by the Transformer. The fusion weights

α

and

β

are introduced to balance the contribution of the raw scheduling-feature branch (

X_{sched}

) and the Transformer semantic branch (

X_{trans}

) in the DD input. We selected

α = 0.6

and

β = 0.4

based on a dedicated sensitivity analysis over

(α, β)

under the constraint

α + β = 1

, where we evaluated multiple candidate settings and reported mean performance with 95% confidence intervals. The results show that performance peaks around

α = 0.6

and

β = 0.4

, while overly imbalanced weights lead to clear degradation. This validation and the corresponding experimental evidence are provided in the Section 4.5.

As shown in Figure 3, the model takes X constructed according to Equation (10) as input. The first-layer linear gating unit produces

h^{(1)}

. The second layer maps

h^{(1)}

to

h^{(2)}

. The third layer applies a linear transformation to obtain the logits Y, and a softmax yields the probability distribution s. The three weight matrices are

W^{(1)}

,

W^{(2)}

, and

W^{(3)}

. The output vector has dimension five, corresponding to the scheduling policies FCFS, SJF, SRTF, PSA, and RR.

To ensure stable convergence on scheduling data and achieve optimal generalization, this study adopts a two-stage end-to-end training strategy, summarized in Algorithm 1.

Algorithm 1 Two-Stage Training Strategy for MCFCTransformer-DD

1:: Input: Dataset $D = {(X_{i}, y_{i})}$ with $X_{i} \in R^{f}$ , $y_{i} \in {1, \dots, 5}$ ; batch size m; backbone epochs $c_{T}$ ; DD epochs $c_{D D}$ ; base learning rate $a_{0}$ ; fusion weights $α = 0.6$ , $β = 0.4$ ; DD layers $n_{l} = 3$ ; frequency gain $α_{freq}$ .
2:: Output: Trained parameters $(θ_{T}, W^{(1)}, W^{(2)}, W^{(3)})$ .
3:: Stage 1: Train backbone
4:: Split D into Train/Test; fit scaler on Train; normalize Train/Test.
5:: Build backbone: MC (k $\in {3, 5, 7}$ , 16 channels) → FreqEnh (FFT → amp $\cdot (1 + α_{freq})$ → IFFT, real) → dual self-attn (freq branch $Q = K = V = H_{freq}$ , time branch $Q = K = V = H$ ) → sum → Last-Token → linear softmax (5-way).
6:: Train the backbone on Train with Adam (learning rate $a_{0}$ , epochs $c_{T}$ , batch size m); obtain $θ_{T}$ .
7:: Stage 2: Train DD
8:: Freeze $θ_{T}$ ; extract features $f_{i} = Backbone (X_{i}; θ_{T})$ for all samples.
9:: Form DD inputs $X_{i}^{D D} = [1; α \cdot normalize (X_{i}); β \cdot f_{i}]$ and one-hot labels $Y_{oh} = OneHot (y, 5)$ .
10:: Initialize $W^{(1)}, W^{(2)}$ (diagonal) and $W^{(3)}$ (random).
11:: for $iter = 1$ to $c_{D D}$ do
12:: $a \leftarrow (1 - iter / c_{D D}) \cdot a_{0}$ ; sample minibatch $(X_{b}^{D D}, Y_{oh, b})$ .
13:: $h^{(1)} \leftarrow (W^{(1)} X_{b}^{D D}) ⊙ X_{b}^{D D}$ ; $h^{(2)} \leftarrow (W^{(2)} h^{(1)}) ⊙ X_{b}^{D D}$ ;
14:: $s \leftarrow softmax (W^{(3)} h^{(2)})$ ; $L \leftarrow CrossEntropy (s, Y_{oh, b})$ .
15:: Backpropagate and update $W^{(1)}, W^{(2)}, W^{(3)}$ with learning rate a (backbone fixed).
16:: end for

3.2.5. Evaluation Metrics

To comprehensively verify the effectiveness and robustness of the proposed model, this study adopts Accuracy, F1 score, and AUROC as the main performance metrics. Accuracy measures the overall correctness of classification. The F1 score provides a more objective assessment of classification performance under class imbalance. AUROC reflects the model’s overall ability to discriminate between positive and negative samples. The corresponding equations are given in Equations (11)–(13).

Accuracy = \frac{TP + TN}{TP + TN + FP + FN} .

(11)

F 1 = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall} .

(12)

AUROC = \int_{0}^{1} TPR (t) d (FPR (t)) .

(13)

where

TP

(true positives) and

TN

(true negatives) denote the numbers of correctly predicted positive and negative samples, and

FP

(false positives) and

FN

(false negatives) denote the numbers of incorrectly predicted positive and negative samples, respectively.

Precision = \frac{TP}{TP + FP}

measures the proportion of predicted positives that are correct, and

Recall = \frac{TP}{TP + FN}

measures the proportion of true positives that are correctly identified. In Equation (13),

TPR (t)

and

FPR (t)

denote the true positive rate and false positive rate at threshold t, where

TPR (t) = \frac{TP (t)}{TP (t) + FN (t)}

and

FPR (t) = \frac{FP (t)}{FP (t) + TN (t)}

. The AUROC is computed as the area under the ROC curve obtained by varying the decision threshold t from 0 to 1.

For the multi-class setting, we compute the AUROC in a one-vs-rest manner: for each class, the ROC curve is obtained from the predicted probability scores for that class and the corresponding binary ground-truth indicator, and the final macro-AUROC is reported by averaging the class-wise AUROC values across all classes.

All three metrics take values in the range

[0, 1]

. Values closer to 1 indicate better model performance. They more accurately reveal feature patterns and decision boundaries in process scheduling tasks. This, in turn, validates the proposed model’s effectiveness and stability in complex scenarios [35,36].

4. Results

4.1. Ablation Experiment

To assess the contribution of each component in MCFCTransformer-DD, a set of ablation experiments was conducted (see Table 1). Starting from a plain Transformer baseline, we progressively integrated multiscale convolution (MCTransformer), frequency-domain enhancement (MCFTransformer), dual-branch self-attention (MCFCTransformer), and finally the dendritic discriminator (MCFCTransformer-DD).

The ablation experiments follow an incremental design. Compared with the Transformer baseline, adding multiscale convolutions (MCTransformer) raises accuracy from 84.00% to 87.40% (

Δ = 3.40 %

), showing that multiscale receptive fields provide the first substantial gain. Adding frequency-domain enhancement (MCFTransformer) further increases accuracy from 87.40% to 89.30% (

Δ = 1.90 %

), confirming that explicit modeling of periodic and harmonic components steadily improves discrimination. Introducing the dual-branch self-attention module (MCFCTransformer) lifts accuracy from 89.30% to 90.50% (

Δ = 1.20 %

), indicating that cross-channel information exchange is especially effective for fine-grained distinctions. Finally, adding the dendritic discriminator (MCFCTransformer-DD) raises accuracy from 90.50% to 94.50% (

Δ = 4.00 %

); in this step, the F1 score increases from 90.67% to 94.65%, and AUROC increases from 0.99% to 1.00. Overall, the metrics exhibit a monotonic upward trend as modules are stacked, and the dendritic discriminator provides the largest single performance gain.

These results show that MCFCTransformer-DD is not robust to arbitrary simplification: it relies on all components working together to reach peak performance. Across the progressively stacked configurations, the model consistently delivers strong results, validating the soundness of the overall architecture. Multiscale convolutions, frequency-domain enhancement, dual-branch self-attention, and the dendritic discriminator all make significant contributions to overall effectiveness; omitting any one of them leads to a measurable degradation.

To quantify the stability of the reported gains, we additionally perform repeated runs and summarize performance with 95% confidence intervals (CI), as reported in Table 2. We also conduct a targeted removal study to isolate the effect of spectral modeling. This complementary evaluation validates that the improvement introduced by frequency-domain enhancement is consistent across repeated trials and is not driven by a single random split or initialization.

4.2. Comparative Experiments

To validate the effectiveness of MCFCTransformer-DD, we select a backpropagation neural network (BP), a genetic algorithm–optimized BP (GA-BP), CNN-LSTM, BiTCN-AT, and MCNN-Transformer as representative comparison models. BP, as a traditional feedforward baseline, is simple and easy to implement, but for high-dimensional time series and fine-grained differences, it is sensitive to initialization and prone to overfitting. GA-BP uses a genetic algorithm to globally optimize weights and thresholds, which can improve fitting and generalization to some extent, but its overall capability remains bounded by the expressive power of the base network. CNN-LSTM strikes a balance between convolutional feature extraction and sequence modeling; however, its gains tend to diminish under strong domain shift, and it requires high training stability. BiTCN-AT strengthens dependency modeling through channel attention and bidirectional temporal convolution and has been shown to be effective in bearing-related scenarios, but it is sensitive to hyperparameters and its transferability needs further validation. MCNN-Transformer is a recent deep time series paradigm that combines multi-channel convolutions with a Transformer; it is robust to noise and uncertainty, but cross-domain reuse depends heavily on tuning and calibration.

Model performance is evaluated using three established metrics, namely accuracy, F1 score, and the area under the ROC curve (AUROC). Accuracy quantifies overall predictive correctness; the F1 score measures performance under class imbalance and near-boundary cases; AUROC summarizes the model’s discriminative ability. ROC analysis visualizes the relationship between sensitivity and specificity across thresholds, and the confusion matrix details the class-wise distribution of predictions. Together, these metrics provide a comprehensive assessment of model performance.

Based on the results in Table 3, the models show clear differences in classification performance. The traditional BP neural network performs relatively poorly (accuracy 85.40%, F1 85.50%, AUROC 0.91). GA-BP improves all three metrics (accuracy 87.30%, F1 87.42%, AUROC 0.92), indicating that optimizing initial weights and thresholds can enhance BP’s classification ability. CNN-LSTM further strengthens performance, achieving an accuracy of 90.10%, an F1 score of 90.19%, and an AUROC of 0.98, which highlights the benefit of combining convolutional feature extraction with sequence modeling. MCNN-Transformer and BiTCN-AT are slightly lower than CNN-LSTM in accuracy and F1, but both reach AUROC 0.98, showing strong overall discriminative capacity. In contrast, the proposed MCFCTransformer-DD attains the best results across all metrics (accuracy 94.50%, F1 94.65%, AUROC 1.00), demonstrating clear advantages in feature extraction and spatiotemporal dependency modeling and yielding more accurate and robust classification.

We further benchmark two widely used lightweight baselines for tabular classification. This comparison clarifies the performance margin between conventional low-capacity models and the proposed architecture under the same feature setting. Table 4 reports mean results with 95% CI. While MLP and SVM achieve reasonable performance, they consistently underperform MCFCTransformer-DD across all metrics. This indicates that the proposed design yields stronger representation and decision capacity even when the input dimension is limited.

To verify the correspondence between predicted and true labels across the five classes, we analyze the confusion matrix, as shown in Figure 4.

In the confusion matrix, the diagonal elements indicate correctly classified samples. The off-diagonal elements indicate misclassifications. Figure 4 summarizes the confusion matrices of six models on the test set. The traditional BP and GA-BP models show strong diagonal responses mainly for classes 1, 2, 4, and 5. However, they almost fail to recognize class 3. Their errors are concentrated in column 5, forming a clear cluster of false positives. After introducing deep modeling with CNN-LSTM, MCNN-Transformer, and BiTCN-AT, the diagonals become thicker overall. Even so, class 3 remains the principal weak link. A typical error is that class 3 is predicted as class 5. In a few cases, it is misclassified as class 4. The proposed MCFCTransformer-DD attains high recognition across all five classes. Small confusions between adjacent classes still persist. Likely causes include high similarity between adjacent classes in time- and frequency-domain patterns, sparse boundary samples, and slight class imbalance. These factors can locally shift the decision boundary. Overall, the model still exhibits strong feature capture and stable classification performance.

In addition, to better present the model’s strongest overall classification performance and anti-fraud detection capability, we evaluate with ROC curves and the quantitative AUC metric, as shown in Figure 5.

For BP and GA-BP, the ROC curves are highly jagged. The AUCs for class 3 and class 5 are 0.784 and 0.863 for BP, and 0.810 and 0.895 for GA-BP. These patterns indicate dispersed score distributions, less smooth decision boundaries, and rapid saturation at low false positive rates. CNN-LSTM, BiTCN-AT, and MCNN-Transformer show smoothly rising ROC curves in the low false positive region. Their AUCs for class 3 reach 0.957, 0.952, and 0.944, and for class 5 reach 0.972, 0.970, and 0.969. This shows that adding multiscale convolutions and attention significantly strengthens discrimination for easily confused classes.

MCFCTransformer-DD’s five one-vs-rest ROC curves almost coincide with the upper-left corner of the plot, with AUCs of 1.000, 1.000, 0.996, 0.989, and 0.998. These results indicate that the model maintains a high true positive rate across a wide range of thresholds. MCFCTransformer-DD therefore sustains the strongest discriminative power among all models. This advantage arises from progressive architectural optimization, as confirmed by the ablation experiments. On top of the Transformer backbone, multiscale convolution and frequency-domain enhancement capture cross-scale patterns in the workload sequences. The dual-branch self-attention module strengthens the coupling between temporal and spectral cues. The dendritic discriminator further refines the final decisions. The performance gains at each stage accumulate and yield high accuracy together with improved robustness and interpretability.

To complement the in-distribution evaluation, we further report performance under distribution shifts and perturbations. These tests provide evidence that the decision boundary remains stable under changes in arrival patterns, workload fluctuations, and noise characteristics. Under these more challenging conditions, AUROC decreases to around 0.95, which is consistent with increased task difficulty and indicates that the high in-distribution AUROC is not dependent on fragile separability.

4.3. Cross-Distribution Generalization Test

We evaluate cross-distribution generalization by training on the original data-generation rule and testing on a shifted distribution with unseen arrival patterns, workload fluctuations, and noise characteristics. Table 5 reports mean performance with 95% CI. The results show a moderate performance decrease, while the model remains strong. This indicates that the learned representation transfers under distribution shift and supports stable scheduling policy identification beyond the original simulation setting.

4.4. Parameter Sensitivity and Noise Robustness

We further evaluate robustness by varying workload types and noise levels. The test configuration includes bursty and stable workloads under multiple perturbation intensities. Table 6 reports mean performance with 95% CI. The model preserves strong accuracy and F1 under these perturbations. The AUROC decrease is consistent with increased uncertainty and indicates that the model does not rely on a brittle boundary that collapses under mild parameter changes.

4.5. Sensitivity of Fusion Weights $α$ and $β$

We validate the fusion weights in Equation (10) using a sensitivity study over

(α, β)

with

α + β = 1

. Table 7 reports mean performance with 95% CI. Accuracy and AUROC peak around

α = 0.6

and

β = 0.4

. Highly imbalanced weights lead to substantial degradation. This indicates that both branches contribute meaningfully to the final prediction and supports the selected weighting scheme.

4.6. Interpretability Analysis and Runtime Efficiency

We present four visual analyses to support interpretability and decision traceability. Figure 6 provides a global SHAP summary for feature attribution. Figure 7 visualizes attention allocation to highlight influential positions and channels. Figure 8 reports an interpretable rule extraction view from the dendritic discriminator. Figure 9 shows a representative dendritic decision path for a sample with competing strategies SJF and RR. These visual results align model behavior with measurable evidence and enable traceable decision analysis.

We also report the average inference latency of MCFCTransformer-DD to quantify runtime efficiency. Under the experimental setup, the average single-sample inference time is approximately 131 ms. This result indicates that the proposed method can support efficient inference for scheduling policy identification at simulation scale. System-level profiling on deployed multi-core platforms, including energy consumption and real-time constraints, remains an important direction for future validation.

5. Conclusions

This study builds MCFCTransformer-DD for CPU scheduling policy prediction. Based on a Transformer encoder backbone, it successively introduces multiscale convolutions, frequency-domain enhancement, and the dual-branch self-attention module to deepen the modeling of scheduling sequences. Multiscale convolutions capture patterns across time spans, balancing short-term fluctuations, local details, and long-term trends; frequency-domain enhancement then mines low and high frequency components and periodic regularities, addressing the limited sensitivity of standard self-attention to rhythmic information; the dual-branch self-attention module relates multiple sequences and variables, strengthening the fusion of multimodal information. In addition, this study incorporates a dendritic discriminator head into the scheduling decision flow for the first time; its gated dendritic structure provides traceable nonlinear decision paths, mitigating black-box behavior and decision opacity in the deep discrimination stage. Multiple comparative experiments show that the model achieves the best results on all metrics, confirming its advantages in diagnosis and decision support.

Although the proposed framework achieves strong results on the constructed dataset, the current evaluation is performed in a controlled simulation setting and focuses on single-processor policy identification. This setting does not fully capture system-level effects in modern deployments, such as core affinity, inter-core migration overhead, cache and NUMA locality, and multi-tenant contention. In addition, the current experiments do not include kernel-level runtime integration, where scheduling overhead, timer granularity, and preemption behavior may influence end-to-end performance. These aspects define the present scope of the study and motivate further system-oriented validation.

Future work will explore hierarchical transfer and online incremental learning in settings with heterogeneous cores and cloud and edge collaboration, and introduce adaptive complexity and energy awareness to support engineering deployment in operating system kernels or database engines, further consolidating the theoretical and practical foundations of intelligent scheduling.

Future studies will therefore expand evaluation along three directions. First, we will validate the method on more diverse workload sources, including public traces and mixed interactive and batch workloads, and assess robustness under concept drift and distribution shifts. Second, we will extend the framework from single-core policy identification to multi-core and multi-tenant scheduling scenarios. This includes modeling affinity constraints, migration costs, and service-level objectives. We will also consider widely used production schedulers, such as Linux CFS, as reference baselines. Third, we will perform system-level profiling for deployment. This will include latency overhead, throughput impact, and energy measurements under realistic hardware settings. These extensions will strengthen both the practical applicability and the reproducibility of the proposed approach.

Author Contributions

Conceptualization, X.L. and H.W.; methodology, H.W.; software, X.P. and H.F.; validation, G.Z., X.P. and X.L.; formal analysis, X.P. and Z.W.; investigation, J.J.; resources, X.L. and H.W.; data curation, H.W.; writing—original draft preparation, X.P. and X.L.; writing—review and editing, X.P., H.F. and X.L.; visualization, X.P.; supervision, J.J.; project administration, G.Z.; funding acquisition, G.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This project was supported by 2025 Higher Education Teaching Reform Research Project of Harbin Normal University (Grant No. XJGYJSZ202505) and 2025 Cultural and Tourism Research Project of Heilongjiang Province (Grant No. 2025WL070).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors would like to thank the fund of 2025 Higher Education Teaching Reform Research Project of Harbin Normal University (Grant No. XJGYJSZ202505) for their support.

Conflicts of Interest

Author Zhengxing Wu was employed by the company Zhongneng Ruifa (Beijing) Technology Development Company Limited. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Pratama, Y.H.; Chung, S.-H.; Fawwaz, D.Z. Low-Latency and Q-Learning-Based Distributed Scheduling Function for Dynamic 6TiSCH Networks. IEEE Access 2024, 12, 49694–49707. [Google Scholar] [CrossRef]
González-Rodríguez, M.; Otero-Cerdeira, L.; González-Rufino, E.; Rodríguez-Martínez, F.J. Study and Evaluation of CPU Scheduling Algorithms. Heliyon 2024, 10, e29959. [Google Scholar] [CrossRef] [PubMed]
Mack, J.; Arda, S.E.; Ogras, U.Y.; Akoglu, A. Performant, Multi-Objective Scheduling of Highly Interleaved Task Graphs on Heterogeneous System-on-Chip Devices. IEEE Trans. Parallel Distrib. Syst. 2022, 33, 2148–2162. [Google Scholar] [CrossRef]
Basaklar, T.; Goksoy, A.A.; Krishnakumar, A.; Gumussoy, S.; Ogras, U.Y. DTRL: Decision Tree-Based Multi-Objective Reinforcement Learning for Runtime Task Scheduling in Domain-Specific System-on-Chips. ACM Trans. Embed. Comput. Syst. 2023, 22, 113. [Google Scholar] [CrossRef]
Topcuoglu, H.; Hariri, S.; Wu, M.Y. Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing. IEEE Trans. Parallel Distrib. Syst. 2002, 13, 260–274. [Google Scholar] [CrossRef]
Ali, S.M.; Alshahrani, R.F.; Hadadi, A.H.; Alghamdi, T.A.; Almuhsin, F.H.; El-Sharawy, E.E. A Review on the CPU Scheduling Algorithms: Comparative Study. Int. J. Comput. Sci. Netw. Secur. 2021, 21, 19–26. [Google Scholar] [CrossRef]
Hamayun, M.; Khurshid, H. An Optimized Shortest Job First Scheduling Algorithm for CPU Scheduling. J. Appl. Environ. Biol. Sci. 2015, 5, 42–46. [Google Scholar]
Mishra, R.; Mitawa, G. Improved Round Robin Algorithm for Effective Scheduling Process for CPU. In Proceedings of the 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV); IEEE: Piscataway, NJ, USA, 2021; pp. 590–593. [Google Scholar] [CrossRef]
Simaiya, S.; Gautam, V.; Lilhore, U.K.; Garg, A.; Ghosh, P.; Trivedi, N.K.; Anand, A. EEPSA: Energy Efficiency Priority Scheduling Algorithm for Cloud Computing. In Proceedings of the 2021 2nd International Conference on Smart Electronics and Communication (ICOSEC); IEEE: Piscataway, NJ, USA, 2021; pp. 1064–1069. [Google Scholar] [CrossRef]
Sharma, P.S.; Kumar, S.; Gaur, S.M.; Jain, V. A Novel Intelligent Round Robin CPU Scheduling Algorithm. Int. J. Inf. Technol. 2022, 14, 1475–1482. [Google Scholar] [CrossRef]
Mishra, A.; Ahmed, A.O. Simulation of CPU Scheduling Algorithms Using Poisson Distribution. Int. J. Math. Sci. Comput. 2020, 6, 71–78. [Google Scholar] [CrossRef]
Regehr, J.D. Using Hierarchical Scheduling to Support Soft Real-Time Applications in General-Purpose Operating Systems. Ph.D. Thesis, University of Virginia, Charlottesville, VA, USA, 2001. [Google Scholar]
Buttazzo, G.C. Hard Real-Time Computing Systems: Predictable Scheduling Algorithms and Applications, 3rd ed.; Springer: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
Miao, Z.; Shao, C.; Li, H.; Tang, Z. Review of Task-Scheduling Methods for Heterogeneous Chips. Electronics 2025, 14, 1191. [Google Scholar] [CrossRef]
Ismail Fawaz, H.; Lucas, B.; Forestier, G.; Pelletier, C.; Schmidt, D.F.; Weber, J.; Webb, G.I.; Idoumghar, L.; Muller, P.-A.; Petitjean, F. InceptionTime: Finding AlexNet for Time Series Classification. Data Min. Knowl. Discov. 2020, 34, 1936–1962. [Google Scholar] [CrossRef]
Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. FEDformer: Frequency Enhanced Decomposed Transformer for Long-Term Series Forecasting. In Proceedings of the 39th International Conference on Machine Learning (ICML); Proceedings of Machine Learning Research (PMLR); PMLR: Online, 2022; Volume 162, pp. 27268–27286. Available online: https://proceedings.mlr.press/v162/zhou22g.html (accessed on 15 November 2025).
Tsai, Y.-H.H.; Bai, S.; Liang, P.P.; Kolter, J.Z.; Morency, L.-P.; Salakhutdinov, R. Multimodal Transformer for Unaligned Multimodal Language Sequences. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL); Association for Computational Linguistics: Florence, Italy, 2019; pp. 6558–6569. [Google Scholar] [CrossRef]
Liu, G.; Wang, J. Dendrite Net: A White-Box Module for Classification, Regression, and System Identification. IEEE Trans. Cybern. 2022, 52, 13774–13787. [Google Scholar] [CrossRef]
Biswas, S.; Ahmed, M.S.; Rahman, M.J.; Khaer, A.; Islam, M.M. A Machine Learning Approach for Predicting Efficient CPU Scheduling Algorithm. In Proceedings of the 2023 5th International Conference on Sustainable Technologies for Industry 5.0 (STI); IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar] [CrossRef]
Senan, S. A Neural Net-Based Approach for CPU Utilization. Bilişim Teknol. Derg. 2017, 10, 263–272. [Google Scholar] [CrossRef][Green Version]
Khan, S.; Ahmad, A.; Raza, H.; Alam, M.M. Intelligent Algorithm for Automatic Runtime Selection of Scheduling Algorithm Using Pattern Recognition Techniques. SSRN Electron. J. 2024. [Google Scholar] [CrossRef]
Tehsin, S.; Asfia, Y.; Akbar, N.; Riaz, F.; Rehman, S.; Young, R. Selection of CPU Scheduling Dynamically through Machine Learning. In Proceedings of SPIE 11400, Pattern Recognition and Tracking XXXI; SPIE: Bellingham, WA, USA, 2020. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Advances in Neural Information Processing Systems (NIPS 2017); Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 5998–6008. [Google Scholar]
Liu, G.; Pang, Y.; Yin, S.; Niu, X.; Wang, J. Dendrite Net with Acceleration Module for Faster Nonlinear Mapping and System Identification. Mathematics 2022, 10, 4477. [Google Scholar] [CrossRef]
Kaufman, S.; Rosset, S.; Perlich, C.; Stitelman, O. Leakage in Data Mining: Formulation, Detection, and Avoidance. ACM Trans. Knowl. Discov. Data 2012, 6, 15. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Tay, Y.; Dehghani, M.; Bahri, D.; Metzler, D. Efficient Transformers: A Survey. ACM Comput. Surv. 2022, 55, 109. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT 2019; Association for Computational Linguistics: Minneapolis, Minnesota, 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
Munyaneza, O.; Sohn, J.W. Multiscale 1D-CNN for Damage Severity Classification and Localization Based on Lamb Wave in Laminated Composites. Mathematics 2025, 13, 398. [Google Scholar] [CrossRef]
Yang, X.; Li, H.; Huang, X.; Feng, X. FEDAF: Frequency Enhanced Decomposed Attention-Free Transformer for Long Time Series Forecasting. Neural Comput. Appl. 2024, 36, 16271–16288. [Google Scholar] [CrossRef]
Cooley, J.W.; Tukey, J.W. An Algorithm for the Machine Calculation of Complex Fourier Series. Math. Comput. 1965, 19, 297–301. [Google Scholar] [CrossRef]
Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. arXiv 2021, arXiv:2106.13008. [Google Scholar]
Liu, G.; Wang, J. A Relation Spectrum Inheriting Taylor Series: Muscle Synergy and Coupling for Hand. Front. Inf. Technol. Electron. Eng. 2022, 23, 145–157. [Google Scholar] [CrossRef]
Liu, G. It May Be Time to Improve the Neuron of Artificial Neural Network. TechRxiv 2020. [Google Scholar] [CrossRef]
Sokolova, M.; Lapalme, G. A Systematic Analysis of Performance Measures for Classification Tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
Fawcett, T. An Introduction to ROC Analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]

Figure 1. Overall architecture of the proposed MCFCTransformer-DD model.

Figure 2. DDNet architecture schematic.

Figure 3. DDNet algorithm flowchart.

Figure 4. Comparison of confusion matrices on the test set across models. Cell shading intensity indicates the number of samples, with darker cells representing higher counts.

Figure 5. ROC curves across models.

Figure 6. Global SHAP explanation for MCFCTransformer-DD.

Figure 7. Attention-based explanation for MCFCTransformer-DD.

Figure 8. Rule extraction view for the dendritic discriminator.

Figure 9. Example of a dendritic decision path for a test sample. The path provides a traceable explanation for choosing SJF instead of RR.

Table 1. Ablation study results for the proposed MCFCTransformer-DD model.

Model	Accuracy (%)	F1 (%)	AUROC
Transformer	84.00	83.78	0.96
MCTransformer	87.40	87.53	0.98
MCFTransformer	89.30	89.40	0.98
MCFCTransformer	90.50	90.67	0.99
MCFCTransformer-DD	94.50	94.65	1.00

Table 2. Repeated-run evaluation. Results are reported as mean with 95% CI.

Setting	Accuracy	F1	AUROC
MCFCTransformer-DD	0.9449 [0.9300, 0.9585]	0.9464 [0.9328, 0.9594]	0.9664 [0.9576, 0.9744]
w/o Frequency-domain enhancement	0.9123 [0.8971, 0.9271]	0.9140 [0.8990, 0.9280]	0.9405 [0.9302, 0.9511]

Table 3. Comparative experiment results on the CPU scheduling policy identification task.

Model	Accuracy (%)	F1 (%)	AUROC
BP	85.40	85.50	0.91
GA-BP	87.30	87.42	0.92
CNN-LSTM	90.10	90.19	0.98
MCNN-Transformer	87.80	87.73	0.98
BiTCN-AT	88.30	88.14	0.98
MCFCTransformer-DD	94.50	94.65	1.00

Table 4. Lightweight baseline comparison on the original distribution. Results are reported as mean with 95% CI.

Model	Accuracy	F1	AUROC
MLP	0.8481 [0.8260, 0.8690]	0.8486 [0.8287, 0.8695]	0.9072 [0.8946, 0.9200]
SVM	0.8803 [0.8595, 0.9010]	0.8786 [0.8583, 0.8992]	0.9272 [0.9151, 0.9393]
MCFCTransformer-DD	0.9449 [0.9300, 0.9585]	0.9464 [0.9328, 0.9594]	0.9664 [0.9576, 0.9744]

Table 5. Cross-distribution test results with 95% CI.

Test Setting	Accuracy	F1	AUROC
Original distribution	0.9449 [0.9300, 0.9585]	0.9464 [0.9328, 0.9594]	0.9664 [0.9576, 0.9744]
Cross-distribution	0.9203 [0.9009, 0.9369]	0.9222 [0.9047, 0.9377]	0.9512 [0.9399, 0.9611]

Table 6. Parameter sensitivity and noise robustness results with 95% CI.

Test Setting	Accuracy	F1	AUROC
Original distribution	0.9449 [0.9300, 0.9585]	0.9464 [0.9328, 0.9594]	0.9664 [0.9576, 0.9744]
Parameter sensitivity	0.9068 [0.8879, 0.9249]	0.9089 [0.8911, 0.9263]	0.9431 [0.9323, 0.9540]

Table 7. Sensitivity analysis of fusion weights

(α, β)

with 95% CI.

Table 7. Sensitivity analysis of fusion weights

(α, β)

with 95% CI.

$α$	$β$	Accuracy	F1	AUROC
0.1	0.9	0.9199 [0.9025, 0.9370]	0.9178 [0.9004, 0.9343]	0.9509 [0.9413, 0.9602]
0.2	0.8	0.9240 [0.9070, 0.9400]	0.9222 [0.9053, 0.9373]	0.9533 [0.9441, 0.9618]
0.3	0.7	0.9294 [0.9130, 0.9450]	0.9285 [0.9123, 0.9437]	0.9568 [0.9474, 0.9655]
0.4	0.6	0.9333 [0.9185, 0.9480]	0.9330 [0.9185, 0.9477]	0.9592 [0.9505, 0.9677]
0.5	0.5	0.9399 [0.9235, 0.9550]	0.9403 [0.9248, 0.9553]	0.9633 [0.9541, 0.9722]
0.6	0.4	0.9449 [0.9300, 0.9585]	0.9464 [0.9328, 0.9594]	0.9664 [0.9576, 0.9744]
0.7	0.3	0.9274 [0.9110, 0.9430]	0.9305 [0.9153, 0.9446]	0.9555 [0.9458, 0.9646]
0.8	0.2	0.8347 [0.8110, 0.8560]	0.8397 [0.8191, 0.8617]	0.8977 [0.8851, 0.9109]
0.9	0.1	0.5295 [0.4990, 0.5600]	0.4702 [0.4459, 0.4956]	0.7055 [0.6920, 0.7188]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Peng, X.; Wang, H.; Zhou, G.; Jiang, J.; Fang, H.; Wu, Z.; Li, X. An Interpretable CPU Scheduling Method Based on a Multiscale Frequency-Domain Convolutional Transformer and a Dendritic Network. Electronics 2026, 15, 693. https://doi.org/10.3390/electronics15030693

AMA Style

Peng X, Wang H, Zhou G, Jiang J, Fang H, Wu Z, Li X. An Interpretable CPU Scheduling Method Based on a Multiscale Frequency-Domain Convolutional Transformer and a Dendritic Network. Electronics. 2026; 15(3):693. https://doi.org/10.3390/electronics15030693

Chicago/Turabian Style

Peng, Xiuwei, Honghua Wang, Guohui Zhou, Jun Jiang, Hao Fang, Zhengxing Wu, and Xiaohui Li. 2026. "An Interpretable CPU Scheduling Method Based on a Multiscale Frequency-Domain Convolutional Transformer and a Dendritic Network" Electronics 15, no. 3: 693. https://doi.org/10.3390/electronics15030693

APA Style

Peng, X., Wang, H., Zhou, G., Jiang, J., Fang, H., Wu, Z., & Li, X. (2026). An Interpretable CPU Scheduling Method Based on a Multiscale Frequency-Domain Convolutional Transformer and a Dendritic Network. Electronics, 15(3), 693. https://doi.org/10.3390/electronics15030693

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Interpretable CPU Scheduling Method Based on a Multiscale Frequency-Domain Convolutional Transformer and a Dendritic Network

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Data Processing

3.2. MCFCTransformer-DD

3.2.1. Multiscale Convolution

3.2.2. Frequency-Domain Enhancement

3.2.3. Dual-Branch Self-Attention Fusion

3.2.4. Dendritic Discriminator Head

3.2.5. Evaluation Metrics

4. Results

4.1. Ablation Experiment

4.2. Comparative Experiments

4.3. Cross-Distribution Generalization Test

4.4. Parameter Sensitivity and Noise Robustness

4.5. Sensitivity of Fusion Weights $α$ and $β$

4.6. Interpretability Analysis and Runtime Efficiency

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

An Interpretable CPU Scheduling Method Based on a Multiscale Frequency-Domain Convolutional Transformer and a Dendritic Network

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Data Processing

3.2. MCFCTransformer-DD

3.2.1. Multiscale Convolution

3.2.2. Frequency-Domain Enhancement

3.2.3. Dual-Branch Self-Attention Fusion

3.2.4. Dendritic Discriminator Head

3.2.5. Evaluation Metrics

4. Results

4.1. Ablation Experiment

4.2. Comparative Experiments

4.3. Cross-Distribution Generalization Test

4.4. Parameter Sensitivity and Noise Robustness

4.5. Sensitivity of Fusion Weights α and β

4.6. Interpretability Analysis and Runtime Efficiency

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.5. Sensitivity of Fusion Weights $α$ and $β$