ORACIL: Conflict-Graph-Based Order-Robust Analytic Class-Incremental Learning

Wang, Guanjie; Sun, Hongyu; Li, Wanjia; Dong, Yanhua

doi:10.3390/electronics15132941

Open AccessArticle

ORACIL: Conflict-Graph-Based Order-Robust Analytic Class-Incremental Learning

College of Computer Science, Jilin Normal University, Siping 136000, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(13), 2941; https://doi.org/10.3390/electronics15132941 (registering DOI)

Submission received: 1 June 2026 / Revised: 29 June 2026 / Accepted: 2 July 2026 / Published: 5 July 2026

Download

Browse Figures

Versions Notes

Abstract

Class-incremental learning allows a model to continuously acquire new classes from sequentially arriving data while preserving its ability to recognize previously learned ones, which is essential for improving adaptability and supporting long-term evolution. However, the class arrival order is inherently random, and highly similar classes may appear consecutively, which intensifies catastrophic forgetting. Although replay-based methods can effectively alleviate this problem, they usually require storing or accessing historical raw samples, which introduces additional data-retention and storage burdens. To address these challenges, this paper proposes ORACIL, an Order-Robust Analytic Class-Incremental Learning framework. First, ORACIL constructs a conflict graph based on class centroids and dynamically partitions newly arriving classes into multiple low-similarity groups, thereby reducing inter-class interference and mitigating forgetting. Second, for each class group, it trains an analytic incremental classification head and performs recursive closed-form updates for the analytic heads using current-stage data and accumulated second-order statistics, without replaying raw historical samples. For group recognition, ORACIL uses feature-derived distance representations rather than raw historical images, making the incremental process raw-sample-free with respect to original image replay. Third, during inference, the group probabilities generated by the group-recognition router are softly fused with the class scores produced by each analytic head, and the class with the highest fused probability is selected as the final prediction. Extensive experiments on CIFAR-100, CUB200, and OmniBenchmark demonstrate the effectiveness of ORACIL. Without replaying historical images, ORACIL achieves final-phase average forgetting rates of 0.16%, 0.77%, and 1.04%, and final-phase accuracies of 95.77%, 93.86%, and 88.12%, respectively. In addition, the MOPD and AOPD results show that ORACIL maintains strong robustness under different class arrival orders.

Keywords:

class-incremental learning; analytic learning; catastrophic forgetting; order robustness; raw-sample-free learning; replay-free learning

1. Introduction

Class-incremental learning (CIL) is a learning paradigm in which classes arrive progressively and the model continuously learns new classes while retaining knowledge of previously learned ones [1]. In recent years, CIL has played an increasingly important role in fields such as robotic automation systems [2,3], wireless communications [4], and healthcare and diagnostic detection [5,6]. As the demand for model adaptability continues to grow, incremental learning is gradually becoming a key enabler for continual model learning, and its application scope and impact are also steadily expanding.

However, the advantages of CIL come at a cost. When learning new tasks, the model tends to lose its ability to recognize previously learned classes, which is the most critical challenge faced by CIL, namely catastrophic forgetting [7]. At present, the main approaches for addressing catastrophic forgetting include replay-based methods, regularization-based methods, and parameter-isolation methods [8]. In particular, replay-based methods have achieved promising performance in mitigating catastrophic forgetting [9]. However, the strong performance of these methods usually comes at the expense of storing or accessing historical raw samples, which introduces additional storage overhead and data-retention concerns [10]. Although some regularization-based methods do not require access to historical data, since they only impose regularization on the loss function, their accuracy and forgetting performance still cannot compete with replay-based methods.

It should be emphasized that the above solutions largely overlook the order in which classes arrive. Recent studies have shown that, even when the data and training protocol remain unchanged, merely changing the presentation order of classes can substantially affect the final accuracy and forgetting rate. This effect becomes particularly severe when highly similar classes arrive consecutively. In such cases, inter-class interference and bias accumulation are significantly intensified, leading to a clear increase in performance variance across different class orders. In contrast, dispersing highly similar classes into neighboring incremental phases can effectively alleviate this issue and improve stability under different class orders [11,12].

In summary, existing CIL techniques either depend on historical raw samples or are highly sensitive to class arrival order. When similar classes are introduced consecutively, catastrophic forgetting becomes even more severe. To address both raw-image replay dependence and class-order sensitivity, this paper proposes ORACIL. Specifically, ORACIL first constructs a conflict graph based on class centroids and adaptive thresholds, and then dynamically partitions the incremental data at each phase into multiple low-similarity class groups through a dynamic grouping strategy. Next, a group-recognition router is trained to enhance model robustness. This router consists of three base classifiers, namely RF, KNN, and LGBM, combined through soft voting. Finally, within each group, analytic closed-form solutions and recursive updates are used to replace backpropagation-based fine-tuning. In this way, the analytic heads can be updated using current-phase data and accumulated feature autocorrelation matrices, without replaying raw historical samples during classifier updating. Meanwhile, the group-recognition router uses feature-derived distance representations rather than raw historical images. As a result, ORACIL avoids replaying raw historical images during analytic-head updating while effectively mitigating catastrophic forgetting.

Therefore, the contributions of this paper can be summarized as follows:

(1): We propose ORACIL, which constructs a conflict graph based on class centroids and adaptive thresholds, and then performs dynamic grouping on the conflict graph to partition newly arriving data into multiple low-similarity class groups. This strategy effectively reduces the impact of class arrival order on catastrophic forgetting. In addition, an analytic incremental classification head is trained within each class group. Recursive closed-form updates are performed for the analytic heads using current-phase data and accumulated second-order statistics, without replaying raw historical samples during analytic-head updating. In this way, the proposed method strikes a balance between robustness to class arrival order and raw-sample-free incremental learning.
(2): We propose a group-recognition router (GRR), which obtains the probability that a sample belongs to each class group by weighted fusion of the outputs from three base classifiers, namely RF, KNN, and LGBM. During inference, these probabilities are further fused with the class scores produced by the analytic head of each group. The class corresponding to the maximum fused probability is taken as the final prediction result. This design preserves the contributions of multiple potential groups and improves the robustness of the model. It should be noted that the GRR is updated in a phase-wise manner using feature-derived distance representations, whereas the analytic heads are updated recursively. Therefore, the current GRR is not designed as a strict sample-by-sample online learner.
(3): Extensive experiments are conducted on CIFAR-100, CUB200, and OmniBenchmark under multiple class orders. The results show that, without replaying raw historical samples, ORACIL achieves high stability and low forgetting rates.

2. Related Work

2.1. Research Progress in Class-Incremental Learning

Class-incremental learning (CIL) aims to enable a model to continuously learn new classes in dynamic environments while retaining its ability to recognize previously learned classes [13]. To alleviate catastrophic forgetting, existing solutions can generally be divided into three categories: replay-based methods, regularization-based methods, and parameter-isolation methods.

Replay-based methods mitigate catastrophic forgetting by storing or generating samples from previously learned classes and jointly training them with data from new classes when learning a new task. Among them, storage-based replay methods [14,15,16] require direct access to historical raw samples, which introduces additional storage overhead and data-retention concerns. In contrast, GAN-based methods [17,18] alleviate catastrophic forgetting by generating samples of previously learned classes and training them jointly with new data; however, their performance depends heavily on the quality of the generated samples.

Regularization-based methods introduce additional constraints into the loss function and preserve knowledge of previously learned classes by restricting large changes in important model parameters or features [19,20,21]. These methods do not access historical raw samples during incremental updating. However, their accuracy and forgetting performance still cannot compete with replay-based methods in many large-scale incremental scenarios. However, in large-scale incremental scenarios, such parameter constraints gradually become less effective, making it difficult for these methods to achieve accuracy comparable to that of replay-based methods.

Parameter-isolation methods allocate independent parameter subspaces to different tasks through dynamic expansion or masking mechanisms [22,23,24]. These methods can avoid forgetting to a certain extent. However, the continual growth of the network structure and the additional task prediction module limit their practicality. Although their dependence on historical data is relatively low, model resource consumption and inference overhead increase as the number of task phases grows. Moreover, the strong isolation across tasks restricts knowledge transfer.

2.2. Research Progress in Class-Incremental Learning from the Perspective of Class Order

In addition to the conventional forgetting problem, recent studies have gradually recognized the significant impact of class arrival order on the performance of CIL. In real-world scenarios, the order in which classes arrive is often uncontrollable, and highly similar classes may appear consecutively. Because the model is highly sensitive to class order, this can lead to substantial performance fluctuations.

To alleviate this issue, existing studies have mainly focused on improving order robustness through structural adjustments. For example, parameter decomposition methods [25] can introduce scalable parameters, making them effective for handling continually increasing tasks and enhancing the learning capacity of the network. Low-rank approximation methods [26] can also be used to apply low-rank perturbations to the Hessian matrix, thereby alleviating catastrophic forgetting. Although these methods have achieved success on specific tasks, they focus more on local changes in the network and do not fundamentally resolve this issue.

Recent studies have shown that inter-task similarity significantly affects the generalization ability and resistance to forgetting of incremental learning. Specifically, task order plays a significant role in model performance. When the similarity between tasks is relatively low, the model tends to exhibit stronger generalization ability and better resistance to forgetting [11,12]. This finding provides a new direction for improving order robustness.

2.3. Research Progress in Analytic Learning

Analytic Learning, also known as pseudoinverse learning [27], aims to circumvent several inherent limitations of backpropagation, or BP, such as gradient vanishing, gradient explosion, and iterative divergence. It emphasizes directly obtaining closed-form solutions through matrix inversion, thereby avoiding the backpropagation procedure required by conventional methods.

In its early form, analytic learning was mainly applied to shallow models. For example, radial basis function networks [28] first perform kernel transformation during training and then directly estimate the network weights by least squares, or LS. Owing to its simplicity and efficiency, this strategy has been widely used in shallow models. With the further development of neural network research, multilayer analytic learning [29,30] decomposes nonlinear network training into multiple linear sub-models. This makes the learning of layer-wise weights more efficient and avoids the complicated gradient propagation process across layers in conventional deep learning.

However, the main bottleneck of traditional analytic learning is that it requires all data to be jointly involved in weight learning at one time, which can easily lead to excessive memory consumption. To overcome this limitation, researchers proposed replacing joint solving with recursive solving [31]. This improvement significantly reduces memory requirements while still accurately reproducing the results of joint training. The introduction of recursive solving enables analytic learning to maintain effective training on data without increasing the memory burden.

In summary, existing studies have mainly focused on alleviating catastrophic forgetting, improving order robustness, and reducing dependence on raw-sample replay. However, a single method can usually achieve a breakthrough in only one of these aspects. From the perspective of electronic visual recognition systems, class-incremental models are often expected to adapt to newly arriving classes while reducing dependence on raw historical images. Therefore, this paper proposes ORACIL, which reduces order sensitivity through a graph-driven grouping strategy and realizes raw-sample-free incremental learning through analytic updates and feature-derived group recognition, thereby achieving a balance among order robustness, forgetting resistance, and replay-free incremental learning.

3. Method

Overview: As shown in Figure 1, the proposed framework consists of two parts: the training stage and the inference stage.

Training stage: First, features are extracted through the backbone network, and a conflict graph is constructed based on class centroids. Then, the dynamic grouping module partitions classes into low-similarity class groups to alleviate order sensitivity. Next, the group-recognition router estimates the probability that a sample belongs to each class group according to its prototype-distance representation. Finally, the analytic incremental classification head within each group is recursively updated without replaying raw historical samples, enabling replay-free analytic-head updating.
Inference stage: The router first outputs the probability distribution of the sample over all class groups. These probabilities are then fused with the predictions of the analytic head from each group through weighted soft fusion. The category corresponding to the maximum response is taken as the final prediction result.

3.1. Problem Setting

In a

K

-phase class-incremental learning scenario, the model sequentially receives training data at phase

k = 1, 2, 3 \dots, K

, and the classes in different phases are mutually exclusive. The training and testing data at phase

k

are denoted by

D_{k}^{train} \sim {X_{k}^{train}, Y_{k}^{train}}

and

D_{k}^{test} \sim {X_{k}^{test}, Y_{k}^{test}}

, respectively. Here, the sample tensor

X_{k}^{(\cdot)}

∈

R^{N_{k} \times w \times h \times c}

, and the training labels satisfy

Y_{k}^{t r a i n} \in R^{N_{k} \times d_{y k}}

, where

d_{y k}

=

|C_{k}|

denotes the total number of classes at phase

k

and

C_{k}

is the class set of that phase. After the completion of phase

k

, the model is required to perform recognition over the observed class set

C_{1 : k}

, where

j

in Equation (1) is only used as the historical phase index.

C_{1 : k} = ⋃_{j = 1}^{k} C_{j}

(1)

Therefore, the dimension of the test labels is

{| C}_{1 : k} |

, namely,

Y_{k}^{t e s t} \in R^{N_{k}^{*} \times {| C}_{1 : k} |}

.

3.2. Dynamic Grouping Based on the Conflict Graph

As shown in Figure 1, the model first uses a pre-trained backbone network to extract the feature representation of a sample

x

, denoted as

f (x) \in R^{d}

. During the incremental process, the backbone network remains frozen and serves only as a fixed feature extractor. To improve the effectiveness of the subsequent analytic head, we further perform feature expansion. Specifically, when the input parameter

M > 0

, the input feature is first projected through a random mapping and then transformed by a ReLU nonlinearity to obtain a high-dimensional feature:

h (x) = σ (f (x) W_{r a n d}) \in R^{M}

(2)

where

σ = R E L U

, and

W_{rand} \in R^{d \times M}

is a fixed random mapping matrix. If random mapping is not enabled, then

h (x) = f (x)

. In implementation, to maintain consistency of feature distributions across different tasks, we apply

L 2

normalization to

h (x)

. This provides a unified input space for subsequent analytic learning and the training of the group-recognition router.

μ_{c} = \frac{1}{|D_{k, c}|} \sum_{x \in D_{k, c}} f (x)

(3)

where

D_{k, c}

denotes the set of samples with label

c

in phase

k

. In this paper, cosine similarity is adopted as the similarity measure. The average cosine similarity threshold for class

c

is defined as follows:

τ_{c} = \frac{1}{|D_{k, c}|} \sum_{x \in D_{k, c}} c o s (f (x), μ_{c})

(4)

when a new class

c

at phase

k

attempts to join an existing group, it can be assigned to that group only if the group capacity has not reached its maximum and the similarity between

c

and every old class

c^{'}

within that group is below the threshold. Specifically, if

\cos (μ_{c}, μ_{c^{'}}) \leq \min (τ_{c}, τ_{c^{'}})

, the two classes are regarded as dissimilar, and the new class

c

is assigned to that group.

For the remaining new classes that cannot be merged into existing groups, each class is treated as a vertex. A threshold-based criterion is then used to determine whether two classes are similar. If they are similar, an edge is established between the corresponding vertices, thereby constructing the conflict graph. A greedy maximum independent set decomposition strategy is applied to the conflict graph to partition it into several independent subsets, each of which corresponds to a newly created group. In this way, a final mapping from classes to groups is obtained. This design ensures larger inter-class differences within each group, leading to more stable incremental learning and improved robustness.

3.3. Group-Recognition Router (GRR)

After dynamically grouping newly arriving classes, the model needs to estimate the probability that a sample belongs to each group during inference. To this end, this paper introduces a group-recognition router. Its input is a prototype-distance representation vector constructed based on class prototypes. For a sample

x

, the distance from

x

to each class prototype is denoted by

d_{j} (x)

, and these distances are arranged according to the class order to form the input vector, as defined in Equation (5):

d_{j} (x) = \{\begin{matrix} {||f (x) - μ_{j}||}_{2}, j \in C_{1 : k} \\ + \infty, j \notin C_{1 : k} \end{matrix}

(5)

Its label is the group ID to which the sample belongs, denoted by

g (x) \in \{1, \dots, | G |\}

. Based on this feature representation, three base classifiers, namely KNN, RF, and LGBM, are trained separately at the end of each incremental phase. Unlike the analytic heads, which are recursively updated through closed-form updates, the current GRR is a phase-wise routing module. When new groups emerge, the router is retrained using the available feature-derived distance representations. Therefore, the GRR should not be interpreted as a sample-by-sample streaming learner. Each classifier outputs a group probability distribution, and these outputs are fused through weighted soft combination, as shown in Equation (6):

p (g| x) = {α p}_{K N N} (g| x) + {β p}_{R F} (g| x) + {γ p}_{L G B M} (g| x)

(6)

where

α

,

β

and

γ

are non-negative fusion coefficients satisfying

α + β + γ = 1

. These coefficients are selected using a validation split constructed from the accumulated feature-derived distance representations used for router training. At each incremental phase, the available router training representations are divided into 80% training data and 20% validation data by stratified sampling according to the group labels, with the random seed fixed to 1993. The KNN, RF, and LGBM classifiers are trained on the training split, and their probability outputs on the validation split are used to select the fusion weights. Specifically, the fusion coefficients are selected by grid search under the constraints

α, β, γ \geq 0

and

α + β + γ = 1

. The search step is set to 0.01, and the weight combination that achieves the highest group-recognition accuracy on the validation split is adopted. No test samples are used in this process. This validation-based selection requires maintaining accumulated feature-derived distance representations for router training. The resulting storage and update overheads are analyzed in Section 4.2.8 and Appendix A, Table A1, Table A2 and Table A3.

It should be emphasized that the router uses feature-derived distance representations rather than raw historical images. Therefore, ORACIL avoids raw-sample replay in the router training process. However, these feature-derived descriptors should not be regarded as private or secure representations. They may still encode information about the training distribution and may be vulnerable to inversion, reconstruction, or membership inference attacks. Thus, ORACIL should be understood as a raw-sample-free class-incremental learning framework rather than a formal privacy-preserving method such as differential privacy. In other words, ORACIL avoids raw-image replay but is not a memory-free framework. The three base classifiers output probability vectors over all groups, and the final fused probability

p (g | x)

is obtained by weighted summation. During inference, these probabilities are used as soft gating weights and are combined with the outputs of the analytic heads for weighted prediction.

3.4. Incremental Analytic Heads and Weight Update

After completing dynamic grouping and training the group-recognition router, ORACIL independently trains a linear analytic head for each class group

g

to discriminate between the classes within that group. Let the matrix of samples routed to class group

g

in the current phase, represented in the analytic space, be denoted by

X_{g} \in R^{n_{g} \times M}

, where

n_{g}

is the number of samples used to update group

g

in the current phase. Its corresponding one-hot labels are denoted by

Y_{g} \in R^{n_{g} \times {| G}_{g} |}

, where

| G_{g} |

represents the number of classes that have appeared in this group, and the column order is consistent with the global class order. The parameters of the analytic head include the weight matrix

W_{g} \in R^{M \times {| G}_{g} |}

and the regularized feature autocorrelation matrix

R_{g} \in R^{M \times M}

. To obtain its analytic closed-form solution, this paper adopts an

L_{2}

-regularized least-squares objective, as shown in Equation (7):

{m i n}_{w_{g}} {||Y_{g} - X_{g} W_{g}||}_{F}^{2} + γ {||W_{g}||}_{F}^{2}

(7)

Its static closed-form solution is

W_{g} = {(X_{g}^{T} X_{g} + γ I)}^{- 1} X_{g}^{T} Y_{g}

(8)

However, in the class-incremental learning scenario, every arrival of new samples would require recomputing the matrix inverse. This leads to high computational cost and makes it difficult to efficiently accumulate historical information. To avoid repeated matrix inversion, this paper initializes

R_{g}^{(0)} = γ^{- 1} I_{M}

(9)

as the state matrix, which represents the inverse covariance matrix of the current analytic head for group

g

. As new samples arrive during the incremental process,

R_{g}

is recursively maintained to approximately represent the inverse form of the accumulated matrix:

R_{g} \approx {(γ I + \sum X^{T} X)}^{- 1}

(10)

where

\sum X^{T} X

denotes the accumulated feature term of all samples up to the current phase. To derive the analytic recursive updates of

R_{g}

and

W_{g}

, let the input and label of the current phase be (

X_{g}

,

Y_{g}

), and treat

X_{g}^{T} X_{g}

as an incremental correction term to the previously accumulated matrix. According to the Woodbury matrix identity, as shown in Equation (11):

{(A + U C V)}^{- 1} = A^{- 1} - A^{- 1} U {(C^{- 1} + V A^{- 1} U)}^{- 1} V A^{- 1}

(11)

the incremental inversion process can be transformed into a recursive update. To avoid numerical instability, a small constant term

ε I

is further introduced into the formula for regularization correction, where

ε \approx 10^{- 8}

.

Assume that the input samples at phase

k

are denoted by (

X_{g}^{(k)}, Y_{g}^{(k)}

). The recursive update process of the state matrix

R_{g}^{(k)}

and the weight matrix

W_{g}^{(k)}

is given in Equations (12)–(14).

K_{g}^{(k)} = {(I_{n_{g}^{(k)}} + X_{g}^{(k)} R_{g}^{(k)} X_{g}^{(k) ⊤} + ε I)}^{- 1}

(12)

R_{g}^{(k + 1)} = R_{g}^{(k)} - R_{g}^{(k)} X_{g}^{(k) ⊤} K_{g}^{(k)} X_{g}^{(k)} R_{g}^{(k)}

(13)

W_{g}^{(k + 1)} = W_{g}^{(k)} + R_{g}^{(k + 1)} X_{g}^{(k) ⊤} (Y_{g}^{(k)} - X_{g}^{(k)} W_{g}^{(k)})

(14)

Here,

R_{g}^{(k)}

and

W_{g}^{(k)}

denote the state matrix and weight matrix before incorporating the samples of phase

k

, while

R_{g}^{(k + 1)}

and

W_{g}^{(k + 1)}

denote the updated matrices after incorporating the current-phase samples. Therefore, Equation (14) uses the updated state matrix

R_{g}^{(k + 1)}

to correct the residual

Y_{g}^{(k)} - X_{g}^{(k)} W_{g}^{(k)}

, which is consistent with the recursive least-squares update derived from the Woodbury identity.

Equation (13) gives the Woodbury-based update of

R_{g}

. Equation (14) performs a one-shot analytic update of

W_{g}^{(k)}

using the current residual

Y_{g}^{(k)} - X_{g}^{(k)} W_{g}^{(k)}

. When new classes appear in group

g

at a new phase, it is only necessary to append zero columns to

W_{g}

along the column dimension and align the columns of

Y_{g}

according to the class order. After that, Equations (12)–(14) can be directly applied. During the recursive update of each analytic head, only the state matrix

R_{g}

and the weight matrix

W_{g}

are retained, and no raw historical samples are replayed for classifier updating. This design avoids replaying raw historical images during analytic-head updating. Nevertheless, the retained analytic-head statistics and feature-derived representations are not intended to provide formal privacy guarantees.

3.5. Inference

During inference, for a test sample

x

, the probability

p (g| x)

of belonging to each class group is first obtained by Equation (6). Then, according to Equation (2), the pre-trained feature of sample

x

is mapped into the analytic space to obtain the analytic feature

h (x)

. For each class group

g

, let the weight matrix of its analytic head be

W_{g}

. The within-group logits vector is then defined as

z_{g} (x) = h {(x)}^{⊤} W_{g} \in R^{|G_{g}|}

(15)

Next, according to the mapping between within-group classes and global classes,

z_{g} (x)

is mapped back to the global class space. Then, the logits produced by each group-specific analytic head are mapped into the global label space. ORACIL then uses

p (g| x)

as a soft gating weight and accumulates the weighted logits from all groups to obtain the global logits, as shown in Equation (16).

Z (c| x) = \sum_{g : c \in G_{g}} p (g| x) {[z_{g} (x)]}_{{local}_{g} (c)}

(16)

where

{local}_{g} (c)

denotes the local index of class

c

in group

g

. If

c \notin G_{g}

, it is set to 0. The final predicted class is determined by the maximum value in the global logits, as shown in Equation (17):

\hat{y} (x) = \arg \underset{c \in C_{1 : t}}{\max Z} (c| x)

(17)

This probability-weighted strategy differs from conventional hard routing. When group prediction is uncertain, it preserves the contributions of multiple potential groups, thereby improving the robustness and stability of the model.

4. Experimental Results

4.1. Experimental Setup

4.1.1. Datasets

To systematically evaluate the effectiveness of ORACIL, we conduct experiments on three datasets: CIFAR-100 [32], CUB200 [33] and OmniBenchmark [34]. These datasets are complementary in terms of image resolution, category granularity, and domain diversity. CIFAR-100 is used to evaluate incremental learning stability in low-resolution visual scenarios. CUB200 is used to assess the robustness of ORACIL under fine-grained and visually similar categories. OmniBenchmark is used to examine the generalization ability of the model in more diverse cross-domain settings. To ensure reproducibility, all three datasets are evaluated under a 10-phase class-incremental protocol, including one base phase and nine incremental phases. In each phase, only the training samples of newly introduced classes are available for model updating, while evaluation is performed on all classes observed so far. The details of the three datasets are described below.

CIFAR-100 contains 100 classes and 60,000 low-resolution color images with an original resolution of

32 \times 32

pixels. Each class contains 600 images, including 500 training images and 100 test images, following the official train/test split. In our incremental setting, CIFAR-100 is divided into 10 phases. The base phase contains 10 classes, and each of the following nine incremental phases introduces 10 new classes. Therefore, the class ranges are organized as 0–10, 10–20, …, and 90–100 after class-order remapping.

CUB200 is a fine-grained bird classification dataset containing 11,788 images from 200 bird species, with 5994 images for training and 5794 images for testing. Each image is provided with category annotation information. Following the same train/test split used in the pre-trained model-based class-incremental learning protocol, CUB200 is divided into 10 phases. The base phase contains 20 classes, and each subsequent incremental phase introduces 20 new classes. Therefore, the full 200-class learning process is completed after 10 phases.

OmniBenchmark contains 300 classes and 1,074,346 images, covering a wide range of visual categories. Compared with CIFAR-100 and CUB200, OmniBenchmark has greater category diversity and more complex cross-domain visual variations, making it suitable for evaluating the generalization ability of incremental learning methods in diverse visual scenarios. Following the same dataset split used in the pre-trained model-based class-incremental learning protocol, OmniBenchmark is also divided into 10 phases. The base phase contains 30 classes, and each of the following nine incremental phases introduces 30 new classes. Thus, all 300 classes are observed after the final phase. The dataset splits and incremental settings used in the experiments are summarized in Table 1.

4.1.2. Compared Baselines

Since ORACIL adopts a pre-trained model for feature extraction, we compare it with representative class-incremental learning methods under the same pre-trained backbone setting to avoid confounding factors introduced by different feature extractors. The baseline results are obtained using two widely used class-incremental learning toolboxes, namely PILOT [35] and PyCIL [36]. Specifically, from the PILOT toolbox, we evaluate L2P [37], CODA-Prompt [38], FOSTER [39], DualPrompt [40], MEMO [24], and iCaRL [41]. From the PyCIL toolbox, we evaluate SimpleCIL [42], WA [43], BiC [44], DER++ [45] and FeTrIL [46]. In addition, GDDSG [47] is included as a recent graph-based comparison method. These methods cover different class-incremental learning paradigms, including prompt-based tuning, replay-based learning, bias correction, exemplar-free learning, and graph-based modeling. For fairness, all baseline methods and ORACIL use the same ViT-B/16 pre-trained model, the same input resolution, and the same data processing pipeline. Hyperparameters related to memory and other method-specific settings follow the default configurations released in the public GitHub repositories of the corresponding methods. Here, GitHub is used only as the web-based repository hosting platform for accessing the released source code and configurations, rather than as software used in our experiments. No additional external data are introduced.

4.1.3. Training Details and Metrics

The code of this paper is implemented in PyTorch 1.8.0 under theUbuntu 20.04 LTS, and all experiments are conducted on an NVIDIA Tesla V100S 32GB GPU(NVIDIA Corporation, Santa Clara, CA, USA). Unless otherwise specified, the class order is generated by applying NumPy random permutation to the natural class index list of each dataset with seed 1993. The generated class order is fixed for the whole incremental process and shared by all compared methods. Other random states related to model initialization and training are controlled by the implementation.

The final selected fusion weights of the group-recognition router are reported in Table 2. Since the GRR is updated phase by phase, the reported values correspond to the final incremental phase of each dataset. The same validation-based selection procedure is applied at all previous phases.

The pre-trained model uses the ImageNet-21K pre-trained weights of ViT-B/16. In this paper, we use the term “raw-sample-free” to indicate that a method does not store or replay raw historical images during incremental updating. This term should not be interpreted as a formal privacy guarantee. ORACIL still retains model parameters, class prototypes, analytic-head statistics, router parameters, and feature-derived distance representations for group recognition.

In this paper, the evaluation metrics include the average accuracy

A_{N}

and the average forgetting rate

F_{N}

at each phase during the incremental process. Assume that there are

N

incremental phases in total. After completing phase

t

, let the test accuracy on the classes of phase

k

be denoted by

a_{t, k}

(1 ≤

k

≤

t \leq N

). Then, the average accuracy and average forgetting rate at phase

N

are defined as follows:

A_{N} = \frac{1}{N} \sum_{k = 1}^{N} a_{N, k}

(18)

F_{N} = \frac{1}{N - 1} \sum_{k = 1}^{N - 1} (\max_{t \in {k, \dots, N}} a_{t, k} - a_{N, k})

(19)

F_{N}

represents the average gap between the best historical performance and the final performance over all previous phases. A smaller

F_{N}

indicates less forgetting and stronger stability.

To measure the impact of class arrival order on model performance, this paper further introduces two metrics, namely AOPD and MOPD. Assume that there are

R

different class arrival orders, indexed by

r = 1, \dots, R

. First, under order

r

, the phase-average accuracy after completing phase

t

is defined as

A_{t}^{(r)} = \frac{1}{t} \sum_{k = 1}^{t} a_{t, k}^{(r)}

(20)

The order performance disparity at phase

t

, denoted by OPD, is defined as

O P D_{t} = {m a x}_{1 \leq r \leq R} A_{t}^{(r)} - {m i n}_{1 \leq r \leq R} A_{t}^{(r)}

(21)

The average order performance disparity over the whole incremental process, denoted by AOPD, is defined as

A O P D = \frac{1}{N} \sum_{t = 1}^{N} O P D_{t}

(22)

The maximum order performance disparity over all phases, denoted by MOPD, is defined as

M O P D = {m a x}_{1 \leq t \leq N} O P D_{t}

(23)

4.2. Performance Evaluation

4.2.1. Results of ORACIL

Figure 2 presents the phase-wise accuracy comparison between ORACIL and representative class-incremental learning methods on CIFAR-100, CUB200, and OmniBenchmark. The compared methods cover both raw-sample-free approaches that avoid storing or replaying historical images, such as L2P, CODA-Prompt, DualPrompt, SimpleCIL, FeTrIL, and GDDSG, and replay-based approaches that rely on exemplar memory or replay buffers, including FOSTER, MEMO, iCaRL, WA, BiC, and DER++. The symbols √ and × indicate whether each method avoids raw historical image replay.

As the incremental phase increases, most baseline methods show a continuous decline in accuracy, reflecting the accumulation of catastrophic forgetting when new classes are introduced sequentially. This degradation is particularly evident for DER++, SimpleCIL, DualPrompt, L2P, and CODA-Prompt on CUB200 and OmniBenchmark. Although replay-based methods such as iCaRL, WA, and BiC alleviate forgetting to a certain extent, their accuracy still gradually decreases in later phases. In contrast, ORACIL maintains a more stable accuracy trend across the incremental process and achieves the highest final-phase accuracy on all three datasets without relying on raw historical image replay.

On CIFAR-100, ORACIL starts with relatively lower accuracy in the first two phases, but its performance rapidly improves and remains stable above 95% throughout the later phases. This indicates that ORACIL can effectively adapt to the incremental class order and preserve discriminative ability over long-term updates. On CUB200, a fine-grained recognition dataset with highly similar categories, ORACIL gradually surpasses the competing methods and maintains nearly 94% accuracy at the final phase, showing its advantage in mitigating interference among visually similar classes. On OmniBenchmark, where the data distribution is more diverse and cross-domain class variations are larger, ORACIL also exhibits the most stable trend and achieves the best final-phase performance among all compared methods.

These results demonstrate that ORACIL provides strong long-term incremental learning ability in challenging class-incremental scenarios. The conflict-graph-based dynamic grouping strategy helps reduce interference caused by the consecutive arrival of similar or conflicting classes, while the analytic closed-form update mechanism enables efficient model adaptation without iterative replay-based retraining. As a result, ORACIL achieves a favorable balance between stability, resistance to forgetting, and classification performance across different incremental learning scenarios.

Figure 3 further compares ORACIL with representative raw-sample-free class-incremental learning methods, including L2P, CODA-Prompt, DualPrompt, GDDSG, SimpleCIL, and FeTrIL. These methods avoid storing or replaying raw historical images, making them directly comparable under a raw-sample-free class-incremental learning setting. Overall, most baseline methods exhibit a continuous decline in accuracy as the number of incremental phases increases, indicating that long-term forgetting remains a challenging issue even without considering replay-based methods. In contrast, ORACIL maintains a much more stable trend across the incremental process and achieves the best final-phase performance on all three datasets.

On CIFAR-100, several baselines start with higher initial accuracy, especially CODA-Prompt and FeTrIL. However, their performance gradually decreases as more classes are introduced. ORACIL starts with relatively lower accuracy in the first two phases, but it quickly improves and remains stable above 95% in the later phases, achieving the highest accuracy in the final phase. On CUB200, the advantage of ORACIL becomes more evident as the incremental process proceeds. Although methods such as SimpleCIL, DualPrompt, and FeTrIL perform competitively in the early phases, they show clear performance degradation in later phases, while ORACIL maintains around 94% accuracy and achieves the best final-phase result. On OmniBenchmark, where the class distribution is more diverse and cross-domain variations are larger, most raw-sample-free baselines suffer from substantial accuracy drops. In contrast, ORACIL keeps a relatively stable trajectory and obtains the highest final-phase accuracy among all compared raw-sample-free methods.

These results demonstrate that ORACIL has stronger long-term stability and resistance to forgetting than existing raw-sample-free baselines. The conflict-graph-based dynamic grouping strategy helps reduce interference among newly introduced and previously learned classes, while the group-recognition router and analytic incremental updating strategy enable effective adaptation during continuous class expansion. Consequently, ORACIL maintains stable performance across different datasets without storing or replaying raw historical images.

Figure 4 presents the phase-wise accuracy comparison among replay-based class-incremental learning methods, including FOSTER, MEMO, iCaRL, WA, BiC, and DER++, on CIFAR-100, CUB200, and OmniBenchmark. These methods rely on exemplar memory or replay buffers to preserve historical information during incremental learning. Compared with the raw-sample-free methods shown in Figure 3, replay-based methods can explicitly reuse historical samples or stored experience to mitigate catastrophic forgetting. However, their performance still varies significantly across datasets and incremental phases.

On CIFAR-100, most replay-based methods start from high initial accuracy and gradually degrade as the incremental process proceeds. FOSTER maintains the best final-phase accuracy among the replay-based baselines, while iCaRL, WA, and BiC show similar decreasing trends. MEMO exhibits a moderate decline, and DER++ suffers from the most pronounced performance drop in the later phases. On CUB200, iCaRL and MEMO show relatively strong long-term performance, whereas FOSTER and DER++ degrade more rapidly as new fine-grained classes are introduced. This suggests that exemplar replay alone may be insufficient when inter-class visual similarity is high. On OmniBenchmark, which contains more diverse visual domains, FOSTER achieves the most stable performance among the replay-based methods, while DER++ again experiences a substantial accuracy decrease across phases.

These results provide a complementary reference for evaluating ORACIL. Although replay-based methods can access historical samples, their accuracy still declines under long-term incremental learning, especially on fine-grained or diverse datasets. In contrast, the results in Figure 2 and Figure 3 show that ORACIL achieves competitive or superior final-phase performance without replaying raw historical images. This indicates that the proposed conflict-graph-based grouping, group-recognition routing, and analytic incremental updating mechanisms can effectively improve long-term stability and resistance to forgetting without relying on exemplar memory.

Table 3, Table 4 and Table 5 report the phase-wise accuracy and final average forgetting (

A F

) of different class-incremental learning methods on CIFAR-100, CUB200, and OmniBenchmark, respectively. The column “No Raw Replay” indicates whether a method avoids storing or replaying raw historical images during incremental learning. A check mark denotes raw-sample-free methods, while a cross mark denotes methods that rely on raw historical images, exemplars, or replay buffers. AF measures the final average forgetting rate, where a lower value indicates better retention of previously learned knowledge.

Across the three datasets, ORACIL consistently achieves the best final-phase accuracy and the lowest AF. On CIFAR-100, ORACIL reaches 95.77% accuracy at the final phase with an AF of only 0.16%, outperforming both raw-sample-free methods and replay-based baselines. Although several methods, such as CODA-Prompt, iCaRL, WA, BiC, and DER++, obtain higher accuracy in the initial phase, their performance gradually declines as more classes are introduced. In contrast, ORACIL improves after the early phases and remains highly stable throughout the later incremental stages.

On CUB200, which contains fine-grained categories with high visual similarity, most methods suffer from more evident performance degradation. Prompt-based methods such as L2P, CODA-Prompt, and DualPrompt drop substantially from the initial to the final phase, while replay-based methods such as iCaRL, WA, BiC, and MEMO also show noticeable forgetting. ORACIL achieves the highest final accuracy of 93.86% and the lowest AF of 0.77%, indicating that the proposed conflict-aware grouping strategy is effective in reducing interference among visually similar classes.

On OmniBenchmark, the performance gap becomes more pronounced due to the more diverse and cross-domain data distribution. Many baselines experience significant accuracy degradation, especially DER++, L2P, CODA-Prompt, DualPrompt, and several replay-based methods. ORACIL maintains a stable accuracy trend and achieves the best final accuracy of 88.12% with the lowest AF of 1.04%. Compared with GDDSG, which also avoids raw replay and shows relatively low forgetting, ORACIL achieves both lower AF and higher final-phase accuracy, demonstrating stronger long-term incremental learning ability.

Overall, Table 3, Table 4 and Table 5 show that ORACIL does not necessarily obtain the highest accuracy at the initial phase, but it consistently exhibits stronger stability as the incremental process proceeds. This behavior is expected because early phases provide limited class observations, making class relationship estimation and group assignment less reliable. As more classes are observed, the conflict graph, group-recognition router, and analytic updating mechanism become more effective, enabling ORACIL to reduce inter-class interference and preserve previously learned knowledge. These results demonstrate that ORACIL achieves a favorable balance between classification accuracy, forgetting resistance, and raw-sample-free incremental learning.

4.2.2. Robustness to Class Order

As shown in Figure 5, on the CIFAR-100 dataset, GDDSG achieves the lowest values on both metrics, indicating that it is almost unaffected by class order. ORACIL obtains a relatively low AOPD of 1.78, but its MOPD reaches 4.9, which is noticeably higher than that of the other methods. This suggests that its order sensitivity is mainly concentrated in a few specific phases rather than reflected as persistent fluctuations throughout the whole process. Both APD [25] and its variant APDfix show relatively high values on this dataset, while HALRP [26] lies in the middle range.

On the fine-grained CUB200 dataset, ORACIL achieves the lowest MOPD and AOPD, with values of 7.72 and 3.22, respectively. Both values are lower than those of GDDSG, indicating that ORACIL is more robust under both average and worst-case conditions. On the more challenging OmniBenchmark dataset, which exhibits larger distribution discrepancies, ORACIL again achieves the best performance, with an MOPD of 10.7 and an AOPD of 4.7. These values are lower than GDDSG’s 11.2 and 7.6, respectively, with a particularly notable reduction in the average disparity.

Overall, ORACIL significantly improves order robustness on CUB200 and OmniBenchmark. Although its MOPD on CIFAR-100 is relatively high due to a single-phase peak, its overall average disparity remains small.

As shown in Figure 6, the figure presents the average accuracy of ORACIL on the three datasets when trained under five different class arrival orders. It can be observed that the accuracy ranges on CIFAR-100, CUB200, and OmniBenchmark are 94.46–95.74%, 93.02–95.73%, and 87.25–91.57%, respectively. Across all three datasets, the accuracy variation under different orders is very small. The stable performance under multiple order settings indicates that ORACIL is robust to class order; that is, the overall model performance remains stable even when the class arrival order changes, without showing severe fluctuations. This robustness mainly benefits from the dynamic grouping mechanism. During the incremental process, it constructs a conflict graph and adjusts the class-group structure, so that newly arriving data are assigned to class groups with lower similarity, thereby achieving order robustness.

4.2.3. Ablation Results

Table 6 reports the ablation results of the three key designs in ORACIL, including the analytic head, the recursive update mechanism, and the dynamic grouping with the group-recognition router. The comparison reveals a clear progressive effect among these components. When only the analytic head is retained, the model fails to maintain discriminative knowledge across incremental phases. Although the analytic head provides a closed-form classifier, it is not sufficient by itself to handle continuously expanding class sets. This is reflected by the extremely low final accuracy and the very high final forgetting on all three datasets. Therefore, the analytic head should be viewed as the computational basis of ORACIL rather than a complete solution to class-incremental learning.

After introducing the recursive update mechanism, the performance improves substantially. This indicates that recursively updating the analytic solution with newly observed data is critical for preserving previously learned decision boundaries while incorporating new classes. Compared with the analytic-head-only variant, this setting greatly reduces final forgetting and raises the final accuracy across CIFAR-100, CUB200, and OmniBenchmark. These results confirm that the recursive update mechanism is the core factor that enables ORACIL to perform raw-sample-free incremental learning without repeatedly retraining the classifier from scratch.

The full model further incorporates dynamic grouping and the group-recognition router. This component does not merely add another classifier; rather, it changes how class interference is controlled during incremental learning. By assigning classes into low-conflict groups and routing test samples through soft group probabilities, ORACIL reduces the competition among visually similar or order-sensitive classes. As a result, the full model achieves the best final accuracy on all three datasets and further suppresses final forgetting. The improvement is especially meaningful because it shows that recursive updates alone can preserve knowledge, but dynamic grouping is needed to organize the expanding class space more effectively.

The AOPD and MOPD metrics provide additional insight into order robustness. Although the full model does not always produce the lowest AOPD or MOPD, it achieves a much better balance between final accuracy, forgetting, and class-order stability. In particular, the recursive-update-only variant can sometimes show lower order-difference values because its overall performance is lower and less variable, but this does not indicate stronger practical performance. The full ORACIL framework maintains high accuracy while keeping forgetting at a low level, demonstrating that the proposed components are complementary: the analytic head provides an efficient closed-form classifier, the recursive update enables continual adaptation without raw-sample replay, and the dynamic grouping router improves robustness by reducing inter-group interference.

4.2.4. Analysis of Class Group Counts

As is shown in Figure 7, the horizontal axis represents the incremental phase, and the vertical axis represents the number of class groups at that phase. It can be observed that different datasets exhibit different grouping trends as the incremental process proceeds. Among them, OmniBenchmark shows the most significant increase in the number of groups. This indirectly indicates that class differences in its cross-domain data are much larger and that the inter-class similarity relationships are more complex. In contrast, the increases on CIFAR-100 and CUB200 are more gradual, with the final numbers of groups reaching 12 and 13, respectively. This suggests that, on datasets with relatively higher similarity, the model is more inclined to merge newly arriving classes into existing groups, thereby maintaining a relatively small number of groups. Overall, these results verify that the proposed dynamic assignment strategy and conflict graph have good adaptability.

As shown in Figure 8, the confusion matrices at all phases exhibit a banded pattern. Most routing errors occur between groups with close indices, which may reflect local ambiguity in the learned prototype-distance representation rather than severe global misrouting. As the incremental process proceeds, the number of groups gradually increases from 4 to 12. During this process, the increase in error remains very small, suggesting that the routing features stay stable throughout training and can provide reliable sample assignment for subsequent within-group analytic classification. The average routing accuracy is approximately 97.05%, with a standard deviation of about 0.42, and the fluctuation between the maximum and minimum values is only 1.36%. Overall, the high accuracy of the router ensures that the multi-group structure does not become a performance bottleneck, and it also indirectly verifies the effectiveness of dynamic grouping and group-routing identification based on the conflict graph.

4.2.5. Analysis of Different Backbones

Table 7 presents the comparison results of ORACIL on three datasets under three different pre-trained backbones, namely ViT-B/16, ResNet-152, and ResNet-50, in terms of the number of groups

(χ)

, average accuracy

(A_{N})

, and average forgetting

(F_{N})

. It can be observed that different backbones have a certain impact on the overall performance, while the general trend remains consistent. Compared with the ResNet-152 and ResNet-50 pre-trained models, ViT-B/16 achieves better performance because the ViT model has stronger global feature modeling capability. This enables more accurate estimation of inter-class feature similarity, thereby improving the overall effectiveness of dynamic grouping and analytic updates. In summary, the results indicate that ORACIL can be combined with different pre-trained backbone networks, although its performance still depends on the quality of the extracted features. ViT-B/16 achieves the best overall performance, which suggests that stronger global feature representations are beneficial for estimating inter-class similarity and constructing more reliable conflict graphs. Nevertheless, the relatively low forgetting rates obtained with ResNet-152 and ResNet-50 show that the proposed dynamic grouping and analytic incremental updating mechanisms are not restricted to a specific backbone and can still maintain stable incremental learning behavior across different architectures.

4.2.6. Effect of Similarity Metrics on Conflict-Graph Construction

Table 8 reports the influence of different similarity metrics on conflict-graph construction. The results show that cosine similarity achieves the best overall trade-off among final accuracy, forgetting, and the number of generated groups. Compared with cosine similarity, Euclidean and Mahalanobis distances produce substantially more groups, especially on CIFAR-100 and OmniBenchmark. This indicates that distance-based metrics make the conflict graph denser and lead to more fragmented class partitions.

This phenomenon is closely related to the feature space used by ORACIL. The features extracted by the pre-trained backbone are normalized before analytic learning and routing; therefore, angular relationships between class prototypes are more stable than absolute distance magnitudes. Cosine similarity focuses on the directional relationship between class prototypes and is less affected by feature norms. In contrast, Euclidean distance is sensitive to feature scale, while Mahalanobis distance additionally depends on the estimation of intra-class variance. In high-dimensional incremental settings, such variance estimation can be unstable because each phase provides only limited samples for newly arriving classes. As a result, Euclidean and Mahalanobis distances may overestimate conflicts between classes and split them into too many groups.

Although finer grouping can reduce within-group class interference, excessive fragmentation increases the difficulty of group recognition and multi-head fusion. The group-recognition router must distinguish more groups, and prediction errors at the group level can propagate to the final class prediction. Therefore, the larger number of groups generated by Euclidean and Mahalanobis distances leads to lower accuracy and higher forgetting. These results justify the choice of cosine similarity as the default metric for conflict-graph construction in ORACIL.

We did not include a separately learned metric in the main comparison because a separately trained metric would require an extra supervised metric-learning stage, additional training objectives, and extra hyperparameter tuning. This would introduce a new optimization module beyond the current analytic and raw-sample-free framework. Therefore, this study focuses on representative non-cosine metrics, namely Euclidean and Mahalanobis distances, while learned metric construction is left as a possible direction for future work.

4.2.7. Sensitivity Analysis of Threshold Strategies in Conflict-Graph Construction

Table 9 presents the sensitivity analysis of different threshold strategies used in conflict-graph construction. The threshold strategy directly affects the density of the conflict graph and thus determines the granularity of the generated class groups. A stricter threshold tends to introduce more conflict edges and produces more groups, whereas a looser threshold merges more classes into the same group.

The proposed mean-similarity threshold achieves the most stable overall trade-off across the three datasets. It adaptively estimates the compactness of each class by averaging the cosine similarities between samples and their corresponding class prototype. Therefore, the threshold is determined by the intra-class distribution rather than by a manually selected global value. As shown in Table 9, mean similarity obtains competitive or superior final accuracy on all datasets while maintaining a moderate number of groups.

Although the fixed threshold of 0.8 achieves the highest accuracy on CIFAR-100, it relies on a manually specified hyperparameter and does not consistently outperform mean similarity on CUB200 and OmniBenchmark. This indicates that a fixed global threshold may be dataset-dependent. Median similarity provides a robust alternative against outliers, but it yields higher forgetting on CUB200 and OmniBenchmark, suggesting that ignoring the overall intra-class distribution may weaken the stability of conflict-graph construction.

The adaptive threshold generates more groups, especially on CIFAR-100, where the number of groups increases from 12 to 21. This indicates that the adaptive strategy makes the conflict graph denser and leads to over-fragmented class partitions. Excessive fragmentation increases the difficulty of group recognition and multi-head fusion, which can reduce final accuracy and increase forgetting.

Overall, these results show that ORACIL is reasonably robust to different threshold strategies, but the mean-similarity threshold provides the best balance among accuracy, forgetting, group compactness, and hyperparameter-free adaptability. Therefore, we use mean similarity as the default threshold strategy in ORACIL.

4.2.8. Final-Phase Runtime and Storage Overhead of GRR

Table 10 summarizes the final-phase runtime and storage overhead of the group-recognition router (GRR) on the three datasets. The detailed phase-wise results are provided in the Appendix A, where Table A1, Table A2 and Table A3 report the per-phase overheads for CIFAR-100, CUB200, and OmniBenchmark, respectively.

As shown in Table 10, the final number of groups varies across datasets, reflecting different levels of class similarity and grouping complexity. CIFAR-100 and CUB200 produce 12 and 13 final groups, respectively, while OmniBenchmark produces 40 final groups due to its more diverse cross-domain class distribution. The stored distance representations remain moderate in size, ranging from 7.19 MB on CUB200 to 102.65 MB on OmniBenchmark. It should be noted that these stored representations are prototype-distance vectors derived from extracted features rather than raw training images. Therefore, ORACIL avoids storing historical raw samples while still retaining sufficient group-discriminative information for GRR training. However, these representations still constitute an additional memory component for GRR training. Therefore, ORACIL should be regarded as raw-sample-free rather than strictly memory-free.

The GRR update time increases with the number of final groups and the amount of stored distance representations. OmniBenchmark requires the longest update time because it has the largest number of final groups and the largest stored representation set. However, this cost is incurred only during the update stage rather than during every inference step. In contrast, the analytic-head update time remains very small across all datasets, demonstrating the efficiency of the recursive analytic update mechanism. Moreover, the inference time remains at the millisecond level, indicating that the additional routing and multi-head fusion process does not introduce a significant inference bottleneck.

Overall, the results indicate that the overhead introduced by the current GRR is mainly concentrated in the phase-wise update stage, while inference remains lightweight under the reported task-wise incremental protocol. More detailed phase-wise overhead trends are reported in Table A1, Table A2 and Table A3 in the Appendix A. These results indicate that the main cost of the current GRR lies in phase-wise router updating, while inference remains lightweight.

5. Conclusions

This paper proposed ORACIL, an order-robust analytic class-incremental learning framework for raw-sample-free incremental recognition. The core idea is to reduce class-order sensitivity by constructing a conflict graph based on inter-class similarity and dynamically assigning newly arriving classes to low-similarity groups. Within each group, ORACIL trains an analytic classification head and updates it through closed-form recursive learning, thereby avoiding the replay of raw historical images during analytic-head updating. For group recognition, ORACIL uses feature-derived distance representations rather than raw historical images, and the group-recognition router provides soft group-level weights during inference. These weights are fused with the class scores from group-specific analytic heads to produce the final prediction. Experiments on CIFAR-100, CUB200, and OmniBenchmark show that ORACIL achieves strong final-stage accuracy, low average forgetting, and improved robustness to different class arrival orders. The results also indicate that ORACIL can be applied to different backbone architectures, although its performance still depends on the quality of the extracted features. Overall, ORACIL provides a favorable balance among accuracy, forgetting resistance, order robustness, and raw-sample-free incremental learning without replaying raw historical images. We emphasize that raw-sample-free learning should not be interpreted as formal privacy-preserving learning. ORACIL still retains class prototypes, analytic-head statistics, router parameters, and feature-derived distance representations, which may contain information about the training distribution and may be vulnerable to inversion, reconstruction, or membership inference attacks. Therefore, ORACIL reduces the dependence on raw-image replay but does not provide cryptographic privacy, differential privacy, or certified protection against information leakage. A limitation of the current framework is that the GRR is not recursively updated in the same way as the analytic heads. Since the router uses KNN, RF, and LGBM, it is updated in a phase-wise manner and requires accumulated feature-derived distance representations for training and validation-based fusion-weight selection. Future work will investigate privacy-enhanced extensions for retained feature-derived representations and more resource-aware GRR designs, such as bounded-memory router training, prototype-only routing, and online-compatible lightweight routers.

Author Contributions

Conceptualization, G.W., H.S. and Y.D.; methodology, G.W. and H.S.; software, G.W. and W.L.; validation, G.W., W.L., H.S. and Y.D.; formal analysis, G.W. and W.L.; investigation, G.W. and W.L.; resources, H.S. and Y.D.; data curation, G.W. and W.L.; writing—original draft preparation, G.W.; writing—review and editing, H.S., Y.D. and W.L.; visualization, G.W.; supervision, H.S. and Y.D.; project administration, H.S. and Y.D.; funding acquisition, H.S. and Y.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Development Project of the Department of Education of Jilin Province, grant number JJKH20250945KJ; and the Jilin Province Science and Technology Development Plan Project—Youth Growth Science and Technology Plan Project, grant number 20220508038RC.

Data Availability Statement

The datasets used in this study are publicly available. CIFAR-100 can be accessed from the official CIFAR repository at https://www.cs.toronto.edu/~kriz/cifar.html (accessed on 27 May 2026). The CUB-200-2011 dataset follows the version used by Zhou et al. [42] and is available at https://drive.google.com/file/d/1XbUpnWpJPnItt5zQ6sHJnsjPncnNLvWb/view?usp=sharing (accessed on 27 May 2026). The OmniBenchmark dataset also follows the version used by Zhou et al. [42] and is available at https://drive.google.com/file/d/1AbCP3zBMtv_TDXJypOCnOgX8hJmvJm3u/view?usp=sharing (accessed on 27 May 2026). The source code of ORACIL is publicly available at https://github.com/gjwang224/ORACIL (accessed on 1 July 2026).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Phase-wise Runtime and Storage Overhead of the Group-Recognition Router on CIFAR-100.

Dataset	Phase	Observed Classes	Groups	Distance Storage (MB)	GRR Update Time (s)	Analytic Head Update Time (s)	Router Inference Time (ms)	Analytic Fusion Time (ms)	Inference Time (ms)
CIFAR-100	0	10	3	1.91	3.58	2.85	0.18	0.00	0.19
CIFAR-100	1	20	3	3.81	7.26	0.24	0.18	0.00	0.19
CIFAR-100	2	30	4	5.72	12.48	0.24	0.18	0.00	0.19
CIFAR-100	3	40	5	7.63	18.67	0.24	0.20	0.00	0.21
CIFAR-100	4	50	6	9.54	25.65	0.25	0.23	0.00	0.23
CIFAR-100	5	60	7	11.44	31.16	0.23	0.24	0.00	0.24
CIFAR-100	6	70	8	13.35	40.60	0.25	0.24	0.00	0.24
CIFAR-100	7	80	11	15.26	53.99	0.27	0.26	0.01	0.27
CIFAR-100	8	90	11	17.17	63.21	0.26	0.27	0.01	0.27
CIFAR-100	9	100	12	19.07	72.54	0.25	0.28	0.01	0.29

Table A2. Phase-wise Runtime and Storage Overhead of the Group-Recognition Router on CUB200.

Dataset	Phase	Observed Classes	Groups	Distance Storage (MB)	GRR Update Time (s)	Analytic Head Update Time (s)	Router Inference Time (ms)	Analytic Fusion Time (ms)	Inference Time (ms)
CUB200	0	20	3	0.72	1.17	0.08	0.35	0.01	0.35
CUB200	1	40	5	1.45	2.36	0.13	0.59	0.01	0.60
CUB200	2	60	5	2.16	3.92	0.09	0.46	0.02	0.48
CUB200	3	80	6	2.89	7.80	0.11	0.35	0.02	0.37
CUB200	4	100	6	3.58	10.45	0.09	0.36	0.01	0.37
CUB200	5	120	7	4.31	12.76	0.09	0.32	0.02	0.34
CUB200	6	140	9	5.04	18.05	0.12	0.51	0.02	0.53
CUB200	7	160	11	5.74	21.93	0.12	0.43	0.04	0.47
CUB200	8	180	12	6.47	31.68	0.13	0.44	0.03	0.47
CUB200	9	200	13	7.19	37.95	0.11	0.51	0.03	0.5

Table A3. Phase-wise Runtime and Storage Overhead of the Group-Recognition Router on OmniBenchmark.

Dataset	Phase	Observed Classes	Groups	Distance Storage (MB)	GRR Update Time (s)	Analytic Head Update Time (s)	Router Inference Time (ms)	Analytic Fusion Time (ms)	Inference Time (ms)
OmniBenchmark	0	30	6	10.24	10.84	0.70	0.34	0.01	0.34
OmniBenchmark	1	60	7	20.51	24.61	0.50	0.32	0.01	0.33
OmniBenchmark	2	90	11	30.52	43.45	0.63	0.37	0.01	0.38
OmniBenchmark	3	120	14	40.95	85.68	0.53	0.44	0.01	0.46
OmniBenchmark	4	150	18	51.40	118.64	0.55	0.45	0.02	0.47
OmniBenchmark	5	180	23	61.55	163.34	0.56	0.49	0.02	0.51
OmniBenchmark	6	210	29	71.66	205.10	0.56	0.48	0.02	0.50
OmniBenchmark	7	240	36	82.31	263.77	0.66	0.51	0.03	0.54
OmniBenchmark	8	270	38	92.79	306.46	0.57	0.56	0.03	0.59
OmniBenchmark	9	300	40	102.65	336.31	0.50	0.57	0.03	0.60

Table A1, Table A2 and Table A3 report the phase-wise runtime and storage overheads of the group-recognition router (GRR) on CIFAR-100, CUB200, and OmniBenchmark, respectively. These tables are included to show how the GRR overhead scales as the number of observed classes and groups increases. These tables complement Table 10 by providing detailed overhead trends across all incremental phases.

As the number of observed classes increases, the storage required for distance representations grows steadily on all datasets. This is expected because more observed classes introduce more feature-derived prototype-distance representations for GRR training. Nevertheless, the storage overhead remains moderate, since ORACIL stores distance representations rather than historical raw images.

The GRR update time is the dominant update-stage cost and increases with the number of observed classes and groups. This trend is most evident on OmniBenchmark, where the final number of observed classes and groups is the largest. In contrast, the analytic-head update time remains consistently small across all phases, confirming the efficiency of the recursive analytic update mechanism.

During inference, the total inference time remains at the millisecond level on all datasets. The router inference time accounts for most of the inference cost, whereas the analytic fusion time is nearly negligible. These results indicate that the proposed GRR introduces manageable phase-wise update overhead while preserving efficient inference throughout the incremental learning process. Therefore, the phase-wise results in Table A1, Table A2 and Table A3 further support the final-phase summary reported in Table 10.

References

Goswami, D.; Liu, Y.; Twardowski, B.O.; van de Weijer, J. Fecam: Exploiting the heterogeneityof class distributions in exemplar-free continual learning. Adv. Neural Inf. Process. Syst. 2023, 36, 6582–6595. [Google Scholar] [CrossRef]
Scucchia, M.; Maltoni, D. Continual Learning of Regions for Efficient Robot Localization on Large Maps. In IEEE Transactions on Robotics; IEEE: New York, NY, USA, 2025. [Google Scholar] [CrossRef]
Wang, G.; Liu, Z.; Zhang, X.; Chen, Y.; Zhang, Y.; Zhu, J.; Jiao, L. PID: A Parameter-Efficient Isolation Domain-Incremental Learning Framework for Signal Modulation Classification. In IEEE Transactions on Neural Networks and Learning Systems; IEEE: New York, NY, USA, 2025. [Google Scholar] [CrossRef] [PubMed]
Li, Z.; Zhang, X.; Guo, Y.; Cai, S.; Lao, M. PENCIL: Prototype-ENhanced ComposItional Learning for Class-Incremental Hand Gesture Recognition. In IEEE Transactions on Consumer Electronics; IEEE: New York, NY, USA, 2025. [Google Scholar] [CrossRef]
Hasan, F.; Naeem, A.; Malik, H.; Naqvi, R.A.; Loh, W.-K. Blockchain-enabled federated learning with capsule network and incremental extreme learning machines for gastrointestinal bleeding detection in wireless capsule endoscopy. Eng. Appl. Artif. Intell. 2025, 162, 112745. [Google Scholar] [CrossRef]
Feng, J.; Qiu, H.; Zhao, L.; Gu, C.; Qin, H.; Xu, Y. Balancing Learning Plasticity and Memory Stability: A parameter space strategy for class-incremental learning. Neural Netw. 2025, 190, 107755. [Google Scholar] [CrossRef] [PubMed]
Belouadah, E.; Popescu, A.; Kanellos, I. A comprehensive study of class incremental learning algorithms for visual tasks. Neural Netw. 2021, 135, 38–54. [Google Scholar] [CrossRef] [PubMed]
Zhou, D.-W.; Wang, Q.-W.; Qi, Z.-H.; Ye, H.-J.; Zhan, D.-C.; Liu, Z.; Learning, C.-I. Class-Incremental Learning: A Survey. In IEEE Transactions on Pattern Analysis and Machine Intelligence; IEEE: New York, NY, USA, 2024; Volume 46, pp. 9851–9873. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Schiele, B.; Sun, Q. RMM: Reinforced memory management for class-incremental learning. In Proceedings of the 35th International Conference on Neural Information Processing Systems (NIPS ‘21); Curran Associates Inc.: Red Hook, NY, USA, 2021; pp. 3478–3490. [Google Scholar]
Delange, M.; Aljundi, R.; Masana, M.; Parisot, S.; Jia, X.; Leonardis, A.; Slabaugh, G.; Tuytelaars, T. A continual learning survey: Defying forgetting in classification tasks. IEEE Trans. Pattern Anal. Mach. Intell. 2021; in press.
Lin, S.; Ju, P.; Liang, Y.; Shroff, N. Theory on forgetting and generalization of continual learning. In International Conference on Machine Learning; PMLR: New York, NY, USA, 2023; pp. 21078–21100. [Google Scholar]
Wu, T.; Li, X.; Li, Y.-F.; Haffari, G.; Qi, G.; Zhu, Y.; Xu, G. Curriculummeta learning for order-robust continual relation extraction. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; pp. 10363–10369. [Google Scholar]
Dohare, S.; Hernandez-Garcia, J.F.; Lan, Q.; Rahman, P.; Mahmood, A.R.; Sutton, R.S. Loss of plasticity in deep continual learning. Nature 2024, 632, 768–774. [Google Scholar] [CrossRef] [PubMed]
Cha, H.; Lee, J.; Shin, J. Co2l: Contrastive continual learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 November 2021; pp. 9516–9525. [Google Scholar]
Riemer, M.; Cases, I.; Ajemian, R.; Liu, M.; Rish, I.; Tu, Y.; Tesauro, G. Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference. In Proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Wang, F.-Y.; Zhou, D.-W.; Ye, H.-J.; Zhan, D.-C. Foster: Feature boosting and compression for classincremental learning. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2022; pp. 398–414. [Google Scholar]
Shin, H.; Lee, J.K.; Kim, J.; Kim, J. Continual learning with deep generative replay. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17); Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 2994–3003. [Google Scholar]
Zhu, K.; Zhai, W.; Cao, Y.; Luo, J.; Zha, Z. Self-sustaining representation expansion for nonexemplar class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 21–24 June 2022; pp. 9296–9305. [Google Scholar]
Aljundi, R.; Babiloni, F.; Elhoseiny, M.; Rohrbach, M.; Tuytelaars, T. Memory aware synapses: Learning what (not) to forget. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 139–154. [Google Scholar]
Kirkpatrick, J.; Pascanu, R.; Rabinowitz, N.; Veness, J.; Desjardins, G.; Rusu, A.A.; Milan, K.; Quan, J.; Ramalho, T.; GrabskaBarwinska, A.; et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. USA 2017, 114, 3521–3526. [Google Scholar] [CrossRef] [PubMed]
Li, Z.; Hoiem, D. Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 2935–2947. [Google Scholar] [CrossRef] [PubMed]
Hurtado, J.; Raymond, A.; Soto, A. Optimizing reusable knowledge for continual learning via metalearning. Adv. Neural Inf. Process. Syst. 2021, 34, 14150–14162. [Google Scholar]
Yan, S.; Xie, J.; He, X. Der: Dynamically expandable representation for class incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 16–25 June 2021; pp. 3014–3023. [Google Scholar]
Zhou, D.-W.; Wang, Q.-W.; Ye, H.-J.; Zhan, D.-C. A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental Learning. In Proceedings of the 11th International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
Yoon, J.; Kim, S.; Yang, E.; Hwang, S.J. Scalable and Order-Robust Continual Learning with Additive Parameter Decomposition. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual, 26 April–1 May 2020. [Google Scholar]
Li, J.; Lai, Y.; Wang, R.; Shui, C.; Sahoo, S.; Ling, C.X.; Yang, S.; Wang, B.; Gagne, C.; Zhou, F. Hessian aware low-rank perturbation for order-robust continual learning. In IEEE Transactions on Knowledge and Data Engineering; IEEE: New York, NY, USA, 2024. [Google Scholar]
Guo, P.; Lyu, M.R. A pseudoinverse learning algorithm for feedforward neural networks with stacked generalization applications to software reliability growth data. Neurocomputing 2004, 56, 101–121. [Google Scholar] [CrossRef]
Park, J.; Sandberg, I.W. Universal approximation using radial-basis-function networks. Neural Comput. 1991, 3, 246–257. [Google Scholar] [CrossRef] [PubMed]
Toh, K.-A. Learning from the kernel and the range space. In Proceedings of the 17th 2018 IEEE Conference on Computer and Information Science; IEEE: New York, NY, USA, 2018; pp. 417–422. [Google Scholar]
Wang, X.; Zhang, T.; Wang, R. Noniterative deep learning: Incorporating restricted boltzmann machine into multilayer random weight neural networks. IEEE Trans. Syst. Man Cybern. Syst. 2019, 49, 1299–1308. [Google Scholar] [CrossRef]
Zhuang, H.; Lin, Z.; Toh, K.-A. Blockwise recursive Moore-Penrose inverse for network learning. In IEEE Transactions on Systems, Man, and Cybernetics: Systems; IEEE: New York, NY, USA, 2021; pp. 1–14. [Google Scholar]
Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images; Technical Report; Department of Computer Science, University of Toronto: Toronto, ON, Canada, 2009; Available online: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (accessed on 1 July 2026).
Wah, C.; Branson, S.; Welinder, P.; Perona, P.; Belongie, S. The Caltech-UCSD Birds-2002011 Dataset: CNS-TR-2011-001; California Institute of Technology: Pasadena, CA, USA, 2011. [Google Scholar]
Zhang, Y.; Yin, Z.; Shao, J.; Liu, Z. Benchmarking omni-vision representation through the lens of visual realms. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2022; pp. 594–611. [Google Scholar]
Sun, H.-L.; Zhou, D.-W.; Zhan, D.-C.; Ye, H.-J. Pilot: A pre-trained model-based continual learning toolbox. arXiv 2025, arXiv:2309.07117. [Google Scholar]
Zhou, D.W.; Wang, F.Y.; Ye, H.J.; Zhan, D.C. PyCIL: A Python toolbox for class-incremental learning. Sci. China Inf. Sci. 2023, 66, 197101. [Google Scholar] [CrossRef]
Wang, Z.; Zhang, Z.; Lee, C.-Y.; Zhang, H.; Sun, R.; Ren, X.; Su, G.; Perot, V.; Dy, J.; Pfister, T. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 139–149. [Google Scholar]
Smith, J.S.; Karlinsky, L.; Gutta, V.; Cascante-Bonilla, P.; Kim, D.; Arbelle, A.; Panda, R.; Feris, R.; Kira, Z. Coda-prompt: Continual decomposed attention-based prompting for rehearsal-free continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 11909–11919. [Google Scholar]
Wang, F.Y.; Zhou, D.W.; Ye, H.J.; Zhan, D.C. FOSTER: Feature Boosting and Compression for Class-Incremental Learning. In Computer Vision—ECCV 2022; ECCV 2022. Lecture Notes in Computer Science; Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T., Eds.; Springer: Cham, Switzerland, 2022; Volume 13685. [Google Scholar]
Wang, Z.; Zhang, Z.; Ebrahimi, S.; Sun, R.; Zhang, H.; Lee, C.-Y.; Ren, X.; Su, G.; Perot, V.; Dy, J.; et al. Dualprompt: Complementary prompting for rehearsal-free continual learning. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2022; pp. 631–648. [Google Scholar]
Rebuffi, S.-A.; Kolesnikov, A.; Sperl, G.; Lampert, C.H. iCaRL: Incremental Classifier and Representation Learning. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5533–5542. [Google Scholar] [CrossRef]
Zhou, D.-W.; Ye, H.-J.; Zhan, D.-C.; Liu, Z. Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need. Int. J. Comput. Vis. 2024, 133, 1012–1032. [Google Scholar] [CrossRef]
Zhao, B.; Xiao, X.; Gan, G.; Zhang, B.; Xia, S.-T. Maintaining Discrimination and Fairness in Class Incremental Learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 13205–13214. [Google Scholar] [CrossRef]
Wu, Y.; Chen, Y.; Wang, L.; Ye, Y.; Liu, Z.; Guo, Y.; Fu, Y. Large Scale Incremental Learning. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 374–382. [Google Scholar] [CrossRef]
Buzzega, P.; Boschini, M.; Porrello, A.; Abati, D.; Calderara, S. Dark experience for general continual learning: A strong, simple baseline. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS ‘20); Curran Associates Inc.: Red Hook, NY, USA, 2020; pp. 15920–15930. [Google Scholar]
Petit, G.; Popescu, A.; Schindler, H.; Picard, D.; Delezoide, B. FeTrIL: Feature Translation for Exemplar-Free Class-Incremental Learning. In Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2–7 January 2023; pp. 3900–3909. [Google Scholar] [CrossRef]
Lai, G.; Li, Y.; Wang, X.; Zhang, J.; Li, T.; Yang, X. Order-Robust Class Incremental Learning: Graph-Driven Dynamic Similarity Grouping. In Proceedings of the 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2025; pp. 4894–4904. [Google Scholar] [CrossRef]

Figure 1. The overall training pipeline of ORACIL.

Figure 2. (a) CIFAR-100. (b) CUB200. (c) OmniBenchmark. Performance comparison between ORACIL and other class-incremental learning methods on the CIFAR-100, CUB200, and OmniBenchmark datasets. The horizontal axis denotes the incremental phase, and the vertical axis denotes classification accuracy. The symbols √ and × indicate whether a method avoids storing or replaying raw historical images, respectively.

Figure 3. (a) CIFAR-100. (b) CUB200. (c) OmniBenchmark. Performance comparison between ORACIL and representative raw-sample-free class-incremental learning methods on the CIFAR-100, CUB200, and OmniBenchmark datasets. The horizontal axis denotes the incremental phase, and the vertical axis denotes classification accuracy. All compared methods avoid storing or replaying raw historical images. √ indicates that the method avoids storing or replaying raw historical images.

Figure 4. (a) CIFAR-100. (b) CUB200. (c) OmniBenchmark. Performance comparison among replay-based class-incremental learning methods on the CIFAR-100, CUB200, and OmniBenchmark datasets. The compared methods include FOSTER, MEMO, iCaRL, WA, BiC, and DER++. The horizontal axis denotes the incremental phase, and the vertical axis denotes classification accuracy. × indicates that it relies on raw historical images or exemplars.

Figure 5. Comparison of order robustness. Five methods are compared on three datasets. The blue bars denote the maximum order performance disparity, or MOPD, and the orange bars denote the average order performance disparity over the whole incremental process, or AOPD, both measured in percentage points.

Figure 6. Accuracy comparison of ORACIL on the CIFAR-100, CUB200, and OmniBenchmark datasets under five random class arrival orders, denoted as Order 1 to Order 5. Bars of different colors represent different class arrival orders, and the values above the bars indicate the final accuracy under the corresponding order.

Figure 7. Trends in the number of groups across incremental phases on different datasets. The horizontal axis denotes the incremental phase, and the vertical axis denotes the number of groups.

Figure 8. (a) 98% (b) 97.05% (c) 97.53% (d) 96.73% (e) 97.2% (f) 96.9% (g) 97.08% (h) 96.71% (i) 96.64% (j) 96.64%. Confusion matrices of the group-recognition router on the CIFAR-100 dataset as the incremental phase progresses. In each subfigure, rows denote predicted group IDs, columns denote ground-truth group IDs, the color scale indicates the number of samples, and the accuracy of that subfigure is reported below.

Table 1. Dataset splits and incremental settings used in the experiments.

Dataset	Total Classes	Total Phases	Base Phase	Incremental Phase
CIFAR-100	100	10	10 classes	10 classes/phase
CUB200	200	10	20 classes	20 classes/phase
OmniBenchmark	300	10	30 classes	30 classes/phase

Table 2. Selected fusion weights of the group-recognition router at the final incremental phase on different datasets.

Dataset	Validation Split	α for KNN	β for RF	γ for LGBM
CIFAR-100	80/20 stratified	0.71	0.01	0.28
CUB200	80/20 stratified	0.68	0.25	0.07
OmniBenchmark	80/20 stratified	0.5	0.5	0

Table 3. Comparison of phase-wise accuracy (%) and final average forgetting (AF,%) of different methods on the CIFAR-100 dataset. “No Raw Replay” indicates whether the method avoids storing or replaying raw historical images. √ indicates that the method does not store or replay raw historical images, whereas × indicates that the method relies on raw historical images or exemplars. ↓ indicates that a lower value is better.

Model	No Raw Replay	Phase: 0	1	2	3	4	5	6	7	8	9	AF↓
L2p	√	97.40	96.10	95.13	93.94	92.87	92.00	91.25	90.50	89.86	89.31	6.87
CODA-Prompt	√	99.40	97.92	96.88	95.80	94.82	93.75	92.87	92.15	91.57	91.02	5.49
FOSTER	×	98.80	97.80	96.79	95.61	94.63	93.94	93.41	92.74	92.07	91.61	1.63
DualPrompt	√	96.40	94.92	93.75	92.58	91.34	90.34	89.56	88.70	87.94	87.38	7.27
MEMO	×	98.50	96.85	95.69	94.32	92.97	91.62	90.62	89.42	88.14	86.97	21.31
GDDSG	√	96.39	95.24	95.30	94.82	94.69	92.71	94.05	93.36	92.23	93.29	0.72
SimpleCIL	√	93.80	92.95	92.33	91.47	90.51	89.54	88.81	87.95	87.19	86.52	6.61
iCaRL	×	99.20	98.02	97.04	96.19	95.17	94.18	93.29	92.10	90.69	89.37	22.37
WA	×	98.90	97.82	97.07	96.36	95.42	94.43	93.56	92.33	91.15	90.09	18.09
BiC	×	98.40	97.62	96.88	96.10	95.14	94.14	93.28	92.02	90.76	89.69	18.59
DER++	×	99.10	97.77	96.17	94.31	91.82	89.73	88.02	85.82	83.71	81.65	39.19
FeTrIL	√	98.70	97.07	95.96	94.73	93.50	92.44	91.61	90.75	89.87	89.18	11.57
ORACIL	√	93.90	93.00	96.27	95.83	96.42	95.93	96.01	95.71	95.81	95.77	0.16

Table 4. Comparison of phase-wise accuracy (%) and final average forgetting (AF,%) of different methods on the CUB200 dataset. “No Raw Replay” indicates whether the method avoids storing or replaying raw historical images. √ indicates that the method does not store or replay raw historical images, whereas × indicates that the method relies on raw historical images or exemplars. ↓ indicates that a lower value is better.

Model	No Raw Replay	Phase: 0	1	2	3	4	5	6	7	8	9	AF↓
L2p	√	98.79	92.45	89.46	87.39	84.99	82.70	81.02	79.59	78.44	77.28	13.55
CODA-Prompt	√	96.60	93.33	89.48	87.26	86.08	85.05	84.08	83.28	82.47	81.85	7.45
FOSTER	×	97.98	92.00	89.54	86.28	84.57	83.65	83.04	82.34	81.72	81.33	1.39
DualPrompt	√	98.79	94.04	91.44	88.90	86.84	84.98	83.49	82.10	80.84	79.67	10.75
MEMO	×	96.76	96.30	95.41	94.39	93.52	92.75	92.11	91.40	90.82	90.25	7.48
GDDSG	√	92.30	91.72	91.22	93.01	93.28	93.12	91.75	91.36	92.01	92.55	1.01
SimpleCIL	√	97.57	96.47	94.87	93.35	92.45	91.30	90.21	89.27	88.46	87.65	7.66
iCaRL	×	98.38	96.98	95.50	94.21	93.30	92.46	91.72	90.92	90.36	89.76	11.07
WA	×	97.57	95.59	94.03	92.61	91.46	90.44	89.39	88.58	87.90	87.15	11.18
BiC	×	96.36	95.91	93.95	92.39	91.25	90.32	89.32	88.53	87.89	87.27	9.39
DER++	×	97.17	94.07	89.69	85.88	83.65	80.96	79.27	77.16	74.95	72.52	50.55
FeTrIL	√	95.95	94.01	92.43	90.95	90.03	89.12	88.38	87.70	87.19	86.75	4.34
ORACIL	√	89.88	92.01	92.71	93.68	93.58	94.25	92.66	93.79	93.81	93.86	0.77

Table 5. Comparison of phase-wise accuracy (%) and final average forgetting (AF,%) of different methods on the OmniBenchmark dataset. “No Raw Replay” indicates whether the method avoids storing or replaying raw historical images. √ indicates that the method does not store or replay raw historical images, whereas × indicates that the method relies on raw historical images or exemplars. ↓ indicates that a lower value is better.

Model	No Raw Replay	Phase: 0	1	2	3	4	5	6	7	8	9	AF↓
L2p	√	89.50	85.82	83.33	81.01	79.35	77.59	76.24	74.95	73.70	72.60	17.35
CODA-Prompt	√	90.13	87.04	79.26	75.38	73.84	73.30	72.86	72.22	71.57	70.98	13.67
FOSTER	×	93.67	93.16	91.44	90.19	88.87	87.75	86.80	85.85	84.89	84.16	4.39
DualPrompt	√	88.67	85.41	83.41	81.67	80.15	78.64	77.18	75.67	74.35	73.27	15.64
MEMO	×	93.17	90.99	89.49	87.67	85.82	84.09	82.58	80.98	79.61	78.40	26.38
GDDSG	√	85.66	87.32	88.43	87.97	87.47	87.08	85.77	85.21	85.92	86.78	1.07
SimpleCIL	√	88.17	87.08	85.64	84.09	82.73	81.37	80.21	79.11	78.13	77.33	8.85
iCaRL	×	92.50	92.49	90.56	87.86	85.65	83.55	81.85	79.88	78.11	76.36	37.13
WA	×	94.50	92.99	90.80	88.09	85.63	83.32	81.31	79.16	77.23	75.44	37.68
BiC	×	93.17	91.45	89.35	87.25	85.52	83.88	82.56	81.10	79.88	78.80	24.73
DER++	×	95.00	91.58	86.89	82.23	77.91	74.16	70.9	67.30	64.61	61.91	63.91
FeTrIL	√	94.33	91.24	88.88	86.96	85.26	83.76	82.43	81.23	80.18	79.29	14.74
ORACIL	√	84.67	88.41	89.82	88.60	89.28	88.45	87.25	86.63	87.41	88.12	1.04

Table 6. Ablation study of analytic heads, recursive updates, and the dynamic grouping router in ORACIL. √ indicates that the corresponding module is selected or enabled.

Dataset	Analytic Head	Recursive Update	Dynamic Grouping+Router	$A_{N}$	$F_{N}$	AOPD	MOPD
CIFAR-100	√			9.75	97.49	1.03	3.2
CIFAR-100	√	√		87.96	4.46	2.21	3.50
CIFAR-100	√	√	√	95.77	0.16	1.78	4.9
CUB200	√			9.81	97.44	1.4	4.5
CUB200	√	√		88.25	4.39	3.78	7.57
CUB200	√	√	√	93.86	0.77	3.22	7.72
OmniBenchmark	√			9.44	94.37	1.46	4.82
OmniBenchmark	√	√		76.97	7.82	4.36	8.09
OmniBenchmark	√	√	√	88.12	1.04	4.7	10.7

Table 7. Performance of ORACIL under different pre-trained models and different datasets. ↑ indicates that a higher value is better, ↓ whereas indicates that a lower value is better.

Dataset	Metric		Backbone
Dataset	Metric	ViT-B/16	ResNet-152	ResNet-50
	χ (↓)	12	23	43
CIFAR-100	$A_{N}$ (↑)	95.77%	89.02%	83.61%
	$F_{N}$ (↓)	0.16%	0.44%	0.73%
	χ (↓)	13	29	39
CUB200	$A_{N}$ (↑)	93.86%	74.12%	73.74%
	$F_{N}$ (↓)	0.77%	1.37%	1.89%
	χ (↓)	40	39	39
OmniBenchmark	$A_{N}$ (↑)	88.12%	77.78%	78.75%
	$F_{N}$ (↓)	1.04%	1.37%	2.67%

Table 8. Effect of Similarity Metrics on Conflict-Graph Construction.

Dataset	Group Metric	$A_{N}$	$F_{N}$	Number of Groups
CIFAR-100	Cosine similarity	95.77	0.16	12
CIFAR-100	Euclidean distance	94.39	0.48	64
CIFAR-100	Mahalanobis distance	93.90	0.77	70
CUB200	Cosine similarity	93.86	0.77	13
CUB200	Euclidean distance	93.46	0.84	19
CUB200	Mahalanobis distance	93.21	1.28	15
OmniBenchmark	Cosine similarity	88.12	1.04	40
OmniBenchmark	Euclidean distance	86.71	1.44	84
OmniBenchmark	Mahalanobis distance	86.28	1.35	67

Table 9. Sensitivity Analysis of Threshold Strategies in Conflict-Graph Construction.

Dataset	Threshold Strategy	$A_{N}$	$F_{N}$	Number of Groups
CIFAR-100	Mean similarity	95.77	0.16	12
CIFAR-100	Median similarity	95.83	0.27	11
CIFAR-100	Fixed threshold 0.8	97.05	0.25	10
CIFAR-100	Adaptive threshold	94.48	0.48	21
CUB200	Mean similarity	93.86	0.77	13
CUB200	Median similarity	93.27	0.89	12
CUB200	Fixed threshold 0.8	93.15	0.82	11
CUB200	Adaptive threshold	93.35	0.55	16
OmniBenchmark	Mean similarity	88.12	1.04	40
OmniBenchmark	Median similarigy	88.11	1.37	40
OmniBenchmark	Fixed threshold 0.8	88.01	1.07	36
OmniBenchmark	Adaptive threshold	87.80	1.31	42

Table 10. Final-Phase Runtime and Storage Overhead of the Group-Recognition Router.

Dataset	Final Groups	Stored Distance Representations (MB)	GRR Update Time (s)	Analytic Head Update Time (s)	Inference Time per Image (ms)
CIFAR-100	12	19.07	72.54	0.25	0.29
CUB200	13	7.19	37.95	0.11	0.53
OmniBenchmark	40	102.65	336.31	0.50	0.60

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, G.; Sun, H.; Li, W.; Dong, Y. ORACIL: Conflict-Graph-Based Order-Robust Analytic Class-Incremental Learning. Electronics 2026, 15, 2941. https://doi.org/10.3390/electronics15132941

AMA Style

Wang G, Sun H, Li W, Dong Y. ORACIL: Conflict-Graph-Based Order-Robust Analytic Class-Incremental Learning. Electronics. 2026; 15(13):2941. https://doi.org/10.3390/electronics15132941

Chicago/Turabian Style

Wang, Guanjie, Hongyu Sun, Wanjia Li, and Yanhua Dong. 2026. "ORACIL: Conflict-Graph-Based Order-Robust Analytic Class-Incremental Learning" Electronics 15, no. 13: 2941. https://doi.org/10.3390/electronics15132941

APA Style

Wang, G., Sun, H., Li, W., & Dong, Y. (2026). ORACIL: Conflict-Graph-Based Order-Robust Analytic Class-Incremental Learning. Electronics, 15(13), 2941. https://doi.org/10.3390/electronics15132941

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

ORACIL: Conflict-Graph-Based Order-Robust Analytic Class-Incremental Learning

Abstract

1. Introduction

2. Related Work

2.1. Research Progress in Class-Incremental Learning

2.2. Research Progress in Class-Incremental Learning from the Perspective of Class Order

2.3. Research Progress in Analytic Learning

3. Method

3.1. Problem Setting

3.2. Dynamic Grouping Based on the Conflict Graph

3.3. Group-Recognition Router (GRR)

3.4. Incremental Analytic Heads and Weight Update

3.5. Inference

4. Experimental Results

4.1. Experimental Setup

4.1.1. Datasets

4.1.2. Compared Baselines

4.1.3. Training Details and Metrics

4.2. Performance Evaluation

4.2.1. Results of ORACIL

4.2.2. Robustness to Class Order

4.2.3. Ablation Results

4.2.4. Analysis of Class Group Counts

4.2.5. Analysis of Different Backbones

4.2.6. Effect of Similarity Metrics on Conflict-Graph Construction

4.2.7. Sensitivity Analysis of Threshold Strategies in Conflict-Graph Construction

4.2.8. Final-Phase Runtime and Storage Overhead of GRR

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI