A Gumbel-Based Selection Data-Driven Evolutionary Algorithm and Its Application to Chinese Text-Based Cheating Official Accounts Mining

Yuan, Jiheng; Li, Jian-Yu

doi:10.3390/a18100643

Open AccessArticle

A Gumbel-Based Selection Data-Driven Evolutionary Algorithm and Its Application to Chinese Text-Based Cheating Official Accounts Mining

by

Jiheng Yuan

¹ and

Jian-Yu Li

^2,*

¹

School of Culture and Tourism, Kaifeng University, Kaifeng 475003, China

²

College of Artificial Intelligence, Nankai University, Tianjin 300350, China

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(10), 643; https://doi.org/10.3390/a18100643

Submission received: 1 September 2025 / Revised: 4 October 2025 / Accepted: 10 October 2025 / Published: 12 October 2025

(This article belongs to the Special Issue Evolutionary and Swarm Computing for Emerging Applications)

Download

Browse Figures

Versions Notes

Abstract

Data-driven evolutionary algorithms (DDEAs) are essential computational intelligent methods for solving expensive optimization problems (EOPs). The management of surrogate models for fitness predictions, particularly the selection and integration of multiple models, is key to their success. However, how to select and integrate models to obtain accurate predictions remains a challenging issue. This paper proposes a novel Gumbel-based selection DDEA named GBS-DDEA, which innovates in both aspects of model selection and integration. First, a Gumbel-based selection (GBS) strategy is proposed to probabilistically choose surrogate models. GBS employs the Gumbel-based distribution to strike a balance between exploiting high-accuracy models and exploring others, providing a more principled and robust selection strategy than conventional probability sampling. Second, a ranking-based weighting ensemble (RBWE) strategy is developed. Instead of relying on absolute error metrics that can be sensitive to outliers, RBWE assigns integration weights based on the models’ relative performance rankings, leading to a more stable and reliable ensemble prediction. Comprehensive experiments on various benchmark problems and a Chinese text-based cheating official accounts mining problem demonstrate that GBS-DDEA consistently outperforms several state-of-the-art DDEAs, confirming the effectiveness and superiority of the proposed dual-strategy approach.

Keywords:

computational intelligence; evolutionary computation; evolutionary algorithm; surrogate; expensive optimization; Gumbel-based selection; ranking-based weighting ensemble

1. Introduction

In recent decades, data-driven evolutionary algorithms (DDEAs), as a significant subfield of evolutionary computation (EC), have shown remarkable effectiveness in tackling complex optimization problems through the use of surrogate models [1,2,3]. Conventional evolutionary algorithms (EAs) rely heavily on fitness evaluations (FEs) to identify promising solutions. However, their efficiency deteriorates drastically when FEs are scarce or computationally expensive, which is typical in many real-world applications. Such problems are commonly referred to as expensive optimization problems (EOPs), where each FE may require substantial computational or monetary costs [4,5,6]. To alleviate this limitation, DDEAs construct surrogate models trained from previously evaluated solutions to approximate FEs, thereby facilitating efficient search under restricted evaluation budgets. In scenarios constrained by time, computational resources, or data accessibility, obtaining additional FEs during optimization is often impractical or even infeasible [7,8,9,10]. In such cases, offline DDEAs are particularly appealing, as they rely solely on existing data to build surrogates and conduct the entire optimization process through surrogate-based evaluations. Thus, DDEAs offer a cost-effective and practical alternative to traditional EAs for solving EOPs. Nevertheless, their effectiveness is fundamentally dependent on the accurate design and exploitation of surrogate models, which remains a central challenge in their further advancement.

To date, efforts to advance DDEAs have largely progressed along two major directions. The first seeks to improve the quality and availability of data, as both are crucial for the accuracy and reliability of surrogate models. For instance, data noise can severely degrade surrogate performance, making preprocessing a necessary step [7]. When dealing with complex data structures, advanced learning techniques can reveal latent patterns and enhance model robustness. A persistent challenge in DDEAs is the small size of the evaluated dataset, as larger datasets generally yield more accurate and generalizable surrogates [11,12]. Consequently, extensive research has focused on expanding data availability and optimizing the use of limited data. Representative strategies include local smoothing methods and synthetic data generation techniques [13,14], which aim to augment datasets and enhance the learning process.

The second direction emphasizes the construction of more accurate surrogate models and the development of effective model management strategies (MMSs) to regulate their use. A variety of surrogate modeling techniques have been investigated, including Gaussian process [15], decision trees [16], and radial basis function neural networks (RBFNNs) [17]. Ensemble learning has also proven effective, as aggregating multiple base models often enhances predictive performance [18,19,20]. At the same time, MMSs play a pivotal role in data-driven evolutionary optimization (DDEO), particularly under data-limited conditions. Such strategies dynamically manage model usage during optimization, such as sample selection [20,21] and knowledge transfer across related tasks [22,23,24]. Together, these approaches aim to maximize data utility, reduce overfitting, and improve the optimization performance of DDEAs under constrained evaluation budgets.

Despite substantial progress, achieving an effective balance between prediction accuracy and generalization remains a central challenge in the design of DDEAs, which limits their practical applicability to EOPs. Selecting and integrating multiple models to enhance prediction accuracy and generalization ability remains a significant and challenging issue. To address this issue, this paper introduces a Gumbel-based selection DDEA (GBS-DDEA) that improves the robustness and efficiency of surrogate-assisted optimization. The proposed framework integrates two key strategies. First, a Gumbel-based selection (GBS) strategy is developed to probabilistically select promising surrogate models based on the Gumbel distribution, thereby reducing the risk of overfitting inherent in deterministic or greedy selection methods. Second, a ranking-based weighting ensemble (RBWE) method is designed to aggregate the selected models, where the ensemble weights are determined by relative performance rankings rather than absolute error values, leading to more stable and reliable fitness approximations.

The main contributions of this work are summarized as follows:

First, in model selection, the GBS strategy is proposed, which introduces the Gumbel distribution into accuracy scores to probabilistically select models. This provides a trade-off between surrogate accuracy and generalization capability.

Second, in model integration, the RBWE strategy is proposed, in which model weights are assigned according to relative ranking performance rather than absolute error values. This design enhances robustness against noisy or biased evaluations, improving stability in fitness approximation.

Third, in algorithm design, by integrating GBS and RBWE, a new algorithm, GBS-DDEA, is developed to tackle EOPs under strict evaluation budgets. The proposed method offers a principled and effective approach to surrogate management, improving optimization efficiency in complex problem scenarios.

Comprehensive experiments are conducted on well-established benchmark problems with varying dimensionalities. The results show that the proposed GBS-DDEA significantly outperforms cutting-edge DDEAs across different scenarios. This confirms the effectiveness and robustness of GBS-DDEA in handling EOPs.

The remainder of this paper is organized as follows: Section 2 provides a brief overview of related work on DDEAs. Section 3 details the proposed GBS-DDEA algorithm, including its core components (the GBS strategy and RBWE strategy) and overall framework. Section 4 presents the experimental setup, benchmark problems, and comparative results analysis. Finally, Section 5 concludes the paper and discusses potential future research directions.

2. Background and Related Work of DDEA

2.1. DDEA

In general, the fundamental principle of DDEAs is to exploit available data to reduce the number of required FEs by employing surrogate models, thereby guiding the evolutionary search more efficiently [25]. By constructing accurate surrogates from previously evaluated solutions, DDEAs can substitute costly real evaluations with surrogate predictions, substantially lowering computational and financial overhead. Consequently, DDEAs typically achieve superior performance over conventional EAs in solving expensive and computationally intensive optimization problems. However, the effectiveness of DDEAs heavily depends on striking a balance between surrogate accuracy and generalization ability. That is, overly accurate but narrowly focused surrogates may lead to overfitting, whereas overly generalized surrogates risk providing misleading fitness estimates. To address this challenge, this paper introduces a GBS-DDEA, which provides a principled framework to enhance both predictive accuracy and generalization in surrogate-assisted evolutionary optimization.

From an architectural standpoint, a typical DDEA framework is composed of two core modules: the Surrogate Model Management (SMM) module and the Data-Driven Evolutionary Optimization (DDEO) module. The SMM is dedicated to building, maintaining, and adaptively updating surrogate models via MMSs to provide reliable approximations of the true objective function. Meanwhile, the DDEO leverages these surrogates within the evolutionary process to guide solution generation, evaluation, and selection [19]. Importantly, the SMM can iteratively refine the surrogate models by incorporating newly collected information and feedback from the DDEO, thereby improving their accuracy and adaptability over time.

Depending on whether new real FEs can be performed during optimization, DDEAs are generally categorized into online and offline variants. In online DDEAs, a restricted number of real FEs are allowed throughout the evolutionary process. The information and knowledge from these evaluations can be exploited to progressively enhance surrogate accuracy, making online approaches particularly effective under limited evaluation budgets. By contrast, offline DDEAs construct surrogates entirely from pre-existing data and do not permit any additional real evaluations during the search. This offline paradigm is typically applied in scenarios where evaluations are prohibitively expensive or practically impossible to obtain.

Although online and offline DDEAs differ in their operational mechanisms, they share the same overarching goal: to alleviate dependence on costly evaluations by fully exploiting available data to effectively guide the optimization process.

2.2. Related Work on Enhanced DDEAs

Numerous strategies have been proposed to enhance DDEAs [4]. Given this work‘s focus on offline data-driven optimization scenarios, the following review concentrates on relevant offline DDEA literature, highlighting their differences and drawbacks.

Offline DDEAs are restricted to a static, pre-collected dataset and cannot perform new FEs during optimization. This constraint makes their performance critically dependent on the efficient exploitation of the available information. Both the volume and the fidelity of this static dataset are paramount, as they directly govern surrogate model accuracy and, ultimately, the success of the optimization [4]. As a result, the inherent limitations of a fixed dataset can significantly bias results. for instance, Lin et al. [26] demonstrated that elevated noise levels in the data considerably degrade DDEAs‘ performance. To mitigate such data quality issues, including noise, incompleteness, and class imbalance, specialized preprocessing techniques are often investigated and employed. For example, the application of the local regression method to denoise data has been conducted in the context of blast furnace optimization [8].

To reduce the computational overhead associated with large-scale or redundant datasets, techniques such as clustering and data mining are frequently employed. The work by Wang et al. [9] in trauma system design utilized clustering to achieve a 90% reduction in computational cost. Conversely, in data-scarce scenarios where building accurate surrogate models is challenging, synthetic data generation presents a viable alternative. For instance, Li et al. [14] developed a localized generation method that enhanced surrogate accuracy with promising outcomes. Data perturbation strategies [13] similarly serve to create multiple diverse and useful data subsets. In addition, Huang et al. [20] propose a TT-DDEA framework, which leverages semi-supervised learning to augment the training dataset. A critical caveat, however, is that synthetic data can introduce inaccuracies if the generative model is not precisely aligned with the underlying problem landscape [14].

Beyond data-centric methods, advancements in surrogate model accuracy have been driven by sophisticated learning techniques. A prominent direction is ensemble learning, which integrates multiple base models, either homogeneous or heterogeneous, to enhance predictive robustness. For example, the DDEA-SE framework [19] employs a bagging strategy, constructing thousands of radial basis function neural networks (RBFNNs) and selectively aggregating hundreds for fitness approximation. Conversely, BDDEA [14] utilizes a boosting approach to train and integrate models sequentially, thereby incrementally refining predictive accuracy. To enhance adaptability, a dynamic ensemble method proposed in [27] selects the most reliable surrogate model based on the immediate optimization context. Other strategies leverage diversity in base learners. The SRK-DDEA algorithm [11], for instance, integrates multiple RBFNNs with distinct radial basis functions and employs surrogate-assisted sorting to rank solutions. These efforts highlight the crucial role of ensemble design and learner diversity in enhancing surrogate performance and overall optimization efficacy.

In addition, other innovative paradigms are emerging. Huang et al. [28] developed CL-DDEA, which fuses contrastive learning with ensemble techniques to build surrogates that more effectively capture inter-individual relationships. Similarly, Hao et al. [29] investigated a relation-based model to circumvent expensive function evaluations, while Sun et al. [30] explored the application of symbolic regression for surrogate modeling in DDEO.

In contrast to the aforementioned techniques, the proposed GBS-DDEA introduces a novel GBS and an RBWE for model selection and combination. This approach is designed to achieve a superior balance between predictive accuracy and generalization capability.

3. The Proposed GBS-DDEA

3.1. The Framework of GBS-DDEA

Figure 1 presents the overall framework of GBS-DDEA, which consists of two key components: the DDEO process and the GBS-based evaluation module. The DDEO process follows a conventional EC workflow. The EC workflow includes initialization operations, variation operations, FE operations, and selection operations. As a result, GBS-DDEA remains broadly compatible with diverse EC algorithms and can readily incorporate various optimizers. The major distinction from standard EC frameworks lies in its evaluation process: instead of relying on computationally expensive FEs, GBS-DDEA employs a Gumbel selection-based ensemble model to predict fitness values. This surrogate strategy is designed to improve predictive reliability and provide more effective guidance for the evolutionary search, thereby reducing the dependence on costly FEs.

The main novelty of GBS-DDEA resides in its evaluation module, which integrates GBS and RBWE strategies. This module first trains multiple surrogate models on the available evaluated data and then estimates their prediction errors. These errors are converted into selection probabilities, which guide the stochastic selection of models for ensemble prediction. Unlike conventional ensemble approaches that typically adopt greedy or deterministic rules, the proposed GBS strategy introduces controlled randomness into model selection. This randomness enhances robustness and mitigates overfitting, thereby strengthening the overall performance of the surrogate. Supported by this surrogate-driven evaluation, the DDEO process iteratively evolves the population until the termination criterion is met, at which point the final solution(s) are produced. The following sections provide a detailed description of model training, the GBS strategy, and the RBWE strategy.

3.2. Model Training

As ensemble-based approaches have shown strong potential in addressing EOPs, the proposed GBS-DDEA employs a multiple-model ensemble for fitness prediction. The key new component distinguishing GBS-DDEA from other ensemble-based DDEAs lies in its GBS strategy. Instead of relying on deterministic or greedy rules, this strategy assigns higher selection probabilities to models with lower prediction errors, thereby increasing both predictive accuracy and generalization capability.

To implement this, a total of R surrogate models is constructed during the initialization phase of GBS-DDEA. The procedure is outlined in Algorithm 1. Specifically, for each surrogate model, a training subset is generated through randomized sampling of the original evaluated dataset. For each data point, a random value is drawn from a uniform distribution over [0, 1]. If the sampled value is less than or equal to 0.5, the data point is included in the training subset of the corresponding model.

Considering the trade-off between computational efficiency and predictive capability, RBFNNs are adopted as the base learners. Each training subset is used to train an RBFNN, resulting in R distinct prediction models for subsequent ensemble use. Upon completion of training, the predictive quality of each model is assessed by computing its mean absolute error (MAE) over the entire original dataset. For model i, the corresponding MAE, denoted as a_i, is given by:

a_{i} = \frac{1}{| Q |} \sum_{x \in Q} |f (x) - U_{i} (x)|

(1)

where Q denotes the original dataset, f(x) represents the true fitness of solution x, and U_i(x) is the fitness predicted by model i.

Note that the exclusive use of RBFNNs in the ensemble can influence both the diversity and generalization of the surrogate model pool. Since all base learners are of the same type, this homogeneity can limit the diversity among models, which is typically beneficial for ensemble robustness. However, the proposed GBS-DDEA compensates for potential diversity issues through randomized sampling during training and the probabilistic model selection via the Gumbel-based strategy, promoting varied model choices over iterations. Furthermore, using RBFNNs, known for their strong approximation capabilities, helps enhance the overall prediction accuracy and generalization, especially when trained on different data subsets. The ensemble strategy of weighting models based on relative rankings rather than absolute errors also bolsters stability and robustness, ensuring that the combined prediction benefits from the strengths of individual RBFNNs despite their homogeneity. In summary, while relying solely on RBFNNs might reduce intrinsic diversity, the random training data sampling, probabilistic model selection, and ranking-based weighting collectively mitigate this effect, enabling the ensemble to maintain good generalization and predictive performance.

Algorithm 1: Model Training

Input:   Q—the dataset for training the model;
              R—the number of initial models.
Output: U₁, U₂, …, U_R-the R trained models;
1: Begin
2:       For i = 1 to R Do
3:              Initialize an empty set E;
4:              For every x in Q Do
5:                  Sample r in [0, 1];
6:                  If the sampled r > 0.5 Then
7:                          Add x into E;
8:                  End If
9:              End For
10:            Train U_i with E;
11:            Calculate the error of U_i on Q as a_i;
12: End For
13: End

3.3. GBS

The GBS strategy is introduced to identify a subset of models that achieves both high accuracy and sufficient diversity in ensemble prediction. The selection of the GBS strategy in the research is primarily motivated by its ability to balance exploration and exploitation in surrogate model selection, which is crucial for improving surrogate-assisted optimization, especially under limited evaluation budgets. The pseudo-code of GBS is shown in Algorithm 2. First, unlike deterministic or greedy methods that may prematurely favor certain models, GBS introduces a probabilistic mechanism rooted in the Gumbel distribution. This allows for a controlled random exploration of models based on their estimated prediction errors, thus enhancing robustness and reducing the risk of overfitting. Second, GBS explicitly considers the models’ prediction errors, assigning selection probabilities that favor more accurate models but still retain a non-zero chance for less accurate ones. This balance ensures that promising models are leveraged while maintaining diversity among models, which is essential for generalization in surrogate modeling. In summary, Gumbel-based Selection was chosen for its principled, probabilistic approach to model selection, which effectively balances accuracy and diversity in surrogate-based evolutionary optimization.

The GBS jointly considers two essential aspects: the reliability of each individual model and the diversity of the selected models. Model reliability is assessed by its prediction error e_i, which provides a direct measure of accuracy. To promote diversity, GBS employs a probabilistic sampling scheme that biases the selection process toward models with smaller errors, while still granting models with larger errors a nonzero probability of inclusion. In this way, the method achieves a balance between exploiting accurate predictors and exploring diverse alternatives. The probability of selecting model i, denoted as p_i, is given by

p_{i} = v_{i} + G u m b e l (0, 1)

(2)

where Gumbel(0,1) is a random variable following the standard Gumbel distribution. v_i is the normalized value based on the prediction error, which can be calculated as:

v_{i} = - \log a_{i}

(3)

Consequently, a larger prediction error a_i results in a smaller selection probability v_i, thereby favoring models with higher accuracy.

Based on p_i, the L models with larger p_i are selected for ensemble prediction. Furthermore, by incorporating the Gumbel distribution into the sampling process, the probability p_i ensures that accurate models are more likely to be chosen, while less accurate models still retain a nonzero chance of selection. This mechanism strikes a balance between exploiting reliable predictors and maintaining the generalization ability of the ensemble.

Algorithm 2: GBS

Input:    U₁, U₂, …, U_R—the trained models;
               a₁, a₂, …, a_R—the model prediction errors;
              L—the number of selected models;
Output:     I—the set containing the selected model index;
1: Begin
2:        For j = 1 to R Do
3:              Use Equation (2) to calculate p_j;
4:        End For
5:        Select L models with larger p_i and store their indexes in I;
6: End

3.4. RBWE

The RBWE method integrates the predictions of the selected models to generate an ensemble output, while explicitly accounting for the accuracy of each individual model via ranking. For a given input x and a selected subset of models I, the ensemble prediction H(x) is formulated as

H (x) = \sum_{i \in I} w_{i} \cdot U_{i} (x)

(4)

where the w_i can be obtained with the ranking weights t_i, as:

w_{i} = \frac{t_{i}}{\sum_{j \in I} t_{j}}

(5)

where the t_i is computed as:

t_{i} = L - r a n k (i | I) + 1

(6)

where rank(i|I) is the ranking of model i in I based on their prediction error a_i. The smaller the a_i, the smaller the rank(i|I). It is important to note that only the ranking value t_i values associated with the models in I are utilized in the computation of w. This weighting scheme ensures that models with higher confidence have a greater influence on the ensemble prediction, thereby enhancing both the accuracy and the robustness of the overall outcome.

3.5. The Entire GBS-DDEA

This section presents the complete GBS-DDEA, with its pseudocode outlined in Algorithm 3. Similar to conventional DDEAs, the GBS-DDEA consists of several key stages. The process begins with initialization, which includes both population generation and surrogate model construction. Specifically, R surrogate models are built from the initially evaluated dataset, as detailed in Algorithm 1, and the prediction error a_i of each model is computed using the original data to support subsequent model selection via GBS. After initialization, the algorithm enters the DDEO phase, which iteratively executes three steps: (1) model selection using GBS, (2) offspring generation through variation operators (e.g., crossover and mutation), and (3) surrogate-assisted fitness evaluation and selection. In this paper, the crossover and mutation use the simulated binary crossover operator and the polynomial mutation operator, respectively. These steps are repeated until the termination criterion is satisfied. In each generation, L models are probabilistically chosen using GBS to form an ensemble, which is then employed to estimate the fitness of all individuals in the population. Based on these predictions, the most promising individuals are selected to advance to the next generation. Finally, once the termination condition is met, the algorithm outputs the best solution as predicted by the surrogate ensemble.

From the above, the GBS-DDEA incorporates two measures to prevent surrogate models from overfitting the offline dataset: First, during the initial model training phase, each surrogate model is trained on a randomly sampled subset of data points, where each data point is included in the training subset with a probability of 0.5. This randomized sampling introduces variability among the training datasets, promoting diversity and reducing the likelihood of overfitting to specific data patterns. Second, the GBS strategy probabilistically favors models with lower prediction errors, while incorporating controlled randomness. This stochastic selection process prevents over-reliance on potentially overfitted models and encourages exploration of models that may generalize better. These measures jointly act to mitigate overfitting and enhance the generalization capability of the surrogate ensemble.

Algorithm 3: GBS-DDEA

Input:       Q—the evaluated dataset;
                 R—the number of initial models;
                 L—the number of selected models for predictions;
Output:    xbest—the best solution;
1: Begin
2:     Get R models via Algorithm 1;
3:     Perform the initialization of population;
4:     While the algorithm does not meet the stop criteria Do
5:         L models are selected from R models via GBS;
6:         The predicted fitness of individuals is updated as Equation (4);
7:         Perform crossover and mutation to generate new individuals;
8:         The fitness of new individuals is predicted with Equation (4);
9:         The old and new individuals are combined, and the better individuals among
           them based on predicted fitness are selected to form a new population;
10:       The best solution in the new population is marked as xbest;
11:     End While
12: The xbest is output;
13: End

4. Experimental Studies

4.1. Experiment Setup

In this study, five well-established benchmark problems are employed to comprehensively evaluate the performance of the proposed GBS-DDEA. The test functions, denoted as TF1–TF5 in Table 1, are widely adopted in the DDEA literature due to their diverse landscape properties and varying levels of complexity [19]. Specifically, TF1 (Ellipsoid) is a unimodal function used to examine an algorithm’s efficiency in converging toward the global optimum. In contrast, TF2 (Rosenbrock), TF3 (Ackley), TF4 (Griewank), and TF5 (Rastrigin) are multimodal functions that pose substantial challenges due to their numerous local optima. Collectively, these functions encompass a broad spectrum of search landscapes, providing a rigorous assessment of GBS-DDEA’s exploration and exploitation capabilities.

In accordance with established practice, each benchmark function is tested under multiple dimensional settings (i.e., 10, 30, 50, and 100 variables). The global optimum of all functions is predefined and fixed, thereby ensuring consistency and fairness in performance evaluation.

To ensure fairness and reproducibility, the experimental settings are designed as follows. Latin hypercube sampling [31] is first employed to generate 11 × D data points across the entire search space, which serve as the initial real-evaluation dataset for each DDEA. Surrogate models are then constructed on this dataset to guide the evolutionary search toward promising regions. Under this offline data-driven setting, no algorithm is permitted to conduct additional real FEs beyond the initial 11 × D samples, reflecting practical scenarios with limited evaluation resources. The experimental environment utilizes a computer server with two CPUs, specifically Intel(R) Xeon(R) W5-3423, and 256 GB of RAM.

To reduce stochastic variability, each algorithm is independently executed 25 times on every problem instance, and the mean results are reported. Statistical significance is assessed using the Wilcoxon rank-sum test at a specific significance level (α = 0.05). For ease of interpretation, the outcomes are annotated with three symbols: “+” indicates that GBS-DDEA significantly outperforms the compared DDEA, “≈” denotes no significant difference, and “–” represents significantly worse performance. These notations provide a concise and intuitive summary of the experimental comparisons, as detailed in the subsequent sections.

4.2. Compared Advanced Algorithms

To rigorously evaluate the effectiveness of the GBS-DDEA, seven state-of-the-art DDEAs are selected for comparison: DDEA-SE [19], BDDEA [14], DDEA-PES [13], MFITS [32], PS-GA [33], ELDR-SAHO [34], and TS-SADE [35]. These algorithms are representative of ensemble-based DDEAs that integrate multiple surrogate models, each employing distinct strategies for model selection. They therefore serve as strong baselines for benchmarking the performance of GBS-DDEA, which introduces the novel GBS strategy and RBWE-based ensemble predictions.

For a fair and consistent comparison, all competing algorithms are implemented using their official or publicly released codebases, thereby minimizing discrepancies due to implementation details. Moreover, GBS-DDEA employs the same evolutionary operators as those used in the baseline algorithms, ensuring that performance gains can be directly attributed to the proposed GBS and RBWE components rather than differences in evolutionary search operators. Additionally, the number of available initial surrogate models in all algorithms is uniformly set to 2000.

4.3. Comparison Study with DDEAs

The results in Table 2 compare GBS-DDEA with seven representative DDEA variants, namely DDEA-SE, BDDEA, DDEA-PES, MFITS, PS-GA, ELDR-SAHO, and TS-SADE, across four dimensional scenarios and five test functions. Overall, GBS-DDEA demonstrates competitive and generally superior performance.

As shown in Table 2, GBS-DDEA achieves competitive or superior results across most test functions and dimensions when compared with the seven representative DDEA variants. At low dimensions (D = 10), GBS-DDEA demonstrates comparable performance to most algorithms, with only minor differences, indicating robustness in simpler problem spaces. As dimensionality increases to D = 30 and D = 50, the advantages of GBS-DDEA become more evident. In these settings, it consistently outperforms BDDEA, DDEA-PES, and several advanced competitors such as MFITS and PS-GA, highlighting its enhanced adaptability to complex problems.

The superiority of GBS-DDEA is particularly clear in high-dimensional scenarios (D = 100). While methods like DDEA-SE and DDEA-PES exhibit significant performance degradation, GBS-DDEA consistently yields stable results, often securing the best or near-best outcomes. This robustness suggests that the gradient-based sampling strategy effectively balances exploration and exploitation, preventing premature convergence while sustaining scalability.

In addition, although certain algorithms, such as MFITS and ELDR-SAHO, perform competitively on selected functions, GBS-DDEA achieves a more balanced performance across all test functions, avoiding the instability observed in some variants.

In summary, the analysis confirms that GBS-DDEA delivers more stable and reliable performance across various dimensions compared to existing DDEA variants. Its ability to maintain competitiveness in low-dimensional settings and to outperform other methods in higher dimensions shows its effectiveness and adaptability.

4.4. Component Analysis of GBS-DDEA

To further investigate the effectiveness of the proposed GBS and RBWE in the GBS-DDEA, Table 3 reports the results of GBS-DDEA and its two variants: GBS-DDEA-noG (without GBS) and GBS-DDEA-noR (without RBWE). The comparisons reveal that both GBS and RBWE play crucial roles in enhancing the performance of GBS-DDEA.

For the 10D case, GBS-DDEA achieves competitive or superior results against both variants. In particular, the absence of GBS (GBS-DDEA-noG) leads to clear performance degradation on TF2 and TF3, while the removal of RBWE (noR) results in worse performance on TF1 and TF2. Although GBS-DDEA-noG slightly outperforms GBS-DDEA on TF4, the overall trend indicates that both GBS and RBWE contribute positively.

When the dimensionality increases to 30D and 50D, the advantage of incorporating GBS and RBWE becomes more pronounced. GBS-DDEA-noG consistently underperforms on TF2, TF3, and TF5, while GBS-DDEA-noR exhibits inferior results on TF2 and TF3, confirming that greedy selection and simple averaging are less effective strategies. Notably, GBS-DDEA maintains more balanced and stable performance across all test functions.

In the 100D case, the importance of GBS is further validated, as GBS-DDEA-noG performs worse on three functions (TF2, TF3, TF5). Similarly, removing RBWE results in inferior or equivalent outcomes, with GBS-DDEA delivering the most accurate solutions overall.

Although the GBS-DDEA-noR shows slightly better results in terms of average on some problems, it does not significantly outperform the GBS-DDEA in terms of the Wilcoxon rank-sum test. Instead, GBS-DDEA significantly outperforms GBS-DDEA on 2 and 1 problems for both small (10) and large (100) problems, and performs similarly in the remaining problems. That is, the GBS-DDEA-noR does not show any significantly better results than the GBS-DDEA. This verifies the effectiveness of the RBWE component in GBS-DDEA.

In summary, the analysis demonstrates that both GBS and RBWE contribute positively to the excellent performance of GBS-DDEA, and removing either of them will degrade the algorithm’s performance. Their integration enables GBS-DDEA to consistently achieve superior performance across a range of problem scales and complexities.

4.5. Parameter Study of GBS-DDEA

First, to investigate the parameter sensitivity of the selected model number L in GBS-DDEA, we compared the original GBS-DDEA (L = 300) with four variants using L = 100, L = 200, L = 250, and L = 400. The results of GBS-DDEA variants are reported in Table 4.

Overall, the results show that the performance of GBS-DDEA is robust to variations in L, with most comparisons yielding statistically equivalent results (“≈”). Specifically, in the 10D and 30D cases, all variants achieve comparable performance across the five test functions, with only occasional differences. For example, GBS-DDEA (L = 250) achieves a worse performance on TF2 in 10D, while GBS-DDEA (L = 100) and GBS-DDEA (L = 400) perform better on individual functions in 30D.

In higher dimensions (50D and 100D), the influence of L becomes more noticeable but remains limited. For instance, GBS-DDEA (L = 200) and GBS-DDEA (L = 250) achieve worse performance on TF1 in 50D and TF4 in 100D, while other settings remain comparable. Importantly, no consistent trend suggests that larger or smaller L values lead to systematic improvements.

With the above, a detailed theoretical analysis of how L influences the bias-variance trade-off in ensemble prediction is also given herein. For the bias component, increased L, interpreted as enlarging the ensemble size or model diversity, generally reduces the bias of the ensemble predictor because aggregating multiple models tends to average out individual model biases, especially if the models are diverse and imperfect. For the variance component, conversely, increasing L can have complex effects on variance. When models are sufficiently diverse and independent, a larger L reduces the variance of the ensemble estimate due to averaging. However, if L grows too large and the models become more correlated (e.g., all trained on similar data), the variance reduction diminishes, and the risks of overfitting can escalate. The choice of L thus balances between reducing bias via greater model diversity (larger L) and avoiding overfitting and increased variance due to overly complex or highly correlated models.

Taken together, these results indicate that the choice of L does not significantly affect the optimization performance of GBS-DDEA, and the default setting of L = 300 offers a good balance between computational cost and search effectiveness. This robustness enhances the practicality of GBS-DDEA, since the algorithm does not rely on fine-tuned L values to achieve strong performance across different problem dimensions.

Second, to investigate the parameter sensitivity of the initial model number R in GBS-DDEA, we compared the original GBS-DDEA(R = 2000) with four variants using R = 1000, R = 1500, R = 2500, and R = 3000.

Table 5 presents the results of GBS-DDEA under different values of the initial model number R. Overall, the performance of GBS-DDEA is relatively stable across a wide range of R values, indicating that the algorithm is not highly sensitive to this parameter. At low dimension (D = 10), all variants achieve comparable results on the five test functions, with only minor fluctuations, and no statistically significant improvements or deteriorations are observed. As the dimensionality increases to D = 30 and D = 50, several cases show slight improvements when R = 1500 or R = 2500, particularly on TF4, where positive differences are observed. However, the improvements are marginal and not consistent across all functions.

For the most challenging case of D = 100, the algorithm again demonstrates robustness, with all variants producing similar results. In this setting, a slight advantage appears for R = 1000 and R = 2500 on TF4 and TF5, but the differences remain limited. Notably, no variant leads to performance deterioration compared to the baseline R = 2000. These findings suggest that GBS-DDEA maintains stable search ability even when the number of initial models varies substantially. In practice, this robustness implies that the default setting R = 2000 provides a good trade-off between computational efficiency and solution quality, while smaller or larger values of R do not significantly compromise performance.

4.6. Case Study of GBS-DDEA on Cheating Official Accounts Mining

To further evaluate the effectiveness of the proposed GBS-DDEA, it is applied to the task of mining cheating official accounts (COAs). With the rapid growth of social media platforms [36,37,38,39], users increasingly rely on official accounts for information. However, this trend also enables malicious groups to create COAs that mimic legitimate organizations to support black industry chains [40,41,42,43]. These accounts generate profits through activities such as posting fake reviews, inflating traffic, advertising, and conducting scams. Consequently, the detection of COAs plays a crucial role in the supervision and governance of online platform ecosystems.

Recently, heterogeneous graph transformer (HGT) models have demonstrated promising results in COA detection combined with Chinese content analysis [44,45]. Nevertheless, the performance of HGT models heavily depends on their hyperparameter settings. Since evaluating each hyperparameter configuration requires extensive training and substantial computational resources, hyperparameter optimization for HGT constitutes a typical EOP, making it a suitable case for testing GBS-DDEA.

In this study, we adopt a dataset provided by WeChat, which contains 32,021 legitimate official accounts and 5391 COAs. The optimization objective is to identify the optimal hyperparameter configuration of HGT that maximizes classification performance. The search space for hyperparameters is summarized in Table 6. For the input, we first perform word segmentation and use the resulting words as input, mapping them into vectors with the aid of a self-constructed corpus. Subsequently, TextCNN [46] is employed to aggregate multiple word vectors into a sentence-level vector, which serves as the feature representation. In this way, the textual content of nodes is effectively embedded as features. To initialize DDEAs, 30 hyperparameter candidates are generated using Latin hypercube sampling (LHS). Performance is evaluated using both micro-F1 and macro-F1 metrics. Comparative results for GBS-DDEA and three baseline DDEAs are presented in Table 6, with the default HGT configuration serving as an additional reference baseline. The computational cost of training and updating 2000 surrogate models is 4.82 s on average over 20 independent runs.

The experimental results in Table 7 show that the proposed GBS-DDEA achieves the best performance among all compared algorithms. Specifically, the micro-F1 score of GBS-DDEA-HGT reaches 0.94, which outperforms the baseline HGT (0.88) and the other three surrogate-assisted DDEAs (0.92–0.93). More importantly, GBS-DDEA also achieves the highest macro-F1 score (0.51), which is a more balanced indicator for handling class imbalance. Since COAs represent the minority class in the dataset, the improvement in macro-F1 demonstrates the stronger ability of GBS-DDEA to enhance the classification performance for rare but critical samples.

These results confirm the effectiveness of the proposed model management strategy. By integrating GBS with the RBWE ensemble mechanism, GBS-DDEA improves the generalization ability of surrogate models, leading to more reliable hyperparameter optimization. The improved optimization process not only finds better hyperparameter configurations but also reduces the risk of overfitting, a key challenge in surrogate-assisted optimization.

Overall, the case study on COA mining highlights that GBS-DDEA is well-suited for solving EOPs in real-world applications. It provides both higher accuracy and better robustness compared with existing DDEAs, thus demonstrating strong potential for deployment in large-scale, high-cost machine learning tasks.

4.7. Computational Efficiency of GBS-DDEA

This section further investigates the computational efficiency of GBS-DDEA. To this end, we compare the time costs and optimization performance of a traditional EA and the proposed GBS-DDEA on the same optimization problem. Specifically, the traditional EA is executed with 3000 FEs, and both the final optimization results and time costs are recorded. The GBS-DDEA is then run until it achieves a comparable optimization result, after which its time cost is measured. The ratio of these two time costs is used to compute the speedup of GBS-DDEA over the traditional EA. It is important to note that both algorithms adopt the same crossover and mutation operators. The key distinction is that, rather than relying on computationally expensive FEs, GBS-DDEA leverages a Gumbel selection–based ensemble model to predict fitness values. To emulate different levels of evaluation cost, benchmark functions are assigned with varying time delays.

The resulting speedups for TF1 and TF5 are presented in Figure 2. As shown, for both functions, the speedup increases with the cost per evaluation and eventually converges to approximately 25. This shows that GBS-DDEA achieves substantial acceleration compared to the traditional EA, particularly when evaluations are highly time-consuming. These results validate the time efficiency of the proposed GBS-DDEA.

4.8. Discussion on Limitations

Although the experiments show the effectiveness of the GBS-DDEA, there are still some limitations. First, since the GBS-DDEA relies on evaluated data to construct the initial surrogate, it may be susceptible to noise in the evaluation data or highly dynamic fitness landscapes, potentially impacting its stability and robustness. Second, the algorithm assumes the availability of a reasonably sized initial dataset for surrogate modeling. In scenarios with very limited initial data or highly costly evaluations, the surrogate models might not be sufficiently accurate initially, potentially impacting convergence. Therefore, researchers should consider these aspects and potential limitations when applying GBS-DDEA to real-world problems, especially where data is limited and evaluations are extremely costly.

5. Conclusions

This paper tackles the overfitting challenge of surrogate models trained on limited evaluated data, which remains a fundamental limitation of DDEAs. To enhance the generalization capability of surrogate-assisted optimization, we propose a novel GBS-DDEA algorithm that incorporates a GBS mechanism and an RBWE strategy into surrogate model management. By probabilistically selecting models based on both accuracy and diversity through GBS, and integrating them via RBWE, the proposed method effectively alleviates overfitting while improving the reliability of fitness approximation.

Comprehensive experiments on a suite of EOPs and a COAs mining problem demonstrate that GBS-DDEA consistently outperforms or achieves performance on par with several state-of-the-art DDEAs. Furthermore, ablation studies confirm the critical contribution of the RBWE strategy in improving surrogate generalization and guiding the evolutionary search toward superior solutions.

In conclusion, GBS-DDEA offers a straightforward yet effective enhancement to surrogate-based optimization frameworks, laying a promising foundation for future advancements in DDEA research. Potential directions for future work include adaptive ensemble updating, normalization or scaling techniques, knowledge transfer across tasks, and the integration of uncertainty quantification, with the aim of further strengthening optimization robustness under conditions of limited or noisy data.

Author Contributions

Conceptualization, J.Y. and J.-Y.L.; methodology, J.-Y.L.; software, J.Y. and J.-Y.L.; validation, J.Y.; formal analysis, J.Y. and J.-Y.L.; investigation, J.Y.; resources, J.-Y.L.; data curation, J.Y.; writing—original draft preparation, J.Y.; writing—review and editing, J.-Y.L.; visualization, J.-Y.L.; supervision, J.-Y.L.; project administration, J.-Y.L.; funding acquisition, J.-Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 62406152, in part by the Tianjin Top Scientist Studio Project under Grant 24JRRCRC00030, in part by the Natural Science Foundation of Tianjin under Grant 24JCQNJC02100, in part by the Tianjin Belt and Road Joint Laboratory under Grant 24PTLYHZ00250, in part by the Fundamental Research Funds for the Central Universities, Nankai University (078-63251088).

Data Availability Statement

The data will be made available upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

DDEA	Data-driven evolutionary algorithm
DDEO	Data-driven evolutionary optimization
DE	Differential evolution
EA	Evolutionary algorithm
EC	Evolutionary computation
EOP	Expensive optimization problem
FE	Fitness evaluation
GBS	Gumbel-based selection
GBS-DDEA	Gumbel-based selection data-driven evolutionary algorithm
MMS	Model management strategy
RBFNN	Radial basis function neural network
RBWE	Ranking-based weighting ensemble
SMM	Surrogate model management

References

Winter, R.; Milatz, B.; Blank, J.; van Stein, N.; Bäck, T.; Deb, K. Parallel Multi-Objective Optimization for Expensive and Inexpensive Objectives and Constraints. Swarm Evol. Comput. 2024, 86, 101508. [Google Scholar] [CrossRef]
Yang, Q.-T.; Li, J.-Y.; Zhan, Z.-H.; Jiang, Y.; Jin, Y.; Zhang, J. A Hierarchical and Ensemble Surrogate-Assisted Evolutionary Algorithm With Model Reduction for Expensive Many-Objective Optimization. IEEE Trans. Evol. Comput. 2024. [Google Scholar] [CrossRef]
Espinosa, R.; Jiménez, F.; Palma, J. Surrogate-Assisted Multi-Objective Evolutionary Feature Selection of Generation-Based Fixed Evolution Control for Time Series Forecasting with LSTM Networks. Swarm Evol. Comput. 2024, 88, 101587. [Google Scholar] [CrossRef]
Li, J.-Y.; Zhan, Z.-H.; Zhang, J. Evolutionary Computation for Expensive Optimization: A Survey. Mach. Intell. Res. 2022, 19, 3–23. [Google Scholar] [CrossRef]
Luong, N.H.; Phan, Q.M.; Vo, A.; Pham, T.N.; Bui, D.T. Lightweight Multi-Objective Evolutionary Neural Architecture Search with Low-Cost Proxy Metrics. Inf. Sci. 2024, 655, 119856. [Google Scholar] [CrossRef]
Wang, H.; Jin, Y. A Random Forest Assisted Evolutionary Algorithm for Data-Driven Constrained Multi-Objective Combinatorial Optimization of Trauma Systems. IEEE Trans. Cybern. 2020, 50, 536–549. [Google Scholar] [CrossRef] [PubMed]
Jin, Y.; Wang, H.; Chugh, T.; Guo, D.; Miettinen, K. Data-Driven Evolutionary Optimization: An Overview and Case Studies. IEEE Trans. Evol. Comput. 2019, 23, 442–458. [Google Scholar] [CrossRef]
Guo, Z.; Lin, S.; Suo, R.; Zhang, X. An Offline Weighted-Bagging Data-Driven Evolutionary Algorithm with Data Generation Based on Clustering. Mathematics 2023, 11, 431. [Google Scholar] [CrossRef]
Horaguchi, Y.; Nakata, M. High-Dimensional Expensive Multiobjective Optimization Using a Surrogate-Assisted Multifactorial Evolutionary Algorithm. In Proceedings of the Genetic and Evolutionary Computation Conference, Malaga, Spain, 14–18 July 2025; ACM: New York, NY, USA, 2025; pp. 572–580. [Google Scholar]
Li, J.-Y.; Zhan, Z.-H.; Xu, J.; Kwong, S.; Zhang, J. Surrogate-Assisted Hybrid-Model Estimation of Distribution Algorithm for Mixed-Variable Hyperparameters Optimization in Convolutional Neural Networks. IEEE Trans. Neural Networks Learn. Syst. 2023, 34, 2338–2352. [Google Scholar] [CrossRef]
Huang, P.; Wang, H.; Ma, W. Stochastic Ranking for Offline Data-Driven Evolutionary Optimization Using Radial Basis Function Networks with Multiple Kernels. In Proceedings of the 2019 IEEE Symposium Series on Computational Intelligence (SSCI), Xiamen, China, 6–9 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 2050–2057. [Google Scholar]
He, C.; Zhang, Y.; Gong, D.; Ji, X. A Review of Surrogate-Assisted Evolutionary Algorithms for Expensive Optimization Problems. Expert Syst. Appl. 2023, 217, 119495. [Google Scholar] [CrossRef]
Li, J.-Y.; Zhan, Z.-H.; Wang, H.; Zhang, J. Data-Driven Evolutionary Algorithm With Perturbation-Based Ensemble Surrogates. IEEE Trans. Cybern. 2020, 51, 3925–3937. [Google Scholar] [CrossRef] [PubMed]
Li, J.Y.; Zhan, Z.H.; Wang, C.; Jin, H.; Zhang, J. Boosting Data-Driven Evolutionary Algorithm with Localized Data Generation. IEEE Trans. Evol. Comput. 2020, 24, 923–937. [Google Scholar] [CrossRef]
Petelin, D.; Filipič, B.; Kocijan, J. Optimization of Gaussian Process Models with Evolutionary Algorithms. In Adaptive and Natural Computing Algorithms. ICANNGA 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 420–429. [Google Scholar]
Ikeguchi, T.; Nishihara, K.; Kawauchi, Y.; Koguma, Y.; Nakata, M. A Surrogate-Assisted Memetic Algorithm for Permutation-Based Combinatorial Optimization Problems. Swarm Evol. Comput. 2025, 98, 102060. [Google Scholar] [CrossRef]
Sun, C.; Jin, Y.; Cheng, R.; Ding, J.; Zeng, J. Surrogate-Assisted Cooperative Swarm Optimization of High-Dimensional Expensive Problems. IEEE Trans. Evol. Comput. 2017, 21, 644–660. [Google Scholar] [CrossRef]
Zhang, M.; Li, H.; Pan, S.; Lyu, J.; Ling, S.; Su, S. Convolutional Neural Networks-Based Lung Nodule Classification: A Surrogate-Assisted Evolutionary Algorithm for Hyperparameter Optimization. IEEE Trans. Evol. Comput. 2021, 25, 869–882. [Google Scholar] [CrossRef]
Wang, H.; Jin, Y.; Sun, C.; Doherty, J. Offline Data-Driven Evolutionary Optimization Using Selective Surrogate Ensembles. IEEE Trans. Evol. Comput. 2018, 23, 203–216. [Google Scholar] [CrossRef]
Huang, P.; Wang, H.; Jin, Y. Offline Data-Driven Evolutionary Optimization Based on Tri-Training. Swarm Evol. Comput. 2021, 60, 100800. [Google Scholar] [CrossRef]
Haftka, R.T.; Villanueva, D.; Chaudhuri, A. Parallel Surrogate-Assisted Global Optimization with Expensive Functions—A Survey. Struct. Multidiscip. Optim. 2016, 54, 3–13. [Google Scholar] [CrossRef]
Ardeh, M.A.; Mei, Y.; Zhang, M. A GPHH with Surrogate-Assisted Knowledge Transfer for Uncertain Capacitated Arc Routing Problem. In Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence (SSCI), Canberra, Australia, 1–4 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 2786–2793. [Google Scholar]
Min, A.T.W.; Ong, Y.-S.; Gupta, A.; Goh, C.-K. Multiproblem Surrogates: Transfer Evolutionary Multiobjective Optimization of Computationally Expensive Problems. IEEE Trans. Evol. Comput. 2019, 23, 15–28. [Google Scholar] [CrossRef]
Russo, I.L.S.; Barbosa, H.J.C. A Multitasking Surrogate-Assisted Differential Evolution Method for Solving Bi-Level Optimization Problems. In Proceedings of the 2022 IEEE Congress on Evolutionary Computation (CEC), Padua, Italy, 18–23 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–8. [Google Scholar]
Tian, J.; Tan, Y.; Zeng, J.; Sun, C.; Jin, Y. Multiobjective Infill Criterion Driven Gaussian Process-Assisted Particle Swarm Optimization of High-Dimensional Expensive Problems. IEEE Trans. Evol. Comput. 2019, 23, 459–472. [Google Scholar] [CrossRef]
Lin, D.; Huang, H.; Li, X.; Gong, Y. Empirical Study of Data-Driven Evolutionary Algorithms in Noisy Environments. Mathematics 2022, 10, 943. [Google Scholar] [CrossRef]
Yu, M.; Li, X.; Liang, J. A Dynamic Surrogate-Assisted Evolutionary Algorithm Framework for Expensive Structural Optimization. Struct. Multidiscip. Optim. 2020, 61, 711–729. [Google Scholar] [CrossRef]
Huang, H.; Gong, Y. Contrastive Learning: An Alternative Surrogate for Offline Data-Driven Evolutionary Computation. IEEE Trans. Evol. Comput. 2023, 27, 370–384. [Google Scholar] [CrossRef]
Hao, H.; Zhang, X.; Zhou, A. Expensive Optimization via Relation. IEEE Trans. Evol. Comput. 2025. [Google Scholar] [CrossRef]
Sun, Y.H.; Huang, T.; Zhong, J.H.; Zhang, J.; Gong, Y.J. Symbolic Regression-Assisted Offline Data-Driven Evolutionary Computation. IEEE Trans. Evol. Comput. 2024, 29, 2158–2172. [Google Scholar] [CrossRef]
Stein, M. Large Sample Properties of Simulations Using Latin Hypercube Sampling. Technometrics 1987, 29, 143–151. [Google Scholar] [CrossRef]
Kenny, A.; Ray, T.; Singh, H.K. An Iterative Two-Stage Multifidelity Optimization Algorithm for Computationally Expensive Problems. IEEE Trans. Evol. Comput. 2023, 27, 520–534. [Google Scholar] [CrossRef]
Gharavian, V.; Rahnamayan, S.; Bidgoli, A.A.; Makrehchi, M. A Pairwise Surrogate Model Using GNN for Evolutionary Optimization. In Proceedings of the 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Oahu, HI, USA, 1–4 October 2023; pp. 3996–4002. [Google Scholar] [CrossRef]
Harada, T. A Pairwise Ranking Estimation Model for Surrogate-Assisted Evolutionary Algorithms. Complex Intell. Syst. 2023, 9, 6875–6890. [Google Scholar] [CrossRef]
Kumar, A.; Das, S.; Snášel, V. Efficient Three-Stage Surrogate-Assisted Differential Evolution for Expensive Optimization Problems. Swarm Evol. Comput. 2025, 98, 102093. [Google Scholar] [CrossRef]
Cheng, X.; Fu, S.; de Vreede, G.-J. Understanding Trust Influencing Factors in Social Media Communication: A Qualitative Study. Int. J. Inf. Manag. 2017, 37, 25–35. [Google Scholar] [CrossRef]
Zhao, N.; Li, H. How Can Social Commerce Be Boosted? The Impact of Consumer Behaviors on the Information Dissemination Mechanism in a Social Commerce Network. Electron. Commer. Res. 2020, 20, 833–856. [Google Scholar] [CrossRef]
Che, H.; Pan, B.; Leung, M.-F.; Cao, Y.; Yan, Z. Tensor Factorization With Sparse and Graph Regularization for Fake News Detection on Social Networks. IEEE Trans. Comput. Soc. Syst. 2024, 11, 4888–4898. [Google Scholar] [CrossRef]
Pan, B.; Li, C.; Che, H.; Leung, M.-F.; Yu, K. Low-Rank Tensor Regularized Graph Fuzzy Learning for Multi-View Data Processing. IEEE Trans. Consum. Electron. 2024, 70, 2925–2938. [Google Scholar] [CrossRef]
Berrondo-Otermin, M.; Sarasa-Cabezuelo, A. Application of Artificial Intelligence Techniques to Detect Fake News: A Review. Electronics 2023, 12, 5041. [Google Scholar] [CrossRef]
Mridha, M.F.; Keya, A.J.; Hamid, M.A.; Monowar, M.M.; Rahman, M.S. A Comprehensive Review on Fake News Detection With Deep Learning. IEEE Access 2021, 9, 156151–156170. [Google Scholar] [CrossRef]
Aïmeur, E.; Amri, S.; Brassard, G. Fake News, Disinformation and Misinformation in Social Media: A Review. Soc. Netw. Anal. Min. 2023, 13, 30. [Google Scholar] [CrossRef] [PubMed]
Ramesh, J.V.N.; Gupta, S.; Quraishi, A.; Dutta, A.K.; Sinha, K.P.; Rao, G.S.N.; Sherkuziyeva, N.; Nimma, D.; Patni, J.C. Unified Fake News Detection Based on IoST-Driven Joint Detection Models. IEEE Trans. Comput. Soc. Syst. 2025. [Google Scholar] [CrossRef]
Hu, Z.; Dong, Y.; Wang, K.; Sun, Y. Heterogeneous Graph Transformer. In Proceedings of the Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; ACM: New York, NY, USA, 2020; pp. 2704–2710. [Google Scholar]
Phan, H.T.; Nguyen, V.D.; Nguyen, N.T. MulGCN: MultiGraph Convolutional Network for Aspect-Level Sentiment Analysis. IEEE Access 2025, 13, 26304–26317. [Google Scholar] [CrossRef]
Kim, Y. Convolutional Neural Networks for Sentence Classification. arXiv 2014, arXiv:1408.5882. [Google Scholar] [CrossRef]

Figure 1. The overall framework of GBS-DDEA.

Figure 2. The speedup of GBS-DDEA.

Table 1. The properties of the TF1 to TF5 problems.

Test Function ID	Function	Number of Variables
TF1	Ellipsoid	{10, 30, 50, 100}
TF2	Rosenbrock
TF3	Ackley
TF4	Griewank
TF5	Rastrigin

Table 2. The results of the GBS-DDEA and other DDEAs.

D	ID	GBS-DDEA	DDEA-SE	BDDEA	DDEA-PES
10	TF1	1.09 ± 5.25 × 10⁻¹	9.94 × 10⁻¹ ± 4.68 × 10⁻¹ (≈)	1.11 ± 4.04 × 10⁻¹ (≈)	1.36 ± 6.52 × 10⁻¹ (≈)
	TF2	2.93 × 10¹ ± 3.29	2.80 × 10¹ ± 5.31 (≈)	3.44 × 10¹ ± 7.46 (+)	2.99 × 10¹ ± 7.73 (≈)
	TF3	5.91 ± 8.03 × 10⁻¹	5.99 ± 7.92 × 10⁻¹ (≈)	6.67 ± 6.44 × 10⁻¹ (+)	6.50 ± 1.44 (≈)
	TF4	1.24 ± 1.02 × 10⁻¹	1.30 ± 1.39 × 10⁻¹ (+)	1.26 ± 1.23 × 10⁻¹ (≈)	1.33 ± 1.32 × 10⁻¹ (+)
	TF5	6.81 × 10¹ ± 2.04 × 10¹	6.08 × 10¹ ± 1.82 × 10¹ (≈)	6.94 × 10¹ ± 2.68 × 10¹ (≈)	5.87 × 10¹ ± 1.79 × 10¹ (≈)
+/≈/−		NA	1/4/0	2/3/0	1/4/0
30	TF1	4.26 ± 9.55 × 10⁻¹	4.09 ± 1.37 (≈)	7.02 ± 2.21 (+)	5.31 ± 1.39 (+)
	TF2	5.97 × 10¹ ± 6.54	5.71 × 10¹ ± 4.79 (≈)	6.96 × 10¹ ± 8.22 (+)	7.00 × 10¹ ± 8.61 (+)
	TF3	5.17 ± 2.78 × 10⁻¹	4.85 ± 5.24 × 10⁻¹ (−)	5.65 ± 6.85 × 10⁻¹ (+)	5.48 ± 3.62 × 10⁻¹ (+)
	TF4	1.15 ± 5.50 × 10⁻²	1.26 ± 7.83 × 10⁻² (+)	1.38 ± 1.14 × 10⁻¹ (+)	1.26 ± 9.47 × 10⁻² (+)
	TF5	1.16 × 10² ± 2.38 × 10¹	1.14 × 10² ± 2.72 × 10¹ (≈)	1.53 × 10² ± 4.09 × 10¹ (+)	1.46 × 10² ± 3.10 × 10¹ (+)
+/≈/−		NA	1/3/1	5/0/0	5/0/0
50	TF1	9.33 ± 1.37	1.35 × 10¹ ± 4.74 (+)	1.33 × 10¹ ± 3.18 (+)	1.48 × 10¹ ± 4.48 (+)
	TF2	8.52 × 10¹ ± 3.06	8.24 × 10¹ ± 4.09 (−)	9.88 × 10¹ ± 9.88 (+)	1.09 × 10² ± 1.12 × 10¹ (+)
	TF3	4.42 ± 2.21 × 10⁻¹	4.90 ± 3.02 × 10⁻¹ (+)	4.79 ± 3.50 × 10⁻¹ (+)	4.90 ± 4.17 × 10⁻¹ (+)
	TF4	1.18 ± 3.60 × 10⁻²	1.89 ± 2.11 × 10⁻¹ (+)	1.42 ± 8.12 × 10⁻² (+)	1.34 ± 9.61 × 10⁻² (+)
	TF5	1.54 × 10² ± 2.80 × 10¹	1.78 × 10² ± 3.17 × 10¹ (+)	1.98 × 10² ± 3.06 × 10¹ (+)	2.38 × 10² ± 5.01 × 10¹ (+)
+/≈/−		NA	4/0/1	5/0/0	5/0/0
100	TF1	8.27 × 10¹ ± 7.74 × 10¹	2.84 × 10² ± 7.87 × 10¹ (+)	5.57 × 10¹ ± 1.25 × 10¹ (≈)	4.17 × 10² ± 3.55 × 10² (+)
	TF2	1.89 × 10² ± 2.38 × 10¹	2.41 × 10² ± 2.60 × 10¹ (+)	1.94 × 10² ± 2.01 × 10¹ (≈)	4.99 × 10² ± 1.93 × 10² (+)
	TF3	4.22 ± 2.43 × 10⁻¹	7.16 ± 6.64 × 10⁻¹ (+)	4.69 ± 2.59 × 10⁻¹ (+)	5.17 ± 6.21 × 10⁻¹ (+)
	TF4	1.56 ± 1.70 × 10⁻¹	1.84 × 10¹ ± 1.83 (+)	1.85 ± 2.36 × 10⁻¹ (+)	3.83 ± 2.01 (+)
	TF5	4.66 × 10² ± 9.26 × 10¹	7.79 × 10² ± 8.59 × 10¹ (+)	4.30 × 10² ± 1.45 × 10² (≈)	9.15 × 10² ± 1.28 × 10² (+)
+/≈/−		NA	5/0/0	2/3/0	5/0/0
D	ID	MFITS	PS-GA	ELDR-SAHO	TS-SADE
10	TF1	1.18 ± 6.28 × 10⁻¹ (≈)	1.17 ± 7.19 × 10⁻¹ (≈)	1.19 ± 4.82 × 10⁻¹ (≈)	1.31 ± 5.18 × 10⁻¹ (≈)
	TF2	2.94 × 10¹ ± 6.75 (≈)	3.59 × 10¹ ± 7.46 (+)	3.43 × 10¹ ± 7.40 (+)	3.01 × 10¹ ± 6.66 (≈)
	TF3	6.51 ± 4.24 (≈)	6.70 ± 3.39 × 10⁻¹ (+)	6.60 ± 6.07 × 10⁻¹ (+)	6.47 ± 1.79 (≈)
	TF4	1.26 ± 2.21 × 10⁻¹ (+)	1.45 ± 1.54 × 10⁻¹ (≈)	1.28 ± 1.68 × 10⁻¹ (≈)	1.32 ± 1.80 × 10⁻¹ (+)
	TF5	5.83 × 10¹ ± 1.75 × 10¹ (≈)	6.34 × 10¹ ± 2.15 × 10¹ (≈)	6.96 × 10¹ ± 2.50 × 10¹ (≈)	5.98 × 10¹ ± 1.59 × 10¹ (≈)
+/≈/−		1/4/0	2/3/0	2/3/0	1/4/0
30	TF1	5.31 ± 1.60 (+)	7.05 ± 2.15 (+)	7.06 ± 3.21 (+)	5.38 ± 2.05 (+)
	TF2	6.92 × 10¹ ± 7.58 (+)	6.87 × 10¹ ± 8.62 (+)	6.96 × 10¹ ± 7.22 (+)	7.03 × 10¹ ± 9.03 (+)
	TF3	5.71 ± 3.06 × 10⁻¹ (+)	5.13 ± 6.92 × 10⁻¹ (+)	5.65 ± 6.85 × 10⁻¹ (+)	5.56 ± 3.85 × 10⁻¹ (+)
	TF4	1.28 ± 8.05 × 10⁻² (+)	1.38 ± 2.18 × 10⁻¹ (+)	1.40 ± 5.71 × 10⁻¹ (+)	1.34 ± 8.85 × 10⁻² (+)
	TF5	1.43 × 10² ± 3.39 × 10¹ (+)	1.63 × 10² ± 4.37 × 10¹ (+)	1.50 × 10² ± 3.28 × 10¹ (+)	1.76 × 10² ± 3.09 × 10¹ (+)
+/≈/−		5/0/0	5/0/0	5/0/0	5/0/0
50	TF1	1.50 × 10¹ ± 3.60 (+)	1.63 × 10¹ ± 1.37 (+)	1.60 × 10¹ ± 2.53 (+)	1.70 × 10¹ ± 4.41 (+)
	TF2	1.11 × 10² ± 2.15 × 10¹ (+)	9.48 × 10¹ ± 8.08 (+)	9.86 × 10¹ ± 9.77 (+)	1.11 × 10² ± 1.66 × 10¹ (+)
	TF3	4.91 ± 3.60 × 10⁻¹ (+)	4.76 ± 4.88 × 10⁻¹ (+)	4.71 ± 4.11 × 10⁻¹ (+)	4.76 ± 3.85 × 10⁻¹ (+)
	TF4	1.45 ± 8.15 × 10⁻² (+)	1.44 ± 7.57 × 10⁻² (+)	1.58 ± 7.69 × 10⁻² (+)	1.51 ± 8.83 × 10⁻² (+)
	TF5	2.58 × 10² ± 4.41 × 10¹ (+)	1.64 × 10² ± 6.68 × 10¹ (+)	1.79 × 10² ± 2.93 × 10¹ (+)	2.15 × 10² ± 4.50 × 10¹ (+)
+/≈/−		5/0/0	5/0/0	5/0/0	5/0/0
100	TF1	4.18 × 10² ± 3.41 × 10² (+)	5.62 × 10¹ ± 2.21 × 10¹ (≈)	5.58 × 10¹ ± 1.25 × 10¹ (≈)	5.60 × 10¹ ± 4.50 × 10² (≈)
	TF2	4.85 × 10² ± 2.10 × 10² (+)	1.58 × 10² ± 3.15 × 10¹ (≈)	1.96 × 10² ± 1.57 × 10¹ (≈)	1.97 × 10² ± 1.93 × 10² (≈)
	TF3	5.23 ± 5.85 × 10⁻¹ (+)	4.72 ± 4.62 × 10⁻¹ (+)	4.70 ± 3.12 × 10⁻¹ (+)	4.27 ± 4.51 × 10⁻¹ (≈)
	TF4	3.90 ± 2.08 (+)	1.82 ± 3.35 × 10⁻¹ (+)	1.84 ± 2.81 × 10⁻¹ (+)	1.88 ± 2.22 (+)
	TF5	9.44 × 10² ± 2.02 × 10² (+)	4.34 × 10² ± 2.13 × 10² (≈)	4.50 × 10² ± 2.01 × 10² (≈)	4.56 × 10² ± 1.33 × 10² (≈)
+/≈/−		5/0/0	2/3/0	2/3/0	1/4/0

Table 3. Comparisons between the GBS-DDEA, GBS-DDEA-noG, and GBS-DDEA-noR.

D	ID	GBS-DDEA	GBS-DDEA-noG	GBS-DDEA-noR
10	TF1	1.09 ± 5.25 × 10⁻¹	1.03 ± 4.47 × 10⁻¹ (≈)	1.25 ± 2.94 × 10⁻¹ (+)
	TF2	2.93 × 10¹ ± 3.29	3.59 × 10¹ ± 7.26 (+)	3.64 × 10¹ ± 6.34 (+)
	TF3	5.91 ± 8.03 × 10⁻¹	6.92 ± 7.85 × 10⁻¹ (+)	5.66 ± 6.43 × 10⁻¹ (≈)
	TF4	1.24 ± 1.02 × 10⁻¹	1.17 ± 4.65 × 10⁻² (−)	1.35 ± 2.01 × 10⁻¹ (≈)
	TF5	6.81 × 10¹ ± 2.04 × 10¹	8.11 × 10¹ ± 1.82 × 10¹ (≈)	5.42 × 10¹ ± 1.62 × 10¹ (≈)
+/≈/−		NA	2/2/1	2/3/0
30	TF1	4.26 ± 9.55 × 10⁻¹	5.84 ± 1.95 (≈)	4.30 ± 9.62 × 10⁻¹ (≈)
	TF2	5.97 × 10¹ ± 6.54	6.35 × 10¹ ± 5.58 (≈)	6.53 × 10¹ ± 3.89 (+)
	TF3	5.17 ± 2.78 × 10⁻¹	5.34 ± 4.50 × 10⁻¹ (≈)	5.74 ± 5.08 × 10⁻¹ (+)
	TF4	1.15 ± 5.50 × 10⁻²	1.25 ± 7.31 × 10⁻² (+)	1.20 ± 2.45 × 10⁻² (+)
	TF5	1.16 × 10² ± 2.38 × 10¹	1.54 × 10² ± 2.02 × 10¹ (+)	1.20 × 10¹ ± 1.60 × 10¹ (+)
+/≈/−		NA	2/3/0	4/1/0
50	TF1	9.33 ± 1.37	1.26 × 10¹ ± 3.68 (+)	1.13 × 10¹ ± 2.28 (+)
	TF2	8.52 × 10¹ ± 3.06	8.98 × 10¹ ± 6.99 (+)	9.07 × 10¹ ± 2.81 (+)
	TF3	4.42 ± 2.21 × 10⁻¹	4.91 ± 3.51 × 10⁻¹ (+)	5.29 ± 2.41 × 10⁻¹ (+)
	TF4	1.18 ± 3.60 × 10⁻²	1.20 ± 4.79 × 10⁻² (≈)	1.22 ± 7.44 × 10⁻² (≈)
	TF5	1.54 × 10² ± 2.80 × 10¹	1.98 × 10² ± 3.89 × 10¹ (+)	1.37 × 10² ± 3.44 × 10¹ (≈)
+/≈/−		NA	4/1/0	3/2/0
100	T1	8.27 × 10¹ ± 7.74 × 10¹	9.13 × 10¹ ± 3.08 × 10¹ (≈)	5.81 × 10¹ ± 1.96 × 10¹ (≈)
	T2	1.89 × 10² ± 2.38 × 10¹	2.22 × 10² ± 4.33 × 10¹ (+)	1.79 × 10² ± 2.25 × 10¹ (≈)
	T3	4.22 ± 2.43 × 10⁻¹	4.64 ± 3.03 × 10⁻¹ (+)	4.11 ± 2.78 × 10⁻¹ (≈)
	T4	1.56 ± 1.70 × 10⁻¹	1.50 ± 9.73 × 10⁻² (≈)	1.94 ± 6.89 × 10⁻¹ (+)
	T5	4.66 × 10² ± 9.26 × 10¹	7.16 × 10² ± 2.64 × 10² (+)	4.29 × 10² ± 1.34 × 10² (≈)
+/≈/−		NA	3/2/0	1/4/0

Table 4. Comparisons between the GBS-DDEA variants with different numbers of selected models.

D	ID	GBS-DDEA (L = 300)	GBS-DDEA (L = 100)	GBS-DDEA (L = 200)	GBS-DDEA (L = 250)	GBS-DDEA (L = 400)
10	TF1	1.09 ± 5.25 × 10⁻¹	1.04 ± 5.62 × 10⁻¹ (≈)	8.62 × 10⁻¹ ± 3.10 × 10⁻¹ (≈)	9.26 × 10⁻¹ ± 3.23 × 10⁻¹ (≈)	9.47 × 10⁻¹ ± 3.52 × 10⁻¹ (≈)
	TF2	2.93 × 10¹ ± 3.29	3.22 × 10¹ ± 6.05 (≈)	2.76 × 10¹ ± 5.35 (≈)	3.37 × 10¹ ± 5.80 (+)	3.14 × 10¹ ± 6.44 (≈)
	TF3	5.91 ± 8.03 × 10⁻¹	6.14 ± 9.72 × 10⁻¹ (≈)	6.08 ± 1.22 (≈)	6.40 ± 1.13 (≈)	6.05 ± 1.12 (≈)
	TF4	1.24 ± 1.02 × 10⁻¹	1.26 ± 1.17 × 10⁻¹ (≈)	1.19 ± 1.10 × 10⁻¹ (≈)	1.19 ± 9.51 × 10⁻² (≈)	1.29 ± 1.25 × 10⁻¹ (≈)
	TF5	6.81 × 10¹ ± 2.04 × 10¹	7.29 × 10¹ ± 1.77 × 10¹ (≈)	6.41 × 10¹ ± 1.48 × 10¹ (≈)	6.54 × 10¹ ± 2.09 × 10¹ (≈)	6.99 × 10¹ ± 2.25 × 10¹ (≈)
+/≈/−		NA	0/5/0	0/5/0	1/4/0	0/5/0
30	TF1	4.26 ± 9.55 × 10⁻¹	4.80 ± 1.84 (≈)	3.82 ± 1.04 (≈)	4.88 ± 1.21 (≈)	4.77 ± 1.68 (≈)
	TF2	5.97 × 10¹ ± 6.54	5.38 × 10¹ ± 8.92 (−)	6.07 × 10¹ ± 3.90 (≈)	6.48 × 10¹ ± 1.13 × 10¹ (≈)	5.90 × 10¹ ± 4.63 (≈)
	TF3	5.17 ± 2.78 × 10⁻¹	4.93 ± 4.38 × 10⁻¹ (≈)	5.35 ± 7.04 × 10⁻¹ (≈)	5.25 ± 6.06 × 10⁻¹ (≈)	4.91 ± 2.07 × 10⁻¹ (−)
	TF4	1.15 ± 5.50 × 10⁻²	1.20 ± 3.24 × 10⁻² (+)	1.20 ± 5.63 × 10⁻² (+)	1.23 ± 6.47 × 10⁻² (+)	1.17 ± 6.08 × 10⁻² (≈)
	TF5	1.16 × 10² ± 2.38 × 10¹	1.06 × 10²±2.58 × 10¹ (≈)	1.26 × 10² ± 2.47 × 10¹ (≈)	1.24 × 10² ± 2.61 × 10¹ (≈)	1.25 × 10² ± 3.72 × 10¹ (≈)
+/≈/−		NA	1/3/1	1/4/0	1/4/0	0/4/1
50	TF1	9.33 ± 1.37	1.06 × 10¹ ± 2.36 (≈)	1.21 × 10¹ ± 4.27 (+)	1.32 × 10¹ ± 2.39 (+)	8.80 ± 3.06 (≈)
	TF2	8.52 × 10¹ ± 3.06	8.84 × 10¹ ± 5.34 (≈)	9.11 × 10¹ ± 1.04 × 10¹ (≈)	8.51 × 10¹ ± 3.99 (≈)	8.45 × 10¹ ± 4.70 (≈)
	TF3	4.42 ± 2.21 × 10⁻¹	4.42 ± 3.21 × 10⁻¹ (≈)	4.43 ± 2.30 × 10⁻¹ (≈)	4.48 ± 2.53 × 10⁻¹ (≈)	4.44 ± 3.20 × 10⁻¹ (≈)
	TF4	1.18 ± 3.60 × 10⁻²	1.21 ± 5.07 × 10⁻² (≈)	1.22 ± 6.77 × 10⁻² (≈)	1.20 ± 5.71 × 10⁻² (≈)	1.17 ± 5.23 × 10⁻² (≈)
	TF5	1.54 × 10² ± 2.80 × 10¹	1.50 × 10² ± 1.90 × 10¹ (≈)	1.61 × 10² ± 3.65 × 10¹ (≈)	1.61 × 10² ± 2.87 × 10¹ (≈)	1.52 × 10² ± 2.38 × 10¹ (≈)
+/≈/−		NA	0/5/0	1/4/0	1/4/0	0/5/0
100	TF1	8.27 × 10¹ ± 7.74 × 10¹	8.05 × 10¹ ± 3.77 × 10¹ (≈)	6.82 × 10¹ ± 1.39 × 10¹ (≈)	6.52 × 10¹ ± 2.13 × 10¹ (≈)	5.65 × 10¹ ± 1.18 × 10¹ (≈)
	TF2	1.89 × 10² ± 2.38 × 10¹	1.96 × 10² ± 4.46 × 10¹ (≈)	1.85 × 10² ± 1.15 × 10¹ (≈)	1.86 × 10² ± 2.58 × 10¹ (≈)	1.73 × 10² ± 1.80 × 10¹ (≈)
	TF3	4.22 ± 2.43 × 10⁻¹	4.34 ± 2.30 × 10⁻¹ (≈)	4.34 ± 2.51 × 10⁻¹ (≈)	4.32 ± 2.46 × 10⁻¹ (≈)	4.26 ± 2.37 × 10⁻¹ (≈)
	TF4	1.56 ± 1.70 × 10⁻¹	1.84 ± 3.39 × 10⁻¹ (+)	1.71 ± 3.29 × 10⁻¹ (≈)	1.76 ± 4.09 × 10⁻¹ (+)	1.74 ± 2.98 × 10⁻¹ (+)
	TF5	4.66 × 10² ± 9.26 × 10¹	4.61 × 10² ± 9.33 × 10¹ (≈)	4.85 × 10² ± 1.42 × 10² (≈)	5.29 × 10² ± 1.06 × 10² (≈)	4.98 × 10² ± 1.43 × 10² (≈)
+/≈/−		NA	1/4/0	0/5/0	1/4/0	1/4/0

Table 5. Comparisons between the GBS-DDEA variants with different numbers of initial models.

D	ID	GBS-DDEA (R = 2000)	GBS-DDEA (R = 1000)	GBS-DDEA (R = 1500)	GBS-DDEA (R = 2500)	GBS-DDEA (R = 3000)
10	TF1	1.09 ± 5.25 × 10⁻¹	8.96 × 10⁻¹ ± 2.94 × 10⁻¹ (≈)	8.37 × 10⁻¹ ± 2.89 × 10⁻¹ (≈)	9.08 × 10⁻¹ ± 3.86 × 10⁻¹ (≈)	9.27 × 10⁻¹ ± 3.63 × 10⁻¹ (≈)
	TF2	2.93 × 10¹ ± 3.29	3.37 × 10¹ ± 1.19 × 10¹ (≈)	2.74 × 10¹ ± 5.02 (≈)	2.51 × 10¹ ± 4.76 (≈)	2.99 × 10¹ ± 5.10 (≈)
	TF3	5.91 ± 8.03 × 10⁻¹	5.55 ± 8.36 × 10⁻¹ (≈)	6.29 ± 8.72 × 10⁻¹ (≈)	6.10 ± 1.06 (≈)	6.41 ± 5.07 × 10⁻¹ (≈)
	TF4	1.24 ± 1.02 × 10⁻¹	1.27 ± 1.12 × 10⁻¹ (≈)	1.32 ± 1.76 × 10⁻¹ (≈)	1.25 ± 1.12 × 10⁻¹ (≈)	1.25 ± 1.31 × 10⁻¹ (≈)
	TF5	6.81 × 10¹ ± 2.04 × 10¹	6.24 × 10¹ ± 1.94 × 10¹ (≈)	7.20 × 10¹ ± 2.22 × 10¹ (≈)	6.39 × 10¹ ± 1.23 × 10¹ (≈)	6.24 × 10¹ ± 1.91 × 10¹ (≈)
+/≈/−		NA	0/5/0	0/5/0	0/5/0	0/5/0
30	TF1	4.26 ± 9.55 × 10⁻¹	3.73 ± 1.29 (≈)	4.09 ± 1.73 (≈)	3.91 ± 8.45 × 10⁻¹ (≈)	4.00 ± 8.68 × 10⁻¹ (≈)
	TF2	5.97 × 10¹ ± 6.54	5.95 × 10¹ ± 5.07 (≈)	5.99 × 10¹ ± 4.92 (≈)	5.96 × 10¹ ± 4.81 (≈)	5.94 × 10¹ ± 7.15 (≈)
	TF3	5.17 ± 2.78 × 10⁻¹	4.49 ± 4.71 × 10⁻¹ (≈)	4.55 ± 3.54 × 10⁻¹ (≈)	4.87 ± 3.89 × 10⁻¹ (≈)	4.74 ± 5.96 × 10⁻¹ (≈)
	TF4	1.15 ± 5.50 × 10⁻²	1.25 ± 4.68 × 10⁻² (+)	1.22 ± 7.47 × 10⁻² (+)	1.18 ± 4.50 × 10⁻² (+)	1.19 ± 7.98 × 10⁻² (≈)
	TF5	1.16 × 10² ± 2.38 × 10¹	1.14 × 10² ± 3.11 × 10¹ (≈)	1.14 × 10² ± 1.97 × 10¹ (≈)	1.16 × 10² ± 2.51 × 10¹ (≈)	1.04 × 10² ± 1.98 × 10¹ (≈)
+/≈/−		NA	1/4/0	1/4/0	1/4/0	0/5/0
50	TF1	9.33 ± 1.37	9.24 ± 1.92 (≈)	8.93 ± 3.05 (≈)	1.01 × 10¹ ± 2.61 (≈)	8.67 ± 1.92 (≈)
	TF2	8.52 × 10¹ ± 3.06	8.29 × 10¹ ± 5.09 (≈)	8.88 × 10¹ ± 4.94 (≈)	8.63 × 10¹ ± 3.54 (≈)	8.12 × 10¹ ± 6.25 (≈)
	TF3	4.42 ± 2.21 × 10⁻¹	4.34 ± 3.16 × 10⁻¹ (≈)	4.32 ± 3.68 × 10⁻¹ (≈)	4.28 ± 3.67 × 10⁻¹ (≈)	4.47 ± 2.22 × 10⁻¹ (≈)
	TF4	1.18 ± 3.60 × 10⁻²	1.21 ± 5.77 × 10⁻² (≈)	1.18 ± 3.78 × 10⁻² (≈)	1.24 ± 1.05 × 10⁻¹ (+)	1.18 ± 4.25 × 10⁻² (≈)
	TF5	1.54 × 10² ± 2.80 × 10¹	1.54 × 10² ± 2.98 × 10¹ (≈)	1.55 × 10² ± 3.82 × 10¹ (≈)	1.52 × 10² ± 3.18 × 10¹ (≈)	1.56 × 10² ± 3.42 × 10¹ (≈)
+/≈/−		NA	0/5/0	0/5/0	1/4/0	0/5/0
100	TF1	8.27 × 10¹ ± 7.74 × 10¹	6.13 × 10¹ ± 1.71 × 10¹ (≈)	5.03 × 10¹ ± 1.29 × 10¹ (≈)	6.46 × 10¹ ± 2.57 × 10¹ (≈)	6.11 × 10¹ ± 3.50 × 10¹ (≈)
	TF2	1.89 × 10² ± 2.38 × 10¹	1.90 × 10² ± 3.77 × 10¹ (≈)	1.75 × 10² ± 2.81 × 10¹ (≈)	1.75 × 10² ± 2.42 × 10¹ (≈)	1.74 × 10² ± 2.79 × 10¹ (≈)
	TF3	4.22 ± 2.43 × 10⁻¹	4.25 ± 2.55 × 10⁻¹ (≈)	4.10 ± 2.24 × 10⁻¹ (≈)	4.11 ± 2.69 × 10⁻¹ (≈)	4.39 ± 1.89 × 10⁻¹ (≈)
	TF4	1.56 ± 1.70 × 10⁻¹	2.02 ± 5.92 × 10⁻¹ (+)	1.91 ± 7.18 × 10⁻¹ (≈)	1.86 ± 6.74 × 10⁻¹ (≈)	2.04 ± 9.61 × 10⁻¹ (≈)
	TF5	4.66 × 10² ± 9.26 × 10¹	4.52 × 10² ± 1.47 × 10² (≈)	4.56 × 10² ± 1.74 × 10² (≈)	4.13 × 10² ± 1.32 × 10² (≈)	4.63 × 10² ± 1.81 × 10² (≈)
+/≈/−		NA	1/4/0	0/5/0	0/5/0	0/5/0

Table 6. The definition of hyperparameters.

Hyperparameter	Search Range
Dropout rate	[0, 1]
Learning rate	[1 × 10⁻⁶, 1]
Text Embedding dimension	[32, 1024]

Table 7. Comparisons of HGT with different DDEAs for COA mining.

Hyperparameter	Micro F1	Macro F1
HGT	0.88	0.43
DDEA-SE-HGT	0.93	0.48
BDDEA-HGT	0.92	0.47
DDEA-PES-HGT	0.92	0.46
GBS-DDEA-HGT	0.94	0.51

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yuan, J.; Li, J.-Y. A Gumbel-Based Selection Data-Driven Evolutionary Algorithm and Its Application to Chinese Text-Based Cheating Official Accounts Mining. Algorithms 2025, 18, 643. https://doi.org/10.3390/a18100643

AMA Style

Yuan J, Li J-Y. A Gumbel-Based Selection Data-Driven Evolutionary Algorithm and Its Application to Chinese Text-Based Cheating Official Accounts Mining. Algorithms. 2025; 18(10):643. https://doi.org/10.3390/a18100643

Chicago/Turabian Style

Yuan, Jiheng, and Jian-Yu Li. 2025. "A Gumbel-Based Selection Data-Driven Evolutionary Algorithm and Its Application to Chinese Text-Based Cheating Official Accounts Mining" Algorithms 18, no. 10: 643. https://doi.org/10.3390/a18100643

APA Style

Yuan, J., & Li, J.-Y. (2025). A Gumbel-Based Selection Data-Driven Evolutionary Algorithm and Its Application to Chinese Text-Based Cheating Official Accounts Mining. Algorithms, 18(10), 643. https://doi.org/10.3390/a18100643

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Gumbel-Based Selection Data-Driven Evolutionary Algorithm and Its Application to Chinese Text-Based Cheating Official Accounts Mining

Abstract

1. Introduction

2. Background and Related Work of DDEA

2.1. DDEA

2.2. Related Work on Enhanced DDEAs

3. The Proposed GBS-DDEA

3.1. The Framework of GBS-DDEA

3.2. Model Training

3.3. GBS

3.4. RBWE

3.5. The Entire GBS-DDEA

4. Experimental Studies

4.1. Experiment Setup

4.2. Compared Advanced Algorithms

4.3. Comparison Study with DDEAs

4.4. Component Analysis of GBS-DDEA

4.5. Parameter Study of GBS-DDEA

4.6. Case Study of GBS-DDEA on Cheating Official Accounts Mining

4.7. Computational Efficiency of GBS-DDEA

4.8. Discussion on Limitations

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI