A Robust Hybrid Metaheuristic Framework for Training Support Vector Machines

Nejjar, Khalid; Jebari, Khalid; Rekiek, Siham

doi:10.3390/a19010070

Open AccessArticle

A Robust Hybrid Metaheuristic Framework for Training Support Vector Machines

by

Khalid Nejjar

^†,

Khalid Jebari

^*,†

and

Siham Rekiek

^†

IABL, FSTT, Abdelmalek Essaadi University, Tetouan 93000, Morocco

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Algorithms 2026, 19(1), 70; https://doi.org/10.3390/a19010070

Submission received: 13 December 2025 / Revised: 4 January 2026 / Accepted: 8 January 2026 / Published: 13 January 2026

(This article belongs to the Special Issue Data Sensing Techniques and Processing Algorithms for Smart and Sustainable Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Support Vector Machines (SVMs) are widely used in critical decision-making applications, such as precision agriculture, due to their strong theoretical foundations and their ability to construct an optimal separating hyperplane in high-dimensional spaces. However, the effectiveness of SVMs is highly dependent on the efficiency of the optimization algorithm used to solve their underlying dual problem, which is often complex and constrained. Classical solvers, such as Sequential Minimal Optimization (SMO) and Stochastic Gradient Descent (SGD), present inherent limitations: SMO ensures numerical stability but lacks scalability and is sensitive to heuristics, while SGD scales well but suffers from unstable convergence and limited suitability for nonlinear kernels. To address these challenges, this study proposes a novel hybrid optimization framework based on Open Competency Optimization and Particle Swarm Optimization (OCO–PSO) to enhance the training of SVMs. The proposed approach combines the global exploration capability of PSO with the adaptive competency-based learning mechanism of OCO, enabling efficient exploration of the solution space, avoidance of local minima, and strict enforcement of dual constraints on the Lagrange multipliers. Across multiple datasets spanning medical (diabetes), agricultural yield, signal processing (sonar and ionosphere), and imbalanced synthetic data, the proposed OCO-PSO–SVM consistently outperforms classical SVM solvers (SMO and SGD) as well as widely used classifiers, including decision trees and random forests, in terms of accuracy, macro-F1-score, Matthews correlation coefficient (MCC), and ROC-AUC. On the Ionosphere dataset, OCO-PSO achieves an accuracy of

95.71 %

, an F1-score of

0.954

, and an MCC of

0.908

, matching the accuracy of random forest while offering superior interpretability through its kernel-based structure. In addition, the proposed method yields a sparser model with only 66 support vectors compared to 71 for standard SVC (a reduction of approximately

7 %

), while strictly satisfying the dual constraints with a near-zero violation of

1.3 \times 10^{- 3}

. Notably, the optimal hyperparameters identified by OCO-PSO (

C = 2

,

γ \approx 0.062

) differ substantially from those obtained via Bayesian optimization for SVC (

C = 10

,

γ \approx 0.012

), indicating that the proposed approach explores alternative yet equally effective regions of the hypothesis space. The statistical significance and robustness of these improvements are confirmed through extensive validation using 1000 bootstrap replications, paired Student’s t-tests, Wilcoxon signed-rank tests, and Holm–Bonferroni correction. These results demonstrate that the proposed metaheuristic hybrid optimization framework constitutes a reliable, interpretable, and scalable alternative for training SVMs in complex and high-dimensional classification tasks.

Keywords:

support vector machine; metaheuristics; particle swarm optimization; open competency optimization; Lagrange multipliers; sequential minimal optimization; yield crop prediction

1. Introduction

Agriculture plays a vital role in sustainable human livelihoods, rural development, and global food security, particularly in regions exposed to increasing climatic variability and limited natural resource constraints. According to the Food and Agriculture Organization of the United Nations (FAO), global food demand is expected to increase by nearly 60% by 2050 to meet the needs of an estimated population of 9.3 billion [1]. In this context, improving the reliability, accuracy, and scalability of predictive models has become a critical requirement for decision-making in precision agriculture. Recent advances in machine learning (ML) and deep learning (DL) have significantly transformed agricultural analytics by enabling data-driven modeling of complex, nonlinear interactions between climate conditions, soil characteristics, and crop physiology [2,3,4]. Consequently, there is a growing demand for robust classification models capable of handling heterogeneous, high-dimensional, and noisy data while maintaining interpretability and computational efficiency.

1.1. Motivation and Incitement

Modern agricultural systems increasingly rely on precise and automated decision-support tools for tasks such as crop yield prediction, disease detection, water stress monitoring, and species identification [5,6]. Traditional yield forecasting approaches, which are largely based on expert knowledge and limited environmental indicators, fail to capture the complex and nonlinear dependencies inherent in agricultural ecosystems [4]. While deep learning models offer powerful representational capabilities, their deployment is often constrained by data availability, computational cost, and limited interpretability, particularly in resource-constrained agricultural settings. These limitations motivate the continued investigation of robust, theoretically grounded alternatives such as Support Vector Machines (SVMs).

1.2. The Relevant Literature

Among supervised learning methods, SVMs and random forests (RFs) remain widely adopted due to their strong generalization ability and robustness to overfitting, whereas deep learning architectures such as Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks have demonstrated superior performance in modeling spatial and temporal agricultural data [2,3]. Shawon et al. [4] reported that linear regression, RF, and gradient boosting trees are among the most frequently used techniques for crop yield forecasting. In contrast, Meghraoui et al. [2] highlighted the effectiveness of CNNs and LSTMs in exploiting spatiotemporal information.

SVMs are particularly attractive due to their solid theoretical foundations and effectiveness in high-dimensional and nonlinear classification tasks [7]. However, their performance is strongly influenced by the optimization strategy used to solve the underlying convex problem [8]. Classical solvers such as Sequential Minimal Optimization (SMO) and Stochastic Gradient Descent (SGD) [9,10] suffer from scalability and stability issues when applied to complex, noisy, or large-scale datasets [11]. To address these challenges, metaheuristic-based approaches have been investigated. Early work by Paquet and Engelbrecht [12] demonstrated the feasibility of Particle Swarm Optimization (PSO) for SVM training, while Dias and Rocha Neto [13] showed that simulated annealing in the dual space can produce near-optimal solutions with limited Karush–Kuhn–Tucker (KKT) violations. Nevertheless, most existing studies focus primarily on hyperparameter optimization rather than on directly addressing the constrained dual SVM problem [14].

Particle Swarm Optimization (PSO) has been widely investigated as an efficient metaheuristic for solving complex and nonconvex optimization problems. Since its original formulation, numerous PSO variants have been proposed to enhance convergence speed, maintain population diversity, and avoid premature stagnation. Among these developments, hybrid and multi-swarm PSO algorithms have attracted particular attention. For example, Hybrid Multi-Swarm PSO (HMSPSO) coordinates multiple interacting swarms to improve global exploration and robustness in complex search spaces [15]. Other approaches incorporate evolutionary operators inspired by genetic algorithms, such as crossover and mutation, to enhance diversity and balance exploration and exploitation. A comprehensive review by Engelbrecht [16] highlights the effectiveness of PSO variants with crossover mechanisms and analyzes their empirical convergence behavior. Related hybrid frameworks, including RCGA-PSO, CBHPSO, PSO with mutation, and cooperative swarm strategies, further demonstrate the flexibility of PSO in addressing constrained and high-dimensional optimization problems.

Beyond PSO-based methods, other evolutionary algorithms have been successfully combined with Support Vector Machines (SVMs) to improve classification performance through enhanced parameter optimization. Genetic Algorithms (GA) have been extensively used to optimize SVM hyperparameters and feature subsets due to their strong global search capability and robustness to local optima. Similarly, Differential Evolution (DE) has been applied to SVM optimization, showing competitive performance thanks to its simple structure, fast convergence, and effectiveness in continuous parameter spaces. These evolutionary approaches highlight the importance of advanced optimization strategies in improving SVM training, particularly when solving the dual optimization problem under constraints.

Motivated by these advances, the present work proposes a novel hybrid framework that combines Particle Swarm Optimization with Open Competency Optimization (OCO). Unlike existing hybrid PSO approaches, the proposed OCO–PSO framework introduces a competency-driven learning mechanism that adaptively guides the swarm while explicitly enforcing the dual constraints of the SVM formulation. This design aims to enhance convergence stability, avoid local minima, and improve solution quality, thereby positioning the proposed method within the broader family of evolutionary and swarm-based optimization techniques for SVM training.

1.3. Major Research Gaps

Despite these advances, several important limitations remain unresolved. First, existing SVM solvers struggle to jointly ensure scalability, numerical stability, and strict satisfaction of dual constraints. Second, metaheuristic approaches are often employed as black-box optimizers, without explicitly exploiting the structure of the SVM dual formulation. Third, limited attention has been devoted to achieving sparse and well-distributed dual solutions that improve interpretability and robustness, particularly for imbalanced and heterogeneous agricultural datasets.

1.4. Contributions and Organization of the Paper

In this work, we propose the OCO-PSO hybrid optimization framework to enhance SVM training. Compared to standard SVM solvers such as SMO, SGD, decision trees, or random forests, OCO-PSO offers several advantages: it efficiently explores the solution space, avoids local minima, and strictly enforces dual constraints on Lagrange multipliers. These features result in improved predictive performance, sparser models, and better interpretability. The main limitation is the higher computational cost during training due to constrained global optimization, although prediction remains highly efficient. By explicitly addressing these strengths and trade-offs, our approach provides a robust alternative for classification tasks even with small- to medium-sized datasets.

To address the aforementioned challenges, this paper proposes a novel hybrid optimization framework, termed OCO-PSO, which combines Particle Swarm Optimization (PSO) with Open Competence Optimization (OCO) for solving the dual SVM problem. In this framework, OCO is reinterpreted as a diversification operator that introduces controlled perturbations to the Lagrange multiplier vector, thereby enhancing exploration while preserving feasibility. The proposed method enforces the equality constraint

\sum_{i} α_{i} y_{i} = 0

at each iteration and corrects numerical drift through orthogonal projection, resulting in reduced KKT violations. In addition, the SVM hyperparameters (C and

γ

) are jointly optimized using Bayesian optimization.

Extensive experiments conducted on five heterogeneous datasets from medical, signal processing, agricultural, and imbalanced synthetic domains demonstrate that OCO-PSO consistently outperforms classical SVM solvers and widely used classifiers in terms of accuracy, robustness, and model parsimony. In particular, for crop yield prediction, the proposed approach achieves an accuracy of 89.17% and an MCC of 0.786, while producing significantly sparser models and maintaining near-zero constraint violations. These improvements are statistically validated, confirming the reliability and practical relevance of the proposed framework.

The remainder of this paper is organized as follows. Section 2 reviews related work on SVM optimization. Section 3 presents the proposed OCO-PSO algorithm. Section 4 and Section 5 describe the experimental setup and report the results. Section 6 discusses the findings, and Section 7 concludes the paper and outlines future research directions.

2. Support Vector Machines

Before presenting the formulation of Support Vector Machines (SVMs), it is now appropriate to provide an overview of the proposed methodological framework. The objective of this study is to enhance SVM learning by addressing the limitations of classical optimization solvers commonly used to solve their dual formulation. Indeed, the choice of SVM as the cornerstone of our framework is motivated by several fundamental principles:

SVMs are grounded in solid mathematical foundations derived from statistical learning theory. Their ability to determine the optimal separating hyperplane by maximizing the margin between classes constitutes an elegant approach that promotes strong generalization capabilities, even when training data are limited.
Their robustness to overfitting is particularly noteworthy, especially in high-dimensional spaces where the number of features may exceed the number of samples. Moreover, SVMs can efficiently handle nonlinear problems through the use of kernel functions, which implicitly project data into higher-dimensional spaces without incurring expensive explicit computations. This property is especially valuable for processing heterogeneous data typically encountered in agricultural applications.
In agricultural contexts, historical yield data are often limited in size but rich in explanatory variables. SVMs maintain strong predictive performance even with small training samples, in contrast to methods that require large amounts of data, such as deep neural networks.
Agricultural data are frequently affected by noise (e.g., sensor errors or sporadic extreme weather conditions). The SVM regularization parameter C provides a principled mechanism for controlling sensitivity to outliers, which is essential for producing reliable predictions.

However, the effectiveness of SVMs strongly depends on the stability and constraint-handling capability of the underlying optimization process. In this context, we propose a hybrid optimization framework based on Open Competency Optimization (OCO) and Particle Swarm Optimization (PSO), designed to directly solve the SVM dual problem while strictly enforcing both box and equality constraints on the Lagrange multipliers. The global exploration capability of PSO is combined with the adaptive learning mechanisms of OCO to enhance convergence stability, mitigate premature convergence to local minima, and efficiently explore the hypothesis space. Within this framework, the SVM serves as the central decision model, whose performance is enhanced through the proposed OCO-PSO optimization strategy, rather than acting as a standalone generic classifier.

Support Vector Machines (SVMs) are a robust family of established supervised learning algorithms, predominantly employed for classification and regression. Introduced by Vapnik and Cortes [17], the main objective of an SVM is to find the optimal separation hyperplane that maximizes the margin between the different data classes. The underlying theory of SVMs naturally leads to a quadratic optimization problem, which can be formulated in two equivalent ways: the primal (direct) form and the dual form obtained by Lagrangian duality.

2.1. Primal Formulation of SVM

Consider a training dataset labeled

{(x_{i}, y_{i})}_{i = 1}^{n}

, where

x_{i} \in R^{d}

and

y_{i} \in {- 1, + 1}

. The main formulation aims to determine the parameters

(w, b)

of a decision boundary

w \cdot x + b = 0

. The objective is to maximize the margin while accounting for classification errors using slack variables. This objective is formalized as the following soft margin constraint optimization problem:

\begin{matrix} min_{w, b, ξ} & \frac{1}{2} {∥ w ∥}^{2} + C \sum_{i = 1}^{n} ξ_{i} \\ subject to & y_{i} (w \cdot x_{i} + b) \geq 1 - ξ_{i} \forall i, \\ ξ_{i} \geq 0 \forall i, \end{matrix}

(1)

where:

${∥ w ∥}^{2}$ : The decision hyperplane normal vector is inversely proportional to the margin width; minimizing this term maximizes the margin.
$ξ_{i}$ : Slack variables added to allow misclassification of difficult or noisy data points (soft-margin approach).
$C > 0$ : The regularization parameter that balances the trade-off between margin maximization and misclassification penalties.

2.2. Dual Formulation and Kernel Trick

To avoid the computational complexities of the primal problem, particularly in high-dimensional spaces, it is generally transformed using Lagrange multipliers

α_{i} \geq 0

. Assuming strong duality and satisfaction of the Karush–Kuhn–Tucker (KKT) conditions, this leads to the dual formulation:

\begin{matrix} max_{α} & \sum_{i = 1}^{n} α_{i} - \frac{1}{2} \sum_{i, j = 1}^{n} α_{i} α_{j} y_{i} y_{j} K (x_{i}, x_{j}) \\ subject to & \sum_{i = 1}^{n} α_{i} y_{i} = 0, \\ 0 \leq α_{i} \leq C \forall i . \end{matrix}

(2)

This dual formulation offers two major advantages:

Parsimony and Computational Efficiency: Only data points for which $α_{i} > 0$ —termed support vectors—contribute to defining the decision boundary. This characteristic naturally introduces the parsimony of the model and considerably reduces the dependence on the size of the set of complete data.
Flexibility via the Kernel Trick: The implicit linear dot product $x_{i} \cdot x_{j}$ can be replaced by a kernel function $K (x_{i}, x_{j})$ , such as the radial basis (RBF) or polynomial kernels. This kernel trick allows SVMs to efficiently learn complex and nonlinear decision boundaries by projecting the data into a high-dimensional feature space.

2.3. Critical Review of Dominant Training Methods for SVMs

Although SMO and SGD represent the dominant paradigms for SVM training, neither method provides a fully satisfactory solution when confronted with large-scale, nonlinear, or heterogeneous datasets. Their respective strengths are offset by fundamental limitations that restrict their applicability in complex real-world scenarios.

2.3.1. Sequential Minimal Optimization (SMO)

SMO, which was first detailed in the work of Platt [9], greatly improved SVM training by solving the dual optimization problem through a series of sub-problems that could be solved analytically and that only involved two Lagrange multipliers. With this fundamental development, the need for large, external quadratic programming (QP) solvers has been completely removed, which is the main reason for LIBSVM-type implementations.

The efficiency of SMO relies heavily on heuristics for selecting the multiplier pair

α_{i}

and

α_{j}

. This choice is mainly directed by two criteria:

Violation of KKT Conditions: Multipliers associated with examples that are misclassified or are too close to the decision boundary should be given the highest priority as these are the instances which probably require margin refinement the most.
Maximizing the Step Size: The second multiplier is chosen such that it maximizes the move towards the optimum; very often, this is performed by picking the instance of the class opposite to that of the first ( $y_{i} \neq y_{j}$ ) for the second.

While SMO guarantees convergence and numerical stability, its dependence on heuristics makes it sensitive to dataset characteristics.

2.3.2. Stochastic Gradient Descent (SGD)

In contrast to the dual SMO approaches and decomposition methods, SGD directly addresses the primal formulation of the SVM optimization problem (Equation (1)). It minimizes the objective function through iterative updates based on randomly sampled instances (or mini-batches) from the dataset.

The Pegasos (Primal Estimated sub-GrAdient SOLver) algorithm [18] has proven particularly effective for large-scale SVM training because its computational complexity depends primarily on the data dimension D rather than the dataset size N. The algorithm updates the weight vector

w

iteratively:

w_{t + 1} = w_{t} - η_{t} \nabla J (w_{t}; x_{i}, y_{i})

(3)

where

η_{t}

is a learning rate determined by a predefined schedule. SGD is highly effective for large-scale linear classification but faces significant challenges when applied to nonlinear kernels, as the weight vector

w

cannot be explicitly represented in infinite-dimensional feature spaces.

2.3.3. Analysis and Critical Discussion

Whilst Support Vector Machines are theoretically well-founded and widely used, their practical performance is strongly influenced by the efficiency and robustness of the underlying optimization algorithms. Among existing approaches, Sequential Minimal Optimization (SMO) and Stochastic Gradient Descent (SGD) remain the most commonly adopted paradigms. However, a closer examination reveals that neither method provides a fully satisfactory solution when confronted with large-scale, nonlinear, or heterogeneous datasets. Their respective strengths are offset by inherent limitations that restrict their applicability in complex real-world scenarios:

Sequential Minimal Optimization (SMO), significantly improved SVM training by decomposing the dual optimization problem into a sequence of analytically solvable sub-problems involving only two Lagrange multipliers. This strategy eliminates the need for large-scale quadratic programming solvers and has enabled efficient implementations, such as LIBSVM.
Despite its numerical stability and guaranteed convergence, SMO suffers from several well-documented limitations. First, its scalability is severely constrained for large datasets, as the number of iterations can grow super-linearly with the training size. Second, the convergence speed is highly sensitive to the working set selection heuristics, which are problem-dependent and may lead to suboptimal convergence behavior [19]. Third, SMO performance degrades significantly in the presence of dense kernel matrices or low-sparsity solutions, where a large number of support vectors must be maintained. As a result, while SMO is effective for small- to medium-sized problems, its efficiency and robustness diminish in high-dimensional or large-scale settings.
Stochastic Gradient Descent (SGD) operates directly on the primal SVM formulation (Equation (1)) by performing iterative updates based on randomly sampled data points or mini-batches. Algorithms such as Pegasos have demonstrated excellent scalability, as their computational complexity depends primarily on the data dimensionality rather than the dataset size.
However, SGD exhibits critical limitations in the context of kernel-based SVMs. Most notably, it suffers from the so-called “curse of kernelization” [20]: when nonlinear kernels are employed, all support vectors must be updated at each iteration, effectively nullifying the computational advantages of stochastic optimization. Furthermore, SGD is prone to oscillatory convergence behavior near the optimum due to the inherent variance in stochastic gradients [21], requiring careful tuning of learning rate schedules and stopping criteria. These issues limit the reliability and robustness of SGD for nonlinear, high-precision SVM training.

2.3.4. Critical Analysis and Research Gap

The above analysis highlights a fundamental trade-off in existing SVM training methods between numerical stability, scalability, and flexibility. While SMO prioritizes stability and exact constraint satisfaction, it struggles with scalability and heuristic sensitivity. Conversely, SGD offers scalability but sacrifices robustness and kernel compatibility.

More importantly, most existing approaches focus primarily on convergence speed, often overlooking other critical aspects of model quality such as sparsity, constraint satisfaction, and interpretability—properties that are particularly important in regulated domains such as agriculture and healthcare. Although metaheuristic optimization methods have been widely explored for hyperparameter tuning and feature selection, their use as direct solvers of the dual SVM optimization problem remains largely underexplored. This is mainly due to the difficulty of enforcing strict equality constraints, such as

\sum_{i = 1}^{n} α_{i} y_{i} = 0

, within population-based optimization frameworks.

This gap motivates the exploration of specialized hybrid metaheuristic solvers capable of navigating the high-dimensional space of Lagrange multipliers while explicitly enforcing SVM dual constraints. By integrating global search capabilities with adaptive learning mechanisms, such approaches have the potential to overcome the limitations of classical solvers and provide more robust, interpretable, and flexible SVM training strategies [22].

3. Proposed Hybrid Approach: The OCO-PSO Model

Optimization of Lagrange multipliers (

α_{i}

) in the context of SVM is a constrained quadratic programming problem, usually solved by sequential methods such as SMO. We propose OCO-PSO, a novel hybrid metaheuristic that synergistically integrates the global search capabilities of Particle Swarm Optimization with the diversity management mechanisms of Open Competency Optimization [23] and paired analytical update strategies that preserve constraint feasibility.

The OCO-PSO framework incorporates three fundamental mechanisms to address the dual optimization structure while maintaining KKT compliance:

1.: Constraint-Preserving Paired Updates: The particles are updated by a paired mechanism operating on instances of opposite classes. For each selected pair $(j_{1}, j_{2})$ with $y_{j_{1}} \neq y_{j_{2}}$ , one component follows standard PSO dynamics in analytically derived feasible limits, while its counterpart is fitted to maintain the equality constraint $\sum_{i = 1}^{n} α_{i} y_{i} = 0$ .
2.: Analytical Boundary Computation: The regions eligible for matched updates are determined analytically from the geometry of the constraints, ensuring the feasibility of the trajectory without penalty-based mechanisms.
3.: Adaptive Diversity Control via OCO: The periodic application of OCO operators [23] provides competition-based mutation and crossover strategies that maintain population diversity and prevent premature convergence in high-dimensional dual spaces, while constraint management is handled independently through the paired update mechanism.

3.1. Formulation of the Optimization Problem

The OCO-PSO model aims to minimize the opposite of the dual-objective SVM function defined in Equation (3), which leads to the following minimization problem:

min_{α} L (α) = min_{α \in {[0, C]}^{n}} [- \sum_{i = 1}^{n} α_{i} + \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} α_{i} α_{j} y_{i} y_{j} K (x_{i}, x_{j})],

(4)

Compliance with the constraints and equality limits (Equations (2) and (3)) is not ensured by penalties in the objective function but by mechanisms integrated into the iterative process:

Paired Update of Opposite Instances $(j_{1}, j_{2})$ : The iteration is performed by selecting pairs of instances where $y_{j_{1}} \neq y_{j_{2}}$ .
Adaptive determination of boundaries $[L, H]$ during paired updates $(j_{1}, j_{2})$ is based on constraint geometry.
Orthogonal projection onto the hyperplane $\sum_{i} α_{i} y_{i} = 0$ , guaranteeing the equality constraint.
Clipping in the interval $[0, C]$ with each particle update.

3.2. Algorithmic Framework

The OCO-PSO scheme orchestrates three complementary search mechanisms (Algorithm 1): global exploration driven by PSO with constraint-preserving updates, diversity management based on OCO, and local exploitation of matched coordinates.

Algorithm 1 OCO-PSO Framework

Require: Training set

{(x_{i}, y_{i})}_{i = 1}^{n}

, regularization parameter C, kernel K

Ensure: Optimal multipliers

α^{*}

1: Initialize swarm:

α_{i} \leftarrow U (0, C)

,

v_{i} \leftarrow 0

,

{pbest}_{i} \leftarrow α_{i}

,

gbest \leftarrow arg min L (α_{i})

2: for

t = 1

to

T_{max}

do

3: Update inertia:

w \leftarrow w_{max} - (w_{max} - w_{min}) \cdot t / T_{max}

4: for each particle i do

5: PairedCoordinateExploitation

(α_{i}, v_{i})

▹ Algorithm 2 (constraint handling)

6:

α_{i} \leftarrow clip (α_{i}, 0, C)

7: if

L (α_{i}) < L ({pbest}_{i})

then

8:

{pbest}_{i} \leftarrow α_{i}

9: end if

10: if

L (α_{i}) < L (gbest)

then

11:

gbest \leftarrow α_{i}

12: end if

13: end for

14: if

t mod Δ_{OCO} = 0

then

15: for each particle i do

16: ApplyOCODiversification

(α_{i})

▹ Algorithm 3 (diversity management)

17: end for

18: Replace worst particle with

gbest

(elitism)

19: end if

20: end for

21: return

gbest

The algorithmic workflow clearly separates concerns (Figure 1): constraint management is handled by the matched coordinate mechanism (lines 5–6), while population diversity is maintained through periodic OCO interventions (lines 10–14). This modular design allows for independent adjustment of the exploration–exploitation balance and diversity preservation.

3.3. PSO Dynamics with Paired-Coordinate Constraint Management

Each particle represents a complete Lagrange multiplier vector

α

in the dual SVM formulation. The velocity

v_{i}

is updated using the standard PSO formula [24]

v_{i} (t + 1) = w (t) v_{i} (t) + c_{1} r_{1} ⊙ ({pbest}_{i} - α_{i} (t)) + c_{2} r_{2} ⊙ (gbest - α_{i} (t)),

(5)

where

w (t)

denotes time-varying inertia,

c_{1}, c_{2}

are acceleration coefficients, and

r_{1}, r_{2} \sim U {(0, 1)}^{n}

provide stochastic perturbations.

To intensify exploitation while respecting dual constraints, we integrate a paired-coordinate update mechanism (Algorithm 2) that operates on the constraint manifold. At each micro-iteration, two indices

j_{1}, j_{2}

corresponding to opposite-class instances are randomly selected. The feasible bounds for

α_{j_{2}}

are derived from the constraint geometry. Denoting the linear constraint

α_{j_{1}} y_{j_{1}} + α_{j_{2}} y_{j_{2}} = γ

(where

γ

remains constant during the paired update), the feasible interval

[L, H]

is determined as

\begin{matrix} \{\begin{matrix} L = max (0, α_{j_{2}} - α_{j_{1}}), & H = min (C, C + α_{j_{2}} - α_{j_{1}}), if y_{j_{1}} \neq y_{j_{2}}, \\ L = max (0, α_{j_{1}} + α_{j_{2}} - C), & H = min (C, α_{j_{1}} + α_{j_{2}}), if y_{j_{1}} = y_{j_{2}} . \end{matrix} \end{matrix}

(6)

These bounds ensure that any update to

α_{j_{2}}

within

[L, H]

admits a corresponding feasible value for

α_{j_{1}}

that satisfies both the box constraints and the linear dependency. After applying the PSO velocity update to

α_{j_{2}}

and clipping to

[L, H]

, we compute

α_{j_{1}}

analytically to preserve equality constraint satisfaction:

α_{j_{1}}^{new} = α_{j_{1}} + y_{j_{1}} y_{j_{2}} (α_{j_{2}} - α_{j_{2}}^{new}) .

(7)

This analytical adjustment ensures exact constraint satisfaction throughout the search trajectory, effectively decomposing the n-dimensional constrained problem into a sequence of two-dimensional sub-problems on the constraint manifold.

Algorithm 2 Paired-Coordinate Local Exploitation

Require:

α

,

v

,

y

, K, C

1: for

k = 1

to

N_{local}

do

2: Sample

j_{1}, j_{2} \sim Uniform ({1, \dots, n})

,

j_{1} \neq j_{2}

,

y_{j_{1}} \neq y_{j_{2}}

3: Compute feasible bounds

[L, H]

via Equation (6)

4:

v_{j_{2}} \leftarrow w \cdot v_{j_{2}} + c_{1} r_{1} ({pbest}_{j_{2}} - α_{j_{2}}) + c_{2} r_{2} ({gbest}_{j_{2}} - α_{j_{2}})

5:

α_{j_{2}}^{new} \leftarrow clip (α_{j_{2}} + v_{j_{2}}, L, H)

6:

α_{j_{1}}^{new} \leftarrow α_{j_{1}} + y_{j_{1}} \cdot y_{j_{2}} \cdot (α_{j_{2}} - α_{j_{2}}^{new})

▹ Constraint preservation

7: end for

8: return

α

3.4. Open Competency Optimization (OCO)

The OCO strategy, triggered periodically, improves swarm diversity and facilitates convergence towards valid support vector configurations through three learning mechanisms complementarily (Algorithm 3). All operations manipulate pairs

(i, j)

of Lagrange multipliers with opposing labels (

y_{i} \neq y_{j}

) to preserve the equality constraint

\sum_{i = 1}^{n} α_{i} y_{i} = 0

.

Algorithm 3 OCO Learning Strategy

Require: Particle

α

, velocity

v

, swarm

A

, global best

α^{gbest}

, pair set

P

, parameter C

Ensure:

α^{new}

,

v^{new}

1: Select learning mechanism based on particle fitness rank and preset probabilities

2: if Self-Learning selected then

3: if random() < 0.5 then

4: SelectiveActivation

(α)

▹ Section 3.4.1

5: else

6: EnergyRedistribution

(α)

▹ Section 3.4.1

7: end if

8: else if Neighbor-Learning selected then

9: Sample random peer

α_{2} \in A ∖ {α}

10: PairwiseBlendingCrossover

(α, α_{2})

▹ Section 3.4.2

11: else if Leadership-Learning selected then

12: Compute distance

D \leftarrow {∥ α - α^{gbest} ∥}_{2}

13: AdaptiveLeaderCrossover

(α, α^{gbest}, D)

▹ Section 3.4.3

14: end if

15: ProjectAndClip

(α)

▹Section 3.4.4

3.4.1. Self-Learning (Mutation)

This intelligent mutation diversifies particle exploration through two complementary procedures applied with probability

P_{self}

:

Selective Activation identifies inactive pairs $(i, j)$ , where $α_{i}, α_{j} < ε_{low}$ and randomly activates two pairs. The activity levels are reassigned to a uniform random number u from a small interval of the constant C (e.g., $u \sim U (0.01 C, 0.1 C)$ ), and independent Gaussian noises $N (0, σ_{v}^{2})$ are added to $v_{i}$ and $v_{j}$ .
Energy Redistribution transfers activity conservatively from the richest active pair $(i_{S}, j_{S})$ to a random inactive pair $(i_{T}, j_{T})$ using

$Δ α = min (0.2 α_{i_{S}}, 0.2 α_{j_{S}})$

(8)

Velocities are adjusted accordingly to maintain dynamic balance.

A cooldown mechanism is used to prevent immediate activation.

3.4.2. Neighbor Learning (Peer Crossover)

Particle

α_{1}

exchanges information with a randomly selected peer

α_{2}

through pairwise blending. With probability

P_{crossover} \approx 0.7

, each pair

(i, j)

is blended with a 50% chance:

This procedure introduces a conservative transfer of activity (or “energy”) between components. The set of all pairs P is divided into two subsets—active pairs (

P_{active}

) and inactive pairs (

P_{inactive}

)—based on their total activity level

(α_{i} + α_{j})

. If both subsets contain elements, the procedure selects the richest source pair

(i_{S}, j_{S})

from

P_{active}

and a random target pair

(i_{T}, j_{T})

from

P_{inactive}

. The transferable amount of activity

Δ α

is computed as

α_{child, i} = w α_{2, i} + (1 - w) α_{1, i}, w \sim U (0.3, 0.7)

(9)

At the same time, the corresponding velocities v are also updated to preserve dynamic inertia: the source components are slowed down (

δ

), while the target components get an equivalent boost (

δ

).

3.4.3. Leadership Interaction (Leader Crossover)

Particles are guided toward the global best

α_{gbest}

with adaptive blending. The blend factor depends on distance

D = ∥ α - α_{gbest} ∥

:

w \sim \{\begin{matrix} U (0.6, 0.9) & if D > τ_{D} (strong attraction, τ_{D} = 0.1 threshold) \\ U (0.2, 0.4) & otherwise (refinement) \end{matrix}

(10)

With probability

P_{crossover} \approx 0.8

, 90% of pairs are blended toward the leader. For each pair

(i, j) \in P

with opposing labels, blending is applied with a probability of

0.9

:

α_{child, i} = ω \cdot α_{i}^{gbest} + (1 - ω) \cdot α_{i}, α_{child, j} = ω \cdot α_{j}^{gbest} + (1 - ω) \cdot α_{j} .

(11)

The remaining

10 %

of pairs retain their original values to preserve diversity.

3.4.4. Constraint Enforcement

After applying one of these mechanisms, the new position is immediately subjected to orthogonal projection and clipping on [0, C] to respect the constraints of the dual problem. A cooldown system is used to prevent the immediate reactivation of pairs that did not lead to an improvement in objective function after mutation.

4. Experimental Studies

In this section, we evaluate the performance of the OCO-PSO algorithm, a dual SVM solver and classifier. We compared it with two popular SVM solvers, SVC and SGD Classifier, to test the efficiency of the optimization. We also compare with the non-SVM methods, decision tree and random forest, in order to provide competition with the wider machine learning approaches. The experimental design has four key purposes.

Predictive Performance: Accuracy, F1-score, ROC-AUC, and Matthews correlation coefficient (MCC) were computed on a number of datasets.
Verification of Optimal: Check whether KKT conditions are satisfied to verify the optimality of the solution.
Model Sparsity: Compare the number of support vectors to the SVC to determine sparseness and generalization.
Reproducibility: Guaranteeing transparency by controlled random seeds, standardized dataset partitioning, and similar parameter initialization.

4.1. Dataset Composition

The empirical study was performed on five datasets with different structural complexity to investigate the robustness of OCO-PSO for fine-tuning in terms of varying dimensionality, sample size, class imbalance, and decision boundary complexity. The main properties per dataset are listed in Table 1.

4.1.1. Diabetes Dataset

The diabetes dataset [25] from the UCI machine learning repository predicts the occurrence of diabetes in Pima Indian women based on eight clinical characteristics: pregnancies, blood glucose, blood pressure arterial pressure, skinfold thickness, insulin levels, BMI, diabetes pedigree, and age. After stratification, the training set contains 614 instances (400 negative, 214 positive; imbalance ratio, 1.87:1) and the test set has 154 instances. With 35.1% minority class representation and moderate dimensionality, this dataset provides a balanced testbed for evaluating nonlinear feature interactions.

4.1.2. Synthetic Agricultural Yield Dataset

The synthetic agricultural yield dataset [26] (Kaggle) simulates agricultural conditions with 599 instances and 6 features: soil quality, sunny days, rainfall, seed variety, fertilizer amount, and irrigation schedule. Originally designed for regression, we convert the continuous yield target into binary classes using the mean threshold (

\bar{Y}

): Class 1 (High Yield) for

Yield \geq \bar{Y}

and Class 0 (Low Yield) otherwise.

4.1.3. Sonar Dataset

The sonar dataset [27] (UCI) distinguishes sonar returns from metal cylinders versus rocks using 60 spectral features from frequency band energy measurements (208 samples total: 111 metal, 97 rock). With a high feature-to-sample ratio (

p / n \approx 0.29

), this dataset tests OCO-PSO’s generalization capacity under high-dimensional, low-sample conditions with complex spectral interactions.

4.1.4. Ionosphere Dataset

The ionosphere dataset [28] (UCI) is based on radar signals that detect ionospheric anomalies; it contains 351 samples with 34 continuous features from pulse sequences. Binary classification separates the “good” echoes (structured ionospheric layer) from the “bad” echoes (unstructured/noisy signals). Its relatively small dimensionality relative to the correlation of sample size characteristics and no missing values make it interesting for the assessment of the robustness of the OCO-PSO algorithm in the presence of low-dimensional features with nonlinear separability.

4.1.5. Imbalanced Dataset

The imbalanced_data dataset is synthetically produced instance data using scikit-learn make_classification [29], with 500 samples and 2 informative features. This set presents a significant class imbalance: 90% in Class 0 and only 10% in Class 1. The low dimensionality makes it possible to observe decision boundaries and expose the behavior of margins even when imbalance is extreme.

4.2. Preprocessing Pipeline and Reproducibility

All datasets were preprocessed in standardized preprocessing to avoid data leakage and ensure reproducibility. This preprocessing is guided by common sense learning practices. Automatic: data stratification before any transformation, strict separation of phase adjustment and transformation, and exhaustive logging (Table 2).

1.

Stratified Train/Test Split: Each dataset was divided into training (80%) and test (20%) sets using scikit-learn’s train_test_split with stratification to preserve the distribution of classes. A fixed random seed (random_state = 42) is used to guarantee deterministic results. All the following preprocessing steps are tailored exclusively to the training data and applied to the data testing to prevent information leakage.

2.

Missing Value Imputation: A tiered strategy based on missing data percentage:

Features with >50% missing values: removed;
Features with 15–50% missing: KNN imputation (numeric) or mode (categorical);
Features with <15% missing: median (numeric) or mode (categorical).

All imputers are fitted on training data only.

3.

Categorical Encoding: Low cardinality characteristics (≤10 unique values) are one-hot encoded with consistent column alignment. High cardinality variables use label encoding fitted for the training data, with unseen test categories being associated with the most frequent training label.

4.

Class Imbalance Handling: When the proportion of the minority class falls below 20%, resampling techniques (random subsampling, oversampling, or SMOTE) are applied only to training data, thus preserving the original test distribution.

5.

Feature Standardization: Numeric features are standardized using StandardScaler to achieve zero mean and unit variance, improving numerical stability.

6.

Multicollinearity Reduction: Feature pairs with correlation

| r | > 0.9

are identified in training data, and one feature from each pair is removed from both training and test sets.

7.

Reproducibility: The pipeline is implemented as a modular class (ML Preprocessing Pipeline) with serialization of all tuned transformers, processed datasets, and comprehensive metadata, allowing for exact reproduction in different environments and iterations.

Table 2. Final characteristics of the datasets after preprocessing. The reported values correspond to the final structure and transformations applied to each dataset. The row Minority (%) refers to the proportion of the underrepresented class after preprocessing.

Property	Diabetes	Ionosphere	Sonar	Imbalance_Data	Agric_Yield_Data
Final Dim. (Train/Test)	614 × 8/154 × 8	280 × 34/71 × 34	166 × 55/42 × 55	718 × 2/100 × 2	479 × 6/120 × 6
Minority (%)	34.8	36.1	46.4	50.0	47.2
Standardization	StandardScaler	StandardScaler	StandardScaler	StandardScaler	StandardScaler
Balancing Method	None	None	None	SMOTE	None
Features Removed	0	5	5	0	0
Type	Real	Real	Real	Synthetic	Real

4.3. Model Configurations and Evaluation Protocol

4.3.1. OCO-PSO Model

The proposed model combines PSO optimization with OCO refinement to train an SVM classifier. Hyperparameters were tuned using Bayesian optimization (BayesSearchCV) with 10-fold cross-validation over the following search spaces:

Regularization: $C \in [1, 10]$ ;
RBF kernel: $γ \in [10^{- 3}, 1]$ (log-uniform);
Swarm size: $n_{particles} \in [10, 150]$ ;
Iterations: $\max_iter \in [50, 150]$ .

KKT condition compliance was verified post-training by checking

| \sum_{i} α_{i} y_{i} | \approx 0

.

4.3.2. Baseline Models

All baselines were optimized using Bayesian search with 10-fold cross-validation, accuracy scoring, and parallel execution (

n_{jobs} = - 1

).

SVC: RBF kernel with $C \in [1, 10]$ , $γ \in [10^{- 3}, 1]$ (log-uniform).
SGDClassifier ( $\max_iter = 1000$ ):
-
$α \in [10^{- 6}, 10^{- 1}]$ (log-uniform);
-
$η_{0} \in [10^{- 3}, 1]$ (log-uniform);
-
learning_rate $\in {constant, optimal, invscaling, adaptive}$ ;
-
loss $\in {hinge, \log_loss, modified_huber}$ .
Decision Tree:
-
max_depth $\in [3, 20]$ , min_samples_split $\in [2, 20]$ , min_samples_leaf $\in [1, 10]$ ;
-
criterion $\in {gini, \log_loss}$ ;
-
max_features $\in {sqrt, \log 2, None}$ .
Random Forest:
-
n_estimators $\in [50, 200]$ , max_depth $\in [3, 20]$ ;
-
min_samples_split $\in [2, 10]$ , min_samples_leaf $\in [1, 5]$ ;
-
max_features $\in {sqrt, \log 2}$ .

All searches used

n_{iter} = 40

(configurable) and

random_state = 42

.

4.3.3. Evaluation Metrics

The performance of the model was further estimated on held-out test sets by

Performance evaluation measures: accuracy, precision, recall, F1-score (macro/weighted), MCC, and ROC-AUC;
Model attributes: number of support vectors (for SVM and SVC models) and KKT violation relationship $| \sum_{i} α_{i} y_{i} |$ ;
Computational efficiency: training time and inference speed.

4.3.4. Reproducibility

All experiments guarantee total reproducibility through

Fixed random seeds ( $random_state = 42$ ) across all components;
Deterministic preprocessing pipeline with stratified splitting;
Automated logging of metrics, optimal hyperparameters, and runtime;
Independent execution per dataset with isolated training/evaluation.

5. Results

We evaluate OCO-PSO with four basic methods—SVC, SGDClassifier, decision tree, and random forest—on datasets of different performance, including diabetes (medical data), sonar, and ionosphere for signal processing and agricultural yield, as well as imbalanced synthetic data (imbalanced_data). The performance is evaluated in terms of accuracy, macro-F1-score, MCC, and ROC-AUC, with a focus on class balance and fairness. Computational efficiency (training and inference time), model sparsity, and the stability of the optimization solution (violation of the KKT condition) are also analyzed.

5.1. Performance Metrics

The classification performance is evaluated using the following metrics:

Accuracy (%): the proportion of correctly classified samples over the total number of samples:

$Accuracy = \frac{T P + T N}{T P + T N + F P + F N} \times 100,$

(12)

where $T P$ , $T N$ , $F P$ , and $F N$ denote true positives, true negatives, false positives, and false negatives, respectively.
Precision: the fraction of correctly predicted positive samples among all predicted positives:

$Precision = \frac{T P}{T P + F P} .$

(13)
Recall (Sensitivity): the fraction of correctly predicted positive samples among all actual positives:

$Recall = \frac{T P}{T P + F N} .$

(14)
F1-Macro: the unweighted mean of F1-scores computed for each class independently:

$F1-Macro = \frac{1}{C} \sum_{i = 1}^{C} \frac{2 \cdot {Precision}_{i} \cdot {Recall}_{i}}{{Precision}_{i} + {Recall}_{i}},$

(15)

where C is the number of classes.
Matthews Correlation Coefficient (MCC): a balanced measure of classification quality that accounts for all four confusion matrix categories:

$MCC = \frac{T P \cdot T N - F P \cdot F N}{\sqrt{(T P + F P) (T P + F N) (T N + F P) (T N + F N)}} .$

(16)
ROC-AUC (Receiver Operating Characteristic–Area Under Curve): measures the classifier’s ability to discriminate between positive and negative classes:

$ROC-AUC = \int_{0}^{1} T P R (F P R) d F P R,$

(17)

where $T P R$ is the true positive rate and $F P R$ is the false positive rate.
Rank: The average ranking of a method across all datasets, where lower ranks indicate better performance.
Process Time (s): The computational time in seconds for training and inference. High training times reflect the cost of constrained global optimization, whereas prediction remains highly efficient.

5.2. Ionosphere Dataset

In the ionosphere dataset, OCO-PSO achieves an accuracy of

95.71 %

, an F1-score of

0.954

, and an MCC of

0.908

, performing equally as well as random forest in terms of accuracy, while offering superior interpretability through its kernel-base structure. The model is trained on 66 support vectors compared to 71 for SVC (i.e., a reduction of

7 %

) and the constraint violation is almost null

1.3 \times 10^{- 3}

. The best values for the hyperparameters (

C = 2, γ \approx 0.062

) differ significantly from the Bayesian optimized ones of SVC (

C = 10, γ \approx 0.012

), indicating that OCO-PSO visits different yet equally effective regions in hypothesis space.

Training takes

3.7

min (142 particles, 90 iterations), while inference requires just 6.4 ms. This combination of a compact representation, adherence to constraints, and competitive performance makes OCO-PSO suitable for safety-critical applications requiring model traceability.

5.3. Hyperparameter Settings and Sensitivity

The hyperparameters of the PSO and OCO mechanisms were set to ensure a controlled trade-off between exploration, exploitation, and computational cost. Swarm parameters, including the number of particles and iterations, were optimized simultaneously with the SVM hyperparameters C and

γ

using Bayesian optimization, validated via 10-fold cross-validation to maximize model accuracy. Particle velocities are updated according to a linearly decreasing inertia coefficient

w \in [0.95, 0.3]

, transitioning from exploration to exploitation. Cognitive and social acceleration coefficients are both set to

c_{1} = c_{2} = 2.0

, while velocity changes are capped at

V_{max} = 0.25 \times C

to ensure numerical stability. Each paired-coordinate local update in the PSO loop is limited to 25 iterations, balancing convergence precision and efficiency.

The OCO strategy is activated every 30 iterations and implements three probabilistic learning mechanisms. Self-learning is guided by a rank-dependent mutation probability

P_{self} \in [0.3, 0.9]

, complemented by a 20-iteration cooldown period and Gaussian velocity noise (

σ = 0.01

) to prevent stagnation. Peer learning relies on a crossover probability

P_{peer} = 0.7

and adaptive mixing

w \in [0.3, 0.7]

, while leadership interaction uses

P_{leader} = 0.8

with distance-dependent blending (

w \in [0.2, 0.4]

for close particles,

w \in [0.6, 0.9]

for distant ones) and a threshold

τ_{D} = 0.1

to toggle between attraction to the global best and local refinement. Soft elitism rescues the worst particle, and all updates respect the SVM dual equality constraints with a KKT tolerance of

ϵ = 10^{- 6}

and a stagnation-based stopping criterion after 15 iterations without improvement.

Table 3 and Table 4 summarize the main PSO and OCO hyperparameters, including their values, roles, and impact on the optimization dynamics.

5.4. Diabetes Dataset

The OCO-PSO model reaches a precision of

73.28 %

and one F1-macro of

0.720

surpassing SVC (

F 1 = 0.658

), random forest (

F 1 = 0.652

, and decision tree (

F 1 = 0.635

) (Table 5). Although the SGD Classifier achieves a slightly higher accuracy (

74.14 %

) (Table 6), OCO-PSO demonstrates a superior class balance with an MCC of

0.454

versus

0.404

for SGD—an essential property for medical applications.

The model exhibits strong parsimony with only 107 support vectors compared to 260 for SVC (

59 %

reduction) and a near-zero constraint violation (

2.5 \times 10^{- 4}

). Optimized hyperparameters (

C = 6, γ =

3.28 \times 10^{- 3}

) reflect a fine-scale RBF kernel adapted to the structure of the local clinical data.

The training requires 24 min (122 particles, 110 iterations) and inference of 65 ms (Table 7). The ROC-AUC of

0.816

remains competitive compared to that of SGD’s

0.833

, demonstrating efficient capture of nonlinear boundaries without compromising discrimination power.

Table 5. Condensed comparative performance of OCO-PSO against baseline models on all benchmark datasets. Best results per dataset are highlighted in bold.

Dataset	Model	Accuracy	F1-Macro	MCC	Rank
Imbalance_data	OCO-PSO	0.980	0.939	0.885	1
	SVC	0.930	0.850	0.737	2
	SGDClassifier	0.900	0.804	0.667	4
	DecisionTree	0.960	0.889	0.778	3
	RandomForest	0.960	0.889	0.778	3
Sonar	OCO-PSO	0.857	0.857	0.718	2
	SVC	0.833	0.831	0.671	3
	SGDClassifier	0.738	0.725	0.496	4
	DecisionTree	0.690	0.686	0.379	5
	RandomForest	0.857	0.852	0.742	1
Ionosphere	OCO-PSO	0.957	0.954	0.908	1
	SVC	0.943	0.938	0.876	2
	SGDClassifier	0.886	0.876	0.751	4
	DecisionTree	0.886	0.870	0.748	5
	RandomForest	0.957	0.954	0.908	1
Diabetes	OCO-PSO	0.733	0.720	0.454	1
	SVC	0.707	0.658	0.323	4
	SGDClassifier	0.741	0.698	0.404	2
	DecisionTree	0.681	0.635	0.274	5
	RandomForest	0.707	0.652	0.317	3
agric_yield_data	OCO-PSO	0.892	0.890	0.786	1
	SVC	0.875	0.874	0.750	2
	SGDClassifier	0.875	0.874	0.749	3
	DecisionTree	0.800	0.799	0.598	5
	RandomForest	0.850	0.847	0.702	4

Table 6. Summary of best performance achieved across all benchmark datasets. Each column represents a dataset; values in bold indicate the best metric achieved.

Metric	Imbalance_Data	Sonar	Ionosphere	Diabetes_Pima	Agric_Yield_Data
Best Model	OCO-PSO	OCO-PSO	OCO-PSO = RF	SGD	OCO-PSO
Accuracy	0.980	0.857	0.957	0.792	0.892
F1-Macro	0.939	0.857	0.954	0.774	0.890
MCC	0.885	RF: 0.742 > OCO: 0.718	0.908	0.548	0.786

Note: OCO-PSO ranks #1 on 3/5 datasets and #2 on 2/5 datasets.

Table 7. Training and inference times and average constraint violation for OCO-PSO across all datasets. High training times reflect the cost of constrained global optimization, whereas prediction remains highly efficient.

Dataset	Train Time (s)	Pred. Time (s)	Constraint Violation
Imbalance_data	168.67	0.0022	0.00275
Sonar	160.68	0.0000	0.09461
Ionosphere	225.5	0.0064	0.00133
diabetes_pima	1443.2	0.0040	0.00067
agric_yield_data	932.33	0.0000	0.01043

5.5. Sonar Dataset

The OCO-PSO yields a high accuracy

85.71 %

and an F1-score of

0.857

(Table 5), slightly outperforming random forest (F

1 = 0.852

), but significantly outperforming SVC (

F 1 = 0.831

), SGD (

F 1 = 0.725

), and the decision tree (

F 1 = 0.686

) (Table 6). The MCC of

0.718

indicates a well-balanced classification, though slightly lower than random forest (

0.752

), but also respecting kernel interpretability by using support vectors.

The parsimony of the model results in reduced support vectors (84 compared to 132 for SVC, or a

36 %

reduction) with a rather large number of features. Training requires 2.6 min (150 particles, 107 iterations) (Table 7), and inference time is almost negligible. The constraint violation rate of

0.095

is higher than in previous datasets; it suggests a more challenging compromise between global convergence and respecting strict constraints in high-dimensional spaces.

The best hyperparameters (

C = 10

,

γ \approx 0.028

) express strict regularization and adapt the kernel scale to the local distance structure. OCO-PSO shows that combining OCO with global optimization can lead to efficient and parsimonious solutions, even in high dimensions, achieving performance comparable to ensemble methods while preserving explicit support vector representation for enhanced interpretability.

5.6. Agricultural Yield Dataset

OCO-PSO achieved the best test accuracy (

89.17 %

), with an

F 1

-macro score of

0.890

(Table 5), outperforming all reference methods:

S V C

(

0.874

),

S G D

(

0.874

), random forest (

0.847

), and decision tree (

0.799

) (Table 6). The balanced precision (

0.899

) and recall (

0.887

) demonstrates its robust performance despite an imbalance in moderate classes (226:253 training examples).

The

M C C

of

0.786

significantly surpasses that of

S V C

(

0.750

),

S G D

(

0.749

), and random forest (

0.702

), which is crucial in the agricultural context. This model thus offers better individual calibration between classes than these methods, an essential condition fo decision support in agriculture. With only 93 support vectors compared to 124 for

S V C

(

25 %

fewer), it is easier to interpret. This reduced number of vectors allows experts to study the crop, weather, and soil profiles associated with these vectors.

The violation rate of the constraint condition is still small (

0.0104

), which is comparable to the result obtained by diabetes and slightly lower than that from sonar, indicating the robust correctness of the scheme. The inference process is instantaneous, and the training takes

15.5

min (119 particles, 117 iterations) (Table 7). The best parameters (C = 8,

γ

≈ 0.0021) show strong regularization with a detailed

R B F

kernel adapted for the modeling of the local nonlinear yield threshold. In particular, the

S V C

optimal

γ

is 24 times larger (

0.0497

), producing evidence that OCO-PSO searches different regions of solutions with a preference for a smoother decision boundary, which generalizes better on this agricultural dataset.

OCO-PSO improves over random forest—a benchmark usually successful on tabular agricultural data—while still providing deterministic and constraint-aware behaviour necessary for regulated precision agriculture applications, for which reproducibility and traceability are important.

5.7. Imbalanced Dataset

OCO-PSO attains

98 %

accuracy, an

F 1

-macro of

0.939

, and

M C C = 0.885

(Table 5), exceeding all the baseline methods;

S V C

(

F 1 = 0.850

,

M C C = 0.737

),

S G D

(

F 1 = 0.804

,

M C C = 0.667

), decision tree (

F 1 \approx 0.889

;

M C C ≳ 0.778

), and random forest (

F 1 \approx 0.889

;

M C C = 0.778

) (Table 6). The method preserves a high precision (

0.989

) and recall (

0.90

) on the minority class, reducing false positives effectively. The model is very sparse: it uses only 70 support vectors (

S V C

’s 239) to represent the decision function. The low constraint violation rate (

2.75 \times 10^{- 3}

) indicates that the optimization is stable and convergent.

With a configuration of 71 particles and 60 iterations, the OCO-PSO training converges in just

2.8

min (Table 7). This computational efficiency not only reduces the training time but also appears to promote better model generalization and reliability. The optimal hyperparameters (

C = 3

,

γ \approx 0.57

) balance the amount of knowledge on which to extract patterns (controlled by

γ

) and model complexity (determined by C) which assists model generalization capability.

The ensemble model achieves a strong ROC-AUC score of

0.962

. While this is slightly lower than a reference random forest’s score of

0.991

, the random forest’s concurrently lower MCC value

M C C

(

0.778

versus

0.885

) suggests that it may suffer from over-optimistic performance (overfitting) on the test set. In contrast, the OCO-PSO-optimized model shows improved performance across all other evaluated metrics. It demonstrates particular strengths in error calibration and enhanced model sparsity, which are critical advantages for imbalanced classification tasks.

To further position the proposed OCO–PSO–SVM framework with respect to existing work, we compare its performance with previously published state-of-the-art methods evaluated on the same dataset. In particular, two recent studies reported classification accuracies of

0.87

and

0.74

, respectively [30,31]. In contrast, the proposed approach achieves a substantially higher accuracy, demonstrating a clear performance improvement over these reference methods. This gain highlights the effectiveness of the proposed hybrid optimization strategy in enhancing the training of SVMs by enabling better exploration of the solution space and more stable convergence toward high-quality optima. Beyond accuracy, our method additionally enforces strict satisfaction of the SVM dual constraints and yields a sparser model, further distinguishing it from existing approaches that primarily focus on predictive performance alone. These results confirm that the proposed OCO–PSO–SVM framework constitutes a competitive and robust alternative to current state-of-the-art solutions.

5.8. Comparison with PSO

Table 8 reports a direct comparison between the proposed OCO–PSO framework and a standard PSO-based SVM solver under identical experimental settings. This comparison is intended to isolate the effect of the Open Competency Optimization mechanism while controlling for the underlying population-based optimization paradigm.

The results indicate that OCO-PSO consistently outperforms PSO across all datasets in terms of both classification accuracy and Matthews correlation coefficient (MCC). On the agricultural yield dataset, OCO-PSO improves accuracy from 0.83 to 0.89, while the MCC increases substantially from 0.67 to 0.87, highlighting superior class discrimination in a data-scarce and potentially imbalanced setting. Similar improvements are observed on the ionosphere dataset, where OCO-PSO achieves higher accuracy (0.95 vs. 0.91) and MCC (0.91 vs. 0.82), reflecting enhanced convergence stability in a moderately high-dimensional feature space.

The performance gap is particularly pronounced on the sonar dataset, which constitutes a challenging benchmark due to its high dimensionality and limited number of samples. In this case, OCO–PSO yields an improvement of approximately 9 in accuracy and nearly 19 in MCC relative to PSO, underscoring the effectiveness of the proposed constraint-preserving paired-coordinate updates and diversity-aware learning mechanisms. Furthermore, on the Imbalance_data dataset, OCO–PSO attains near-optimal performance (accuracy of 0.98 and MCC of 0.88), significantly outperforming PSO and demonstrating strong robustness under severe class imbalance.

Overall, this comparative analysis provides empirical evidence that the proposed OCO–PSO framework constitutes a more reliable and effective metaheuristic solver for the SVM dual optimization problem than conventional PSO, particularly in challenging scenarios characterized by constraint sensitivity, limited training data, and class imbalance.

5.9. Statistical Analysis

This section includes a statistical analysis of the performed model OCO-PSO in relation to SVC, SGD, decision trees, and random forests. The analysis adheres to the best practices for statistical validation in machine learning by integrating three complementary parts. Data was analyzed using nonparametric bootstrap distribution, paired Student’s t-tests with Holm–Bonferroni correction and Wilcoxon signed-rank tests.

5.9.1. Methodological Framework

Bootstrap Distribution Analysis ( $n = 1000$ )

A nonparametric bootstrap was applied with 1000 resampled to calculate the empirical distributions of accuracy,

F 1

score,

M C C

, and

R O C - A U C

. These provide robust estimates of central tendency, variance, and model stability given the sampling uncertainties. The box plots obtained (Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6) convey the data spread and quartile distribution for each of these metrics and emphasize the stability of PSO_OCO across the different datasets.

Significance Tests and Correction for Multiple Comparisons

In order to analyze the difference in performance on the

P S O

-

O C O

and reference models, we performed

Paired t-tests of mean performance differences;
the Wilcoxon signed-rank test as a robust, distribution-free analogue.

Corrections were made for multiple comparisons using Holm–Bonferroni correction, with 16 total comparisons (4 metric × 4 models). The level of significance was

α = 0.05

. Holm-adjusted p-values (p Holm) are presented together with the unadjusted ones in Table 9, Table 10, Table 11, Table 12 and Table 13.

Measuring the Size of the Effect

The effect size of observed differences was expressed as Cohen’s d for paired samples. The interpretation is

| d | < 0.2

(negligible),

0.2 - 0.5

(low),

0.5 - 0.8

(medium), and ≥0.8 (high). In each performance table (Table 9, Table 10, Table 11, Table 12 and Table 13), we also include the effect sizes given by OCO-PSO compared to its opponents.

5.9.2. Analysis Results

Dataset 1: Diabetes

Table 9 presents significant differences in OCO-PSO with the reference models. OCO-PSO is competitive but performs worse than SGD with respect to accuracy and

R O C - A U C

area (

d = - 0.57

to

- 1.03

;

p_{Holm} < 0.001

). It does not increase the performance in terms of the

R O C

area with respect to both the nearest neighbor (

Δ = + 0.0752

,

d = + 1.94

) and decision tree. The bootstrap distributions (Figure 2) of predictions show that despite the mixed performance on this dataset, OCO-PSO is a stable algorithm.

Table 9. Significant differences in performance metrics for various models relative to the baseline OCO-PSO (dataset: diabetes).

Metric	Model	$Δ_{mean}$	Cohen’s d	p-Value	$p_{Holm}$	Sig.
Precision	SGD	−0.0188	−0.57	<0.001	<0.001	Yes
F1-Macro	Decision Tree	+0.0346	+0.70	<0.001	<0.001	Yes
MCC	Decision Tree	+0.0693	+0.70	<0.001	<0.001	Yes
ROC-AUC	SGD	−0.0167	−1.03	<0.001	<0.001	Yes
ROC-AUC	SVC	−0.0184	−0.96	<0.001	<0.001	Yes
ROC-AUC	Decision Tree	+0.0752	+1.94	<0.001	<0.001	Yes

Note:

Δ_{mean}

represents the difference in the mean metric value (model—OCO-PSO). All significant results pass the Holm correction procedure.

Figure 2. Bootstrap distributions of metrics (

n = 1000

)—diabetes dataset.

Figure 2. Bootstrap distributions of metrics (

n = 1000

)—diabetes dataset.

Dataset 2: Imbalanced Data

As presented in Table 10, on all metrics, tree models (decision tree and random forest) are substantially better than OCO-PSO (

| d | \approx 1.36 – 1.38

). This demonstrates the benefits of ensembles in class disequilibrium. Though OCO-PSO has a higher

A U C

-

R O C

than the decision tree (

Δ = + 0.0549

;

d = + 1.58

), this signifies better calibration by probability. This conclusion is also reinforced by the bootstrapped curves (Figure 3).

Table 10. Significant differences in performance metrics for various models relative to the baseline OCO-PSO (dataset 2: imbalanced data).

Metric	Model	$Δ_{mean}$	Cohen’s d	p-Value	$p_{Holm}$	Sig.
Precision	Decision Tree	−0.0194	−1.38	<0.001	<0.001	Yes
Precision	Random Forest	−0.0194	−1.38	<0.001	<0.001	Yes
F1-Macro	Decision Tree	−0.0608	−1.36	<0.001	<0.001	Yes
F1-Macro	Random Forest	−0.0608	−1.36	<0.001	<0.001	Yes
MCC	Decision Tree	−0.1204	−1.37	<0.001	<0.001	Yes
MCC	Random Forest	−0.1204	−1.37	<0.001	<0.001	Yes
ROC-AUC	Decision Tree	+0.0549	+1.58	<0.001	<0.001	Yes

Note:

Δ_{mean}

represents the difference in the mean metric value (model—OCO-PSO). All significant results pass the Holm correction procedure.

Figure 3. Bootstrap distributions of metrics (

n = 1000

)—imbalanced data.

Figure 3. Bootstrap distributions of metrics (

n = 1000

)—imbalanced data.

Dataset 3: Ionosphere

Table 11 shows that OCO-PSO achieves exceptional performance, significantly outperforming gradient-based and decision tree models, with very large effect sizes (

d = 2.59

–

3.19

). The gains compared to SGD are among the most significant in the entire study (e.g., MCC

Δ = - 0.3113

;

d = - 3.17

). The bootstrap distributions (Figure 4) confirm that OCO-PSO offers both high accuracy and minimal variance, demonstrating excellent generalization to high-dimensional nonlinear settings.

Figure 4. Bootstrap distributions of metrics (

n = 1000

)—ionosphere dataset.

Figure 4. Bootstrap distributions of metrics (

n = 1000

)—ionosphere dataset.

Table 11. Significant differences in performance metrics for various models relative to the baseline OCO-PSO (dataset 3: ionosphere).

Metric	Model	$Δ_{mean}$	Cohen’s d	p-Value	$p_{Holm}$	Sig.
Precision	SGD	−0.1424	−3.14	< 0.001	<0.001	Yes
Precision	Decision Tree	−0.0858	−2.63	<0.001	<0.001	Yes
F1-Macro	SGD	−0.1576	−3.19	<0.001	<0.001	Yes
F1-Macro	Decision Tree	−0.1004	−2.59	<0.001	<0.001	Yes
MCC	SGD	−0.3113	−3.17	<0.001	<0.001	Yes
MCC	Decision Tree	−0.1910	−2.65	<0.001	<0.001	Yes
ROC-AUC	SGD	−0.0648	−1.26	<0.001	<0.001	Yes
ROC-AUC	Decision Tree	−0.1038	−2.17	<0.001	<0.001	Yes
ROC-AUC	SVC	+0.0178	+0.95	<0.001	<0.001	Yes

Note:

Δ_{mean}

represents the difference in the mean metric value (model—OCO-PSO). All significant results pass the Holm correction procedure.

Dataset 4: Sonar

As shown in Table 12, OCO-PSO significantly outperforms SGD and decision trees for all metrics (d up to

2.59

). It also surpasses random forest in terms of area under the ROC curve (

Δ = + 0.0758

;

d = + 1.57

). Bootstrap plots (Figure 5) reveal that OCO-PSO combines strong predictive power and high stability, comparable to ensemble methods, while being theoretically simpler.

Figure 5. Bootstrap distributions of metrics (

n = 1000

)—sonar dataset.

Figure 5. Bootstrap distributions of metrics (

n = 1000

)—sonar dataset.

Table 12. Significant differences in performance metrics for various models relative to the baseline OCO-PSO (dataset 4: sonar).

Metric	Model	$Δ_{mean}$	Cohen’s d	p-Value	$p_{Holm}$	Sig.
Precision	SGD	−0.0953	−1.44	<0.001	<0.001	Yes
Precision	Decision Tree	−0.1671	−1.86	<0.001	<0.001	Yes
F1-Macro	SGD	−0.1058	−1.57	<0.001	<0.001	Yes
F1-Macro	Decision Tree	−0.1740	−1.91	<0.001	<0.001	Yes
MCC	SGD	−0.1759	−1.39	<0.001	<0.001	Yes
MCC	Decision Tree	−0.3365	−1.86	<0.001	<0.001	Yes
ROC-AUC	SGD	−0.1474	−2.59	<0.001	<0.001	Yes
ROC-AUC	Decision Tree	−0.0856	−1.19	<0.001	<0.001	Yes
ROC-AUC	SVC	+0.0444	+1.25	<0.001	<0.001	Yes
ROC-AUC	Random Forest	+0.0758	+1.57	<0.001	<0.001	Yes

Note:

Δ_{mean}

represents the difference in the mean metric value (model—OCO-PSO). All significant results pass the Holm correction procedure.

Dataset 5: Agricultural Yield Prediction

Table 13 highlights the OCO-PSO method as the most effective. In particular, the exceptional improvement in the area under the ROC curve (AUC-ROC) compared to the decision tree (

Δ = + 0.1421

;

d = + 4.65

) is noteworthy, representing the largest difference observed across the entire dataset. Figure 6 confirms that OCO-PSO achieves both minimal variance and superior calibration, an essential requirement for agricultural decision-support systems.

Table 13. Significant differences in performance metrics for various models relative to the baseline OCO-PSO (dataset 5: agricultural yield prediction).

Metric	Model	$Δ_{mean}$	Cohen’s d	p-Value	$p_{Holm}$	Sig.
Precision	SGD	$- 0.0087$	$- 0.61$	<0.001	<0.001	Yes
Precision	Decision Tree	$- 0.0835$	$- 2.57$	<0.001	<0.001	Yes
Precision	Random Forest	$- 0.0343$	$- 1.20$	<0.001	<0.001	Yes
F1-Macro	SGD	$- 0.0088$	$- 0.62$	<0.001	<0.001	Yes
F1-Macro	Decision Tree	$- 0.0840$	$- 2.58$	<0.001	<0.001	Yes
F1-Macro	Random Forest	$- 0.0364$	$- 1.22$	<0.001	<0.001	Yes
MCC	SGD	$- 0.0169$	$- 0.60$	<0.001	<0.001	Yes
MCC	Decision Tree	$- 0.1675$	$- 2.57$	<0.001	<0.001	Yes
MCC	Random Forest	$- 0.0654$	$- 1.28$	<0.001	<0.001	Yes
ROC-AUC	SGD	$+ 0.0137$	$+ 1.51$	<0.001	<0.001	Yes
ROC-AUC	Decision Tree	$- 0.1421$	$- 4.65$	<0.001	<0.001	Yes
ROC-AUC	Random Forest	$- 0.0110$	$- 0.72$	<0.001	<0.001	Yes

Figure 6. Bootstrap distributions of metrics (

n = 1000

)—agricultural yield dataset.

Figure 6. Bootstrap distributions of metrics (

n = 1000

)—agricultural yield dataset.

6. Discussion of Results

Over five distinct datasets—medical, agricultural, signal processing, and imbalanced combined real synthetic data—the OCO-PSO achieved overall superior performance in discrimination, fairness, parsimony, and stability to constraints. Its performance is validated by rigorous statistical tests, based on 1000 bootstrap replications, paired Student’s t-tests, Wilcoxon signed-rank tests, and a Holm–Bonferroni correction, thus ensuring the statistical and practical relevance of the observed improvements.

6.1. Discriminatory Performance and Fairness

The OCO-PSO strikes a favorable balance on overall accuracy and class-level fairness, F1-Macro, MCC across diverse benchmark datasets. On Imbalanced_data (extreme class imbalance 9:1), OCO-PSO achieves high MCC (

0.885

) and minority recall (

0.90

), though tree-based ensemble methods demonstrate substantially superior overall performance (

| d | \approx 1.3661 . 38

), confirming an advantage for extreme class imbalance. OCO-PSO has a better calibration on ROC-AUC compared to individual decision trees (

δ = + 0.0549

;

d = + 1.58

).

OCO-PSO performs especially well on the balanced to moderately imbalanced datasets. On agricultural yield (1.12:1), it attains optimal accuracy (

89.17 %

) and the highest MCC (

0.786

), with exceptional ROC-AUC improvement versus decision tree

δ = + 0.1421

,

d = + 4.65

. On ionosphere (1.77:1), OCO-PSO performs exceptionally well, significantly outperforming gradient-based and tree models with large effect sizes (d = 2.59–3.19). Bootstrap distribution (Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6) demonstrates lower variance and superior calibration. Across the benchmark suite, OCO-PSO performs in an optimal or near-optimal way in around

60 %

of comparisons, with all statistically significant differences remaining robust after Holm–Bonferroni correction.

6.2. Parsimony and Interpretability of the Model

The OCO-PSO algorithm builds sparser models than the Bayesian-optimized SVC, reducing support vectors by

7 %

to

71 %

depending on the dataset (average

39 %

). This enhances the interpretability and analysis for domain specialists.

6.3. Respect for Constraints and the Stability of the Optimization

Constraint violations remain relatively small (≲

10^{- 2}

) on four of five datasets, with particularly low violations (<

10^{- 3}

) on simpler problems. The higher violation rate on sonar (∼0.095) reflects the challenge of maintaining dual-space feasibility in high-dimensional settings (55 features).

6.4. Cost Calculation and Practical Considerations

A training time range from

2.0

to

24.0

min is a consequence of the PSO-based hyperparameter search, but the model exhibits strong generalization stability, calibrated error distributions, and compactness. The inference time is low (0 to

6.4

ms), which can be used on-the-fly.

6.5. Limitations and Comparative Context

The OCO-PSO algorithm does not consistently offer better performance in terms of raw accuracy. For example, on diabetes, SGD has better accuracy (

74.14 %

vs.

73.28 %

) but OCO-PSO exhibits a higher MCC (

0.454

vs.

0.0404

), indicating better calibration at the class level. This distinction underscores the importance of fairness-based metrics when operational context involves asymmetric error costs.

In summary, the combined analysis, using bootstrapping, paired significance tests, effect size estimation, and correction for multiple tests, demonstrates that the OCO-PSO algorithm shows a lot of advancement in terms of fairness, parsimony, calibration, and constraint satisfaction. It is ideal for applications that demand interpretability alongside reproducibility, such as medical diagnostics, precision agriculture, and safety-critical systems, where fair decision-making is crucial.

7. Conclusions

We proposed a hybrid supervised learning method, OCO-PSO, by combining the Open Competency Optimization and particle swarm dynamics for dual formulation SVM. we conducted experiments on five benchmark datasets, including medical diagnostic (diabetes), agricultural yield prediction (crop yield), signal processing (sonar and ionosphere), and extremely imbalanced classification (Imbalanced_data).

This approach exhibits benefits in terms of model sparsity, calibrating at the class level and satisfying constraints. Between

7 %

and

71 %

, fewer support vectors from standard SVC enhance interpretability. On three of five datasets (ionosphere, sonar, agricultural yield), it reaches or surpasses optimal F1-macro and MCC performances (Table 14): this confirms dual-space optimization convergence with constraint violations of less than

1 %

. Performance gains are statistically significant on several datasets p-values < 0.01, which remain stable after robust multiple testing correction.

Bootstrap analysis shows that OCO-PSO is characterized by low variance, and that ROC-AUC calibration remains stable, in contrast to simpler gradient-based techniques. While ensemble methods (random forest) demonstrate advantages on the extremely imbalance scenario (9:1 ratio), OCO-PSO performs significantly better for balanced up to moderately imbalanced problems and preserves the interpretability of the kernel. Higher training cost (

2.6

min to 24 min, depending on the dataset) is warranted when improved decision quality, interpretability, reproducibility, and fairness are paramount—factors that are indispensable in regulated applications such as medical diagnostics, precision agriculture, or safety-critical systems.

7.1. Future Directions

A natural extension is to reformulate OCO-PSO into a multiobjective optimization (MOO) framework. This approach addresses margin maximization, constraint satisfaction, parsimony, and robustness by one unified fitness function. A MOO would explicitly expose these trade-offs, producing a Pareto front of jointly optimal classifiers. This would allow practitioners to select solutions adapted to specific operational constraints (for example, a maximum number of support vectors count for embedded deployment and minimum MCC for imbalanced data. Additional research directions include (1) adaptive management of constraints for improved stability in large-dimensional spaces (by addressing high sonar KKT violation), (2) warm-start strategies leveraging standard SVM solutions to reduce training time, and (3) theoretical analysis of convergence guarantees under OCO’s constraint-preserving updates.

In future work, we aim to extend the proposed OC-PSO framework to multiclass Support Vector Machines (SVMs). While the current study focuses on binary classification, the framework can naturally accommodate established multiclass strategies, including decomposition-based approaches (e.g., One vs. One, One vs. All) and native multiclass formulations (e.g., Crammer–Singer and DAGSVM).

Integrating these strategies will broaden the applicability of OC-PSO to a wider range of prediction tasks, enhancing its scalability and predictive performance for complex multiclass problems in domains such as agricultural yield forecasting.

7.2. Considerations for Crop Yield Prediction

In practical agricultural applications, the choice of multiclass strategy depends on the number of defined yield categories and the underlying data distribution. OvO generally offers strong performance when classes are relatively balanced, whereas OvA is more effective in scenarios involving a large number of classes or severe class imbalance—conditions that frequently arise in agricultural yield prediction. These extensions will enable the proposed OCO-PSO framework to address a broader range of real-world precision agriculture problems.

In summary, OCO-PSO represents a principles-based alternative for scenarios where the reliability, interpretability, and strict constraint compliance outweigh training speed—an increasingly relevant trade-off in responsible AI deployment.

In this work, we have proposed the OCO-PSO hybrid optimization framework to enhance SVM training. Compared to standard SVM solvers such as SMO, SGD, decision trees, or random forests, OCO-PSO offers several advantages: it efficiently explores the solution space, avoids local minima, and strictly enforces dual constraints on Lagrange multipliers. These features result in improved predictive performance, sparser models, and better interpretability. The main limitation is the higher computational cost during training due to constrained global optimization, although prediction remains highly efficient. By explicitly addressing these strengths and trade-offs, our approach provides a robust alternative for classification tasks even with small- to medium-sized datasets. Furthermore, the proposed OCO-PSO–SVM framework demonstrates clear advantages over conventional methods that do not employ PSO. It achieves more accurate and interpretable models by combining global search capability with adaptive learning and strict enforcement of dual constraints. While the training phase is computationally more demanding, the prediction phase remains efficient, making the method suitable for practical applications. These findings highlight the potential of hybrid metaheuristic optimization in improving the performance of SVMs across diverse classification problems.

Author Contributions

Conceptualization, K.J. and K.N.; methodology, K.J. and K.N.; software, K.N.; validation, K.J., K.N., and S.R.; formal analysis, K.N.; investigation, K.N.; resources, K.J., K.N., and S.R.; writing—original draft preparation, K.J. and K.N.; writing—review and editing, K.J. and K.N.; supervision, K.J.; project administration, K.J.; funding acquisition, K.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

https://www.kaggle.com/search?q=crop-+yield-prediction-dataset (accessed on 1 January 2026) and https://archive.ics.uci.edu/ (accessed on 1 January 2020).

Conflicts of Interest

The authors K.Jebari, K.Nejjar and S.Rekiek declare no conflicts of interest.

References

FAO. The Future of Food and Agriculture—Alternative Pathways to 2050. Rome: Food and Agriculture Organization of the United Nations, 2018. Available online: https://openknowledge.fao.org/server/api/core/bitstreams/2c6bd7b4-181e-4117-a90d-32a1bda8b27c/content (accessed on 1 January 2026).
Meghraoui, K.; Sebari, I.; Pilz, J.; Ait El Kadi, K.; Bensiali, S. Applied Deep Learning-Based Crop Yield Prediction: A Systematic Analysis of Current Developments and Potential Challenges. Technologies 2024, 12, 43. [Google Scholar] [CrossRef]
Choi, J.W.; Hidayat, M.S.; Cho, S.B.; Hwang, W.H.; Lee, H.; Cho, B.K.; Kim, M.S.; Baek, I.; Kim, G. Recent Trends in Machine Learning, Deep Learning, Ensemble Learning, and Explainable Artificial Intelligence Techniques for Evaluating Crop Yields Under Abnormal Climate Conditions. Plants 2025, 14, 2841. [Google Scholar] [CrossRef] [PubMed]
Shawon, S.M.; Ema, F.B.; Mahi, A.K.; Niha, F.L.; Zubair, H. Crop yield prediction using machine learning: An extensive and systematic literature review. Smart Agric. Technol. 2025, 10, 100718. [Google Scholar] [CrossRef]
Kamilaris, A.; Prenafeta-Boldú, F.X. Deep learning in agriculture: A survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef]
Liakos, K.G.; Busato, P.; Moshou, D.; Pearson, S.; Bochtis, D. Machine learning in agriculture: A review. Sensors 2018, 18, 2674. [Google Scholar] [CrossRef] [PubMed]
Vapnik, V.N. An overview of statistical learning theory. IEEE Trans. Neural Netw. 1999, 10, 988–999. [Google Scholar] [CrossRef] [PubMed]
Bottou, L.; Cortes, C.; Vapnik, V.N. Support Vector Machines. In Encyclopedia of Machine Learning; Springer: Berlin/Heidelberg, Germany, 2007; pp. 981–983. [Google Scholar] [CrossRef]
Platt, J.C. Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines; Technical Report MSR-TR-98-14; Microsoft Research: Redmond, WA, USA, 1998. [Google Scholar]
Joachims, T. Making Large-Scale SVM Learning Practical. In Advances in Kernel Methods: Support Vector Learning; Schölkopf, B., Burges, C.J.C., Smola, A.J., Eds.; MIT Press: Cambridge, MA, USA, 1999; pp. 169–184. [Google Scholar]
Wang, H.; Li, W. Fast ramp fraction loss SVM classifier with low computational complexity for pattern classification. Neural Netw. 2025, 184, 107087. [Google Scholar] [CrossRef] [PubMed]
Paquet, U.; Engelbrecht, A. Training support vector machines with particle swarms. In Proceedings of the International Joint Conference on Neural Networks, Portland, OR, USA, 20–24 July 2003; IEEE: Piscataway, NJ, USA, 2003; Volume 2, pp. 1593–1598. [Google Scholar]
Dias, M.L.D.; Neto, A.R.R. Training soft margin support vector machines by simulated annealing: A dual approach. Expert Syst. Appl. 2017, 87, 157–169. [Google Scholar] [CrossRef]
Huang, C.L.; Wang, C.J. A GA-based feature selection and parameters optimizationfor support vector machines. Expert Syst. Appl. 2006, 31, 231–240. [Google Scholar] [CrossRef]
Akopov, A. A Hybrid Multi-Swarm Particle Swarm Optimization Algorithm for Solving Agent-Based Epidemiological Model. Cybern. Inf. Technol. 2025, 25, 59–77. [Google Scholar] [CrossRef]
Engelbrecht, A. Particle swarm optimization with crossover: A review and empirical analysis. Artif. Intell. Rev. 2016, 45, 131–165. [Google Scholar] [CrossRef]
Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [PubMed]
Shalev-Shwartz, S.; Singer, Y.; Srebro, N. Pegasos: Primal estimated sub-gradient solver for svm. In Proceedings of the 24th International Conference on Machine Learning, Corvalis, OR, USA, 20–24 June 2007; pp. 807–814. [Google Scholar]
Fan, R.E.; Chen, P.H.; Lin, C.J. Working set selection using second order information for training support vector machines. J. Mach. Learn. Res. 2005, 6, 1889–1918. [Google Scholar]
Wang, J.; Zhou, J.; Zhao, P.; Liu, J.; Hoi, S.C.H.; Zhao, T. Breaking the curse of kernelization: Budgeted stochastic gradient descent for large-scale SVM training. J. Mach. Learn. Res. 2012, 13, 3295–3329. [Google Scholar]
Bottou, L. Large-scale machine learning with stochastic gradient descent. In Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT 2010), Paris, France, 22–27 August 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 177–187. [Google Scholar]
Jin, Y.; Branke, J. Evolutionary optimization in uncertain environments—A survey. IEEE Trans. Evol. Comput. 2005, 9, 303–317. [Google Scholar] [CrossRef]
Ben Jelloun, R.; Jebari, K.; El Moujahid, A. Open Competency Optimization: A Human-Inspired Optimizer for the Dynamic Vehicle-Routing Problem. Algorithms 2024, 17, 449. [Google Scholar] [CrossRef]
Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95—International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; IEEE: Piscataway, NJ, USA, 1995; Volume 4, pp. 1942–1948. [Google Scholar]
Smith, J.; Everhart, J.; Dickson, W.; Knowler, W.; Johannes, R. Diabetes. UCI Machine Learning Repository. Original Owners: National Institute of Diabetes and Digestive and Kidney Diseases. 1988. Available online: https://archive.ics.uci.edu/dataset/34/diabetes (accessed on 1 January 1994).
Attakorah, S.O. Agriculture Crop Yield. Kaggle Dataset. Compiled Dataset Containing Agricultural Statistics for Various Countries and Crops, Including Production, Yield, and Harvested Area. 2023. Original Owners: Jennifer Chu, MIT News. Available online: https://www.kaggle.com/datasets/samuelotiattakorah/agriculture-crop-yield (accessed on 15 February 2024).
Roy, R. SONAR.csv. Kaggle Dataset. Version 1. Updated Dataset of the Classic “Connectionist Bench (Sonar, Mines vs. Rocks)” Data. 2020. Available online: https://www.kaggle.com/datasets/rupakroy/sonarcsv (accessed on 15 February 2020).
Sigillito, V.; Wing, S.; Hutton, L.; Baker, K. Ionosphere. UCI Machine Learning Repository. 1989. ID: 52. Available online: https://archive.ics.uci.edu/dataset/52/ionosphere (accessed on 1 January 2026).
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Meenakshi, G.; Sanchez, D.T.; Jawarneh, M. Support Vector Machine for Crop Yield Prediction Towards Smart Agriculture. Available online: https://www.scitepress.org/Papers/2023/126149/ (accessed on 17 June 2023).
Senapaty, M.K.; Ray, A.; Padhy, N. A decision support system for crop recommendation using machine learning classification algorithms. Agriculture 2024, 14, 1256. [Google Scholar] [CrossRef]

Figure 1. Overall workflow of the proposed OCO_PSO algorithm. The framework integrates PSO-based global exploration, paired-coordinate local exploitation for exact constraint handling, and periodic OCO-based diversification to preserve swarm diversity.

Table 1. Dataset characteristics summary.

Dataset	Domain	Samples × Feat.	Class Distribution (neg:pos)	Type
Diabetes	Medical	768 × 8	500:268 (65.0%:35.0%)	Real
Agricultural Yield	Agriculture	599 × 6	317:282 (52.9%:47.1%)	Synthetic
Ionosphere	Radar/Remote Sensing	351 × 34	225:126 (64.1%:35.9%)	Real
Sonar	Signal Processing	208 × 60	89:77 (53.6%:46.4%)	Real
Imbalance_data	Synthetic Data	500 × 2	450:50 (90.0%:10.0%)	Synthetic

Table 3. Summary of PSO algorithm parameters.

Category	Parameter	Description/Value
Swarm Settings	$n_{p a r t i c l e s}$ , $m a x_i t e r$	Optimized via Bayesian Search with C and $γ$
Swarm Settings	Kernel	RBF (Radial Basis Function)
Inertia Dynamics	$w_{s t a r t}$	0.95 (Maximum initial exploration)
Inertia Dynamics	$w_{e n d}$	0.3 (Precise final exploitation)
Acceleration Factors	Cognitive ( $c_{1}$ )	2.0 (Attraction to local best $p b e s t$ )
Acceleration Factors	Social ( $c_{2}$ )	2.0 (Attraction to global best $g b e s t$ )
Movement Constraints	Max velocity ( $v_{m a x}$ )	$0.25 \times C$ (Kinetic dynamics regulation)
Local Optimization	$m a x_p s o_l o c a l_i t e r$	25 (Pairwise update iterations in local loop)
Local Optimization	$t o l_k k t$	0.001 (KKT conditions tolerance)
Convergence Criteria	$ϵ_{f i t n e s s}$	$10^{- 6}$ (Objective function precision threshold)
Convergence Criteria	$C o n v e r g e n c e_i t e r$	15 (Stagnation limit before termination)

Table 4. Summary of OCO parameters, activated every 30 iterations.

Strategy (Function)	Hyperparameter	Symbol/Value	Role and Impact
S1: Self-learning (`apply_self_learning`)	Mutation probability	$[0.3, 0.9]$	Rank-based adjustment (higher for low-performing particles).
	Activation interval	$[0.01 C, 0.1 C]$	Intensity of pairwise dimension reactivation.
	Transfer rate	$20 % (Δ α)$	Proportion of energy shifted between pairs.
	Cooldown period	20 iterations	Suspension of dimensions failing to improve.
	Velocity noise	$σ = 0.01$	Gaussian noise to maintain kinetic dynamics.
S2: Peer Learning (`peer_crossover`)	Crossover probability	$P_{p e e r} = 0.7$	Frequency of information exchange between neighbors.
	Mixing factor	$w \in [0.3, 0.7]$	Balance of interpolation between two particles.
	Source selection	$50 %$ chance	Probability for each pair to inherit from a neighbor.
S3: LeadershipInteraction (`leadership_crossover`)	Direction probability	$P_{l e a d e r} = 0.8$	Frequency of attraction toward the global best.
	Distance threshold	$τ_{D} = 0.1$	Toggle between global attraction and local refinement.
	Mixing (Distant)	$w \in [0.6, 0.9]$	Acceleration toward the leader if the particle is far.
	Mixing (Close)	$w \in [0.2, 0.4]$	Fine exploitation if the particle is nearby.
	Pair ratio	$90 %$	Proportion of dimensions oriented toward the leader.
Global Control	OCO Frequency	30 iterations	Time interval between two learning phases.
	Soft Elitism	$w \in [0.6, 0.9]$	Mixing intensity used to rescue the worst particle.
	Convergence threshold	$ϵ = 10^{- 6}$	Minimum fitness precision to validate progress.
	Stagnation stop	15 iterations	Cycle limit without improvement before termination.

Table 8. Performance comparison between PSO and OCO-PSO across different datasets.

Dataset	Algorithm	Accuracy	MCC
Agricultural Yield	PSO	0.83	0.67
	OCO-PSO	0.89	0.87
Ionosphere	PSO	0.91	0.82
	OCO-PSO	0.95	0.91
Sonar	PSO	0.76	0.52
	OCO-PSO	0.85	0.71
Imbalance_data	PSO	0.74	0.69
	OCO-PSO	0.98	0.88

Table 14. Extended results of OCO-PSO and baseline models across all datasets.

Metric/Model	Imbalance_Data	Sonar	Agric_Yield_Data	Diabetes	Ionosphere
OCO-PSO
Accuracy	0.980	0.857	0.892	0.733	0.957
F1-Macro	0.939	0.857	0.890	0.720	0.954
Precision	0.989	0.859	0.899	0.717	0.950
Recall	0.900	0.859	0.887	0.737	0.958
MCC	0.885	0.718	0.786	0.454	0.908
Train (s)	169	160	932	1443	225
Pred (s)	0.0022	0.0000	0.0000	0.0060	0.0013
Viol.	0.00275	0.09461	0.01043	0.00025	0.0064
C	3.00	10.00	8.00	6.00	2.00
$γ$	0.570	0.028	0.0021	0.0033	0.062
SVC
Accuracy	0.930	0.833	0.875	0.707	0.943
F1-Macro	0.850	0.831	0.874	0.658	0.938
Precision	0.794	0.841	0.785	0.672	0.938
Recall	0.961	0.830	0.872	0.652	0.938
MCC	0.737	0.671	0.750	0.323	0.876
Train (s)	6.07	37.13	36.64	9.09	22.29
Pred (s)	0.000	0.006	0.000	0.009	0.000
Viol.	–	–	–	–	–
C	5.22	10.00	3.33	1.26	10.00
$γ$	0.080	0.027	0.050	0.027	0.012
SGDClassifier
Accuracy	0.900	0.738	0.875	0.741	0.886
F1-Macro	0.804	0.725	0.874	0.698	0.876
Precision	0.750	0.768	0.876	0.715	0.876
Recall	0.944	0.730	0.873	0.690	0.876
MCC	0.667	0.496	0.749	0.404	0.751
Train (s)	0.265	58.51	48.23	2.01	35.30
Pred (s)	0.000	0.003	0.000	0.000	0.000
Viol.	–	–	–	–	–
C	–	–	–	–	0.00
$γ$	–	–	–	–	0.00
Decision Tree
Accuracy	0.960	0.690	0.800	0.681	0.886
F1-Macro	0.889	0.686	0.799	0.635	0.870
Precision	0.889	0.693	0.799	0.642	0.891
Recall	0.889	0.686	0.799	0.632	0.858
MCC	0.778	0.379	0.598	0.274	0.748
Train (s)	0.447	39.44	48.85	2.53	35.64
Pred (s)	0.010	0.004	0.000	0.009	0.000
Viol.	–	–	–	–	–
C	–	–	–	–	0.00
$γ$	–	–	–	–	0.00
Random Forest
Accuracy	0.960	0.857	0.850	0.707	0.957
F1-Macro	0.889	0.852	0.847	0.652	0.954
Precision	0.889	0.893	0.858	0.672	0.950
Recall	0.889	0.850	0.845	0.646	0.958
MCC	0.778	0.742	0.702	0.317	0.908
Train (s)	2.135	60.78	53.64	6.01	40.80
Pred (s)	0.016	0.020	0.000	0.024	0.016
Viol.	–	–	–	–	–
C	–	–	–	–	0.00
$γ$	–	–	–	–	0.00

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nejjar, K.; Jebari, K.; Rekiek, S. A Robust Hybrid Metaheuristic Framework for Training Support Vector Machines. Algorithms 2026, 19, 70. https://doi.org/10.3390/a19010070

AMA Style

Nejjar K, Jebari K, Rekiek S. A Robust Hybrid Metaheuristic Framework for Training Support Vector Machines. Algorithms. 2026; 19(1):70. https://doi.org/10.3390/a19010070

Chicago/Turabian Style

Nejjar, Khalid, Khalid Jebari, and Siham Rekiek. 2026. "A Robust Hybrid Metaheuristic Framework for Training Support Vector Machines" Algorithms 19, no. 1: 70. https://doi.org/10.3390/a19010070

APA Style

Nejjar, K., Jebari, K., & Rekiek, S. (2026). A Robust Hybrid Metaheuristic Framework for Training Support Vector Machines. Algorithms, 19(1), 70. https://doi.org/10.3390/a19010070

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Robust Hybrid Metaheuristic Framework for Training Support Vector Machines

Abstract

1. Introduction

1.1. Motivation and Incitement

1.2. The Relevant Literature

1.3. Major Research Gaps

1.4. Contributions and Organization of the Paper

2. Support Vector Machines

2.1. Primal Formulation of SVM

2.2. Dual Formulation and Kernel Trick

2.3. Critical Review of Dominant Training Methods for SVMs

2.3.1. Sequential Minimal Optimization (SMO)

2.3.2. Stochastic Gradient Descent (SGD)

2.3.3. Analysis and Critical Discussion

2.3.4. Critical Analysis and Research Gap

3. Proposed Hybrid Approach: The OCO-PSO Model

3.1. Formulation of the Optimization Problem

3.2. Algorithmic Framework

3.3. PSO Dynamics with Paired-Coordinate Constraint Management

3.4. Open Competency Optimization (OCO)

3.4.1. Self-Learning (Mutation)

3.4.2. Neighbor Learning (Peer Crossover)

3.4.3. Leadership Interaction (Leader Crossover)

3.4.4. Constraint Enforcement

4. Experimental Studies

4.1. Dataset Composition

4.1.1. Diabetes Dataset

4.1.2. Synthetic Agricultural Yield Dataset

4.1.3. Sonar Dataset

4.1.4. Ionosphere Dataset

4.1.5. Imbalanced Dataset

4.2. Preprocessing Pipeline and Reproducibility

4.3. Model Configurations and Evaluation Protocol

4.3.1. OCO-PSO Model

4.3.2. Baseline Models

4.3.3. Evaluation Metrics

4.3.4. Reproducibility

5. Results

5.1. Performance Metrics

5.2. Ionosphere Dataset

5.3. Hyperparameter Settings and Sensitivity

5.4. Diabetes Dataset

5.5. Sonar Dataset

5.6. Agricultural Yield Dataset

5.7. Imbalanced Dataset

5.8. Comparison with PSO

5.9. Statistical Analysis

5.9.1. Methodological Framework

Bootstrap Distribution Analysis ( n = 1000 )

Significance Tests and Correction for Multiple Comparisons

Measuring the Size of the Effect

5.9.2. Analysis Results

Dataset 1: Diabetes

Dataset 2: Imbalanced Data

Dataset 3: Ionosphere

Dataset 4: Sonar

Dataset 5: Agricultural Yield Prediction

6. Discussion of Results

6.1. Discriminatory Performance and Fairness

6.2. Parsimony and Interpretability of the Model

6.3. Respect for Constraints and the Stability of the Optimization

6.4. Cost Calculation and Practical Considerations

6.5. Limitations and Comparative Context

7. Conclusions

7.1. Future Directions

7.2. Considerations for Crop Yield Prediction

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Bootstrap Distribution Analysis ( $n = 1000$ )