Next Article in Journal
Analyzing Digital Political Campaigning Through Machine Learning: An Exploratory Study for the Italian Campaign for European Union Parliament Election in 2024
Next Article in Special Issue
Role of Roadside Units in Cluster Head Election and Coverage Maximization for Vehicle Emergency Services
Previous Article in Journal
Advances in Federated Learning: Applications and Challenges in Smart Building Environments and Beyond
Previous Article in Special Issue
A Conceptual Study of Rapidly Reconfigurable and Scalable Optical Convolutional Neural Networks Based on Free-Space Optics Using a Smart Pixel Light Modulator
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Introducing a New Genetic Operator Based on Differential Evolution for the Effective Training of Neural Networks

by
Ioannis G. Tsoulos
1,*,
Vasileios Charilogis
1 and
Dimitrios Tsalikakis
2
1
Department of Informatics and Telecommunications, University of Ioannina, 45110 Ioannina, Greece
2
Department of Engineering Informatics and Telecommunications, University of Western Macedonia, 50100 Kozani, Greece
*
Author to whom correspondence should be addressed.
Computers 2025, 14(4), 125; https://doi.org/10.3390/computers14040125
Submission received: 7 March 2025 / Revised: 19 March 2025 / Accepted: 21 March 2025 / Published: 28 March 2025
(This article belongs to the Special Issue Emerging Trends in Machine Learning and Artificial Intelligence)

Abstract

:
Artificial neural networks are widely established models used to solve a variety of real-world problems in the fields of physics, chemistry, etc. These machine learning models contain a series of parameters that must be appropriately tuned by various optimization techniques in order to effectively address the problems that they face. Genetic algorithms have been used in many cases in the recent literature to train artificial neural networks, and various modifications have been made to enhance this procedure. In this article, the incorporation of a novel genetic operator into genetic algorithms is proposed to effectively train artificial neural networks. The new operator is based on the differential evolution technique, and it is periodically applied to randomly selected chromosomes from the genetic population. Furthermore, to determine a promising range of values for the parameters of the artificial neural network, an additional genetic algorithm is executed before the execution of the basic algorithm. The modified genetic algorithm is used to train neural networks on classification and regression datasets, and the results are reported and compared with those of other methods used to train neural networks.

1. Introduction

Artificial neural networks are machine learning models and have been widely used in recent decades to solve a number of problems [1,2]; they are parametric models commonly defined as N x , w . The vector x stands for the input pattern, and the vector w represents the associated set of parameters that should be calculated by any optimization method. The calculation is performed by minimizing the so-called training error, expressed as
E N x , w = i = 1 M N x i , w y i 2
The values x i , y i , i = 1 , , M form the training set of the problem, where y i represents the expected outputs for each pattern x i .
Artificial neural networks have been applied to a wide variety of problems in various fields, such as physics [3], astronomy [4], chemistry [5], economics [6], and medicine [7,8]. Equation (1) has been minimized by various methods in the relevant literature. Among them, one can find the backpropagation method [9,10], the RPROP method [11,12,13], quasi-Newton methods [14,15], Simulated Annealing [16], Particle Swarm Optimization (PSO) [17,18], genetic algorithms [19,20], differential evolution [21], Ant Colony Optimization [22], the Gray Wolf Optimizer [23], whale optimization [24], etc. Moreover, Zhang et al. proposed a hybrid algorithm that combines PSO and the backpropagation algorithm for neural network training [25]. Also, many researchers have recently proposed methods that take advantage of parallel processing units in order to speed up the training process [26,27]. Moreover, Kang et al. recently proposed a hybrid method that combines the lattice Boltzmann method [28] and various machine learning methods with the neural networks included in them with good approximation abilities [29]. Furthermore, a series of papers have recently been published that tackle the initialization procedure for the parameters of neural networks. These methods include decision trees [30], the incorporation of Cauchy’s inequality [31], discriminant learning [32], the usage of polynomial bases [33], and the usage of intervals [34]. A systematic review of initialization methods can be found in the work of Narkhede et al. [35].
Additionally, determining the optimal architecture of an artificial neural network can effectively contribute to its training: on the one hand, it will reduce the required training time, and, on the other hand, it will eliminate the problem of overfitting. To tackle this problem, a number of researchers have proposed many methods, such as genetic algorithms [36,37], the application of the PSO method [38], and the application of reinforcement learning [39]. Also, Tsoulos et al. proposed the use of the grammatical evolution technique [40] to construct artificial neural networks [41].
This paper proposes a two-stage technique for the efficient training of artificial neural networks. In the first stage, a genetic algorithm is used to efficiently identify a range of values within which the parameters of the artificial neural network should be optimized. In the second stage, a genetic algorithm is used to optimize these parameters, and it uses a new operator to enhance the results. This new operator is based on the differential evolution technique [42], and it is applied periodically to randomly selected chromosomes of the genetic population. The first stage of the technique is necessary to ensure that the parameters of the artificial neural network are trained within a range of values, which will prevent their overfitting as much as possible. In the second stage, the differential evolution method is selected as the base of the new operator. This method is an evolutionary technique widely used in a series of practical problems, such as community detection [43], structure prediction [44], motor fault diagnosis [45], and clustering techniques [46]. Furthermore, this method is chosen as the basis for the new genetic operator due to the small number of required parameters that must be specified by the user.
The remainder of this article is organized as follows: In Section 2, the proposed method is discussed in detail. In Section 3, the used datasets, as well as the conducted experiment, are discussed. Finally, in Section 4, the conclusions are presented.

2. Method Description

The two phases of the proposed method are analyzed in detail in this section. During the first phase, a genetic algorithm is utilized in order to detect a promising interval of values for the parameters of a neural network. In the second phase, a genetic algorithm that incorporates the suggested operator is applied to minimize the training error of the neural network and the parameters are initialized inside the interval located during the first phase.

2.1. The First Phase of the Proposed Method

In the first phase of the proposed technique, a genetic algorithm is used to identify a range of values for the parameters of the artificial neural network. Genetic algorithms are evolutionary methods, where a series of randomly created candidate solutions, those called chromosomes, are evolved repetitively through a series of steps similar to natural processes such as selection, crossover, and mutation. Genetic algorithms have been used successfully in a series of real-world problems, such as the placement of wind turbines [47], water distribution [48], economics [49], neural network training [50] etc. The neural networks adopted in this manuscript have the following form, as proposed in [41]:
N x , w = i = 1 H w ( d + 2 ) i ( d + 1 ) σ j = 1 d x j w ( d + 2 ) i ( d + 1 ) + j + w ( d + 2 ) i
where the value H denotes the total number of processing units in this network and the value d defines the number of inputs for the pattern x . Hence, the total number of parameters for this network is n = ( d + 2 ) H . The function σ ( x ) is the sigmoid function, defined as follows:
σ ( x ) = 1 1 + exp ( x )
The steps of the algorithm of the first phase are as follows:
  • Initialization step.
    (a)
    Set the number of chromosomes N c and the maximum number of allowed generations N g .
    (b)
    Set the selection rate p s and the mutation rate p m .
    (c)
    Set the margin factor a, where a 1 .
    (d)
    Set  k = 0 as the generation counter.
    (e)
    Initialize the chromosomes randomly, g i , i = 1 , , N c . Each chromosome is a vector of parameters for the artificial neural network.
  • Fitness calculation step.
    (a)
    For  i = 1 , , N c  do
    i.
    Create the neural network N i x , g i for the chromosome g i .
    ii.
    Calculate the associated fitness value f i as
    f i = j = 1 M N i x j , g i y j 2
    for the pairs x j , y j , j = 1 , , M of the training set.
    (b)
    End For
  • Genetic operations step.
    (a)
    Transfer the best 1 p s × N c chromosomes of the current generation to the next one. The remaining will be replaced by chromosomes produced in crossover and mutation.
    (b)
    Perform the crossover procedure. During this procedure, for each pair of constructed chromosomes z ˜ , w ˜ , two chromosomes will be selected from the current population using tournament selection. The production of the new chromosomes is performed using a process suggested by Kaelo et al. [51].
    (c)
    Perform the mutation procedure. During the mutation procedure, for each element of each chromosome, a random number r [ 0 , 1 ] is selected. The corresponding element is altered randomly when r p m .
  • Termination check step.
    (a)
    Set  k = k + 1 .
    (b)
    If  k N g , then goto the Fitness calculation step.
  • Margin creation step.
    (a)
    Obtain the best chromosome g * with the lowest fitness value.
    (b)
    Create the vectors L * and R * as follows:
    L i * = a g i * , i = 1 , , n R i * = a g i * , i = 1 , , n

2.2. The Second Phase of the Proposed Method

During the second phase, a second genetic algorithm is used to minimize the training error of the neural network. The parameters of the neural network are initialized inside the vectors L * and R * produced in the previous phase of the algorithm. Also, a novel stochastic genetic operator, which is based on the Differential Evolution approach, is applied periodically to the genetic population. This new stochastic operator is used to improve the performance of randomly selected chromosomes and to speed up the overall genetic algorithm in finding the global minimum. The main steps of the algorithm executed on the second phase are as follows:
  • Initialization step.
    (a)
    Set the number of chromosomes N c and the maximum number of allowed generations N g .
    (b)
    Set the selection rate p s 1 and the mutation rate p m 1 .
    (c)
    Set the crossover probability CR, used in the new genetic operator.
    (d)
    Set the differential weight F that will be used in the novel genetic operator.
    (e)
    Set  N i as the number of generations before the application of the new operator.
    (f)
    Set  N l as the number of chromosomes that will participate in the new operator.
    (g)
    Initialize the g i , i = 1 , , N c chromosomes inside the vectors L * and R * of the previous phase.
    (h)
    Set  k = 0 the generation counter.
  • Fitness calculation step.
    (a)
    For  i = 1 , , N c  do
    i.
    Produce the corresponding neural network N i x , g i for the chromosome g i .
    ii.
    Calculate the fitness value f i as
    f i = j = 1 M N i x j , g i y j 2
    (b)
    End For
  • Application of genetic operators.
    (a)
    Copy the best 1 p s × N c chromosomes with the lowest fitness values to the next generation. The remaining will be replaced by chromosomes produced in crossover and mutation.
    (b)
    Apply the same crossover procedure as in the algorithm of the first phase.
    (c)
    Apply the same mutation procedure as in the genetic algorithm of the first phase.
  • Application of the novel genetic operator.
    (a)
    If  k mod N i = 0  then
    i.
    Create the set C = z 1 , z 2 , , z N c r of N l randomly selected chromosomes.
    ii.
    For  i = 1 , , N l apply the deOperator of Algorithm 1 to every chromosome z i C .
    (b)
    End if
  • Termination check step.
    (a)
    Set  k = k + 1 .
    (b)
    If  k N g goto Fitness calculation step.
  • Testing step.
    (a)
    Obtain the best chromosome g * from the genetic population.
    (b)
    Create the corresponding neural network N * x j , g * .
    (c)
    Apply this neural network to the test set of the objective problem and report the error.
Algorithm 1 The Proposed Genetic Operator
Function deOperator g , F , C R
  • Select three distinct chromosomes a , b , c from the current population using tournament selection.
  • Set  R [ 1 , n ] a randomly selected integer.
  • Set  t = g , as the trial chromosome.
  • For  i = 1 , , n  do
    (a)
    Select  r [ 0 , 1 ] a random number.
    (b)
    If  i = R or r C R  then  t i = a i + F × b i c i
    (c)
    Set  t f = j = 1 M N i x j , t y j 2
    (d)
    If  t f f g then g = t .
  • End For
  • Return g.
End Function
The steps of all phases are also graphically illustrated in Figure 1.

3. Experiments

To demonstrate the dynamics and reliability of the proposed methodology, a series of experiments were carried out on known datasets from the relevant literature. These datasets were obtained from the following databases:

3.1. Experimental Datasets

The following series of classification datasets were used in the conducted experiments:
  • The Alcohol dataset, which is related to experiments on alcohol consumption [54].
  • The Appendicitis dataset, which is a medical dataset [55].
  • The Australian dataset, which is used in bank transactions [56].
  • The Balance dataset, which contains measurements from various psychological experiments [57].
  • The Circular dataset, which was created artificially.
  • The Cleveland dataset, which is a medical dataset [58,59].
  • The Dermatology dataset, which is a medical dataset regarding dermatology problems [60].
  • The Ecoli dataset, which is used in protein problems [61].
  • The Fert dataset, related to the detection of relations between sperm concentration and demographic data.
  • The Haberman dataset, which is related to the detection of breast cancer.
  • The Hayes roth dataset [62].
  • The Heart dataset, which is related to some heart diseases [63].
  • The HouseVotes dataset, related to data from congressional voting in USA [64].
  • The Ionosphere dataset, that contains measurements from the ionosphere [65,66].
  • The Liverdisorder dataset, which is a medical dataset [67,68].
  • The Lymography dataset [69].
  • The Mammographic dataset, which is a medical dataset [70].
  • The Parkinsons dataset, that was used in the detection of Parkinson’s disease [71,72].
  • The Pima dataset, a medical dataset related to the detection of diabetes’s disease [73].
  • The Popfailures dataset, related to climate model simulations [74].
  • The Regions2 dataset, related to some diseases in liver [75].
  • The Saheart dataset, related to some heart diseases [76].
  • The Segment dataset, related to image processing [77].
  • The Sonar dataset, used to discriminate sonar signals [78].
  • The Spiral dataset, which was created artificially.
  • The StatHeart dataset, a medical dataset regarding heart diseases.
  • The Student dataset, which is related to experiments conducted in schools [79].
  • The WDBC dataset, which is related to the detection of cancer [80].
  • The Wine dataset, used to detection of the quality of wines [81,82].
  • The EEG dataset, which contains various EEG measurements [83,84]. From this dataset, the following cases were utilized: Z_F_S, ZO_NF_S and ZONF_S.
  • The ZOO dataset, which is used for animal classification [85].
Also, the following regression datasets were incorporated in the conducted experiments:
  • The Abalone dataset, that was used to predict the age of abalones [86].
  • The Airfoil dataset, derived from NASA [87].
  • The Baseball dataset, used to predict the salary of baseball players.
  • The BK dataset, related to basketball games [88].
  • The BL dataset, related to some electricity experiments.
  • The Concrete dataset, which is related to civil engineering [89].
  • The Dee dataset, which is related to the price of electricity.
  • The Housing dataset, related to the price of houses [90].
  • The Friedman dataset, used in various benchmarks [91].
  • The FY dataset, related to fruit flies.
  • The HO dataset, obtained from the STATLIB repository.
  • The Laser dataset, related to laser experiments.
  • The LW dataset, related to the prediction of the weight of babes.
  • The MB dataset, which was obtained from Smoothing Methods in Statistics.
  • The Mortgage dataset, which is an economic dataset.
  • The Plastic dataset, related to the pressure in plastics.
  • The PY dataset [92].
  • The PL dataset, obtained from the STATLIB repository.
  • The Quake dataset, used to detect the strength of earthquakes.
  • The SN dataset, which is related to trellising and pruning.
  • The Stock dataset, used to estimate the price of stocks.
  • The Treasury dataset, which is an economic dataset.
  • The VE dataset, obtained from the STATLIB repository.

3.2. Experimental Results

The code used in the conducted experiments was written in ANSI C++ and all runs were performed 30 times using a different seed for the random number generator each time. The validation of the experiments was done using the well-known method of 10-fold cross validation. All the experiments were conducted on a Linux machine with 128 GB of ram. In the case of classification datasets, the average classification error is reported in the experimental tables. This error is calculated through the following equation:
E C N ( w , x ) = 100 × i = 1 K class N w , x i y i K
where the set T = x i , y i , i = 1 , , K denotes the test set of the objective problem. In regression datasets, the average regression error as calculated on the test set is reported and it is denoted as:
E R N ( w , x ) = i = 1 K N w , x i y i 2 K
The values for the parameters of the proposed method are shown in Table 1. In the experimental tables, the following notation is used:
  • The column DATASET represents the objective problem.
  • The column ADAM denotes the incorporation of the ADAM optimizer [93] to train a neural network with H = 10 processing nodes.
  • The column BFGS represents the application of the BFGS optimizer [94] to train a neural network with H = 10 processing nodes.
  • The column GENETIC denotes the usage of a genetic algorithm with the same set of parameters as shown in Table 1 to train an artificial neural network with H = 10 processing nodes.
  • The column NEAT is used for the application of the NEAT method (NeuroEvolution of Augmenting Topologies) [95].
  • The row AVERAGE is used for the average classification or regression error for all datasets.
Also, for the comparisons between methods, the Wilcoxon signed-rank test was used, which is a non-parametric test for paired data. Each method (“PROPOSED”, “BFGS”, etc.) was evaluated on the same dataset (dependent measurements), which is why a paired observations test was selected. The Mann–Whitney U test or t-test were not applied, as the former is for independent groups and the latter requires normally distributed data conditions that are not met here due to the use of a non-parametric test.
The experimental results for the classification datasets are shown in Table 2 and the experimental results for the regression datasets are shown in Table 3.
Also, in Figure 2 and Figure 3, the statistical comparison for the experimental results is outlined graphically.
In the scientific analysis of the experimental results, the proposed model (PROPOSED) demonstrates statistically significant superiority over other methods (ADAM, BFGS, GENETIC, NEAT, RBF) in both the classification and regression datasets. For classification, the PROPOSED model achieves a mean error rate of 20.02%, compared to 25.48–33.01% for the other methods, with extremely low p-values (e.g., p = 2.3 × 10 10 against BFGS). In regression, the model’s mean absolute error (4.64) is significantly lower than that of the comparative methods (12.8–25.12), with strong statistical significance p < 10 5 . However, in certain datasets (e.g., CLEVELAND, FRIEDMAN), the model’s performance decreases, indicating dependence on data characteristics.
In the analysis of the experiments (Figure 2) for classification datasets, the proposed model (PROPOSED) demonstrates statistically significant superiority over all comparative methods (BFGS, ADAM, NEAT, GENETIC), with extremely low p-values ( p = 2.3 × 10 10 to p = 4.4 × 10 6 ). This indicates strong differences at a confidence level >99.9%, particularly against BFGS and ADAM p < 10 9 . For regression datasets (Figure 3), PROPOSED maintains significant superiority over all methods, though with slightly higher p-values p = 10 5 t o p = 2.4 × 10 7 . The largest difference is observed against NEAT p = 2.4 × 10 7 , while the smallest is against GENETIC p = 8.6 × 10 5 .
Moreover, in Figure 4, the a comparison of execution times is graphically outlined between the simple genetic algorithm and the proposed method for various values of the critical parameter N I . The dataset used in this experiment is the WINE classification dataset. As expected, increasing the value of this parameter also leads to a reduction in the required execution time for the proposed method, since the new genetic operator is applied more and more sparsely to the population chromosomes.

3.3. Experiments with the Differential Weight

An additional experiment was performed to demonstrate the reliability of the proposed methodology. In this experiment, three different differential weight F calculation techniques were used for the proposed operator. The range of this parameter was defined as [0, 2] in the work of Storn and Price [96]. These techniques are the following:
  • FIXED, which is the default technique in the proposed method. In this technique, the value F = 0.8 is used for the differential weight.
  • ADAPTIVE, where the adaptive calculation of the parameter F is used as proposed in [97].
  • RANDOM, where the stochastic calculation of parameter F as proposed in [98] is used.
In Table 4, the results from the application of the proposed method using the previously mentioned techniques for the differential weight are depicted for the classification datasets. Similarly, the same method is applied to the regression datasets and the results are presented in Table 5.
Furthermore, the statistical comparison of these experimental tables is outlined in Figure 5 and Figure 6, respectively.
When comparing differential weight calculation methods (FIXED, ADAPTIVE, RANDOM), the ADAPTIVE method shows slightly better average performance (19.91% in classification, 4.59 in regression) compared to FIXED (20.02%, 4.64) and RANDOM (19.97%, 4.59). However, the differences are minimal and often statistically insignificant (e.g., p = 0.83 for FIXED vs. ADAPTIVE in classification). In some datasets, such as Lymography and HouseVotes, the RANDOM method outperforms others, highlighting the need to tailor the method to the specific problem. In regression, the only significant difference occurs between FIXED and ADAPTIVE ( p = 0.041 ).
For this experiment, as shown in Figure 5, the differences between FIXED and ADAPTIVE ( p = 0.83 ) and ADAPTIVE vs RANDOM ( p = 0.94 ) are not statistically significant, suggesting equivalent performance. However, the comparison FIXED vs. RANDOM ( p = 2.3 × 10 10 ) reveals a highly significant difference, likely due to the random nature of RANDOM. For regression datasets (Figure 6), the only significant difference occurs between FIXED and ADAPTIVE ( p = 0.041 ), while the remaining comparisons (FIXED vs. RANDOM: p = 0.75 , ADAPTIVE vs. RANDOM: p = 0.31 ) lack statistical significance.

3.4. Experiments with the Margin Factor a

An additional experiment was executed to outline the performance and the stability of the proposed method. In this experiment, the margin factor denoted as a in the proposed method was varied from a = 1.0 (which is the default value) to a = 4.0 . The experimental results for the classification datasets are shown in Table 6 and regression dataset results are in Table 7.
Also, the statistical comparisons for these experiments are shown graphically in Figure 7 and Figure 8, respectively.
Table 6 presents the percentage error values corresponding to various classification datasets for different values of the parameter a, which determines the optimal value boundaries in the proposed machine learning model. The last row of the table includes the average error for each value of the parameter a. The data analysis reveals that an increase in the parameter a generally leads to a rise in the average error rate. Specifically, the average increases from 20.02% for α = 1.0 to 21.38% for α = 3.5 , while for α = 4.0 , it slightly decreases to 21.00%. This indicates that, despite the general upward trend, there are cases where further increasing the parameter can reduce the error. A detailed examination of the data by dataset shows significant variations among different values of the parameter a. For example, in the Circular dataset, the error gradually increases from 3.69% for α = 1.0 to 4.70% for α = 4.0 . Similarly, in the Wine dataset, the error value increases from 8.55% for α = 1.0 to 11.02% for α = 4.0 , indicating a clear negative impact of increasing the parameter. However, in other datasets, the effect of the parameter is neither as linear nor as pronounced. For instance, in the Hayes Roth dataset, the error decreases from 39.69% for α = 2.0 to 35.49% for α = 2.5 , before increasing again. Similarly, in the Lymography dataset, the error significantly decreases for α = 4.0 (26.69%) compared to the other parameter values. These data demonstrate that the relationship between the parameter a and the error is not linear across all datasets and depends on the specific characteristics of each dataset. Overall, the statistical analysis indicates that while increasing the parameter a is often associated with higher errors, there are exceptions that may be linked to specific properties of the data or the model.
Table 7 presents the absolute error values for various regression datasets in relation to different values of the parameter a, which determines the optimal value boundaries in the proposed machine learning model. The last row lists the average errors for each value of the parameter a. The table analysis shows that an increase in the parameter a is accompanied by a general increase in the average error. Specifically, the average error increases from 4.64 for α = 1.0 to 5.64 for α = 4.0 , suggesting that higher parameter values may lead to worse model performance. Examining the individual datasets reveals variability in the effect of the parameter a. For example, in the BASEBALL dataset, the error continuously increases from 67.45 for α = 1.0 to 86.54 for α = 4.0 , indicating a clear negative effect of increasing the parameter. Conversely, in the STOCK dataset, an increase in the parameter is accompanied by a decrease in error, from 3.47 for α = 1.0 to 3.35 for α = 4.0 . In other datasets, the effect is less straightforward. For example, in the MORTGAGE dataset, the error increases from 0.31 for α = 1.0 to 0.68 for α = 3.5 but slightly decreases to 0.62 for α = 4.0 . Overall, the statistical analysis suggests that the effect of the parameter a depends on the characteristics of each dataset and that, although increasing the parameter is often associated with higher errors, there are cases where performance remains stable or improves slightly.
In Figure 7, the results pertain to classification datasets and various models. From the comparisons between different values of the parameter a, it is observed that the comparison between a = 1.0 and a = 1.5 does not exhibit a statistically significant difference ( p = 0.38 ). In contrast, the comparison between a = 1.5 and a = 2.0 shows a significant difference ( p = 0.018 ), as does the comparison between a = 2.0 and a = 2.5 ( p = 0.045 ). The remaining comparisons, specifically between a = 2.5 and a = 3.0 ( p = 0.25 ), a = 3.0 and a = 3.5 ( p = 0.27 ), and a = 3.5 and a = 4.0 ( p = 0.39 ), do not demonstrate statistically significant differences.
In Figure 8, the data refer to regression datasets and comparisons for various models. The results indicate that the comparison between a = 1.0 and a = 1.5 is not statistically significant ( p = 0.55 ), whereas the difference between a = 1.5 and a = 2.0 is statistically significant ( p = 0.014 ). The comparison between a = 2.0 and a = 2.5 does not show statistical significance ( p = 0.17 ), nor do the comparisons between a = 2.5 and a = 3.0 ( p = 0.2 ), a = 3.0 and a = 3.5 ( p = 0.16 ), and a = 3.5 and a = 4.0 ( p = 0.31 ). Overall, the p values highlight significant differences in specific cases, while in others, the differences remain statistically insignificant.

4. Conclusions

The study confirms the superiority of the proposed model compared to existing methods in both classification and regression tasks. This superiority is demonstrated by a statistically significant reduction in error, although the model’s performance heavily depends on data characteristics such as complexity, stratification, or the presence of noise. This highlights the importance of adapting the model to the specificities of each problem, as overly generalized approaches may lead to suboptimal results. Additionally, the choice of methods for calculating critical parameters, such as differential weights, appears to have a relatively limited impact on overall performance. Differences between approaches (e.g., FIXED, ADAPTIVE, RANDOM) are minimal and often statistically insignificant, suggesting that optimization efforts could focus on other factors. Tuning the parameter “a”, however, emerges as a critical factor. Higher values of this parameter often improve accuracy, particularly for large or complex datasets, but they may also introduce instability in applications where data sensitivity is high (e.g., financial forecasting). This dual behavior underscores the need to balance flexibility and robustness during the tuning process. Despite its analytical approach, the study does not sufficiently explore the potential real-world applications of the proposed model beyond benchmark datasets. Addressing this gap, future research could expand the methodology to practical domains such as biomedical data analysis, social networks, or dynamic systems, including real-time monitoring and financial forecasting. Investigating the model’s adaptability and scalability in these fields could enhance its utility in real-world scenarios.
For future research, it would be beneficial to develop dynamic algorithms that automatically adjust the parameter a based on data dynamics. For example, machine learning mechanisms that analyze data variability or heterogeneity in real-time could optimize the value of a without human intervention, ensuring both stability and high performance. Additionally, the development of hybrid methods that combine the adaptability of ADAPTIVE approaches with the simplicity of FIXED approaches could result in more resilient solutions capable of addressing a broader range of problems. It is also important to investigate how specific data characteristics—such as noisy measurements, class imbalances, or lack of labeling—affect the model’s effectiveness. Such an analysis would support the development of adaptive strategies for data preprocessing or augmentation. Furthermore, exploring the generalizability of results to more complex environments, such as time-series data or dynamic systems, remains an open research avenue. Finally, integrating explainable artificial intelligence (XAI) techniques, such as feature contribution analysis or visualization tools, could enhance the model’s transparency. This would not only facilitate the interpretation of results by experts but also aid in identifying optimal parameter-tuning practices, making the model a more predictable and reliable tool for real-world applications.

Author Contributions

V.C. and I.G.T. conducted the experiments, employing several datasets, and provided the comparative experiments. D.T. and V.C. performed the statistical analysis and prepared the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been financed by the European Union: Next Generation EU through the Program Greece 2.0 National Recovery and Resilience Plan, under the call RESEARCH–CREATE–INNOVATE, project name “iCREW: Intelligent small craft simulator for advanced crew training using Virtual Reality techniques” (project code: TAEDK-06195).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Abiodun, O.I.; Jantan, A.; Omolara, A.E.; Dada, K.V.; Mohamed, N.A.; Arshad, H. State-of-the-art in artificial neural network applications: A survey. Heliyon 2018, 4, e00938. [Google Scholar]
  2. Suryadevara, S.; Yanamala, A.K.Y. A Comprehensive Overview of Artificial Neural Networks: Evolution, Architectures, and Applications. Rev. Intel. Artif. Med. 2021, 12, 51–76. [Google Scholar]
  3. Baldi, P.; Cranmer, K.; Faucett, T.; Sadowski, P.; Whiteson, D. Parameterized neural networks for high-energy physics. Eur. Phys. J. C 2016, 76, 235. [Google Scholar]
  4. Firth, A.E.; Lahav, O.; Somerville, R.S. Estimating photometric redshifts with artificial neural networks. Mon. Not. R. Astron. Soc. 2003, 339, 1195–1202. [Google Scholar] [CrossRef]
  5. Shen, Y.; Bax, A. Protein backbone and sidechain torsion angles predicted from NMR chemical shifts using artificial neural networks. J. Biomol. NMR 2013, 56, 227–241. [Google Scholar]
  6. Huang, Z.; Chen, H.; Hsu, C.-J.; Chen, W.-H.; Wu, S. Credit rating analysis with support vector machines and neural networks: A market comparative study. Decis. Support Syst. 2004, 37, 543–558. [Google Scholar]
  7. Baskin, I.I.; Winkler, D.; Tetko, I.V. A renaissance of neural networks in drug discovery. Expert Opin. Drug Discov. 2016, 11, 785–795. [Google Scholar]
  8. Bartzatt, R. Prediction of Novel Anti-Ebola Virus Compounds Utilizing Artificial Neural Network (ANN). Chem. Fac. Publ. 2018, 49, 16–34. [Google Scholar]
  9. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar]
  10. Chen, T.; Zhong, S. Privacy-Preserving Backpropagation Neural Network Learning. IEEE Trans. Neural Netw. 2009, 20, 1554–1564. [Google Scholar]
  11. Riedmiller, M.; Braun, H. A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP algorithm. In Proceedings of the IEEE International Conference on Neural Networks, San Francisco, CA, USA, 28 March–1 April 1993; pp. 586–591. [Google Scholar]
  12. Pajchrowski, T.; Zawirski, K.; Nowopolski, K. Neural Speed Controller Trained Online by Means of Modified RPROP Algorithm. IEEE Trans. Ind. Inform. 2015, 11, 560–568. [Google Scholar]
  13. Hermanto, R.P.S.; Suharjito; Diana; Nugroho, A. Waiting-Time Estimation in Bank Customer Queues using RPROP Neural Networks. Procedia Comput. Sci. 2018, 135, 35–42. [Google Scholar]
  14. Robitaille, B.; Marcos, B.; Veillette, M.; Payre, G. Modified quasi-Newton methods for training neural networks. Comput. Chem. Eng. 1996, 20, 1133–1140. [Google Scholar]
  15. Liu, Q.; Liu, J.; Sang, R.; Li, J.; Zhang, T.; Zhang, Q. Fast Neural Network Training on FPGA Using Quasi-Newton Optimization Method. IEEE Trans. Very Large Scale Integr. (Vlsi) Syst. 2018, 26, 1575–1579. [Google Scholar]
  16. Kuo, C.L.; Kuruoglu, E.E.; Chan, W.K.V. Neural Network Structure Optimization by Simulated Annealing. Entropy 2022, 24, 348. [Google Scholar] [CrossRef]
  17. Zhang, C.; Shao, H.; Li, Y. Particle swarm optimization for evolving artificial neural network. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, Nashville, TN, USA, 8–11 October 2000; pp. 2487–2490. [Google Scholar]
  18. Yu, J.; Wang, S.; Xi, L. Evolving artificial neural networks using an improved PSO and DPSO. Neurocomputing 2008, 71, 1054–1060. [Google Scholar]
  19. Leung, F.H.F.; Lam, H.K.; Ling, S.H.; Tam, P.K.S. Tuning of the structure and parameters of a neural network using an improved genetic algorithm. IEEE Trans. Neural Netw. 2003, 14, 79–88. [Google Scholar] [PubMed]
  20. Yao, X. Evolving artificial neural networks. Proc. IEEE 1999, 87, 1423–1447. [Google Scholar]
  21. Wang, L.; Zeng, Y.; Chen, T. Back propagation neural network with adaptive differential evolution algorithm for time series forecasting. Expert Syst. Appl. 2015, 42, 855–863. [Google Scholar]
  22. Salama, K.M.; Abdelbar, A.M. Learning neural network structures with ant colony algorithms. Swarm Intell. 2015, 9, 229–265. [Google Scholar]
  23. Mirjalili, S. How effective is the Grey Wolf optimizer in training multi-layer perceptrons. Appl. Intell. 2015, 43, 150–161. [Google Scholar] [CrossRef]
  24. Aljarah, I.; Faris, H.; Mirjalili, S. Optimizing connection weights in neural networks using the whale optimization Algorithm. Soft Comput. 2018, 22, 1–15. [Google Scholar] [CrossRef]
  25. Zhang, J.-R.; Zhang, J.; Lok, T.-M.; Lyu, M.R. A hybrid particle swarm optimization–back-propagation algorithm for feedforward neural network training. Appl. Math. Comput. 2007, 185, 1026–1037. [Google Scholar] [CrossRef]
  26. Oh, K.S.; Jung, K. GPU implementation of neural networks. Pattern Recognit. 2004, 37, 1311–1314. [Google Scholar] [CrossRef]
  27. Zhang, M.; Hibi, K.; Inoue, J. GPU-accelerated artificial neural network potential for molecular dynamics simulation. Comput. Phys. Commun. 2023, 285, 108655. [Google Scholar]
  28. Krüger, T.; Kusumaatmaja, H.; Kuzmin, A.; Shardt, O.; Silva, G.; Viggen, E.M. The Lattice Boltzmann Method; Springer International Publishing: Berlin/Heidelberg, Germany, 2017; Volume 10, pp. 4–15. [Google Scholar]
  29. Kang, Q.; Li, K.Q.; Fu, J.L.; Liu, Y. Hybrid LBM and machine learning algorithms for permeability prediction of porous media: A comparative study. Comput. Geotech. 2024, 168, 106163. [Google Scholar] [CrossRef]
  30. Ivanova, I.; Kubat, M. Initialization of neural networks by means of decision trees. Knowl.-Based Syst. 1995, 8, 333–344. [Google Scholar]
  31. Yam, J.Y.F.; Chow, T.W.S. A weight initialization method for improving training speed in feedforward neural network. Neurocomputing 2000, 30, 219–232. [Google Scholar] [CrossRef]
  32. Chumachenko, K.; Iosifidis, A.; Gabbouj, M. Feedforward neural networks initialization based on discriminant learning. Neural Netw. 2022, 146, 220–229. [Google Scholar]
  33. Varnava, T.M.; Meade, A.J. An initialization method for feedforward artificial neural networks using polynomial bases. Adv. Adapt. Data Anal. 2011, 3, 385–400. [Google Scholar]
  34. Sodhi, S.S.; Chandra, P. Interval based Weight Initialization Method for Sigmoidal Feedforward Artificial Neural Networks. AASRI Procedia 2014, 6, 19–25. [Google Scholar]
  35. Narkhede, M.V.; Bartakke, P.P.; Sutaone, M.S. A review on weight initialization strategies for neural networks. Artif. Intell. Rev. 2022, 55, 291–322. [Google Scholar]
  36. Arifovic, J.; Gençay, R. Using genetic algorithms to select architecture of a feedforward artificial neural network. Phys. A Stat. Mech. Its Appl. 2001, 289, 574–594. [Google Scholar] [CrossRef]
  37. Benardos, P.G.; Vosniakos, G.C. Optimizing feedforward artificial neural network architecture. Eng. Appl. Artif. Intell. 2007, 20, 365–382. [Google Scholar]
  38. Garro, B.A.; Vázquez, R.A. Designing Artificial Neural Networks Using Particle Swarm Optimization Algorithms. Comput. Neurosci. 2015, 2015, 369298. [Google Scholar] [CrossRef]
  39. Baker, B.; Gupta, O.; Naik, N.; Raskar, R. Designing neural network architectures using reinforcement learning. arXiv 2016, arXiv:1611.02167. [Google Scholar]
  40. O’Neill, M.; Ryan, C. Grammatical evolution. IEEE Trans. Evol. Comput. 2001, 5, 349–358. [Google Scholar]
  41. Tsoulos, I.G.; Gavrilis, D.; Glavas, E. Neural network construction and training using grammatical evolution. Neurocomputing 2008, 72, 269–277. [Google Scholar]
  42. Pant, M.; Zaheer, H.; Garcia-Hernandez, L.; Abraham, A. Differential Evolution: A review of more than two decades of research. Eng. Appl. Artif. Intell. 2020, 90, 103479. [Google Scholar]
  43. Li, Y.H.; Wang, J.Q.; Wang, X.J.; Zhao, Y.L.; Lu, X.H.; Liu, D.L. Community detection based on differential evolution using social spider optimization. Symmetry 2017, 9, 183. [Google Scholar] [CrossRef]
  44. Yang, W.; Siriwardane EM, D.; Dong, R.; Li, Y.; Hu, J. Crystal structure prediction of materials with high symmetry using differential evolution. J. Phys. Condens. Matter 2021, 33, 455902. [Google Scholar]
  45. Lee, C.Y.; Hung, C.H. Feature ranking and differential evolution for feature selection in brushless DC motor fault diagnosis. Symmetry 2021, 13, 1291. [Google Scholar] [CrossRef]
  46. Saha, S.; Das, R. Exploring differential evolution and particle swarm optimization to develop some symmetry-based automatic clustering techniques: Application to gene clustering. Neural Comput. Appl. 2018, 30, 735–757. [Google Scholar] [CrossRef]
  47. Grady, S.A.; Hussaini, M.Y.; Abdullah, M.M. Placement of wind turbines using genetic algorithms. Renew. Energy 2005, 30, 259–270. [Google Scholar] [CrossRef]
  48. Prasad, T.; Park, N. Multiobjective Genetic Algorithms for Design of Water Distribution Networks. J. Water Resour. Plann. Manag. 2004, 130, 73–82. [Google Scholar] [CrossRef]
  49. Min, S.-H.; Lee, J.; Han, I. Hybrid genetic algorithms and support vector machines for bankruptcy prediction. Expert Syst. Appl. 2006, 31, 652–660. [Google Scholar] [CrossRef]
  50. Whitley, D.; Starkweather, T.; Bogart, C. Genetic algorithms and neural networks: Optimizing connections and connectivity. Parallel Comput. 1990, 14, 347–361. [Google Scholar] [CrossRef]
  51. Kaelo, P.; Ali, M.M. Integrated crossover rules in real coded genetic algorithms. Eur. J. Oper. Res. 2007, 176, 60–76. [Google Scholar] [CrossRef]
  52. Kelly, M.; Longjohn, R.; Nottingham, K. The UCI Machine Learning Repository. 2023. Available online: https://archive.ics.uci.edu (accessed on 20 September 2023).
  53. Alcalá-Fdez, J.; Fernandez, A.; Luengo, J.; Derrac, J. García, S.; Sánchez, L.; Herrera, F. KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework. J.-Mult.-Valued Log. Soft Comput. 2011, 17, 255–287. [Google Scholar]
  54. Tzimourta, K.D.; Tsoulos, I.; Bilero, I.T.; Tzallas, A.T.; Tsipouras, M.G.; Giannakeas, N. Direct Assessment of Alcohol Consumption in Mental State Using Brain Computer Interfaces and Grammatical Evolution. Inventions 2018, 3, 51. [Google Scholar] [CrossRef]
  55. Weiss, S.M.; Kulikowski, C.A. Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems; Morgan Kaufmann Publishers Inc.: Burlington, MA, USA, 1991. [Google Scholar]
  56. Quinlan, J.R. Simplifying Decision Trees. Int. J.-Man-Mach. Stud. 1987, 27, 221–234. [Google Scholar]
  57. Shultz, T.; Mareschal, D.; Schmidt, W. Modeling Cognitive Development on Balance Scale Phenomena. Mach. Learn. 1994, 16, 59–88. [Google Scholar]
  58. Zhou, Z.H.; Jiang, Y. NeC4.5: Neural ensemble based C4.5. IEEE Trans. Knowl. Data Eng. 2004, 16, 770–773. [Google Scholar]
  59. Setiono, R.; Leow, W.K. FERNN: An Algorithm for Fast Extraction of Rules from Neural Networks. Appl. Intell. 2000, 12, 15–25. [Google Scholar]
  60. Demiroz, G.; Govenir, H.A.; Ilter, N. Learning Differential Diagnosis of Eryhemato-Squamous Diseases using Voting Feature Intervals. Artif. Intell. Med. 1998, 13, 147–165. [Google Scholar]
  61. Horton, P.; Nakai, K. A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1996, 4, 109–115. [Google Scholar]
  62. Hayes-Roth, B.; Hayes-Roth, F. Concept learning and the recognition and classification of exemplars. J. Verbal Learn. Verbal Behav. 1977, 16, 321–338. [Google Scholar]
  63. Kononenko, I.; Šimec, E.; Robnik-Šikonja, M. Overcoming the Myopia of Inductive Learning Algorithms with RELIEFF. Appl. Intell. 1997, 7, 39–55. [Google Scholar]
  64. French, R.M.; Chater, N. Using noise to compute error surfaces in connectionist networks: A novel means of reducing catastrophic forgetting. Neural Comput. 2002, 14, 1755–1769. [Google Scholar] [PubMed]
  65. Dy, J.G.; Brodley, C.E. Feature Selection for Unsupervised Learning. J. Mach. Learn. Res. 2004, 5, 845–889. [Google Scholar]
  66. Perantonis, S.J.; Virvilis, V. Input Feature Extraction for Multilayered Perceptrons Using Supervised Principal Component Analysis. Neural Process. Lett. 1999, 10, 243–252. [Google Scholar]
  67. Garcke, J.; Griebel, M. Classification with sparse grids using simplicial basis functions. Intell. Data Anal. 2002, 6, 483–502. [Google Scholar]
  68. Mcdermott, J.; Forsyth, R.S. Diagnosing a disorder in a classification benchmark. Pattern Recognit. Lett. 2016, 73, 41–43. [Google Scholar]
  69. Cestnik, G.; Konenenko, I.; Bratko, I. Assistant-86: A Knowledge-Elicitation Tool for Sophisticated Users. In Progress in Machine Learning; Bratko, I., Lavrac, N., Eds.; Sigma Press: Wilmslow, UK, 1987; pp. 31–45. [Google Scholar]
  70. Elter, M.; Schulz-Wendtland, R.; Wittenberg, T. The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. Med. Phys. 2007, 34, 4164–4172. [Google Scholar] [PubMed]
  71. Little, M.; Mcsharry, P.; Roberts, S.; Costello, D.; Moroz, I. et al, Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection. BioMed Eng. OnLine 2007, 6, 23. [Google Scholar]
  72. Little, M.A.; McSharry, P.E.; Hunter, E.J.; Spielman, J.; Ramig, L.O. Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. IEEE Trans. Biomed. Eng. 2009, 56, 1015–1022. [Google Scholar]
  73. Smith, J.W.; Everhart, J.E.; Dickson, W.C.; Knowler, W.C.; Johannes, R.S. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care IEEE Computer Society Press, Minneapolis, MN, USA, 8–10 June 1988; pp. 261–265. [Google Scholar]
  74. Lucas, D.D.; Klein, R.; Tannahill, J.; Ivanova, D.; Brandon, S.; Domyancic, D.; Zhang, Y. Failure analysis of parameter-induced simulation crashes in climate models. Geosci. Dev. 2013, 6, 1157–1171. [Google Scholar]
  75. Giannakeas, N.; Tsipouras, M.G.; Tzallas, A.T.; Kyriakidi, K.; Tsianou, Z.E.; Manousou, P.; Hall, A.; Karvounis, E.C.; Tsianos, V.; Tsianos, E. A clustering based method for collagen proportional area extraction in liver biopsy images. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. 2015, 2015, 3097–3100. [Google Scholar]
  76. Hastie, T.; Tibshirani, R. Non-parametric logistic and proportional odds regression. JRSS-C (Appl. Stat.) 1987, 36, 260–276. [Google Scholar]
  77. Dash, M.; Liu, H.; Scheuermann, P.; Tan, K.L. Fast hierarchical clustering and its validation. Data Knowl. Eng. 2003, 44, 109–138. [Google Scholar]
  78. Gorman, R.P.; Sejnowski, T.J. Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets. Neural Netw. 1988, 1, 75–89. [Google Scholar]
  79. Cortez, P.; Silva, A.M.G. Using data mining to predict secondary school student performance. In Proceedings of the 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008), EUROSIS-ETI, Porto, Portugal, 9–11 April 2008; pp. 5–12. [Google Scholar]
  80. Wolberg, W.H.; Mangasarian, O.L. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc. Natl. Acad. Sci. USA 1990, 87, 9193–9196. [Google Scholar] [PubMed]
  81. Raymer, M.; Doom, T.E.; Kuhn, L.A.; Punch, W.F. Knowledge discovery in medical and biological datasets using a hybrid Bayes classifier/evolutionary algorithm. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2003, 33, 802–813. [Google Scholar]
  82. Zhong, P.; Fukushima, M. Regularized nonsmooth Newton method for multi-class support vector machines. Optim. Methods And Softw. 2007, 22, 225–236. [Google Scholar]
  83. Andrzejak, R.G.; Lehnertz, K.; Mormann, F.; Rieke, C.; David, P.; Elger, C.E. Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Phys. Rev. E 2001, 64, 061907. [Google Scholar]
  84. Tzallas, A.T.; Tsipouras, M.G.; Fotiadis, D.I. Automatic Seizure Detection Based on Time-Frequency Analysis and Artificial Neural Networks. Comput. Intell. Neurosci. 2007, 2007, 80510. [Google Scholar] [CrossRef]
  85. Koivisto, M.; Sood, K. Exact Bayesian Structure Discovery in Bayesian Networks. J. Mach. Learn. Res. 2004, 5, 549–573. [Google Scholar]
  86. Nash, W.J.; Sellers, T.L.; Talbot, S.R.; Cawthor, A.J.; Ford, W.B. The Population Biology of Abalone (_Haliotis_ Species) in Tasmania. I. Blacklip Abalone (_H. rubra_) from the North Coast and Islands of Bass Strait; Sea Fisheries Division; Technical Report No. 48; Department of Primary Industry and Fisheries, Tasmania: Hobart, Australia, 1994; ISSN 1034-3288. [Google Scholar]
  87. Brooks, T.F.; Pope, D.S.; Marcolini, A.M. Airfoil Self-Noise and Prediction. Technical Report, NASA RP-1218. July 1989. Available online: https://ntrs.nasa.gov/citations/19890016302 (accessed on 5 March 2025).
  88. Simonoff, J.S. Smooting Methods in Statistics; Springer: Berlin/Heidelberg, Germany, 1996. [Google Scholar]
  89. Yeh, I.C. Modeling of strength of high performance concrete using artificial neural networks. Cem. Concr. Res. 1998, 28, 1797–1808. [Google Scholar]
  90. Harrison, D.; Rubinfeld, D.L. Hedonic prices and the demand for clean ai. J. Environ. Econ. Manag. 1978, 5, 81–102. [Google Scholar]
  91. Friedman, J. Multivariate Adaptative Regression Splines. Ann. Stat. 1991, 19, 1–141. [Google Scholar]
  92. King, R.D.; Muggleton, S.; Lewis, R.; Sternberg, M.J.E. Drug design by machine learning: The use of inductive logic programming to model the structure-activity relationships of trimethoprim analogues binding to dihydrofolate reductase. Proc. Nat. Acad. Sci. USA 1992, 89, 11322–11326. [Google Scholar] [PubMed]
  93. Kingma, D.P.; Ba, J.L. ADAM: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA, USA, 7–9 May 2015; pp. 1–15. [Google Scholar]
  94. Powell, M.J.D. A Tolerant Algorithm for Linearly Constrained Optimization Calculations. Math. Program. 1989, 45, 547–566. [Google Scholar]
  95. Stanley, K.O.; Miikkulainen, R. Evolving Neural Networks through Augmenting Topologies. Evol. Comput. 2002, 10, 99–127. [Google Scholar] [PubMed]
  96. Storn, R.; Price, K. Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 1997, 11, 341–359. [Google Scholar]
  97. Wu, K.; Liu, Z.; Ma, N.; Wang, D. A Dynamic Adaptive Weighted Differential Evolutionary Algorithm. Comput. Neurosci. 2022, 2022, 1318044. [Google Scholar]
  98. Charilogis, V.; Tsoulos, I.G.; Tzallas, A.; Karvounis, E. Modifications for the Differential Evolution Algorithm. Symmetry 2022, 14, 447. [Google Scholar] [CrossRef]
Figure 1. The flowchart of the proposed method.
Figure 1. The flowchart of the proposed method.
Computers 14 00125 g001
Figure 2. Statistical comparison for the obtained experimental results using a variety of machine learning methods in the classification datasets.
Figure 2. Statistical comparison for the obtained experimental results using a variety of machine learning methods in the classification datasets.
Computers 14 00125 g002
Figure 3. Statistical comparison of the experimental results using a series of machine learning methods on the regression datasets.
Figure 3. Statistical comparison of the experimental results using a series of machine learning methods on the regression datasets.
Computers 14 00125 g003
Figure 4. A comparison of execution times for the WINE dataset between the simple genetic algorithm and the proposed method for various values of the parameter N I .
Figure 4. A comparison of execution times for the WINE dataset between the simple genetic algorithm and the proposed method for various values of the parameter N I .
Computers 14 00125 g004
Figure 5. Statistical comparison of the experimental results on the classification datasets, using the proposed method and a series of differential weight techniques.
Figure 5. Statistical comparison of the experimental results on the classification datasets, using the proposed method and a series of differential weight techniques.
Computers 14 00125 g005
Figure 6. Statistical comparison of the experiments on the regression datasets using the proposed method and a series of differential weight calculation techniques.
Figure 6. Statistical comparison of the experiments on the regression datasets using the proposed method and a series of differential weight calculation techniques.
Computers 14 00125 g006
Figure 7. Statistical comparison of the experimental results on the classification datasets using the proposed method and different values for the margin factor a.
Figure 7. Statistical comparison of the experimental results on the classification datasets using the proposed method and different values for the margin factor a.
Computers 14 00125 g007
Figure 8. Statistical comparison of the experiments on the regression datasets using the proposed method and different values for the margin factor a.
Figure 8. Statistical comparison of the experiments on the regression datasets using the proposed method and different values for the margin factor a.
Computers 14 00125 g008
Table 1. The values of the experimental parameters.
Table 1. The values of the experimental parameters.
ParameterMeaningValue
N g Number of generations allowed200
N c Number of chromosomes500
N i Number of generations before the application of the operator20
N l Number of chromosomes where the operator will be applied20
FDifferential Weight0.8
CRCrossover probability0.9
p s Selection rate0.1
p m Mutation rate0.05
HNumber of processing nodes10
aMargin factor1.0
Table 2. Experimental results for the classification datasets mentioned here using a series of machine learning methods. The numbers in the cells represent the average classification error as reported in the corresponding test set.
Table 2. Experimental results for the classification datasets mentioned here using a series of machine learning methods. The numbers in the cells represent the average classification error as reported in the corresponding test set.
DatasetADAMBFGSNEATGENETICPROPOSED
Alcohol57.78%41.50%66.80%39.57%24.79%
Appendicitis16.50%18.00%17.20%18.10%15.97%
Australian35.65%38.13%31.98%32.21%31.76%
Balance7.87%8.64%23.14%8.97%8.39%
Circular19.95%6.08%35.18%5.99%3.69%
Cleveland67.55%77.55%53.44%51.60%48.10%
Dermatology26.14%52.92%32.43%30.58%7.74%
Ecoli64.43%69.52%43.44%54.67%47.62%
Fert23.98%23.20%15.37%28.50%22.00%
Haberman29.00%29.34%24.04%28.66%25.99%
Hayes Roth59.70%37.33%50.15%56.18%37.00%
Heart38.53%39.44%39.27%28.34%24.79%
HouseVotes7.48%7.13%10.89%6.62%5.22%
Ionosphere16.64%15.29%19.67%15.14%9.56%
Liverdisorder41.53%42.59%30.67%31.11%31.08%
Lymography29.26%35.43%33.70%23.26%28.60%
Mammographic46.25%17.24%22.85%19.88%16.98%
Parkinsons24.06%27.58%18.56%18.05%18.02%
Pima34.85%35.59%34.51%32.19%30.44%
Popfailures5.18%5.24%7.05%5.94%4.29%
Regions229.85%36.28%33.23%29.39%26.43%
Saheart34.04%37.48%34.51%34.86%32.60%
Segment49.75%68.97%66.72%57.72%30.00%
Sonar30.33%25.85%34.10%22.40%18.78%
Spiral47.67%47.99%48.66%48.66%44.20%
Statheart44.04%39.65%44.36%27.25%22.72%
Student5.13%7.14%10.20%5.61%4.16%
Wdbc35.35%29.91%12.88%8.56%7.73%
Wine29.40%59.71%25.43%19.20%8.55%
Z_F_S47.81%39.37%38.41%10.73%6.46%
ZO_NF_S47.43%43.04%43.75%21.54%6.01%
ZONF_S11.99%15.62%5.44%2.60%1.79%
ZOO14.13%10.70%20.27%16.67%9.07%
AVERAGE32.70%33.01%31.16%25.48%20.02%
Table 3. Experimental results for the regression datasets using a series of machine learning methods. The numbers in the cells stand for the average regression error as calculated in the corresponding test set.
Table 3. Experimental results for the regression datasets using a series of machine learning methods. The numbers in the cells stand for the average regression error as calculated in the corresponding test set.
DatasetADAMBFGSGENETICNEATPROPOSED
Abalone4.305.697.179.884.33
Airfoil0.0050.0030.0030.0670.003
Baseball77.90119.63103.60100.3967.45
BK0.030.280.0270.150.02
BL0.282.555.740.050.002
Concrete0.0780.0660.00990.0810.003
Dee0.632.361.0131.5120.20
Housing80.2097.3843.2656.4926.62
Friedman22.901.2631.24919.351.33
FY0.0380.220.650.080.039
HO0.0350.622.780.1690.014
Laser0.030.0150.590.0840.0027
LW0.0282.981.900.170.016
MB0.060.1293.390.0610.048
Mortgage9.248.232.4114.110.31
Plastic11.7120.322.7920.772.20
PL0.1170.290.280.0980.023
PY0.090.578105.410.0750.016
Quake0.060.420.0400.2980.043
SN0.2060.402.950.1740.024
Stock180.89302.433.88215.823.47
Treasury11.169.912.92915.520.44
VE0.3591.922.430.0450.023
AVERAGE17.4125.1212.8019.804.64
Table 4. Experiments for classification datasets using a series of differential weight mechanisms.
Table 4. Experiments for classification datasets using a series of differential weight mechanisms.
DatasetFIXEDADAPTIVERANDOM
Alcohol24.79%25.65%23.02%
Appendicitis15.97%15.83%15.80%
Australian31.76%31.61%31.60%
Balance8.39%8.45%8.65%
Circular3.69%3.67%3.78%
Cleveland48.10%48.39%48.35%
Dermatology7.74%7.27%7.37%
Ecoli47.62%48.34%47.96%
Fert22.00%22.17%22.10%
Haberman25.99%26.20%26.12%
Hayes Roth37.00%38.38%37.65%
Heart24.79%24.15%25.51%
HouseVotes5.22%5.21%4.78%
Ionosphere9.56%9.28%9.30%
Liverdisorder31.08%31.46%31.11%
Lymography28.60%28.95%27.88%
Mammographic16.98%17.02%17.18%
Parkinsons18.02%17.86%17.90%
Pima30.44%31.12%30.48%
Popfailures4.29%4.25%4.28%
Regions226.43%25.94%26.35%
Saheart32.60%33.11%32.92%
Segment30.00%28.83%30.85%
Sonar18.78%18.08%18.70%
Spiral44.20%44.21%44.12%
Statheart22.72%22.72%23.41%
Student4.16%3.95%4.35%
Wdbc7.73%7.48%7.40%
Wine8.55%6.29%6.74%
Z_F_S6.46%6.92%6.65%
ZO_NF_S6.01%6.10%5.89%
ZONF_S1.79%1.71%1.76%
ZOO9.07%6.57%8.90%
AVERAGE20.02%19.91%19.97%
Table 5. Experiments on regression datasets using a variety of weight mechanism methods.
Table 5. Experiments on regression datasets using a variety of weight mechanism methods.
DatasetFIXEDADAPTIVERANDOM
ABALONE4.334.244.34
AIRFOIL0.0030.0030.003
BASEBALL67.4567.2366.76
BK0.020.020.02
BL0.0020.0020.002
CONCRETE0.0030.0030.003
DEE0.200.200.20
HOUSING26.6226.0726.11
FRIEDMAN1.331.211.34
FY0.0390.0390.039
HO0.0140.0140.014
LASER0.00270.00270.0028
LW0.0160.0110.011
MB0.0480.0480.048
MORTGAGE0.310.350.34
PLASTIC2.202.132.20
PL0.0230.0220.022
PY0.0160.0170.017
QUAKE0.0430.0820.04
SN0.0240.0240.024
STOCK3.473.333.45
TREASURY0.440.420.45
VE0.0230.0230.023
AVERAGE4.644.594.59
Table 6. Experiments for classification datasets using a series of values for the margin factor a.
Table 6. Experiments for classification datasets using a series of values for the margin factor a.
Dataset a = 1.0 a = 1.5 a = 2.0 a = 2.5 a = 3.0 a = 3.5 a = 4.0
Alcohol24.79%22.88%25.55%30.21%30.77%32.05%28.42%
Appendicitis15.97%16.33%18.07%18.70%17.30%18.60%19.51%
Australian31.76%32.79%33.23%32.65%32.87%32.76%33.68%
Balance8.39%8.50%8.39%8.84%8.85%8.87%8.78%
Circular3.69%4.19%4.22%4.30%4.33%4.46%4.70%
Cleveland48.10%47.17%46.68%47.58%50.84%50.62%47.74%
Dermatology7.74%7.47%7.60%7.57%7.71%8.86%7.83%
Ecoli47.62%48.69%48.29%49.50%51.39%51.99%49.39%
Fert22.00%23.47%23.50%25.03%24.80%24.17%24.27%
Haberman25.99%26.26%26.74%27.23%27.49%27.67%27.14%
Hayes Roth37.00%39.15%39.69%35.49%38.10%39.15%40.05%
Heart24.79%24.64%24.85%25.50%26.85%26.46%24.89%
HouseVotes5.22%4.91%5.05%4.90%4.57%4.71%5.02%
Ionosphere9.56%9.65%9.92%9.72%9.75%9.57%10.23%
Liverdisorder31.08%31.94%31.42%30.89%31.54%30.92%32.13%
Lymography28.60%28.24%28.79%30.21%29.64%30.71%26.69%
Mammographic16.98%16.34%16.35%15.86%15.88%16.35%16.69%
Parkinsons18.02%18.88%18.56%17.98%17.74%17.95%19.40%
Pima30.44%30.89%31.11%32.81%32.88%32.83%31.03%
Popfailures4.29%5.07%5.53%5.51%5.54%5.97%5.97%
Regions226.43%26.53%26.30%26.52%26.28%26.26%25.35%
Saheart32.60%31.43%32.98%33.52%33.49%33.46%32.93%
Segment30.00%27.98%30.86%31.39%33.76%34.51%35.41%
Sonar18.78%21.08%22.23%21.47%21.75%21.57%21.68%
Spiral44.20%44.65%44.52%43.81%43.58%43.39%44.13%
Statheart22.72%23.40%23.85%24.22%25.56%26.10%24.22%
Student4.16%4.75%5.24%5.38%5.43%5.55%5.37%
Wdbc7.73%6.95%6.64%7.14%7.29%7.17%7.31%
Wine8.55%6.59%9.35%9.47%8.12%9.65%11.02%
Z_F_S6.46%6.66%6.90%6.92%6.53%6.87%6.61%
ZO_NF_S6.01%6.03%6.08%6.95%7.17%7.28%6.25%
ZONF_S1.79%1.75%1.80%2.15%2.14%2.13%1.79%
ZOO9.07%8.63%7.00%7.87%6.77%6.97%7.53%
AVERAGE20.02%20.12%20.52%20.83%21.11%21.38%21.00%
Table 7. Experiments for regression datasets using a variety of values for the margin factor a.
Table 7. Experiments for regression datasets using a variety of values for the margin factor a.
Dataset a = 1.0 a = 1.5 a = 2.0 a = 2.5 a = 3.0 a = 3.5 a = 4.0
ABALONE4.334.394.464.644.844.935.06
AIRFOIL0.0030.0030.0020.0020.0030.0030.002
BASEBALL67.4574.6679.7879.3981.5584.6586.54
BK0.020.0190.0190.0180.0170.0180.02
BL0.0020.0010.0010.00070.00090.00080.003
CONCRETE0.0030.0030.0030.0040.0030.0040.003
DEE0.200.200.200.210.200.210.20
HOUSING26.6226.2428.1327.9127.3629.5329.25
FRIEDMAN1.331.191.201.211.221.211.20
FY0.0390.040.0420.0410.0420.0420.045
HO0.0140.0140.0130.0130.0130.0140.014
LASER0.00270.00250.00250.00280.00260.00250.0024
LW0.0160.0110.0110.0130.0130.0140.013
MB0.0480.0490.0510.0510.0520.0540.09
MORTGAGE0.310.210.370.480.630.680.62
PLASTIC2.202.052.182.232.252.232.21
PL0.0230.0230.0210.0220.0220.0220.022
PY0.0160.0220.0230.0260.0280.0270.028
QUAKE0.0430.0380.040.0390.0370.0370.039
SN0.0240.0240.0250.0240.0260.0240.026
STOCK3.473.593.683.703.373.243.35
TREASURY0.440.420.400.650.820.951.01
VE0.0230.0240.0240.0250.0260.0270.029
AVERAGE4.644.925.255.255.335.565.64
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tsoulos, I.G.; Charilogis, V.; Tsalikakis, D. Introducing a New Genetic Operator Based on Differential Evolution for the Effective Training of Neural Networks. Computers 2025, 14, 125. https://doi.org/10.3390/computers14040125

AMA Style

Tsoulos IG, Charilogis V, Tsalikakis D. Introducing a New Genetic Operator Based on Differential Evolution for the Effective Training of Neural Networks. Computers. 2025; 14(4):125. https://doi.org/10.3390/computers14040125

Chicago/Turabian Style

Tsoulos, Ioannis G., Vasileios Charilogis, and Dimitrios Tsalikakis. 2025. "Introducing a New Genetic Operator Based on Differential Evolution for the Effective Training of Neural Networks" Computers 14, no. 4: 125. https://doi.org/10.3390/computers14040125

APA Style

Tsoulos, I. G., Charilogis, V., & Tsalikakis, D. (2025). Introducing a New Genetic Operator Based on Differential Evolution for the Effective Training of Neural Networks. Computers, 14(4), 125. https://doi.org/10.3390/computers14040125

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop