Next Article in Journal
Patient-Specific Hyperparameter Optimization of a Deep Learning-Based Tumor Autocontouring Algorithm on 2D Liver, Prostate, and Lung Cine MR Images: A Pilot Study
Previous Article in Journal
A Comparative Study of Machine Learning Techniques for Cell Annotation of scRNA-Seq Data
Previous Article in Special Issue
Unleashing the Power of Tweets and News in Stock-Price Prediction Using Machine-Learning Techniques
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Utilizing a Bounding Procedure Based on Simulated Annealing to Effectively Locate the Bounds for the Parameters of Radial Basis Function Networks

by
Ioannis G. Tsoulos
1,*,
Vasileios Charilogis
1 and
Dimitrios Tsalikakis
2
1
Department of Informatics and Telecommunications, University of Ioannina, 45110 Ioannina, Greece
2
Department of Engineering Informatics and Telecommunications, University of Western Macedonia, 50100 Kozani, Greece
*
Author to whom correspondence should be addressed.
Algorithms 2025, 18(4), 234; https://doi.org/10.3390/a18040234
Submission received: 26 March 2025 / Revised: 17 April 2025 / Accepted: 17 April 2025 / Published: 18 April 2025
(This article belongs to the Special Issue Recent Advances in Algorithms for Swarm Systems)

Abstract

:
Radial basis function (RBF) networks are an established parametric machine learning tool which has been extensively utilized in data classification and data fitting problems. These specific machine learning tools have been applied in various scientific areas, such as problems in physics, chemistry, and medicine, with excellent results. A two-step technique is usually used to adjust the parameters of these models, which is in most cases extremely effective. However, it does not effectively explore the value space of the network parameters and often results in parameter stability problems. In this paper, the use of a bounding technique that explores the value space of the parameters of these networks using intervals generated by a procedure based on the Simulated Annealing method is recommended. After finding a promising range of values for the network parameters, a genetic algorithm is applied within this range of values to more effectively adjust its parameters. The new method was applied on a wide range of classification and regression datasets from the relevant literature and the results are reported in the current manuscript.

1. Introduction

A wide range of practical problems can be considered as classification or regression problems. Such problems occur in areas such as physics [1,2], astronomy [3,4], chemistry [5,6], medicine [7,8], economics [9,10], etc. A machine learning tool commonly used to tackle such problems is the radial basis function (RBF) network, expressed as the following function:
R x = i = 1 k w i ϕ x c i
In this equation, the following definitions are used for the symbols:
  • The vector x represents the input pattern with d number of features.
  • The parameter k stands for the number of weights of the network. These weights are represented by the values w i , i = 1 , , k .
  • The vectors c i , i = 1 , , k represent the so-called centers of the network.
  • The function ϕ ( x ) commonly is a Gaussian function with the following definition:
    ϕ ( x ) = exp x c 2 σ 2
A typical plot of the Gaussian function for c = 0 , σ = 1 is depicted in Figure 1. From this graph it is deduced that the Gaussian RBF decreases regarding the distance from c. The parameters c and σ can be estimated using the K-Means algorithm, introduced by MacQueen [11]. The training error of an RBF network is defined as
E R x = i = 1 M R x i y i 2
where x i , y i , i = 1 , , M represents the training set of the objective problem and the values y i are the expected outputs for every pattern x i .
RBF networks have been incorporated in various cases, such as face recognition [12], solutions of differential equations [13,14], stock prediction [15], robotics [16,17], network security [18,19], etc.
Due to the widespread use of these networks, several papers have been presented in recent years that study their basic characteristics. For example, Benoudjit et al. [20] presented a discussion on kernel widths in RBF networks. Similarly, Oyang et al. [21] presented a novel method for the estimation of the kernel density. Moreover, Ros et al. [22] proposed an automatic method for the initialization of the parameters of RBF networks. Furthermore, a variety of pruning techniques have also been proposed [23,24], used to efficiently reduce the number of processing units of RBF networks in order to avoid the overfitting problem. Also, for the effective training of RBF networks a variety of methods has been proposed in the recent literature, such as the incorporation of genetic algorithms [25,26], usage of the Particle Swarm Optimization (PSO) method [27,28], the usage of methods based on Differential Evolution [29], etc. Furthermore, since there has been an extremely large spread of parallel programming techniques in recent decades, publications have also appeared that exploit such techniques for the efficient and fast training of the above networks [30,31].
In most cases, RBF networks are trained using a two-stage technique. In the first stage, the centers and variances are estimated using the K-means technique. In the second stage, a linear system is solved to find the weights of the Gaussian units. Although the above procedure is extremely fast, it cannot effectively explore the value space for the network parameters and in many cases numerical stability problems occur when solving the linear system. This paper proposes a three-stage method for the efficient training of RBF networks. In the first stage, an initial estimate of the centers and fluctuations of the RBF network is made using the K-means technique. Subsequently, an interval generation method based on Simulated Annealing [32] is used to efficiently identify the optimal interval of network parameter values. The creation of the value interval is performed within a value interval that is based on the initial estimate of the network parameters made in the first phase of the method. In the last phase of the proposed technique, a genetic algorithm is used to train the parameters of the machine learning model within the optimal value interval resulting from the second stage of the method.
The rest of this article is divided as follows: In Section 2 the proposed method is presented in detail; in Section 3 the used datasets as well as the conducted experiments are presented; and finally, in Section 4 some conclusions are discussed as well as some guidelines for improvements of the current work.

2. Method Description

In this section, a detailed presentation of the three phases of the proposed technique is made using appropriate algorithms.

2.1. The First Phase of the Proposed Method

A typical diagram of an RBF network is depicted in Figure 2. Every RBF network with k weights has the following sets of parameters:
  • A series of vectors c i , i = 1 , , k that represent the centers of the network. Each center has d values.
  • Each Gaussian unit has an additional parameter σ i that corresponds to the variance of this unit.
  • The set of output weights w with k values.
Provided that the input vector x has d values, the total number of parameters of an RBF network is calculated as
n = ( d + 2 ) × k
For the determination of initial values for the centers and variances, the K-means algorithm is utilized here and it is presented in Algorithm 1.
Algorithm 1 Description of the K-means algorithm.
  • Initialization
    (a)
    The input of the algorithm is the points x i , i = 1 , , M belonging to the train set of the objective problem,
    (b)
    Define as k the number of centers.
    (c)
    Set  S j = { } , j = 1 , , k .
  • Repeat
    (a)
    For each point x i , i = 1 , , M do
    i.
    Set  j * = argmin m = 1 k D x i , c m . The index j * denotes the nearest center for point x i . The function D ( x , y ) represents the Euclidean distance.
    ii.
    Set  S j * = S j * x i .
    iii.
    End For
    (b)
    For every center c j , j = 1 k do
    i.
    Set  M j as the number of points in S j
    ii.
    Update the center c j with the following equation:
    c j = 1 M j x i S j x i
    (c)
    End For
  • The algorithm terminates when c j no longer changes
  • The output of the algorithm are the c i , i = 1 , , k centers and the corresponding variances σ i , i = 1 , , k
When this process is finished, the vector z with the calculated parameters is formed using the function calculateZ(), described in Algorithm 2.
Algorithm 2 The algorithm used for the calculation of the initial set of parameters for the RBF network.
Function calculateZ( c , σ , w 0 )
Inputs: The initial calculated values for centers c , the initial values for variances σ and a positive value w 0 .
  • Set  l = 1
  • For  i = 1 , , k  do
    (a)
    For  j = 1 , , d  do
    i.
    Set  z l = c i , j
    ii.
    Set  l = l + 1
    (b)
    End For
  • End For
  • For  i = 1 , , k do
    (a)
    Set  z l = σ i
    (b)
    Set  l = l + 1
  • End For
  • For  i = 1 , , k do
    (a)
    Set  z l = w 0 , where w 0 a positive double precision number.
    (b)
    Set  l = l + 1
  • End For
  • Return the vector z .
  • End function
The layout of the vector z is outlined in Figure 3. In this layout, the initial values of the centers are entered at the beginning of the vector, then the initial values for the variances are entered, and finally the initial values for the network weights.

2.2. The Second Phase of the Proposed Method

In the second phase of the algorithm, the value interval for the network parameters is constructed using a technique based on Simulated Annealing. Simulated Annealing is an optimization procedure used in a variety of cases, such as resource allocation [33], portfolio problems [34], energy problems [35], biology [36], etc. The algorithm starts from a large value of a factor called temperature, which is gradually decreased. For large values of temperature, the algorithm performs a wide exploration of the search space, and at low values it focuses around some minimum of the objective function. The Simulated Annealing variant used here minimizes interval functions denoted as
f x = f min x , f max x
Also, in order to compare two intervals a = a 1 , a 2 and b = b 1 , b 2 , the operator D ( a , b ) is incorporated with the following definition:
D ( a , b ) = TRUE , a 1 < b 1 OR a 1 = b 1 AND a 2 < b 2 FALSE , otherwise
The method used below assumes that there are value intervals for the network parameters. The main steps of the method used in the second phase are shown as a procedure in Algorithm 3. The calculation of the functional values for such value intervals is performed using Algorithm 4.
The algorithm starts from an initial value interval, which is based on the result of the first phase of the overall process. At each iteration it generates random value intervals that are close to the current value interval. If a value interval with a smaller functional value than the current one is presented, it is accepted; otherwise it will be accepted with a probability that is high for large temperature values and decreases significantly as the temperature value drops. This means that the algorithm makes a wide exploration of the search space for high temperature values and centers around some minimum as the temperature value decreases.
Algorithm 3 The main steps of the proposed Simulated Annealing variant.
Function simanMethod z , F , T 0 , r T , N eps , a
Inputs: The vector z with the initial set of parameters, the limit factor F > 1 , the initial temperature T 0 , the reduction factor for temperature r T , r T > 0 , r T < 1 , the number of samples taken at every iteration denoted as N eps and a > 0 the perturbation factor.
  • Construct the bound vectors L * , R * using the vector z of the first phase as follows:
    L i * = F × z i , i = 1 , , n R i * = F × z i , i = 1 , , n
    The parameter F is a positive number greater than 1 and is used as a limiting factor for the network parameters.
  • Set k = 0
  • Set  x b = L * , R * , y b = fitness L * , R * , N s . This function utilizes the function fitness() outlined in Algorithm 4.
  • Set  x c = x b , y c = y b
  • For  i = 1 , , N eps  do
    (a)
    Produce a random sample x t = L t , R t using the procedure of Algorithm 5 using as inputs to the function the interval x c = L c , R c and the double precision number a.
    (b)
    Set  y t = fitness L t , R t , N s which calls the function provided in Algorithm 4.
    (c)
    If  D y t , y c = TRUE  then Set  x c = x t , y c = y t
    (d)
    Else Set  x c = x t with probability min 1 , exp f t f c T k
    (e)
    End if
    (f)
    If  D x c , x b = TRUE  then Set  x b = x c , y b = y c
  • End For
  • Set  T k + 1 = T k r T
  • Set  k = k + 1 .
  • If  T k ϵ  stop and return x b = L b , R b as the best located interval of bounds.
  • Go to step 5.
End function
Algorithm 4 The algorithm used to calculate the function value for intervals of parameters.
function fitness L , R , N s
  • Create the set T = r 1 , r 2 , , r N s with N s random samples in L , R .
  • Set  E min = , E max =
  • For  i = 1 , , N s  do
    (a)
    Create the RBF network R i x using as parameter set the corresponding sample r i T .
    (b)
    Calculate the training error f R i = j = 1 M R i x j y j 2 for the training set of the objective problem.
    (c)
    If  f R i E min  then  E min = f R i
    (d)
    If  f R i E max  then  E max = f R i
  • End For
  • Return the interval E min , E max .
End function
Algorithm 5 The sampling function used in the Simulated Annealing algorithm.
Function sample L , R , a
  • For  i = 1 , , n  do
    (a)
    Set  m = L i + R i L i 2
    (b)
    Set  L i x = L i + a r 1 m L i , where r 1 [ 0 , 1 ] a random value.
    (c)
    Set R i x = R i a r 2 R i m , where r 2 [ 0 , 1 ] a random value.
  • End For
  • Return  x = L x , R x the located interval.
End function

2.3. The Final Phase of the Proposed Method

In the last phase of the proposed procedure, a genetic algorithm is executed to train the RBF network. The training is performed within the optimal value interval identified in the previous phase of the proposed procedure. The main steps of the proposed genetic algorithm are shown in Algorithm 6.
Algorithm 6 The final phase of the proposed method.
Procedure finalPhase N c , N g , p s , p m , L b , R b
Inputs: Number of chromosomes N c , number of generations N g , selection rate p s , mutation rate p m , the bounds the bounds L b , R b of previous phase.
  • Initialize the chromosomes g i , i = 1 , , N c as random vectors inside the bounds L b , R b of the second phase of the suggested algorithm.
  • Set  k = 0
  • For  i = 1 , , N c  do
    (a)
    Create the RBF network R i x using as parameters the chromosome g i .
    (b)
    Calculate the corresponding fitness value f i = j = 1 M R i x j y j 2
  • End For
  • Selection procedure. The  p s × N c best chromosomes with the lowest fitness values are copied intact to the next generation. The remaining chromosomes will be replaced by new chromosomes produced in crossover and mutation procedure.
  • Crossover procedure. During the application of crossover, pairs of chromosomes are chosen from the current population with tournament selection. For every pair ( z , w ) of parents two new chromosomes z ˜ and w ˜ are will be produced using the following equations:
    z i ˜ = a i z i + 1 a i w i w i ˜ = a i w i + 1 a i z i
    where i = 1 , , n and a i are randomly selected values with a i [ 0.5 , 1.5 ] [37].
  • Mutation procedure. For every element of each chromosome a random number r [ 0 , 1 ] is produced. This element will be altered randomly when r p m .
  • Set  k = k + 1
  • If  k < N g  then go to step 3.
  • Obtain the best chromosome g b from the genetic population, with the lowest fitness value.
  • Create the corresponding RBF network R b x
  • Apply  R b x to the corresponding test set and measure the associated test error and Terminate.
End procedure

3. Experiments

A series of classification and regression datasets from various websites was incorporated to test the proposed method and to measure its reliability. The datasets used in the conducted experiments were downloaded from the following online databases:

3.1. Experimental Datasets

In the conducted experiments, the following classification datasets were used:
  • The Alcohol dataset, used in various experiments regarding alcohol consumption [40].
  • The Appendicitis dataset [41].
  • The Australian dataset, used in various bank transactions [42].
  • The Balance dataset, used in a series of psychological experiments [43].
  • The Circular dataset, which is an artificial dataset.
  • The Cleveland dataset, which is a medical dataset [44,45].
  • The Dermatology dataset, obtained from various dermatology problems [46].
  • The Hayes-Roth dataset [47].
  • The Heart dataset, which is a medical dataset regarding heart diseases [48].
  • The HeartAttack dataset, used in the prediction of heart attacks.
  • The HouseVotes dataset, that contains data from congressional voting in the USA [49].
  • The Ionosphere dataset, obtained from various experiments in the ionosphere [50,51].
  • The Liverdisorder dataset, which is a medical dataset [52,53].
  • The Lymography dataset [54].
  • The Mammographic dataset, which is a medical dataset [55].
  • The Parkinsons dataset, used for the detection of Parkinson’s disease [56,57].
  • The Phoneme datasets, obtained from various sound experiments.
  • The Pima dataset, a medical dataset used in the diabetes studies [58].
  • The Popfailures dataset, related to measurements from climate model simulations [59].
  • The Regions2 dataset, which is a medical dataset [60].
  • The Saheart dataset, which is a medical dataset about heart disease [61].
  • The Segment dataset, related to issues regarding image processing [62].
  • The Sonar dataset, related to sonar signals [63].
  • The Spiral dataset, which is an artificial dataset.
  • The StatHeart dataset, a medical dataset related to heart disease.
  • The Student dataset, derived from various experiments in schools [64].
  • The Transfusion dataset [65].
  • The WDBC dataset, which is a medical dataset about the detection of cancer [66].
  • The Wine dataset, related to the detection of the quality of wines [67,68].
  • The EEG dataset, which is obtained from various EEG measurements [69,70]. The cases Z_F_S, Z_O_N_F_S, ZO_NF_S, and ZONF_S were used from this dataset.
  • The ZOO dataset, used for animal classification [71].
Moreover, the following regression datasets were obtained for the conducted experiments:
  • The Abalone dataset, that contains data related to the age of abalones [72].
  • The Airfoil dataset, obtained from NASA [73].
  • The Auto dataset, used to estimate the fuel consumption in cars.
  • The Baseball dataset, related to the estimation of the salary of baseball players.
  • The BK dataset, used for the prediction of points in basketball games [74].
  • The BL dataset, used in some electricity experiments.
  • The Concrete dataset, derived from civil engineering [75].
  • The Dee dataset, used for the estimation of the price of electricity.
  • The Housing dataset, used to estimate the price of houses [76].
  • The Friedman database [77].
  • The FA dataset, which is related to fat measurements.
  • The FY dataset, that contains data regarding fruit flies.
  • The HO dataset, that was derived from the STATLIB repository.
  • The Laser dataset, used in various laser experiments.
  • The MB dataset, derived from smoothing methods in statistics.
  • The Mortgage dataset, that contains economic measurements.
  • The NT dataset [78].
  • The Plastic dataset, that contains measurements from experiments conducted related to pressure in plastics.
  • The PY dataset [79].
  • The PL dataset, downloaded from the STATLIB repository.
  • The Quake dataset, related to the estimation of the strength of earthquakes.
  • The SN dataset, that contains measurements about trellising and pruning.
  • The Stock dataset, which is an economic dataset about the prediction of the price of stocks.
  • The Treasury dataset, which contains economic measurements.

3.2. Experimental Results

The code used in the experiments was implemented in the C++ programming language and a machine equipped with 128 GB RAM running Debian Linux was utilized in the conducted experiments. The code was written with the assistance of the freely available GlobalOptimus optimization environment, that can be downloaded from https://github.com/itsoulos/GlobalOptimus (accessed on 12 April 2025). Each experiment was executed 30 times and the average classification error was measured for the classification datasets and the average regression error was measured for the regression datasets. The classification error was computed using the following equation:
E C R ( x ) = 100 × i = 1 K class R x i y i K
where the set T = x i , y i , i = 1 , , K stands for the test set of the current problem and R ( x ) is the RBF model. The regression error was calculated through the following equation:
E R R ( x ) = i = 1 K R x i y i 2 K
The values for the parameters of the proposed method are mentioned in Table 1. The selection of values for the experimental parameters was performed in such a way that there was a compromise between the speed and reliability of the proposed methodology. In the following tables, that describe the experimental results, the following notation is used:
  • The column Dataset represents the name of the objective problem.
  • The column BFGS denotes the application of the BFGS optimization method [80] in the training of a neural network [81,82] with 10 processing nodes.
  • The column ADAM stands for the incorporation of the ADAM optimizer [83] to train an artificial neural network with 10 processing nodes.
  • The column NEAT represents the usage of the NEAT method (NeuroEvolution of Augmenting Topologies) [84].
  • The column RBF-KMEANS stands for the usage of the original two-phase method to train an RBF network with 10 processing nodes.
  • The column GENRBF represents the incorporation of the method proposed in [85] to train an RBF network with 10 processing nodes.
  • The column Proposed denotes the usage of the proposed method to train an RBF network with 10 processing nodes.
  • The row Average represents the average classification or regression error.
  • The row W-average denotes the average classification error for all datasets and for each method. In this average, each individual classification error was multiplied by the number of patterns for the corresponding dataset.
The results from the application of the previously mentioned machine learning methods to the classification datasets are depicted in Table 2, and for the regression datasets the results are presented in Table 3.
Table 2 presents the error rates of the various machine learning models (BFGS, ADAM, NEAT, RBF-KMEANS, GENRBF, proposed) on the different classification datasets. Each row corresponds to a dataset, while each column represents the error rate of a specific model. These values indicate the percentage of incorrect predictions, with lower values reflecting better performance. The last row of the table includes the average error rates for each model. Statistical analysis of the data reveals significant insights. The proposed model exhibits the lowest average error rate (18.67%) compared to the other models, establishing it as the optimal choice based on the table. Conversely, the other models demonstrate higher average error rates, with GENRBF showing the highest average error (34.64%). Additionally, significant variations in error rates across datasets are observed. For instance, on the “Circular” and “ZONF_S” datasets, the proposed model outperforms others, with very low error rates (4.19% and 1.79%, respectively). Conversely, on datasets like “Cleveland”, the NEAT model shows a lower error rate (53.44%) compared to the proposed model (50.82%). Notably, in certain datasets, the performance of the proposed model is significantly inferior to other models. For example, on the “Alcohol” and “Z_F_S” datasets, the proposed model exhibits much higher error rates compared to other models. This indicates that while the proposed model generally has the lowest average error rate, its performance may not be consistent across all datasets. In conclusion, the proposed model emerges as the best general choice for minimizing error rates, though its evaluation depends on the characteristics of each dataset. The performance differences among models highlight the need for careful model selection depending on the application.
Also, the average execution time for each machine learning technique that was applied to the classification datasets is depicted in Figure 4.
As expected, the proposed technique requires significantly more execution time than all the other techniques in the set, since it consists of the serial execution of global optimization techniques. Moreover, the method entitled GENRBF also required a significant amount of time with respect to other simpler methods in the set. The additional time required by the proposed technique can of course be significantly reduced by the use of parallel processing techniques in its various stages, such as, for example, the use of parallel Simulated Annealing techniques [86]. Moreover, Figure 5 depicts a comparison of the error rates across the models for all classification datasets involved in the conducted experiments.
Table 3 displays the absolute error values resulting from the application of the various machine learning models (BFGS, ADAM, NEAT, RBF-KMEANS, GENRBF, proposed) on the regression datasets. Each row corresponds to a dataset, while each column shows the error of a specific model. The last row records the average error for each model. Lower error values indicate better model performance. The analysis shows that the proposed model has the lowest average error (5.48), making it the most efficient choice among the available models. The second-best model is RBF-KMEANS, with an average error of 9.19, while other models, such as BFGS (26.43) and ADAM (19.62), exhibit significantly higher error values. The performance of the proposed model is particularly impressive on datasets such as BL, where its error is nearly negligible (0.0002), and Mortgage, where it has a very low error (0.14) compared to other models. On datasets like Stock and Plastic, where errors are high across all models, the proposed model still outperforms the other models, with error values of 1.53 and 2.29, respectively. However, there are instances where the performance difference of the proposed model relative to others is small or even unfavorable. For example, on the Laser dataset, the ADAM model has an error of 0.03, slightly higher than the proposed model’s 0.003, while on the HO dataset, the proposed model performs better (0.01), but the RBF-KMEANS model is comparably close (0.03). In summary, the proposed model achieves the lowest average error and the most consistent performance across most datasets, making it an ideal choice for regression problems. Nonetheless, certain models, such as RBF-KMEANS, may demonstrate competitive performance in specific cases, suggesting that model selection depends on the unique characteristics of each dataset.
An analysis of significance levels for the classification datasets, as illustrated in Figure 6, reveals that the proposed model statistically significantly outperforms all other models in every comparison pair. Specifically, the p-values indicate strong statistical differences: Proposed vs. BFGS p = 10 8 , proposed vs. ADAM p = 6.2 × 10 8 , proposed vs. NEAT p = 2.9 × 10 9 , proposed vs. RBF-KMEANS p = 1.2 × 10 6 , and proposed vs. GENRBF p = 1.2 × 10 10 . These values suggest that the proposed model is significantly better than the others with high reliability.
In Figure 7, which concerns the regression datasets, a similar pattern is observed, though the p-values are generally higher compared to the classification datasets. The proposed model demonstrates statistically significant superiority over the other models in all comparison pairs: Proposed vs. BFGS ( p = 0.00011 ), proposed vs. ADAM ( p = 0.015 ), proposed vs. NEAT ( p = 0.00016 ), proposed vs RBF-KMEANS ( p = 0.0016 ), and proposed vs. GENRBF ( p = 0.00049 ). Although the significance is not as strong as in the classification datasets, the proposed model’s superiority remains clear.

3.3. Experiments on the Perturbation Factor a

In order to determine the stability of the proposed technique, another experiment was performed in which the perturbation factor a, presented in the second stage of the proposed technique, took a series of different values. Table 4 presents the error rates of the proposed machine learning model for three different values of the perturbation factor a (0.001, 0.005, 0.01) across various classification datasets. Each row represents a dataset, and the values indicate the model’s error rate for each value of a. The last row includes the average error rate for each value of a. Analysis of the data shows that the smallest value of a (0.001) achieves the lowest average error rate (18.67%), while the largest value (0.01) results in the highest average (19.06%). This suggests that the model generally performs better with smaller values of a, although the difference in averages is minimal. At the dataset level, there are cases where the model’s performance is significantly affected by changes in the parameter. For instance, on the “Lymography” dataset, increasing a from 0.001 to 0.01 leads to a significant increase in the error rate, from 20.64% to 30.33%. A similar trend is observed on the “ZOO” dataset, where the error rate rises from 4.50% to 6.87% for a = 0.005 , but decreases again to 4.60% for a = 0.01 . On the other hand, on datasets like “ZO_NF_S”, the error remains unchanged at 3.63%, regardless of changes in a. Datasets such as “Z_F_S” and “ZONF_S” exhibit nonlinear behavior. On “Z_F_S”, the error rate significantly decreases from 3.16% to 2.79% as a increases from 0.001 to 0.01, while on “ZONF_S”, a similar decrease is observed from 1.79% to 1.74%. In conclusion, the analysis indicates that the perturbation factor a has a notable impact on the performance of the proposed model. Smaller values of a are generally associated with better performance; however, the optimal value may depend on the characteristics of each dataset. Instances where error rates increase or decrease nonlinearly with changes in “a” suggest the need for further investigation into the tuning of “a” for specific applications.
Table 5 displays the absolute error values of the proposed machine learning model across various regression datasets for three different values of the perturbation factor a (0.001, 0.005, 0.01). Data analysis reveals that the parameter a = 0.005 yields the lowest average error (4.92), while the values a = 0.001 and a = 0.01 result in slightly higher averages (5.48 and 5.13, respectively). This difference indicates that 0.005 is generally the most suitable value for the model, ensuring better performance in most cases. At the dataset level, the impact of a varies. Some datasets, such as “Airfoil”, “Concrete”, “Dee”, “HO”, “Laser”, “NT”, “PL”, “Plastic”, and “Quake”, show no change in error with variations in a as the error values remain constant. In contrast, other datasets exhibit significant variations. For example, on the “Baseball” dataset, the error decreases from 86.19 for a = 0.001 to 77.46 for a = 0.005 then increases again to 81.97 for a = 0.01 . Similarly, on the “MB” dataset, the error drastically decreases from 5.49 for a = 0.001 to 0.56 for a = 0.005 and further to 0.48 for a = 0.01 . On datasets like “FA” and “FY”, the error increases as a changes from 0.001 to 0.005, then decreases again for a = 0.01 . On the “Treasury” dataset, the error shows a slight decline as a increases. In conclusion, the parameter a has a significant impact on the model’s performance on certain datasets, while on others, its effect is negligible. The lowest average error observed for a = 0.005 suggests that this value is generally optimal for the model, though further tuning may be required for specific datasets. Cases with high variability in errors highlight the need for deeper analysis and optimization of the a parameter based on the characteristics of each dataset.
Figure 8 compares different values of the parameter a for the classification datasets. The p-values for the comparisons a = 0.001 vs. a = 0.005 ( p = 0.36 ), a = 0.001 vs. a = 0.01 ( p = 0.071 ), and a = 0.005 vs. a = 0.01 ( p = 0.17 ) indicate that the differences between the parameter values are not statistically significant. This suggests that varying the parameter a within this range does not substantially affect the model’s performance on these datasets.
Figure 9 presents corresponding comparisons for the regression datasets, where a similar result is observed. The p-values for the comparisons a = 0.001 vs. a = 0.005 ( p = 0.75 ), a = 0.001 vs. a = 0.01 ( p = 0.41 ), and a = 0.005 vs. a = 0.01 ( p = 0.94 ) indicate the absence of statistically significant differences. This shows that the choice of parameter a does not significantly influence the model’s performance on the regression datasets.

3.4. Experiments on the Parameter F

Another experiment was conducted using the initialization factor F. Table 6 presents the percentage error rates of the proposed machine learning model across various classification datasets for four different values of the parameter F (1.5, 3.0, 5.0, 10.0). Analyzing the data reveals that the parameter F influences the model’s performance, but this effect varies by dataset. The lowest average error rate is observed for F = 5.0 (18.58%), indicating that this value is generally optimal. For the other values, slightly higher average error rates are noted: 18.88% for F = 3.0 ; 18.67% for F = 10.0 ; and the highest rate, 20.32%, for F = 1.5 . Examining individual datasets, it is evident that for many of them, increasing F improves performance, as reflected in reduced error rates. Examples include the “Ionosphere”, “Wine”, and “ZONF_S” datasets, where error rates decrease as F increases. On “Ionosphere”, the error rate drops from 12.92% for F = 1.5 to 7.39% for F = 10.0 . On “Wine”, the error rate decreases from 10.90% for F = 1.5 to 7.71% for F = 10.0 . Similarly, on “ZONF_S”, the error rate steadily decreases from 2.59% for F = 1.5 to 1.79% for F = 10.0 . However, there are cases where increasing F does not lead to improvement or results in higher error rates. For example, on the “Segment” dataset, the error rate rises from 35.81% for F = 1.5 to 40.83% for F = 10.0 . On the “Spiral” dataset, the error rate consistently increases from 13.28% for F = 1.5 to 22.52% for F = 10.0 . A similar trend is observed on the “Z_O_N_F_S” dataset, where the error rate rises from 46.00% for F = 1.5 to 46.77% for F = 10.0 . Overall, the parameter F significantly affects the model’s performance, and the optimal value appears to be F = 5.0 , as evidenced by the lowest average error rate. However, the exact impact depends on the characteristics of each dataset, emphasizing the need to fine-tune the parameter value for specific datasets to achieve optimal performance.
Table 7 provides the absolute error values of the proposed machine learning model across various regression datasets for four different values of the parameter F (1.5, 3.0, 5.0, 10.0). The data analysis shows that the parameter F affects the model’s performance differently depending on the dataset. The average errors indicate that F = 5.0 yields the lowest overall error (5.22), followed by F = 3.0 , with an average of 5.25. Higher averages are observed for F = 1.5 (5.52) and F = 10.0 (5.48), suggesting that deviating from F = 5.0 tends to increase error in some cases. Examining the datasets, it is evident that in several cases, increasing F improves performance, reducing error rates. For example, on the “Abalone” dataset, the error decreases from 6.39 for F = 1.5 to 5.10 for F = 10.0 . Similarly, on the “Friedman” dataset, the error significantly decreases from 6.59 for F = 1.5 to 1.45 for F = 10.0 . On the “Laser” dataset, the error decreases progressively from 0.022 for F = 1.5 to 0.003 for F = 10.0 . Conversely, there are datasets where the effect of F is nonlinear or increases the error rate. For instance, on the “Housing” dataset, the error rises from 16.75 for F = 1.5 to 18.70 for F = 10.0 . On the “MB” dataset, there is a sharp increase in error from 0.116 for F = 1.5 to 5.49 for F = 10.0 , indicating that F significantly impacts model performance for this dataset. In summary, the parameter F has varying effects on the model’s performance across different datasets. While the average indicates that F = 5.0 is the optimal choice, precise optimization of the parameter should be dataset-specific. Additionally, extreme parameter values may lead to significant performance degradation in certain datasets, as seen in examples like “MB” and “Housing”.
In Figure 10, which compares different values of the parameter F for the classification datasets, several statistically significant differences are observed. The p-values for the comparisons F = 1.5 vs. F = 3.0 ( p = 0.00019 ), F = 1.5 vs. F = 5.0 ( p = 0.00012 ), and F = 1.5 vs. F = 10.0 ( p = 0.00069 ) indicate a strong difference in the model’s performance. In contrast, the values for the comparisons F = 3.0 vs. F = 5.0 ( p = 0.027 ), F = 3.0 vs. F = 10.0 ( p = 0.062 ), and F = 5.0 vs. F = 10.0 ( p = 0.23 ) show that the differences between larger values of the parameter F are less significant.
Figure 11 examines comparisons of the parameter F for the regression datasets and shows no statistically significant differences. The p-values for the comparisons F = 1.5 vs. F = 3.0 ( p = 0.18 ), F = 1.5 vs. F = 5.0 ( p = 0.15 ), F = 1.5 vs. F = 10.0 ( p = 0.21 ), F = 3.0 vs. F = 5.0 ( p = 0.7 ), F = 3.0 vs. F = 10.0 ( p = 0.54 ), and F = 5.0 vs. F = 10.0 ( p = 0.89 ) indicate that variations in the value of the parameter F do not significantly affect the model’s performance on the regression datasets. This may suggest greater stability of the model to changes in this parameter compared to the classification datasets.

3.5. Experiments on Number of Generations

In order to evaluate the convergence of the genetic algorithm, an additional experiment was conducted where the number of generations was altered from 25 to 200. The experimental results using the proposed method for the classification datasets are depicted in Table 8 and for the regression datasets in Table 9.
In Table 8, the experimental results indicate a general trend of decreasing error rates as the number of generations N g increases from 25 to 200. This reduction suggests that increasing N g improves the performance of the genetic algorithm, as it allows the model to better approximate the optimal solution. For most datasets, the lowest error rate is observed when N g = 200 . For instance, on the “Alcohol” dataset, the error rate decreased from 38.19% to 31.28%, while a significant reduction was also observed in the “Sonar” dataset, from 21.77% to 18.25%. Similar reductions were noted on several other datasets, such as “Spiral” (31.82% to 22.52%) and “ZO_NF_S” (4.19% to 3.63%). However, there are datasets like “Mammographic” and “Lymography” where the changes are minor and not consistently positive, indicating that increasing the number of generations does not always have a dramatic impact on performance. The overall average error rate across all datasets steadily decreases, from 19.99% when N g = 25 to 18.67% when N g = 200 . This confirms the general trend of the genetic algorithm converging towards improved solutions with more generations. The statistical analysis of the results demonstrates that increasing the number of generations is effective in the majority of cases, enhancing the accuracy of the proposed method on classification datasets. Nonetheless, the performance improvement appears to also depend on the specific characteristics of each dataset as well as the inherent properties of the method.
Figure 12 presents the significance levels p for the experiments conducted on the classification datasets with various models. The results indicate that the difference between N g = 25 and N g = 50 is not statistically significant, as p is 0.12. In contrast, the transition from N g = 50 to N g = 100 is marginally significant, with p = 0.048, suggesting that the increase in generations begins to influence model performance. Finally, the difference between N g = 100 and N g = 200 shows high statistical significance, with p = 0.00083, confirming that the increase in the number of generations significantly contributes to performance improvement.
In Table 9, the experimental findings suggest that the convergence behavior of the genetic algorithm on regression datasets shows varied patterns as the number of generations N g increases from 25 to 200. Specifically, a reduction in error values is observed on several datasets, while on others, errors either increase or remain stable. For instance, on the “Abalone” dataset, the error decreases steadily from 6.08 at N g = 25 to 5.10 at N g = 200 , indicating improved performance. On the “Friedman” dataset, there is a significant reduction from 3.22 to 1.45, demonstrating a clear trend of convergence toward optimal solutions. Similar reductions are observed on “Stock” (2.29 to 1.53) and “Treasury” (0.82 to 0.51). However, there are datasets such as “Auto”, where the reduction is negligible, and the error slightly increases from 9.60 at N g = 100 to 9.68 at N g = 200 . Furthermore, on the “Housing” dataset, a gradual increase in error is observed, from 17.72 to 18.70, suggesting that the increased number of generations did not improve performance. On the “MB” dataset, the error rises significantly, from 0.15 at N g = 25 to 5.49 at N g = 200 , indicating potential instability of the method on this specific dataset. The average across all datasets shows a nonlinear pattern, with values fluctuating. The average error decreases from 5.54 at N g = 25 to 5.13 at N g = 100 , but then increases to 5.48 at N = 200 . This indicates that increasing the number of generations does not always lead to consistent improvement in performance across regression datasets. In conclusion, the statistical analysis reveals that increasing the number of generations N g can improve the performance of the genetic algorithm on certain datasets; however, its impact is not always consistent. The outcome depends on the specific characteristics of each dataset as well as the inherent properties of the method.
Figure 13 pertains to the regression datasets and shows that the performance differences between consecutive N g values are not statistically significant. Specifically, for N g = 25 versus N g = 50 , p is 0.46, for N g = 50 versus N g = 100 , p is 0.27, and for N g = 100 versus N g = 200 , p is 0.9. These results suggest that increasing the number of generations in regression datasets does not lead to significant performance changes, indicating that the characteristics of these datasets may limit the effectiveness of the approach.

4. Conclusions

The article focuses on optimizing the parameter tuning process in radial basis function networks through a multidimensional and innovative approach that combines techniques such as Simulated Annealing and genetic algorithms. The proposed method surpasses traditional two-stage approaches, where parameters are typically determined using fixed processes like K-means clustering followed by a split between training and validation phases. Instead, the article introduces a three-phase process. In the first phase, initial parameter estimation is performed using K-means, ensuring a stable starting point. In the second phase, the application of Simulated Annealing provides an advanced mechanism for exploring the parameter space, avoiding local minima and examining a broader range of potential values. Finally, the third phase integrates genetic algorithms, enabling the optimization of parameters based on the model’s actual performance. This process ensures a comprehensive and adaptive approach, reducing the likelihood of overfitting and numerical instability. The novelty of the method lies not only in the three-phase process but also in the model’s ability to adapt to datasets with diverse characteristics, such as those involving complex nonlinear relationships or multidimensional dependencies. The experimental results demonstrate the method’s superiority compared to traditional techniques, such as BFGS, ADAM, NEAT, and RBF-KMEANS, achieving improvements in terms of average error rates across both classification and regression datasets.
The article’s conclusions clearly highlight the superiority of the proposed method. In classification datasets, the method achieves lower average error rates, particularly when the parameters a and F are optimally configured. For instance, in classification datasets, the parameter F significantly influences performance, with F = 5.0 proving to be the most effective value in many cases. On datasets such as “Ionosphere” and “Wine”, dramatic reductions in error rates are observed as the value of F increases, emphasizing the importance of selecting this parameter correctly. Similarly, on the regression datasets, the proposed model demonstrates exceptional performance, with notable examples including datasets like “Abalone” and “Friedman”, where the error decreases significantly compared to traditional techniques. The “MB” dataset is particularly interesting, as the use of Simulated Annealing contributes to a substantial performance improvement, avoiding errors commonly encountered in traditional approaches. Despite the generally positive results, there are cases where the method’s performance is suboptimal, such as on the “Segment” dataset, indicating that the method is not universally generalizable without adjustment to the characteristics of each dataset. The need for further research on the parameters a and F is evident, as these parameters critically impact performance across numerous datasets.
For the future, the article proposes numerous directions for further exploration and development. Initially, applying the method to more diverse data categories, such as time series, image data, or even genetic data, could broaden its application scope. Additionally, the dynamic adjustment of the parameters a and F during training, using reinforcement learning techniques, could further enhance performance by eliminating the need for manual tuning. Furthermore, integrating the method into deep learning systems, such as convolutional or recurrent neural networks, could lead to hybrid approaches that combine the flexibility of RBFs with the computational power of deep neural networks. Moreover, investigating the robustness of the method in environments with dynamic or imbalanced data could provide additional insights into its generalizability. Finally, analyzing the method’s performance on big data and integrating it with technologies like distributed processing or cloud computing could open new avenues, enabling its scalability to larger-scale problems. The methodology proposed in the article not only sets new standards for the performance of RBFs but also paves the way for further innovation in the field of machine learning.

Author Contributions

V.C. and I.G.T. conducted the experiments, employing several datasets and provided the comparative experiments. D.T. and V.C. performed the statistical analysis and prepared the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been financed by the European Union: Next, Generation EU through the Program Greece 2.0 National Recovery and Resilience Plan, under the call RESEARCH–CREATE–INNOVATE, project name “iCREW: Intelligent small craft simulator for advanced crew training using Virtual Reality techniques” (project code: TAEDK-06195).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Mjahed, M. The use of clustering techniques for the classification of high energy physics data. Nucl. Instrum. Methods Phys. Res. Sect. A Accel. Spectrometers Detect. Assoc. Equip. 2006, 559, 199–202. [Google Scholar] [CrossRef]
  2. Andrews, M.; Paulini, M.; Gleyzer, S.; Poczos, B. End-to-End Event Classification of High-Energy Physics Data. J. Phys. Conf. Ser. 2018, 1085, 042022. [Google Scholar] [CrossRef]
  3. Viquar, M.; Basak, S.; Dasgupta, A.; Agrawal, S.; Saha, S. Machine learning in astronomy: A case study in quasar-star classification. In Emerging Technologies in Data Mining and Information Security: Proceedings of IEMIS 2018; Springer: Singapore, 2019; Volume 3, pp. 827–836. [Google Scholar]
  4. Luo, S.; Leung, A.P.; Hui, C.Y.; Li, K.L. An investigation on the factors affecting machine learning classifications in gamma-ray astronomy. Mon. Not. R. Astron. Soc. 2020, 492, 5377–5390. [Google Scholar] [CrossRef]
  5. He, P.; Xu, C.J.; Liang, Y.Z.; Fang, K.T. Improving the classification accuracy in chemistry via boosting technique. Chemom. Intell. Lab. Syst. 2004, 70, 39–46. [Google Scholar] [CrossRef]
  6. Aguiar, J.A.; Gong, M.L.; Tasdizen, T. Crystallographic prediction from diffraction and chemistry data for higher throughput classification using machine learning. Comput. Mater. Sci. 2020, 173, 109409. [Google Scholar] [CrossRef]
  7. Yadav, S.S.; Jadhav, S.M. Deep convolutional neural network based medical image classification for disease diagnosis. J. Big Data 2019, 6, 113. [Google Scholar] [CrossRef]
  8. Qing, L.; Linhong, W.; Xuehai, D. A Novel Neural Network-Based Method for Medical Text Classification. Future Internet 2019, 11, 255. [Google Scholar] [CrossRef]
  9. Kaastra, I.; Boyd, M. Designing a neural network for forecasting financial and economic time series. Neurocomputing 1996, 10, 215–236. [Google Scholar] [CrossRef]
  10. Hafezi, R.; Shahrabi, J.; Hadavandi, E. A bat-neural network multi-agent system (BNNMAS) for stock price prediction: Case study of DAX stock price. Appl. Soft Comput. 2015, 29, 196–210. [Google Scholar] [CrossRef]
  11. MacQueen, J. Some methods for classification and analysis of multivariate observations. Berkeley Symp. Math. Stat. Prob. 1967, 1, 281–297. [Google Scholar]
  12. Er, M.J.; Wu, S.; Lu, J.; Toh, H.L. Face recognition with radial basis function (RBF) neural networks. IEEE Trans. Neural Netw. 2002, 13, 697–710. [Google Scholar] [PubMed]
  13. Mai-Duy, N.; Tran-Cong, T. Numerical solution of differential equations using multiquadric radial basis function networks. Neural Netw. 2001, 14, 185–199. [Google Scholar] [CrossRef] [PubMed]
  14. Mai-Duy, N. Solving high order ordinary differential equations with radial basis function networks. Int. J. Numer. Meth. Eng. 2005, 62, 824–852. [Google Scholar] [CrossRef]
  15. Shen, W.; Guo, X.; Wu, C.; Wu, D. Forecasting stock indices using radial basis function neural networks optimized by artificial fish swarm algorithm. Knowl.-Based Syst. 2011, 24, 378–385. [Google Scholar] [CrossRef]
  16. Lian, R.-J. Adaptive Self-Organizing Fuzzy Sliding-Mode Radial Basis-Function Neural-Network Controller for Robotic Systems. IEEE Trans. Ind. Electron. 2014, 61, 1493–1503. [Google Scholar] [CrossRef]
  17. Vijay, M.; Jena, D. Backstepping terminal sliding mode control of robot manipulator using radial basis functional neural networks. Comput. Electr. Eng. 2018, 67, 690–707. [Google Scholar] [CrossRef]
  18. Ravale, U.; Marathe, N.; Padiya, P. Feature Selection Based Hybrid Anomaly Intrusion Detection System Using K Means and RBF Kernel Function. Procedia Comput. Sci. 2015, 45, 428–435. [Google Scholar] [CrossRef]
  19. Lopez-Martin, M.; Sanchez-Esguevillas, A.; Arribas, J.I.; Carro, B. Network Intrusion Detection Based on Extended RBF Neural Network With Offline Reinforcement Learning. IEEE Access 2021, 9, 153153–153170. [Google Scholar] [CrossRef]
  20. Benoudjit, N.; Verleysen, M. On the Kernel Widths in Radial-Basis Function Networks. Neural Process. Lett. 2003, 18, 139–154. [Google Scholar] [CrossRef]
  21. Oyang, Y.J.; Hwang, S.C.; Ou, Y.Y.; Chen, C.Y.; Chen, Z.W. Data classification with radial basis function networks based on a novel kernel density estimation algorithm. IEEE Trans. Neural Netw. 2005, 16, 225–236. [Google Scholar] [CrossRef]
  22. Ros, F.; Pintore, M.; Deman, A.; Chrétien, J.R. Automatical initialization of RBF neural networks. Chemom. Intell. Lab. Syst. 2007, 87, 26–32. [Google Scholar] [CrossRef]
  23. Ricci, E.; Perfetti, R. Improved pruning strategy for radial basis function networks with dynamic decay adjustment. Neurocomputing 2006, 69, 1728–1732. [Google Scholar] [CrossRef]
  24. Huang, G.-B.; Saratchandran, P.; Sundararajan, N. A generalized growing and pruning RBF (GGAP-RBF) neural network for function approximation. IEEE Trans. Neural Netw. 2005, 16, 57–67. [Google Scholar] [CrossRef]
  25. Harpham, C.; Dawson, C.W.; Brown, M.R. A review of genetic algorithms applied to training radial basis function networks. Neural Comput. Appl. 2004, 13, 193–201. [Google Scholar] [CrossRef]
  26. Sarimveis, H.; Alexandridis, A.; Mazarakis, S.; Bafas, G. A new algorithm for developing dynamic radial basis function neural network models based on genetic algorithms. Comput. Chem. Eng. 2004, 28, 209–217. [Google Scholar] [CrossRef]
  27. Rani, R.H.J.; Victoire, T.A.A. Training radial basis function networks for wind speed prediction using PSO enhanced differential search optimizer. PLoS ONE 2018, 13, e0196871. [Google Scholar] [CrossRef]
  28. Zhang, W.; Wei, D. Prediction for network traffic of radial basis function neural network model based on improved particle swarm optimization algorithm. Neural Comput. Appl. 2018, 29, 1143–1152. [Google Scholar] [CrossRef]
  29. Qasem, S.N.; Shamsuddin, S.M.; Zain, A.M. Multi-objective hybrid evolutionary algorithms for radial basis function neural network design. Knowl.-Based Syst. 2012, 27, 475–497. [Google Scholar] [CrossRef]
  30. Yokota, R.; Barba, L.A.; Knepley, M.G. PetRBF—A parallel O(N) algorithm for radial basis function interpolation with Gaussians. Comput. Methods Appl. Mech. Eng. 2010, 199, 1793–1804. [Google Scholar] [CrossRef]
  31. Lu, C.; Ma, N.; Wang, Z. Fault detection for hydraulic pump based on chaotic parallel RBF network. EURASIP J. Adv. Signal Process. 2011, 2011, 49. [Google Scholar] [CrossRef]
  32. Ingber, L. Very fast simulated re-annealing. Math. Comput. Model. 1989, 12, 967–973. [Google Scholar] [CrossRef]
  33. Aerts, J.C.; Heuvelink, G.B. Using simulated annealing for resource allocation. Int. J. Geogr. Inf. Sci. 2002, 16, 571–587. [Google Scholar] [CrossRef]
  34. Ganesh, K.; Punniyamoorthy, M. Optimization of continuous-time production planning using hybrid genetic algorithms-simulated annealing. Int. J. Adv. Manuf. Technol. 2005, 26, 148–154. [Google Scholar] [CrossRef]
  35. El-Naggar, K.M.; AlRashidi, M.R.; AlHajri, M.F.; Al-Othman, A.K. Simulated annealing algorithm for photovoltaic parameters identification. Sol. Energy 2012, 86, 266–274. [Google Scholar] [CrossRef]
  36. Dupanloup, I.; Schneider, S.; Excoffier, L. A simulated annealing approach to define the genetic structure of populations. Mol. Ecol. 2002, 11, 2571–2581. [Google Scholar] [CrossRef]
  37. Kaelo, P.; Ali, M.M. Integrated crossover rules in real coded genetic algorithms. Eur. J. Oper. Res. 2007, 176, 60–76. [Google Scholar] [CrossRef]
  38. Kelly, M.; Longjohn, R.; Nottingham, K. The UCI Machine Learning Repository. 2023. Available online: https://archive.ics.uci.edu (accessed on 20 September 2023).
  39. Alcalá-Fdez, J.; Fernandez, A.; Luengo, J.; Derrac, J.; García, S.; Sánchez, L.; Herrera, F. KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework. J. Mult.-Valued Log. Soft Comput. 2011, 17, 255–287. [Google Scholar]
  40. Tzimourta, K.D.; Tsoulos, I.; Bilero, I.T.; Tzallas, A.T.; Tsipouras, M.G.; Giannakeas, N. Direct Assessment of Alcohol Consumption in Mental State Using Brain Computer Interfaces and Grammatical Evolution. Inventions 2018, 3, 51. [Google Scholar] [CrossRef]
  41. Weiss, S.M.; Kulikowski, C.A. Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems; Morgan Kaufmann Publishers Inc.: Burlington, MA, USA, 1991. [Google Scholar]
  42. Quinlan, J.R. Simplifying Decision Trees. Int. J. Man-Mach. Stud. 1987, 27, 221–234. [Google Scholar] [CrossRef]
  43. Shultz, T.; Mareschal, D.; Schmidt, W. Modeling Cognitive Development on Balance Scale Phenomena. Mach. Learn. 1994, 16, 59–88. [Google Scholar] [CrossRef]
  44. Zhou, Z.H.; Jiang, Y. NeC4.5: Neural ensemble based C4.5. IEEE Trans. Knowl. Data Eng. 2004, 16, 770–773. [Google Scholar] [CrossRef]
  45. Setiono, R.; Leow, W.K. FERNN: An Algorithm for Fast Extraction of Rules from Neural Networks. Appl. Intell. 2000, 12, 15–25. [Google Scholar] [CrossRef]
  46. Demiroz, G.; Govenir, H.A.; Ilter, N. Learning Differential Diagnosis of Eryhemato-Squamous Diseases using Voting Feature Intervals. Artif. Intell. Med. 1998, 13, 147–165. [Google Scholar]
  47. Hayes-Roth, B.; Hayes-Roth, B.F. Concept learning and the recognition and classification of exemplars. J. Verbal Learn. Verbal Behav. 1977, 16, 321–338. [Google Scholar] [CrossRef]
  48. Kononenko, I.; Šimec, E.; Robnik-Šikonja, M. Overcoming the Myopia of Inductive Learning Algorithms with RELIEFF. Appl. Intell. 1997, 7, 39–55. [Google Scholar] [CrossRef]
  49. French, R.M.; Chater, N. Using noise to compute error surfaces in connectionist networks: A novel means of reducing catastrophic forgetting. Neural Comput. 2002, 14, 1755–1769. [Google Scholar] [CrossRef]
  50. Dy, J.G.; Brodley, C.E. Feature Selection for Unsupervised Learning. J. Mach. Learn. Res. 2004, 5, 845–889. [Google Scholar]
  51. Perantonis, S.J.; Virvilis, V. Input Feature Extraction for Multilayered Perceptrons Using Supervised Principal Component Analysis. Neural Process. Lett. 1999, 10, 243–252. [Google Scholar] [CrossRef]
  52. Garcke, J.; Griebel, M. Classification with sparse grids using simplicial basis functions. Intell. Data Anal. 2002, 6, 483–502. [Google Scholar] [CrossRef]
  53. Mcdermott, J.; Forsyth, R.S. Diagnosing a disorder in a classification benchmark. Pattern Recognit. Lett. 2016, 73, 41–43. [Google Scholar] [CrossRef]
  54. Cestnik, G.; Konenenko, I.; Bratko, I. Assistant-86: A Knowledge-Elicitation Tool for Sophisticated Users. In Progress in Machine Learning; Bratko, I., Lavrac, N., Eds.; Sigma Press: Wilmslow, UK, 1987; pp. 31–45. [Google Scholar]
  55. Elter, M.; Schulz-Wendtland, R.; Wittenberg, T. The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. Med. Phys. 2007, 34, 4164–4172. [Google Scholar] [CrossRef] [PubMed]
  56. Little, M.A.; McSharry, P.E.; Roberts, S.J.; Costerllo, D.A.; Moroz, I.M. Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection. BioMed Eng. Online 2007, 6, 23. [Google Scholar] [CrossRef] [PubMed]
  57. Little, M.A.; McSharry, P.E.; Hunter, E.J.; Spielman, J.; Ramig, L.O. Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. IEEE Trans. Biomed. Eng. 2009, 56, 1015–1022. [Google Scholar] [CrossRef]
  58. Smith, J.W.; Everhart, J.E.; Dickson, W.C.; Knowler, W.C.; Johannes, R.S. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care IEEE Computer Society Press, Minneapolis, MN, USA, 8–10 June 1988; pp. 261–265. [Google Scholar]
  59. Lucas, D.D.; Klein, R.; Tannahill, J.; Ivanova, D.; Brandon, S.; Domyancic, D.; Zhang, Y. Failure analysis of parameter-induced simulation crashes in climate models. Geosci. Model Dev. 2013, 6, 1157–1171. [Google Scholar] [CrossRef]
  60. Giannakeas, N.; Tsipouras, M.G.; Tzallas, A.T.; Kyriakidi, K.; Tsianou, Z.E.; Manousou, P.; Hall, A.; Karvounis, E.C.; Tsianos, V.; Tsianos, E. A clustering based method for collagen proportional area extraction in liver biopsy images. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, Milan, Italy, 25–29 August 2015; Art. No. 7319047. pp. 3097–3100. [Google Scholar]
  61. Hastie, T.; Tibshirani, R. Non-parametric logistic and proportional odds regression. JRSS-C (Appl. Stat.) 1987, 36, 260–276. [Google Scholar] [CrossRef]
  62. Dash, M.; Liu, H.; Scheuermann, P.; Tan, K.L. Fast hierarchical clustering and its validation. Data Knowl. Eng. 2003, 44, 109–138. [Google Scholar] [CrossRef]
  63. Gorman, R.P.; Sejnowski, T.J. Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets. Neural Netw. 1988, 1, 75–89. [Google Scholar] [CrossRef]
  64. Cortez, P.; Silva, A.M.G. Using data mining to predict secondary school student performance. In Proceedings of the 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008), EUROSIS-ETI, Porto, Portugal, 9–11 April 2008; pp. 5–12. [Google Scholar]
  65. Yeh, I.-C.; Yang, K.-J.; Ting, T.-M. Knowledge discovery on RFM model using Bernoulli sequence. Expert Syst. Appl. 2009, 36, 5866–5871. [Google Scholar] [CrossRef]
  66. Wolberg, W.H.; Mangasarian, O.L. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc. Natl. Acad. Sci. USA 1990, 87, 9193–9196. [Google Scholar] [CrossRef]
  67. Raymer, M.; Doom, T.E.; Kuhn, L.A.; Punch, W.F. Knowledge discovery in medical and biological datasets using a hybrid Bayes classifier/evolutionary algorithm. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2003, 33, 802–813. [Google Scholar] [CrossRef]
  68. Zhong, P.; Fukushima, M. Regularized nonsmooth Newton method for multi-class support vector machines. Optim. Methods Softw. 2007, 22, 225–236. [Google Scholar] [CrossRef]
  69. Andrzejak, R.G.; Lehnertz, K.; Mormann, F.; Rieke, C.; David, P.; Elger, C.E. Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Phys. Rev. E 2001, 64, 061907. [Google Scholar] [CrossRef]
  70. Tzallas, A.T.; Tsipouras, M.G.; Fotiadis, D.I. Automatic Seizure Detection Based on Time-Frequency Analysis and Artificial Neural Networks. Comput. Intell. Neurosci. 2007, 2007, 80510. [Google Scholar] [CrossRef]
  71. Koivisto, M.; Sood, K. Exact Bayesian Structure Discovery in Bayesian Networks. J. Mach. Learn. Res. 2004, 5, 549–573. [Google Scholar]
  72. Nash, W.J.; Sellers, T.L.; Talbot, S.R.; Cawthor, A.J.; Ford, W.B. The Population Biology of Abalone (_Haliotis_ Species) in Tasmania. I. Blacklip Abalone (_H. rubra_) from the North Coast and Islands of Bass Strait, Sea Fisheries Division; Technical Report No. 48; Department of Primary Industry and Fisheries, Tasmania: Hobart, Australia, 1994; ISSN 1034-3288. [Google Scholar]
  73. Brooks, T.F.; Pope, D.S.; Marcolini, A.M. Airfoil Self-Noise and Prediction. Technical Report, NASA RP-1218. July 1989. Available online: https://ntrs.nasa.gov/citations/19890016302 (accessed on 5 March 2025).
  74. Simonoff, J.S. Smooting Methods in Statistics; Springer: Berlin/Heidelberg, Germany, 1996. [Google Scholar]
  75. Yeh, I.C. Modeling of strength of high performance concrete using artificial neural networks. Cem. Concr. Res. 1998, 28, 1797–1808. [Google Scholar] [CrossRef]
  76. Harrison, D.; Rubinfeld, D.L. Hedonic prices and the demand for clean ai. J. Environ. Econ. Manag. 1978, 5, 81–102. [Google Scholar] [CrossRef]
  77. Friedman, J. Multivariate Adaptative Regression Splines. Ann. Stat. 1991, 19, 1–141. [Google Scholar]
  78. Mackowiak, P.A.; Wasserman, S.S.; Levine, M.M. A critical appraisal of 98.6 degrees f, the upper limit of the normal body temperature, and other legacies of Carl Reinhold August Wunderlich. J. Am. Med. Assoc. 1992, 268, 1578–1580. [Google Scholar] [CrossRef]
  79. King, R.D.; Muggleton, S.; Lewis, R.; Sternberg, M.J.E. Drug design by machine learning: The use of inductive logic programming to model the structure-activity relationships of trimethoprim analogues binding to dihydrofolate reductase. Proc. Nat. Acad. Sci. USA 1992, 89, 11322–11326. [Google Scholar] [CrossRef]
  80. Powell, M.J.D. A Tolerant Algorithm for Linearly Constrained Optimization Calculations. Math. Program. 1989, 45, 547–566. [Google Scholar] [CrossRef]
  81. Bishop, C. Neural Networks for Pattern Recognition; Oxford University Press: Oxford, UK, 1995. [Google Scholar]
  82. Cybenko, G. Approximation by superpositions of a sigmoidal Function. Math. Control. Signals Syst. 1989, 2, 303–314. [Google Scholar] [CrossRef]
  83. Kingma, D.P.; Ba, J.L. ADAM: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA, USA, 7–9 May 2015; pp. 1–15. [Google Scholar]
  84. Stanley, K.O.; Miikkulainen, R. Evolving Neural Networks through Augmenting Topologies. Evol. Comput. 2002, 10, 99–127. [Google Scholar] [CrossRef] [PubMed]
  85. Ding, S.; Xu, L.; Su, C.; Jin, F. An optimizing method of RBF neural network based on genetic algorithm. Neural Comput. Appl. 2012, 21, 333–336. [Google Scholar] [CrossRef]
  86. Bevilacqua, A. A methodological approach to parallel simulated annealing on an SMP system. J. Parallel Distrib. 2002, 62, 1548–1570. [Google Scholar] [CrossRef]
Figure 1. A typical plot of the Gaussian function.
Figure 1. A typical plot of the Gaussian function.
Algorithms 18 00234 g001
Figure 2. A typical diagram of an RBF network.
Figure 2. A typical diagram of an RBF network.
Algorithms 18 00234 g002
Figure 3. The scheme of the particles in the proposed method.
Figure 3. The scheme of the particles in the proposed method.
Algorithms 18 00234 g003
Figure 4. Average execution time for the machine learning methods that were applied to the classification datasets.
Figure 4. Average execution time for the machine learning methods that were applied to the classification datasets.
Algorithms 18 00234 g004
Figure 5. Comparison of error rates across the machine learning models utilized.
Figure 5. Comparison of error rates across the machine learning models utilized.
Algorithms 18 00234 g005
Figure 6. Statistical comparison of the experimental results for the classification datasets.
Figure 6. Statistical comparison of the experimental results for the classification datasets.
Algorithms 18 00234 g006
Figure 7. Statistical comparison of the obtained experimental results for the regression datasets.
Figure 7. Statistical comparison of the obtained experimental results for the regression datasets.
Algorithms 18 00234 g007
Figure 8. Statistical comparison of the obtained results from the application of the current method on the classification datasets using different values of perturbation factor a.
Figure 8. Statistical comparison of the obtained results from the application of the current method on the classification datasets using different values of perturbation factor a.
Algorithms 18 00234 g008
Figure 9. Statistical comparison of the obtained experiments results from the application of the proposed method to the regression datasets, using different values of the perturbation factor a.
Figure 9. Statistical comparison of the obtained experiments results from the application of the proposed method to the regression datasets, using different values of the perturbation factor a.
Algorithms 18 00234 g009
Figure 10. Statistical comparison for the experimental results by the application of the proposed method with different values of parameter F. The method was applied on the classification datasets.
Figure 10. Statistical comparison for the experimental results by the application of the proposed method with different values of parameter F. The method was applied on the classification datasets.
Algorithms 18 00234 g010
Figure 11. Statistical comparison of the obtained results by the application of the proposed method to the regression datasets. For this experiment, different values of parameter F were used.
Figure 11. Statistical comparison of the obtained results by the application of the proposed method to the regression datasets. For this experiment, different values of parameter F were used.
Algorithms 18 00234 g011
Figure 12. Statistical comparison of the experimental results by the application of the proposed method with different values for the number of generations N g . The method was applied on the classification datasets.
Figure 12. Statistical comparison of the experimental results by the application of the proposed method with different values for the number of generations N g . The method was applied on the classification datasets.
Algorithms 18 00234 g012
Figure 13. Statistical comparison of the experimental results by the application of the proposed method for different values of the number of generations N g . The method was applied on the regression datasets. The ns symbol denotes p > 0.05 (Not significant) and the symbol * stands for p < 0.05 (significant).
Figure 13. Statistical comparison of the experimental results by the application of the proposed method for different values of the number of generations N g . The method was applied on the regression datasets. The ns symbol denotes p > 0.05 (Not significant) and the symbol * stands for p < 0.05 (significant).
Algorithms 18 00234 g013
Table 1. The values for the experimental parameters.
Table 1. The values for the experimental parameters.
ParameterMeaningValue
w 0 Initial values for the weights100.0
FScale factor used in the initialization10.0
T 0 Initial temperature 10 6
ϵ Small value used in comparisons 10 6
aPerturbation factor0.001
N s Number of samples used in the fitness calculation100
N e p s Number of samples taken in Simulated Annealing100
N c Number of chromosomes500
N g Maximum number of allowed generations200
p s Selection rate0.1
p m Mutation rate0.05
Table 2. Experimental results for the classification datasets using the series of machine learning methods adopted here. The numbers in cells represent average classification error as measured on the corresponding test set.
Table 2. Experimental results for the classification datasets using the series of machine learning methods adopted here. The numbers in cells represent average classification error as measured on the corresponding test set.
DatasetBFGSADAMNEATRBF-KMEANSGENRBFProposed
Alcohol41.50%57.78%66.80%49.38%52.45%31.28%
Appendicitis18.00%16.50%17.20%12.23%16.83%15.27%
Australian38.13%35.65%31.98%34.89%41.79%21.00%
Balance8.64%7.87%23.14%33.42%38.02%12.95%
Cleveland77.55%67.55%53.44%67.10%67.47%50.82%
Circular6.08%19.95%35.18%5.98%21.43%4.19%
Dermatology52.92%26.14%32.43%62.34%61.46%36.13%
Hayes-Roth37.33%59.70%50.15%64.36%63.46%33.54%
Heart39.44%38.53%39.27%31.20%28.44%15.33%
HeartAttack46.67%45.55%32.34%29.00%40.48%18.52%
HouseVotes7.13%7.48%10.89%6.13%11.99%3.74%
Ionosphere15.29%16.64%19.67%16.22%19.83%7.39%
Liverdisorder42.59%41.53%30.67%30.84%36.97%27.92%
Lymography35.43%29.26%33.70%25.50%29.33%20.64%
Mammographic17.24%46.25%22.85%21.38%30.41%17.21%
Parkinsons27.58%24.06%18.56%17.41%33.81%15.35%
Phoneme15.58%29.43%22.34%23.32%26.29%16.62%
Pima35.59%34.85%34.51%25.78%27.83%23.59%
Popfailures5.24%5.18%7.05%7.04%7.08%4.80%
Regions236.28%29.85%33.23%38.29%39.98%25.54%
Saheart37.48%34.04%34.51%32.19%33.90%29.64%
Segment68.97%49.75%66.72%59.68%54.25%40.83%
Sonar25.85%30.33%34.10%27.90%37.13%18.25%
Spiral47.99%48.90%50.22%44.87%50.02%22.52%
StatHeart39.65%44.04%44.36%31.36%42.94%19.52%
Student7.14%5.13%10.20%5.49%33.26%5.11%
Transfusion25.84%25.68%24.87%26.41%25.67%24.59%
Wdbc29.91%35.35%12.88%7.27%8.82%5.00%
Wine59.71%29.40%25.43%31.41%31.47%7.71%
Z_F_S39.37%47.81%38.41%13.16%23.37%3.16%
Z_O_N_F_S65.67%78.79%77.08%48.70%68.40%46.77%
ZO_NF_S43.04%47.43%43.75%9.02%22.18%3.63%
ZONF_S15.62%11.99%5.44%4.03%17.41%1.79%
ZOO10.70%14.13%20.27%21.93%33.50%4.50%
Average32.98%33.60%32.46%28.39%34.64%18.67%
W-average32.01%34.97%34.32%30.13%34.73%20.21%
Table 3. Experimental results for regression datasets. Numbers in cells represent average regression error as calculated on the corresponding test set.
Table 3. Experimental results for regression datasets. Numbers in cells represent average regression error as calculated on the corresponding test set.
DatasetBFGSADAMNEATRBF-KMEANSGENRBFProposed
Abalone5.694.309.887.379.985.10
Airfoil0.0030.0050.0670.270.1210.004
Auto60.9770.8456.0617.8716.789.68
Baseball119.6377.90100.3993.0298.9186.19
BK0.280.030.150.020.0230.153
BL2.550.280.050.0130.0050.0002
Concrete0.0660.0780.0810.0110.0150.006
Dee2.360.6301.5120.170.250.16
Housing97.3880.2056.4957.6895.6918.70
Friedman1.2622.9019.357.2316.241.45
FA0.4260.110.190.0150.150.019
FY0.220.0380.080.0410.0410.077
HO0.620.0350.1690.030.0760.01
Laser0.0150.030.0840.030.0750.003
MB0.1290.060.0612.160.415.49
Mortgage8.239.2414.111.451.920.14
NT0.1290.120.338.140.020.007
PL0.290.1170.0982.120.1550.023
Plastic20.3211.7120.778.6225.912.29
PY0.5780.090.0750.0120.0290.019
Quake0.420.060.2980.070.790.036
SN0.400.0260.1740.0270.0270.024
Stock302.43180.8912.2312.2325.181.53
Treasury9.9111.1615.522.021.890.51
Average26.4319.6212.849.1912.285.48
Table 4. Experimental results for the classification datasets using a series of values for perturbation factor a.
Table 4. Experimental results for the classification datasets using a series of values for perturbation factor a.
Dataset a = 0.001 a = 0.005 a = 0.01
Alcohol31.28%30.88%31.77%
Appendicitis15.27%15.77%15.57%
Australian21.00%21.03%20.77%
Balance12.95%13.13%13.29%
Cleveland50.82%50.64%51.11%
Circular4.19%4.01%4.08%
Dermatology36.13%36.81%36.67%
Hayes-Roth33.54%33.44%34.10%
Heart15.33%15.15%15.22%
HeartAttack18.52%18.80%18.88%
HouseVotes3.74%3.60%4.12%
Ionosphere7.39%7.38%7.29%
Liverdisorder27.92%28.27%28.49%
Lymography20.64%20.57%30.33%
Mammographic17.21%17.15%17.14%
Parkinsons15.35%14.35%15.23%
Phoneme16.62%16.74%16.00%
Pima23.59%24.09%23.99%
Popfailures4.80%4.82%4.86%
Regions225.54%25.62%25.75%
Saheart29.64%29.93%29.22%
Segment40.83%41.41%41.96%
Sonar18.25%17.80%17.83%
Spiral22.52%22.00%22.27%
StatHeart19.52%19.35%19.58%
Student5.11%5.03%5.25%
Transfusion24.59%24.70%24.64%
Wdbc5.00%5.05%5.07%
Wine7.71%7.88%7.90%
Z_F_S3.16%3.66%2.79%
Z_O_N_F_S46.77%46.83%47.06%
ZO_NF_S3.63%3.63%3.63%
ZONF_S1.79%1.81%1.74%
ZOO4.50%6.87%4.60%
Average18.67%18.77%19.06%
Table 5. Experimental results for regression datasets using different values for parameter a.
Table 5. Experimental results for regression datasets using different values for parameter a.
Dataset a = 0.001 a = 0.005 a = 0.01
Abalone5.105.125.10
Airfoil0.0040.0040.004
Auto9.689.809.89
Baseball86.1977.4681.97
BK0.1530.0430.11
BL0.00020.00020.0003
Concrete0.0060.0060.006
Dee0.160.160.16
Housing18.7018.7319.20
Friedman1.451.441.45
FA0.0190.090.07
FY0.0770.120.076
HO0.010.010.01
Laser0.0030.0030.003
MB5.490.560.48
Mortgage0.140.120.13
NT0.0070.0070.007
PL0.0230.0230.023
Plastic2.292.292.29
PY0.0190.0190.017
Quake0.0360.0360.036
SN0.0240.0250.024
Stock1.531.521.52
Treasury0.510.540.47
Average5.484.925.13
Table 6. Experimental results for the classification datasets using a series of values for parameter F.
Table 6. Experimental results for the classification datasets using a series of values for parameter F.
Dataset F = 1.5 F = 3.0 F = 5.0 F = 10.0
Alcohol25.66%29.16%26.14%31.28%
Appendicitis16.30%14.57%15.50%15.27%
Australian23.53%22.27%20.81%21.00%
Balance15.12%13.32%12.68%12.95%
Cleveland51.81%51.41%50.70%50.82%
Circular4.75%4.15%4.52%4.19%
Dermatology36.69%36.48%36.39%36.13%
Hayes-Roth46.18%35.54%34.18%33.54%
Heart16.68%15.93%15.68%15.33%
HeartAttack27.39%20.38%19.03%18.52%
HouseVotes3.80%3.35%3.85%3.74%
Ionosphere12.92%8.27%7.41%7.39%
Liverdisorder30.48%29.27%28.48%27.92%
Lymography29.89%22.41%21.93%20.64%
Mammographic18.00%17.17%16.96%17.21%
Parkinsons18.25%17.18%15.90%15.35%
Phoneme17.27%15.88%15.90%16.62%
Pima24.54%24.17%24.05%23.59%
Popfailures7.07%5.35%5.01%4.80%
Regions226.07%26.02%25.78%25.54%
Saheart29.75%28.91%29.42%29.64%
Segment35.81%36.84%38.93%40.83%
Sonar24.68%19.25%16.98%18.25%
Spiral13.28%15.25%17.88%22.52%
StatHeart19.98%19.58%19.63%19.52%
Student6.14%6.30%5.92%5.11%
Transfusion25.45%25.23%25.19%24.59%
Wdbc4.94%4.92%4.90%5.00%
Wine10.90%9.37%8.51%7.71%
Z_F_S4.13%3.73%3.67%3.16%
Z_O_N_F_S46.00%45.61%46.57%46.77%
ZO_NF_S3.67%4.19%3.16%3.63%
ZONF_S2.59%2.37%2.06%1.79%
ZOO11.17%8.10%8.00%4.50%
Average20.32%18.88%18.58%18.67%
Table 7. Experimental results for regression datasets using different values for parameter F.
Table 7. Experimental results for regression datasets using different values for parameter F.
Dataset F = 1.5 F = 3.0 F = 5.0 F = 10.0
Abalone6.395.795.575.10
Airfoil0.0040.0040.0040.004
Auto9.839.699.679.68
Baseball84.2783.0184.5786.19
BK0.2750.0480.0710.153
BL0.410.00050.00030.0002
Concrete0.0060.0060.0060.006
Dee0.160.160.160.16
Housing16.7517.8218.0718.70
Friedman6.593.851.671.45
FA0.0540.030.0530.019
FY0.2160.2460.3320.077
HO0.010.010.010.01
Laser0.0220.0110.0050.003
MB0.1160.1350.3075.49
Mortgage0.560.590.360.14
NT0.0070.0070.0070.007
PL0.0240.0230.0230.023
Plastic2.372.332.312.29
PY2.330.0490.0220.019
Quake0.0360.0360.0360.036
SN0.0280.040.0250.024
Stock1.451.511.491.53
Treasury0.500.590.430.51
Average5.525.255.225.48
Table 8. Experimental results using the proposed method and a series of numbers of generations for the classification datasets.
Table 8. Experimental results using the proposed method and a series of numbers of generations for the classification datasets.
Dataset N g = 25 N g = 50 N g = 100 N g = 200
Alcohol38.19%34.98%32.45%31.28%
Appendicitis15.53%15.40%16.00%15.27%
Australian23.92%23.77%22.56%21.00%
Balance19.08%16.71%13.57%12.95%
Cleveland51.08%50.85%51.26%50.82%
Circular4.76%4.94%5.13%4.19%
Dermatology36.12%36.17%35.81%36.13%
Hayes-Roth37.85%35.56%33.56%33.54%
Heart15.58%15.62%15.52%15.33%
HeartAttack20.28%19.14%18.89%18.52%
HouseVotes3.46%3.81%3.78%3.74%
Ionosphere9.16%8.23%7.95%7.39%
Liverdisorder27.77%28.23%28.26%27.92%
Lymography19.62%20.26%20.62%20.64%
Mammographic16.99%16.90%16.92%17.21%
Parkinsons14.58%14.51%13.76%15.35%
Phoneme17.29%17.45%17.21%16.62%
Pima23.83%24.09%23.87%23.59%
Popfailures4.93%4.61%4.70%4.80%
Regions225.65%25.74%25.60%25.54%
Saheart28.50%29.23%29.07%29.64%
Segment41.81%42.77%43.65%40.83%
Sonar21.77%20.45%19.60%18.25%
Spiral31.82%28.09%26.07%22.52%
StatHeart20.05%20.09%19.93%19.52%
Student4.43%4.62%5.21%5.11%
Transfusion24.82%24.72%24.39%24.59%
Wdbc4.97%5.10%5.25%5.00%
Wine7.80%7.37%7.69%7.71%
Z_F_S3.52%3.55%3.59%3.16%
Z_O_N_F_S49.56%48.04%47.77%46.77%
ZO_NF_S4.19%3.97%3.77%3.63%
ZONF_S2.22%1.92%2.00%1.79%
ZOO8.57%8.40%7.60%4.50%
Average19.99%19.57%19.21%18.67%
Table 9. Experimental results using the proposed method and a series of numbers of generations for the regression datasets.
Table 9. Experimental results using the proposed method and a series of numbers of generations for the regression datasets.
Dataset N g = 25 N g = 50 N g = 100 N g = 200
Abalone6.085.745.655.10
Airfoil0.0040.0040.0040.004
Auto10.379.719.609.68
Baseball89.2390.9181.4286.19
BK0.0280.0320.100.153
BL0.0010.00030.00060.0002
Concrete0.0070.0070.0060.006
Dee0.160.160.170.16
Housing17.7218.0018.2318.70
Friedman3.222.411.811.45
FA0.0250.0510.050.019
FY0.0780.290.170.077
HO0.0090.0090.0090.01
Laser0.0040.0030.0030.003
MB0.150.370.275.49
Mortgage0.290.230.480.14
NT0.0060.0060.0070.007
PL0.020.0180.0160.023
Plastic2.332.322.222.29
PY0.020.0180.0160.019
Quake0.0360.0360.0360.036
SN0.0260.0250.0250.024
Stock2.291.961.981.53
Treasury0.820.750.750.51
Average5.545.555.135.48
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tsoulos, I.G.; Charilogis, V.; Tsalikakis, D. Utilizing a Bounding Procedure Based on Simulated Annealing to Effectively Locate the Bounds for the Parameters of Radial Basis Function Networks. Algorithms 2025, 18, 234. https://doi.org/10.3390/a18040234

AMA Style

Tsoulos IG, Charilogis V, Tsalikakis D. Utilizing a Bounding Procedure Based on Simulated Annealing to Effectively Locate the Bounds for the Parameters of Radial Basis Function Networks. Algorithms. 2025; 18(4):234. https://doi.org/10.3390/a18040234

Chicago/Turabian Style

Tsoulos, Ioannis G., Vasileios Charilogis, and Dimitrios Tsalikakis. 2025. "Utilizing a Bounding Procedure Based on Simulated Annealing to Effectively Locate the Bounds for the Parameters of Radial Basis Function Networks" Algorithms 18, no. 4: 234. https://doi.org/10.3390/a18040234

APA Style

Tsoulos, I. G., Charilogis, V., & Tsalikakis, D. (2025). Utilizing a Bounding Procedure Based on Simulated Annealing to Effectively Locate the Bounds for the Parameters of Radial Basis Function Networks. Algorithms, 18(4), 234. https://doi.org/10.3390/a18040234

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop