Classification Improvement with Integration of Radial Basis Function and Multilayer Perceptron Network Architectures

Kovács, László

doi:10.3390/math13091471

Open AccessFeature PaperArticle

Classification Improvement with Integration of Radial Basis Function and Multilayer Perceptron Network Architectures

by

László Kovács

Institute of Informatics, University of Miskolc, H-3515 Miskolc, Hungary

Mathematics 2025, 13(9), 1471; https://doi.org/10.3390/math13091471

Submission received: 13 March 2025 / Revised: 23 April 2025 / Accepted: 25 April 2025 / Published: 30 April 2025

(This article belongs to the Special Issue Exploring Statistical Learning: Inference, Optimization, and Real-World Applications)

Download

Browse Figures

Versions Notes

Abstract

The radial basis function architecture and the multilayer perceptron architecture are very different approaches to neural networks in theory and practice. Considering their classification efficiency, both have different strengths; thus, the integration of these tools is an interesting but understudied problem domain. This paper presents a novel initialization method based on a distance-weighted homogeneity measure to construct a radial basis function network with fast convergence. The proposed radial basis function network is utilized in the development of an integrated RBF-MLP architecture. The proposed neural network model was tested in various classification tasks and the test results show superiority of the proposed architecture. The RBF-MLP model achieved nearly 40 percent better accuracy in the tests than the baseline MLP or RBF neural network architectures.

Keywords:

redial basis function; neural networks; parameter initialization; density-based entropy

MSC:

68T05

1. Introduction

The radial bases function (RBF) network is an important network architecture, introduced in 1988 [1]. In RBF neuron forward processing [2], the positions of the objects are based on the distances from the given centroids. The neurons correspond to spherical areas in the object space, and the associated activation value depends on the distance to the centroid of the radial neurons.

According to the analysis performed in [3,4,5], the main benefits of the RBF network can be summarized as follows:

relatively simple architecture,
low parameter complexity,
fast learning process,
strong tolerance to input noise,
good generalization ability.

The conclusion of the comparison of the RBF network and traditional MLP networks ([4]) shows that:

RBF networks are especially recommended for surface with regular peaks and valleys;
For classification problems, traditional neural networks can usually provide better classification results;
RBF networks perform more robustly and tolerantly than traditional neural networks when dealing with noisy input data.

In traditional RBF networks, the activation at node i is calculated with a Gaussian function:

a_{i} (x) = e^{- α_{i} | | x - c_{i} | |},

where

c_{i}

denotes the centroid of the neuron. The parameter

α_{i}

corresponds to an inverse radius value that determines the slope of the activation curve.

The RBF network has a three-layered architecture, including:

an input layer, which represents the feature vector of the input items;
a middle layer of radial neurons;
an output layer, where each neuron corresponds to an output category.

The layers in the network are fully connected. The baseline forward process is performed in two phases. First, the feature vector of the objects is transformed into a vector of activation values of the radial neurons, then this vector is sent to a standard perceptron layer. The output signal in the output layer is computed based on a linear weighted sum of the outputs of the radial neurons. An architecture comparison of the RBF network and the baseline MLP network is presented in Figure 1.

Due to the benefits of RBF networks, we can find many publications in the literature on the practical application of RBF networks. In [5], the network is applied for dynamic system design, where the authors presented novel learning methods using error correction with the second-order gradient approach. The robust decision surfaces and the ability to estimate the distance a test case is from the original training data were the main reasons to apply RBF for fault diagnosis (see [6]). In [7], a rain prediction framework was developed via integration of evolutionary optimization methods with RBF backpropagation methods. Another special application field is presented in [8], where a novel RBF-based neural network method was applied to solve elliptic boundary value problems. There are many other application areas in the literature, such as time series prediction [9], earth science [10], human–computer interfaces [11], or engineering [12].

In the literature, the efficient construction of the RBF neural network architecture is an actively investigated research domain. Considering the key research projects on RBF, we can identify the following key research directions:

form of the kernel function,
initial positions of the centroids,
learning algorithm.

One of the recent key results [13] shows that large RBF networks with suitable optimizers, regularization techniques, can achieve predictive performance comparable to gradient-boosted decision trees.

Although in current research activities on neural network technology, the MLP and convolutional neural models are the dominating methods, the number of research works on RBF network models has still been increasing in recent years [14]. This fact shows that, thanks to the recognized benefits, the RBF model plays a stable and important role in the family of neural network models. The current RBF applications are based on the overriding principles of k-means variant clustering and two-phase parameter optimization. Another interesting fact, that there is a large separation between the RBF and other traditional neural network models, is that the RBF is considered as a model with a fixed, relatively rigid architecture. It is widely assumed that the RBF has a weaker classification accuracy but better stability.

Considering the initial positions of the centroids, there are the following main initialization approaches [15] in the literature:

uniform or random distribution,
unsupervised clustering methods,
supervised clustering (usually the decision tree method),
evolutionary optimization algorithms (here, the GA is the dominating approach).

Regarding the radius parameter of the radial neurons, the usual solution is to use a single

α

value for all neurons. In the more sophisticated variants, the following solutions are used:

separate $α$ values for the different neurons,
using hyper-basis functions.

Hyper-basis functions are defined with the following formula [16]:

a_{i} (x) = e^{- α_{i} \cdot {(x - c_{i})}^{T} \cdot R_{i} \cdot (x - c_{i})},

where

R_{i}

is a positive definite matrix that determines the contour of the related border hyper-ellipsoid.

Considering the initialization of the

α

value, the most general solution is to set the value to the mean of distances to the nearest centroid [15].

Regarding the weight parameters of the links that connect the radial and output layers, the dominant solution is the application of the standard backpropagation training algorithm [17]. A more special algorithm was presented in [15], where the calculation of the weights between the middle and output layers was performed with a least mean square error procedure using Bayesian decision optimization, which assumes that the output of the middle layer forms a normal distribution. Another interesting approach is presented in [3], and a new kernel function is proposed, which is a composite of a set of sigmoidal functions that aims at approximating a given function with nearly constant values.

The primary motivation of our research was to analyze the integration of RBF and MLP neural network models, with the aim of developing an efficient hybrid model that takes advantage of the strengths of both approaches. The main contributions of this work are:

The introduction of a new measure for the evaluation of centroid positions. The measure value is based on the homogeneity and density of the neighborhood region.
The development of a novel initialization method for RBF networks.
The development of a novel network architecture combining RBF and MLP modules.
The implementation and presentation of test cases that show the superiority of the proposed architecture.

2. Rbf Centroid Initialization Based on Distance-Weighted Homogeneity

2.1. RBF Centroid Initialization Methods

The simplest approach is the random initialization approach, usually using a uniform distribution. The main drawback of this approach is that the real category distribution in the object space is not uniform. According to the test results in [18], random initialization of the centers could lead to misclassification and biased approximation, providing weaker precision than the other approaches.

In the case of unsupervised clustering, the k-means method is the dominant variant [19]. Although random positions are generated in the first phase, in later steps, the positions are improved by the optimization algorithm to minimize the intra-cluster distances:

D = \sum_{i}^{N} {(d_{E} (x_{i}, c_{x}))}^{2},

where

x_{i}

is an object vector in the feature space and

c_{x}

is the centroid to which

x_{i}

belongs. The symbol

d_{E}

denotes the Euclidean distance. In the baseline versions, the k-means clustering focuses on finding dense areas in the feature space where the category assignments of the objects have no role in the positioning of the centroids. In this sense, clustering is an unsupervised optimization approach. In the literature, we can later find approaches in which the clustering is performed in separate steps for each category [15]. This means that centroids represent homogeneous clusters in the object space. Another extension is to apply some other heuristics in the clustering process, such as the immunity-based approach (immunological center selection, [18]). This method uses an affinity measure that is inversely proportional to the distance and generates centers with high affinity values using center cloning and pruning operations.

The execution complexity of the k-means clustering method can be approximated as follows:

O (N \cdot K \cdot D \cdot I),

where N is the number of objects; D is the dimensionality of the object space; K is the number of clusters; and I is the iteration count [20]. The value of I is hard to predict, as it is significantly dependent on the distribution of the data and the initial position of the centroids. One of the key novelties of the k-means++ method is that it applies an optimized centroid initialization method. Although k-means clustering is a widely accepted method, there are some shortcomings that can degrade its performance. One issue is that it is suitable only for clusters with homogeneous density values; its assumption of spherical and equally sized clusters results in insensitivity to the category distribution of the objects.

In the case of the k-means method, the objects are assigned to the nearest centroid, independently of the distances to the other centroids. If the difference between the shortest distance and the second-shortest distance is small, this crisp decision will eliminate significant information. In the proposed fuzzy version of the k-means, the c-means method, the nodes belong to more different clusters at the same time. As the management of these relationships requires more calculations, the complexity of c-means is higher than the complexity of the k-means method:

O (N \cdot K^{2} \cdot D \cdot I) .

In general, the c-means method has similar problems as the k-means and it is also sensitive to local optimums.

Tree structures can also be used for clustering. The R-tree is one of the most widely used methods for this task. It constructs nested rectangle areas as clusters. The main cost factor in building the R-tree [21] is the splitting of a cluster into two sub-clusters. The cost complexity can be approximated as follows:

O (N \cdot l o g (K) \cdot D) .

In some versions of tree structures, category homogeneity is also involved in the node split operation. Tree construction is a relatively fast method, but the generated partitioning is of lower quality.

In the family of decision trees, the most popular choice is the application of the C4.5 method [22]. In the tree, every leaf node is associated with a rectangle area whose center is selected as the centroid. In the case of optimal decision trees, the set of objects assigned to the nodes has optimum homogeneity. The homogeneity is usually measured with some variants of entropy on the category labels. Thus, the decision tree approach strives to position the centroids in homogeneous clusters.

In some versions of the tree structures, the category homogeneity is also involved in the node split operation. Tree construction is a relatively fast method, but the generated partitioning is of lower quality.

In [23], the optimal positions of the centroids are calculated with a clustering algorithm, APC-III. This method is a variant of quality threshold clustering, as it assigns the items to the nearest cluster if the distance is below a threshold; otherwise, the item will be considered as center of a new cluster. In order to avoid heterogeneous clusters, the clustering is performed separately for each category.

As shown in the analyses performed [24], supervised clustering provides many advantages over standard unsupervised methods. One simple approach to consider the category is to extend the object feature vectors with a new dimension describing the category label [25]. One disadvantage of this method is the difficulty in normalizing the values of two vector sections. Another method is the application of output context clustering [26]. The algorithm first defines the output contexts and then clusters the inputs with their respective output contexts. This method guarantees that all elements of a given cluster belong to the same category.

This method was extended by [24], where a module was introduced to determine the optimal number of clusters for a regression problem. The goal of the extension is to separate similar input objects belonging to the same category from the rest of the dataset. The method first generates disjoint intervals of the value set. The objects are then assigned to the corresponding value interval. The sets obtained are homogeneous in terms of output value (category). For each category object set, fuzzy c-means clustering is invoked to find the optimal initial positions of the related centroids. As objects of the same category are usually distributed in many local clusters, the main task of the proposed module is to find an appropriate set of sub-clusters for every category. The optimization module applies a novel separability factor to find the appropriate subcluster number.

Regarding evolutionary methods, we can see a wide variety of heuristic methods, such as the genetic algorithm [27] or swarm optimization methods [28]. Some of the main drawbacks of this approach are the following:

high execution cost,
low stability.

The analyses performed found in the literature show that the appropriate initial location of the centroids is a key factor in the efficiency of the redial basis neurons. A important step in the development of an efficient RBF unit is to find an optimal position in the object space.

The main problem of all traditional initialization methods is that they are insensitive to the category distribution. Considering, for example, a uniform or a baseline k-means initialization, the methods do not take into account the category labels. If the centroid is located in an inhomogeneous zone, the generated activation signal has a low category separation power, and it cannot significantly improve the classification accuracy of the neural network. On the other hand, if the spherical area of the redial function neuron includes a homogeneous zone, the unit has high activation signal output only for the related single category; thus, this unit has a high separation power.

As existing methods lack the ability to locate the optimum position for a wide range of problem domains in an efficient way, the present work focuses on the presentation of a novel approach for centroid initialization to find homogeneous dense zones in the object space.

2.2. Initialization of the Centroids with the DH Measure

In our investigation, we focus only on the classification problem, where objects are assigned to discrete categories. First, we introduce a homogeneity measure to evaluate the different positions in the feature space. The proposed d-entropy (or density-based entropy) is used to measure the homogeneity related to a given position in the feature space T.

D_entropy. Having a set of labeled objects

T {(x_{i}, c_{i})}

, where

x_{i} \in R^{D}

is the position (feature vector) of the object and

c_{i} \in C

is a category label, the d_entropy at position

x \in R^{D}

is calculated as follows:

d_e n t r o p y (x) = - \sum_{i \in 1 . . L} w_{i}^{x} \cdot l o g (w_{i}^{x})

where i denotes a category index and

L = | C |

. The value of

w_{i}^{x}

is given as follows:

w_{i}^{x} = \frac{f_{i}^{x}}{\sum_{j} f_{j}^{x}}

f_{i}^{x} = \sum_{u \in T_{i}} e x p (- γ \cdot d (x, u))

T_{c} \subseteq T

denotes the objects belonging to category c, and

d (x, y)

is the selected distance (usually Euclidean distance) function in the feature space.

For the demonstration of the density entropy, let us take a one-dimensional object space with the following data distribution:

category A: [0.1, 0.15, 0.18, 0.20, 0.35, 0.43]
category B: [0.3, 0.4, 0.5, 0.6, 0.65, 0.8]

The proposed measure D_entropy(x) shows the homogeneity of the category in the neighborhood of x. The shape of the calculated homogeneity measure for various gamma values is illustrated in the following figures (Figure 2 and Figure 3). In the figures, the two bottom lines of the circles show the positions of the data items. As we can see, the homogeneity value is significantly dependent on the gamma factor. If gamma is near zero, the neighborhood is very large, and each position has the same homogeneity value. The higher the gamma value, the smaller the neighborhood area. The proposed RBF initialization method will focus on locating homogeneous areas, where the (low D_entropy) measure is minimal, to position the RBF centers. The main idea behind this step is that the accuracy of the RPB prediction is better in homogeneous areas than in inhomogeneous areas.

The introduced D_entropy measure has a low value if the neighborhood of the argument position is homogeneous regarding the category labels. The base features of the D_entropy measure can be summarized on the basis of the following properties.

Proposition 1.

If the object set is homogeneous, only one category is present, and the value of D_entropy is equal to zero for every position

x \in R^{D}

.

This property is based on the fact that

L = 1

and

w_{1}^{x}

= 1.0.

Proposition 2.

For every object space and position:

\sum_{i} w_{i}^{x} = 1

This statement follows directly from the definition.

Proposition 3.

In a finite bounded object space, if

γ \to 0

then the D_entropy approaches the Shannon entropy of the object set.

In this case, we obtain

e x p (- γ \cdot d (x, u)) \to 1

and thus

f_{i}^{x} \to | T_{i} |

and

w_{i}^{x} \to p_{i}

Using the limit value of

w_{i}^{x}

, we obtain the Shannon entropy value.

Proposition 4.

If for every

u \in T

object in the object set

d (x, u) = d_{0}

then the D_entropy(x) value is equal to the Shannon entropy of the object set.

Similarly to the previous proposition, we obtain

f_{i}^{x} \to β | T_{i} |

and

w_{i}^{x} \to p_{i}

Thus, we obtain the Shannon entropy as the result.

As we can see, the proposed measure has some formal similarity with the Shannon entropy concept, as

both have similar formulas,
the distribution of $w_{i}^{x}$ is similar to the probability distributions,
both can be used to find homogeneous sets.

The factor

γ

determines the radius of the sphere of the effective zones. If

γ

is too small, there is a large effective zone and all positions in the space tend to be of similar importance. The other extreme is when

γ

is too large; in this case, the effective zone is restricted to a single point.

The D_entropy measure can be used to show the homogeneity level of the different locations in the object space, but this measure alone is not powerful enough to reach the main goal, namely to locate dense and homogeneous areas. Thus, in the next step, we add a density component to the measure to improve efficiency.

DH-measure: For a set of labeled objects

T {(x_{i}, c_{i})}

, where

x_{i} \in R^{D}

, the DH measure at position

x \in R^{D}

, (

μ_{D H} (x)

) is defined as

μ_{D H} = \frac{1 + d_e n t r o p y {(x)}^{β}}{D (x)}

where

D (x) = \sum_{j} f_{j}^{x}, f_{i}^{x} = \sum_{u \in T_{i}} e x p (- γ \cdot d (x, u)) .

and

β

is a weighting factor.

The expression

D (x)

can be considered as a measure of density homogeneity in the feature space. In the optimal position, the value

μ_{D H}

is at minimum, i.e., the density measure is high and the D_entropy part is low.

Example 1.

In the next figures (Figure 4 and Figure 5), a simple two-dimensional feature space is tested to illustrate the shape of the D_entropy measure for different γ factors. The objects in the dataset are organized into blobs. There are only two categories; the related objects are denoted by + or o markers. The figures show the related heat maps for different γ values. We can observe that the larger γ values represent steeper slopes for the D_entropy function. In the figures, the light colors show high density values, while dark colors denote small density areas.

Example 2.

In the next examples (Figure 6 and Figure 7), the heat map of the D_entropy measure is illustrated for the same object space. We can also observe here the effect of the γ value on the shape of the heat map.

Example 3.

In the third example (Figure 8 and Figure 9), the heat map of the DH measure is illustrated for the same object space. In the figures, the white circle shows the position of the optimum value.

The measures introduced will be used to find the optimal initial centroid positions of the neurons of the radial basis function. As full scanning of the whole object space is usually impractical, we present an approximation method in the next section.

2.3. Initialization of the Radial Function Neurons

In order to avoid the full scan of the entire object space, the proposed method applies a widely used simplification approach, namely, the search is not performed on the whole universe

R^{D}

, only

{x \in T}

, the set of object positions tested. We assume that the number of centroids required is fixed and is denoted by K. If we were to take the first K objects having the best DH-measure values, we would usually obtain positions near each other, which would degrade the efficiency of the RBF layer. In order to avoid this risk, an additional parameter

ϵ

is introduced which corresponds to a threshold value: the distance between two centroids must be greater than

ϵ

. The selection of the centroid objects is carried out using the following algorithm (Figure 10).

Calculate the DH measurements for all elements of the set of objects and introduce O as the set of candidate objects. Initially, O is equal to the set of objects in T.
Select the object with the minimal DH-measure value as the first centroid.

$s_{1} = {argmin}_{x \in O} {μ_{D H} (x)}$

$S = {s_{1}}$
If the number of centroids in S is equal to K or o is empty, then the algorithm terminates.
Remove the $ϵ$ -neighborhood of the centroid recently selected from O ( $s_{i}$ ).

$O = O ∖ {x \in O | d (x, s_{i}) < ϵ}$
Select the object with the minimal DH-measure value as the next centroid.

$s_{i} = {argmin}_{x \in O} {μ_{D H} (x)}$

$S = S \cup {s_{i}}$
Go to Step 3

Based on the literature, the standard step after initialization of the centroid positions is the application of the iterative (usually back-propagation) algorithm to optimize the weight values between the RBF layer and the output layer. As the process starts with a random weight distribution in the perceptron layer, this process takes more time to find the appropriate weight setting.

2.4. Weight Initialization of the Perceptron Layer

In our opinion, the process of weight learning can be sped up with a method based on the following considerations. If we find the optimal centroid position, then its neighborhood will contain homogeneous objects that belong dominantly to a given category c. Thus, for every centroid, we can determine a dominating category, or more precisely, we can calculate a weight value for all categories based on the values

w_{i}^{x}

used in the calculation of the D_entropy measure. In the proposed method, we use these values

w_{i}^{x}

as initial weight values for the edges between the RBF and the output layer. With this kind of initial setting, the RBF neuron will send a strong signal to the category neurons that dominate its neighborhood. Thus, in other words, if the test position x is in the neighborhood of the centroid s and the relative frequency values of the categories

(c_{1}, \dots, c_{m})

are equal to

(w_{1}, \dots, w_{2})

, then the importance of the centroid output in the different output neurons is proportional to

(w_{1}, \dots, w_{2})

; thus, the dominant categories in the RBF centroid will obtain the highest input signal.

The initialization of the edge weights is performed in the following steps.

Insert the centroid positions of the RBF layer into $S C$
Calculate the $w_{i}$ values for each centroid $s \in S C$ :

$w_{i}^{s} = \frac{f_{i}^{s}}{\sum_{j} f_{j}^{s}}$

$f_{i}^{s} = \sum_{u \in T_{i}} e x p (- γ \cdot d (s, u))$
Set the edge weight values equal to the $w_{i}^{s}$ values.

In the proposed novel two-phase initialization model, the first phase is to find the optimal centroid positions using the HD-measure as the objective function. In the second phase, the weights of the outgoing edges are set equal to the relative frequency vector values calculated in the first phase optimization process. The main promise of this method is that it can ensure better initial accuracy of RBF network classification.

Proposition 5.

The probability of correct prediction is higher for the proposed bounded two-phase initialization method than in the case of random weight initialization.

To show the correctness of this proposition, let us take an RBF layer with centroids

Z = (z_{1}, z_{2}, \dots, z_{K})

. The input target object is denoted by x, and we assume that x belongs to the neighborhood of

z_{1} \in Z

. The set of categories is given by

C = (c_{1}, \dots, c_{L})

. We assume that the probability order at z is

P (c_{1}) > P (c_{2}) > \dots P (c_{L})

. The activation signal of neuron

z_{i}

is denoted by

s_{i}

,

S = < s_{1}, s_{2}, \dots, s_{K} >

. The related relative frequencies at centroid

z_{i}

are

w_{c_{1}}^{i}, w_{c_{2}}^{i}, \dots, w_{c_{L}}^{i}

. The weights for the outgoing edges are denoted by

v_{c_{1}}^{i}, v_{c_{2}}^{i}, \dots, v_{c_{L}}^{i}

. If the

v_{j}^{i}

values are generated uniformly randomly, the

V^{1} < v_{1}^{1}, \dots, v_{1}^{K} >

,

V^{2} < v_{2}^{1}, \dots, v_{2}^{K} >

,…,

V^{L} < v_{L}^{1}, \dots, v_{L}^{K} >

vectors are independent and belong to the same value domain, having the same value distribution.

The output of the category j in the output layer is calculated with:

o_{j} = \sum_{i} s_{i} \cdot v_{j}^{i} = S \cdot V^{j} .

This means that the expected value of the output values

o_{j}

for the different categories has the same value. This means that each category has the same chance to be the winner at the output layer. In this case, each category has the same

1 / L

probability of being the prediction value.

In the case of the DH-measure approach, the weights are equal to the density factors:

V^{i} = W^{i} = < w_{i}^{1}, \dots, w_{i}^{K} > .

Taking into account the output values, we can use the following formula:

o_{j} = s_{1} * w_{j}^{1} + \sum_{i = 2}^{K} s_{i} \cdot w_{j}^{i} = s_{1} \cdot w_{j}^{1} + S^{'} \cdot V^{' j}

As the

V^{' j}

vectors are independent and correspond to the same value distribution, the average value for

S^{'} \cdot V^{' j}

is the same for each category. On the other hand, as

s_{1} > s_{2} > \dots > s_{K}

and

w_{1}^{1} > w_{2}^{1} > \dots > w_{K}^{1}

we have

s_{1} \cdot w_{1}^{1} > s_{1} \cdot w_{2}^{1} > \dots > s_{1} \cdot w_{L}^{1}

This means that for an average case, the prediction for x is the category

c_{1}

. As this category has the highest chance in the region of

z_{1}

, the expected accuracy value is higher than for the random case, where each category has the same probability of being selected as the winner.

2.5. Parameter Optimization

The efficiency of the position initialization process and the classification prediction significantly depends on the two key parameters of the proposed algorithm, namely the gamma factor (

γ

) and the cluster count (K). In order to provide an optimal parameter value selection, we performed a combined hill climbing optimization process. In the first step, we determine the reasonable value range using the following heuristic considerations:

If gamma is too small, near 0, all positions have very similar fitness values.
If gamma is too large, the neighborhood is restricted to only a few other points.
If K is too small, we obtain a weak approximation.
if K is too large, the execution cost and overfitting increases.

According to our experience, the optimal parameter values depend on the data distribution; thus, we involved different data distributions in the optimization tests. For the gamma parameter, we selected two starting points in the optimization process:

g_{1} = 1, g_{2} = 100

. In both cases, we obtained nearly the same results with

g_{o p t} = 10

. We can remark that this gamma value also provided the fastest convergence in the training process of the constructed neural network. Regarding the K parameter, the minimal value was 10, and the largest value was 100. Here, as expected, the

K = 100

setting provided the best convergence in the training phase, but considering the overall cost values, we found that K = 50 was the optimal cluster count for the investigated datasets.

2.6. Test Results

The main goal of the next experiments was to compare the classification accuracy of the proposed DH-based RBF models with the standard RBF initialization variants. In the tests, the following RBD variants were involved:

RBF-KM: baseline RBF structure with k-means initialization,
RBF-DT: RBF structure with decision tree initialization,
RBF-FCM: RBF structure with fuzzy c-means initialization,
RBF-DH: the proposed RBF model.

Regarding the test datasets, the following benchmark databases were involved in the experimental tests:

Iris dataset (M:4, C:3,N: 150 source: https://www.kaggle.com/datasets/arshid/iris-flower-dataset, accessed on 12 October 2024);
Mushroom dataset (M:22, C:7, N: 8124, source: https://archive.ics.uci.edu/ml/datasets/mushroom, accessed on 12 October 2024);
Abalone dataset (M:8, C:28, N:4177, source: https://archive.ics.uci.edu/ml/datasets/Abalone, accessed on 14 October 2024);
Blobs dataset (synthetic dataset).

In the list, the symbol M denotes the number of attributes, C denotes the number of categories, and N shows the number of records in the dataset. In the case of the Uniform dataset, a uniform random distribution in a hypercube was used to generate the objects. The objects in the Blobs dataset are distributed in clusters using the make_blobs routine in the sklearn Python package.

The test framework was implemented in the Python infrastructure. The main goal of the performed tests was to compare the accuracy level of the existing and novel approaches on some widely used benchmark datasets.

The test results are summarized in Figure 11, Figure 12, Figure 13 and Figure 14. As we can see from the result data, the proposed initialization method provides a fast learning rate even after the first few epochs. For datasets with a clear cluster structure, the proposed method provides faster and more efficient learning. The benefits of this approach are very significant, especially for well-clustered object distributions such as the case for the Blob dataset.

3. Integration of RBF and MLP Architectures

MLP neural network architectures are two alternatives that have different benefits and drawbacks. The RBF architecture is strong, with fast convergence, prediction stability, and easy explainability. On the other hand, MLP is better in universal global optimization, managing more complex problems with higher accuracy. Based on these considerations, we performed an investigation of the integration of these two types of layers into an integrated neural network architecture.

For the analysis, we introduced and tested the following novel architectures.

3.1. RBF-MLP Version 1

In this architecture, the input of the MLP hidden layer includes both the standard input layer and the output of the embedded RBF network section (Figure 15). The signals of the RBF output and of the input layer are first concatenated into a larger signal vector. The MLP hidden layer processes this merged signal and forwards the generated activation values to the output layer with the sigmoid activation function. The main motivation of this approach is that the RBF outputs can be used in deeper layers to improve the approximation accuracy. One key problem in the application of the RBF output vector as input for additional layers is that the RBF network significantly shrinks the volume of the object space. For example, the vector

(0, 0, \dots, 0)

is not part of the output space, as it is impossible to be at 0 distance to many different positions at the same time.

3.2. RBF-MLP Version 2

In the second tested version (Figure 16), the RBF and MLP modules run in parallel. Both will generate its prediction, and these outputs are aggregated in the last output layer. In the investigated version, we applied the max aggregator to calculate the output prediction values. A key problem of this approach is that the two modules use different methods to calculate the output; thus, the same output value may represent different importance in the two components.

3.3. RBF-MLP Version 3

In the third version investigated (Figure 17), a weighting parameter is involved in order to scale the two output vectors. A custom layer is created, which multiplies the incoming values with a weight scalar value. In this approach, the applied weight factor is equal to the current validation accuracy of the two modules (RBF and MLP modules). This means that training of RBF-MLP Version 3 is performed in the following steps.

Initialization of the RBF and MLP components;
Repetition of a fixed length training epochs. In each repetition phase:
- train both RBF and MLP modules,
- evaluate both on the same validation dataset,
- set the multiplication factor to the measured accuracy of the corresponding neural network module,
- train the whole network.

The main motivation of this architecture is to ensure that the module with better results in the previous training phase will obtain higher priority in the voting system.

4. Test Experiments on the Integrated Architecture

The main goal of the experimental tests is to compare the classification accuracy of the proposed integrated models with the standard variants, RBF and MLP. In this section, we use the following abbreviations to denote the different network architectures.

Existing architectures:

MLP: baseline MLP structure with a single hidden layer,
RBF-KM: baseline RBF structure with k-means initialization.

Investigated novel architectures:

RBF-DH: the proposed RPF model;
RBF-MLP-V1: integration of proposed RBF with MLP version 1;
RBF-MLP-V2: integration of proposed RBF with MLP version 2;
RBF-MLP-V3: integration of proposed RBF with MLP version 3;
RBF-MLP-V4: integration of proposed RBF with MLP version 4.

The RBF-MLP-V4 version contains two separate networks, an RBF and a MLP module. For a given object x, the engine calls the RBF neural network if the object x is near a centroid (

θ

—distance threshold); otherwise, the MLP unit is used for prediction. The measured values are presented in Figure 18, Figure 19, Figure 20, Figure 21, Figure 22, Figure 23, Figure 24, Figure 25 and Figure 26. The accuracy values presented in the results are the calculated average values based on 20 measurements. Our focus is on investigating the effect of the proposed initialization method, especially at the beginning of the training process. Therefore, in our experiments, the test accuracy was measured as a function of the preceding epoch steps. Thus, we selected five epoch lengths (2, 6, 20, 60, 200), performed training with these epoch settings, and called a test evaluation after the training phases. For the tests, we used 20% of the available datasets, and 80% was used for training and validation.

In Figure 18, the dataset is the synthetic Blob object set in 10-dimensional space. The test was executed with

γ = 10

. Each dataset is represented with one figure related to the common epoch range. The title of the figures shows the dataset name, the number of dimensions, and the

γ

values. These figures relate only to a part of the performed tests; the total number of perfomed tests was 122. In these tests, we analyzed, in addition to the epoch dependency, among others, the

γ

-dependency and the K-dependency. The full list of the test results can be found in URL_meresek.

In order to summarize the dataset-level results, a relative aggregated accuracy value is introduced. This value is calculated in the following way:

For each test that relates to a given parameter setting and covers all architecture models, relative accuracy values are calculated. This value is equal to the ratio between the absolute accuracy and the absolute accuracy of the MLP method.
Sum the relative accuracy values over all tests.

The resulting relative accuracy values are presented in Table 1 and Table 2. The first table shows the aggregation over all tests, while the second table relates only to the tests with small epoch numbers (2, 6, 20). In the result tables, the MLP method is used as baseline for the relative accuracy values. As the presented tables illustrate, the RBF-MLP-V3 architecture dominates in both test settings. The relative accuracy improvement is nearly 40% in all test settings.

The results of the tests can be summarized as follows.

The proposed initialization method is superior to the standard RBF neural network initialization methods (random, k-means, c-means, decision trees, etc.), especially at low epoch numbers.
The RBF neural network using the proposed initialization heuristic on the DH-measure method coupled with edge weight adjustment provided one of the best accuracy values.
The integration of the RBF module with the MLP modules provided the best accuracy result. The main motivation for this architecture is that RBF can provide fast localization of dense homogeneous clusters, and the learning process on RBF and MLP can further improve the accuracy level of the integrated system.
The selection of the right value for the $γ$ parameter is a crucial step in the initialization of the network. It can be seen that the appropriate value $γ$ also depends on the size of the actual object space. In our tests, where the positions of the objects were normalized into a unit cube, the best results were related to the setting $γ = 10$ (see Figure 27).
Regarding the k-dependency, the performed tests show (see Figure 28) that the number of RBF neurons significantly influences the accuracy value. In the Figure on the Maternal dataset, the legend Y_maternal_n is for the proposed RBF network with n RBF neurons. In the tested case, the increase in size to $K = 100$ did not significantly improve the results compared to a lower size of $K = 50$ . The $Y_K M$ curve denotes the test of the RBF with a baseline architecture of k-means.

5. Conclusions

Although RBF neural networks provide good performance in terms of their fast learning process and noise tolerance, they are not widely used to solve larger problem domains. MLP networks are effective at solving complex problems but typically require longer training times. The main goal of this work was to investigate the integration of these two models to combine the benefits of these two approaches. The main contributions of this paper are as follows.

Introduction of a novel RBF initialization method to increase the learning rate in the first few epochs. The proposed method uses a novel distance-weighted homogeneity measure as an objective function for centroid positioning.
Application of a novel two-phase initialization approach in which the category density in the region of the actual centroid is converted into a vector of weight values of the outgoing edges.
Integration of the RBF and MLP layers within a unified network model, where the role of the RBF layer is to speed up convergence during the training process.

The results of the tests performed show the clear benefits of the proposed architecture as follows:

The RBF neural networks with the proposed initialization method are superior to the other standard methods from the viewpoint of fast training of the neural network.
Fast training can mean less energy consumption, which is a key factor in green computing.
The integration of the RBF and MLP architectures can provide an optimal architecture that covers a wider spectrum of application domains.
The RBF module can improve the explainability of the container neural network.

The proposed model is mainly beneficial if the dataset contains dense and homogeneous local clusters. The engine detects these clusters and positions the centroids to these clusters.

One of the open problems we did not touch on in the investigations is the selection of the optimal parameters for K and

γ

. Both influence the overall accuracy of the host system, but the exact effect remains uncovered. This aspect requires further investigation.

Funding

This research received no external funding.

Data Availability Statement

Three open benchmark datasets were used in the tests: - Iris dataset (source: https://www.kaggle.com/datasets/arshid/iris-flower-dataset accessed on 12 March 2025); - Mushroom dataset (source: https://archive.ics.uci.edu/ml/datasets/mushroom accessed on 12 March 2025); - Abalone dataset (source: https://archive.ics.uci.edu/ml/datasets/Abalone accessed on 12 March 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Broomhead, D.S.; Lowe, D. Multivariable functional interpolation and adaptive networks. Complex Syst. 1988, 2, 321–355. [Google Scholar]
Moody, J.; Darken, C.J. Fast learning in networks of locally-tuned processing units. Neural Comput. 1989, 1, 281–294. [Google Scholar] [CrossRef]
Lee, C.-C.; Chung, P.-C.; Tsai, J.-R.; Chang, C.-I. Robust radial basis function neural networks. Cybernetics 1999, 29, 674–685. [Google Scholar]
Xie, T.; Yu, H.; Wilamowski, B. Comparison between traditional neural networks and radial basis function networks. In Proceedings of the 2011 IEEE International Symposium on Industrial Electronics, Gdansk, Poland, 27–30 June 2011; pp. 1194–1199. [Google Scholar]
Yu, H.; Xie, T.; Paszczynski, S.; Wilamowski, B.M. Advantages of radial basis function networks for dynamic system design. IEEE Trans. Ind. Electron. 2011, 58, 5438–5450. [Google Scholar] [CrossRef]
Leonard, J.A.; Kramer, M.A. Radial basis function networks for classifying process faults. IEEE Control Syst. Mag. 1991, 11, 31.38. [Google Scholar]
Wu, J.; Long, J.; Liu, M. Evolving RBF neural networks for rainfall prediction using hybrid particle swarm optimization and genetic algorithm. Neurocomputing 2015, 148, 136–142. [Google Scholar] [CrossRef]
Liu, C.Y.; Ku, C.Y. A novel ANN-based radial basis function collocation method for solving elliptic boundary value problems. Mathematics 2023, 11, 3935. [Google Scholar] [CrossRef]
Chng, E.S.; Chen, S.; Mulgrew, B. Gradient radial basis function networks for nonlinear and nonstationary time series prediction. IEEE Trans. Neural Netw. 1996, 7, 190–194. [Google Scholar] [CrossRef]
Fidencio, P.H.; de Andrade, J.C. Determination of organic matter in soils using radial basis function networks and near infrared spectroscopy. Anal. Chim. Acta 2002, 453, 125–134. [Google Scholar] [CrossRef]
Oglesby, J.; Mason, J.S. Radial basis function networks for speaker recognition. In Proceedings of the ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing, Toronto, ON, Canada, 14–17 April 1991; pp. 393–396. [Google Scholar]
Papadimitrakis, M.; Alexandridis, A. Active vehicle suspension control using road preview model predictive control and radial basis function networks. Appl. Soft Comput. 2022, 120, 108646. [Google Scholar] [CrossRef]
Määttä, J.; Bazaliy, V.; Kimari, J.; Djurabekova, F.; Nordlund, K.; Roos, T. Gradient-based training and pruning of radial basis function networks with an application in materials physics. Neural Netw. 2021, 133, 123–131. [Google Scholar] [CrossRef]
Ku, C.-Y.; Liu, C.-Y.; Chiu, Y.-J.; Chen, W.-D. Deep Neural Networks with Spacetime RBF for Solving Forward and Inverse Problems in the Diffusion Process. Mathematics 2024, 12, 1407. [Google Scholar] [CrossRef]
Schwenker, F.; Kestler, H.A.; Palm, G. Three learning phases for radial-basis-function networks. Neural Netw. 2001, 14, 439–458. [Google Scholar] [CrossRef]
Girosi, F.; Jones, M.; Poggio, T. Regularization theory and neural networks architectures. Neural Comput. 1995, 7, 219–269. [Google Scholar] [CrossRef]
Musavi, M.T.; Ahmed, W.; Chan, K.H.; Faris, K.B.; Hummels, D.M. On the training of radial basis function classifiers. Neural Netw. 1992, 5, 595–603. [Google Scholar] [CrossRef]
De Castro, L.N.; Von Zuben, F.J. Automatic determination of radial basis functions: An immunity-based approach. Int. J. Neural Syst. 2001, 11, 523–536. [Google Scholar] [CrossRef]
Hartigan, J.A.; Wong, M.A. Algorithm AS 136: A k-means clustering algorithm. J. R. Stat. Soc. Ser. C Appl. Stat. 1979, 28, 100–108. [Google Scholar] [CrossRef]
Thakur, P.S.; Verma, R.K.; Tiwari, R. Analysis of time complexity of K-means and fuzzy C-means clustering algorithm. Eng. Math. Lett. 2024, 2024, 4. [Google Scholar]
Arge, L.; Berg, M.D.; Haverkort, H.; Yi, K. The priority R-tree: A practically efficient and worst-case optimal R-tree. ACM Trans. Algorithms 2008, 4, 1–30. [Google Scholar] [CrossRef]
Chauhan, H.; Chauhan, A. Implementation of decision tree algorithm c4. 5. Int. J. Sci. Res. Publ. 2013, 3, 1–3. [Google Scholar]
Hwang, Y.S.; Bang, S.Y. An efficient method to construct a radial basis function neural network classifier. Neural Netw. 1997, 10, 1495–1503. [Google Scholar] [CrossRef] [PubMed]
Wang, D.; Zeng, X.-J.; Keane, J.A. A clustering algorithm for radial basis function neural network initialization. Neurocomputing 2012, 77, 144–155. [Google Scholar] [CrossRef]
Kim, E.; Park, M.; Ji, S.; Park, M. A new approach to fuzzy modelling. IEEE Trans. Fuzzy Syst. 1997, 5, 328–337. [Google Scholar]
Pedrycz, W. Conditional fuzzy clustering in the design of radial basis function neural network. Neural Netw. 2002, 9, 601–612. [Google Scholar] [CrossRef] [PubMed]
Harpham, C.; Dawson, C.W.; Martin, R.B. A review of genetic algorithms applied to training radial basis function networks. Neural Comput. Appl. 2004, 13, 193–201. [Google Scholar] [CrossRef]
Feng, H.M. Self-generation RBFNs using evolutional PSO learning. Neurocomputing 2006, 70, 241–251. [Google Scholar] [CrossRef]

Figure 1. Architectures of the RBF and MLP neural networks.

Figure 2. D_entropy map for

γ = 1

.

Figure 2. D_entropy map for

γ = 1

.

Figure 3. D_entropy map for

γ = 20

.

Figure 3. D_entropy map for

γ = 20

.

Figure 4. Density map for

γ = 1

.

Figure 4. Density map for

γ = 1

.

Figure 5. Density map for

γ = 10

.

Figure 5. Density map for

γ = 10

.

Figure 6. D_entropy map for

γ = 0.1

.

Figure 6. D_entropy map for

γ = 0.1

.

Figure 7. D_entropy map for

γ = 10

.

Figure 7. D_entropy map for

γ = 10

.

Figure 8. DH-measure map for

γ = 0.1, β = 2

.

Figure 8. DH-measure map for

γ = 0.1, β = 2

.

Figure 9. DH-measure map for

γ = 3, β = 2

.

Figure 9. DH-measure map for

γ = 3, β = 2

.

Figure 10. Algorithm of initialization of centroid positions.

Figure 11. Accuracy curve, Blob, D = 10,

γ

= 100.

Figure 11. Accuracy curve, Blob, D = 10,

γ

= 100.

Figure 12. Accuracy curve, Iris, D = 4,

γ

= 10.

Figure 12. Accuracy curve, Iris, D = 4,

γ

= 10.

Figure 13. Accuracy curve, Mushroom, D = 22,

γ

= 10.

Figure 13. Accuracy curve, Mushroom, D = 22,

γ

= 10.

Figure 14. Accuracy curve, Abalone, D = 8,

γ

= 10.

Figure 14. Accuracy curve, Abalone, D = 8,

γ

= 10.

Figure 15. Architecture of the RBF-MLP V1 neural network.

Figure 16. Architecture of the RBF-MLP V2 neural network.

Figure 17. Architecture of the RBF-MLP V3 neural network.

Figure 18. Accuracy curve, Blob, D = 10,

γ

= 100.

Figure 18. Accuracy curve, Blob, D = 10,

γ

= 100.

Figure 19. Accuracy curve, Iris, D = 4,

γ

= 10.

Figure 19. Accuracy curve, Iris, D = 4,

γ

= 10.

Figure 20. Accuracy curve, Uniform, D = 3,

γ

= 10.

Figure 20. Accuracy curve, Uniform, D = 3,

γ

= 10.

Figure 21. Accuracy curve, Mushroom, D = 22,

γ

= 10.

Figure 21. Accuracy curve, Mushroom, D = 22,

γ

= 10.

Figure 22. Accuracy curve, Maternal, D = 6,

γ

= 10.

Figure 22. Accuracy curve, Maternal, D = 6,

γ

= 10.

Figure 23. Accuracy curve, Abalone, D = 8,

γ

= 10.

Figure 23. Accuracy curve, Abalone, D = 8,

γ

= 10.

Figure 24. Accuracy curve, Adult, D = 14,

γ

= 10.

Figure 24. Accuracy curve, Adult, D = 14,

γ

= 10.

Figure 25. Accuracy curve, Adult_risk, D = 26,

γ

= 10.

Figure 25. Accuracy curve, Adult_risk, D = 26,

γ

= 10.

Figure 26. Accuracy curve, Diabetes, D = 8,

γ

= 10.

Figure 26. Accuracy curve, Diabetes, D = 8,

γ

= 10.

Figure 27. Accuracy curve for different

γ

values using RBF-DH network on iris and blob datasets.

Figure 27. Accuracy curve for different

γ

values using RBF-DH network on iris and blob datasets.

Figure 28. Accuracy curve for different K values on maternal dataset.

Table 1. Relative and absolute aggregated accuracy of the investigated methods.

Method	Relative Accuracy	Absolute Accuracy
MLP	100%	60.3%
RBF-KM	94%	57%
RBF-DT	97%	58.5%
RBF-FCM	95%	57.3%
RBF-DH	125%	75.4%
RBF-MLP-V1	111%	67%
RBF-MLP-V2	133%	80.2%
RBF-MLP-V3	139%	84%
RBF-MLP-V4	122%	73.7%

Table 2. Relative and absolute aggregated accuracy of the investigated methods for the 2, 6, 20 epoch lengths.

Method	Relative Accuracy	Absolute Accuracy
MLP	100%	53.6%
RBF-KM	95%	50.3%
RBF-DT	98%	52.5%
RBF-FCM	94%	50.32%
RBF-DH	139%	70.7%
RBF-MLP-V1	114%	61%
RBF-MLP-V2	135%	72.3%
RBF-MLP-V3	140%	75%
RBF-MLP-V4	135%	72.3%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kovács, L. Classification Improvement with Integration of Radial Basis Function and Multilayer Perceptron Network Architectures. Mathematics 2025, 13, 1471. https://doi.org/10.3390/math13091471

AMA Style

Kovács L. Classification Improvement with Integration of Radial Basis Function and Multilayer Perceptron Network Architectures. Mathematics. 2025; 13(9):1471. https://doi.org/10.3390/math13091471

Chicago/Turabian Style

Kovács, László. 2025. "Classification Improvement with Integration of Radial Basis Function and Multilayer Perceptron Network Architectures" Mathematics 13, no. 9: 1471. https://doi.org/10.3390/math13091471

APA Style

Kovács, L. (2025). Classification Improvement with Integration of Radial Basis Function and Multilayer Perceptron Network Architectures. Mathematics, 13(9), 1471. https://doi.org/10.3390/math13091471

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Classification Improvement with Integration of Radial Basis Function and Multilayer Perceptron Network Architectures

Abstract

1. Introduction

2. Rbf Centroid Initialization Based on Distance-Weighted Homogeneity

2.1. RBF Centroid Initialization Methods

2.2. Initialization of the Centroids with the DH Measure

2.3. Initialization of the Radial Function Neurons

2.4. Weight Initialization of the Perceptron Layer

2.5. Parameter Optimization

2.6. Test Results

3. Integration of RBF and MLP Architectures

3.1. RBF-MLP Version 1

3.2. RBF-MLP Version 2

3.3. RBF-MLP Version 3

4. Test Experiments on the Integrated Architecture

5. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI