A Hyperparameter Self-Evolving SHADE-Based Dendritic Neuron Model for Classiﬁcation

: In recent years, artiﬁcial neural networks (ANNs), which are based on the foundational model established by McCulloch and Pitts in 1943, have been at the forefront of computational research. Despite their prominence, ANNs have encountered a number of challenges, including hyperparameter tuning and the need for vast datasets. It is because many strategies have predominantly focused on enhancing the depth and intricacy of these networks that the essence of the processing capabilities of individual neurons is occasionally overlooked. Consequently, a model emphasizing a biologically accurate dendritic neuron model (DNM) that mirrors the spatio-temporal features of real neurons was introduced. However, while the DNM shows outstanding performance in classiﬁcation tasks, it struggles with complexities in parameter adjustments. In this study, we introduced the hyperparameters of the DNM into an evolutionary algorithm, thereby transforming the method of setting DNM’s hyperparameters from the previous manual adjustments to adaptive adjustments as the algorithm iterates. The newly proposed framework, represents a neuron that evolves alongside the iterations, thus simplifying the parameter-tuning process. Comparative evaluation on benchmark classiﬁcation datasets from the UCI Machine Learning Repository indicates that our minor enhancements lead to signiﬁcant improvements in the performance of DNM, surpassing other leading-edge algorithms in terms of both accuracy and efﬁciency. In addition, we also analyzed the iterative process using complex networks, and the results indicated that the information interaction during the iteration and evolution of the DNM follows a power-law distribution. With this ﬁnding, some insights could be provided for the study of neuron model training.


Introduction
In 1943, McCulloch and Pitts introduced a mathematical representation of neural activity, laying the foundation for ANNs [1].Although ANNs have gained prominence, they come with challenges, such as hyperparameter tuning complexities, the necessity for abundant labeled training datasets, excessive model refinement, and inherent opaqueness stemming from their black-box nature [2,3].Interestingly, these approaches often overlook the essence of deep neural networks: the efficient data processing capability of individual neurons.Instead, they frequently boost model performance through statistical theories or intricate learning strategies.Some researchers have also chosen to reconceptualize neuron models rather than merely deepening neural networks [4].Furthermore, the McCulloch-Pitts neuron model, which solely represents connection strength between two neurons using a weight, has faced criticism for its simplification [5].
A real biological neuron possesses intricate spatial and temporal features.Drawing inspiration from the neuron's ability to process temporal information, a unique and biologically accurate dendritic neuron model (DNM) was proposed.This model stands out due to its distinct architecture and excitation functions, incorporating sigmoid functions for synaptic interactions and a multiplication operation to emulate dendritic interactions [6].
As a new model, there is ample scope for refining and improving the DNM.For addressing classification problems through combinatorial optimization with algorithmically trained artificial neurons, it is evident that utilizing more advanced algorithms in conjunction with sophisticated neuron models can substantially enhance performance.Recently, an upgraded iteration of the DNM has emerged as a promising choice for training neurons, demonstrating superior outcomes in classification tasks.Notably, among these advancements, the refined DNM-R model has achieved the highest classification accuracy when paired with the same algorithm [7].Nonetheless, this optimization approach comes with a notable drawback: the DNM employed as the training model encompasses numerous parameters requiring adjustment.Typically, this involves a minimum of three parameters, namely, k, q, and M.These parameters essentially correspond to the amplification factor, discrimination factor, and the number of dendritic branches.In prior investigations, k and q were often segmented into 5 parameters each for the purpose of tuning experiments, while M sometimes entailed as many as 20 parameters.From the perspective of orthogonal experiments, this implies that parametric experiments alone would require repetition a minimum of 500 times.This imposes a substantial burden, both in terms of research endeavors and practical applications.Therefore, it is imperative to introduce a methodology capable of adaptively optimizing the model parameters to facilitate improvements.
Given that the training issue associated with the neuron model may constitute an NP-hard problem [8], evolutionary algorithms (EAs) emerge as a potent solution.The genesis of EAs can be traced back to the genetic algorithm (GA) [9], which subsequently spurred the creation of various algorithms, including differential evolution (DE) [10] and the success-history-based parameter adaptation for differential evolution (SHADE) [11].
A key enhancement of DE over GA is its mutation strategy, which leverages differences among individuals instead of mere random variations.In contrast, SHADE refines the DE approach by differentially linking offspring to the optimal parent individuals.Moreover, parameters from top-performing individuals are preserved through iterations to guide the learning of subsequent generations.The efficacy of DE-derived algorithms has been well documented [12], with their enhanced versions frequently securing leading spots in the IEEE CEC contests [13].
Various adaptation strategies can be employed in optimization processes, including random variations in crossover and variance rates using probability distributions like normal and Cauchy distributions, a common practice in many differential evolution algorithms [14].Additionally, some research endeavors have explored the utilization of fitness-distance balance strategies to adaptively fine-tune algorithmic parameters [15,16].These investigations have, to varying degrees, shown that adaptive strategies not only obviate the need for meticulous parameter tuning but also contribute to enhanced algorithmic performance.The implementation of suitable adaptive strategies is particularly pertinent in the context of optimizing real-world problems, such as classification tasks.It is important to recognize that the training process of algorithms on artificial neurons essentially involves iterating their weights, constituting an adaptive process in itself.Consequently, the algorithm itself can be construed as an efficient adaptive instrument endowed with nonlinear properties, setting it apart from conventional mathematical techniques.Advanced algorithms are inherently designed to tackle intricate black-box problems, which typically necessitate robust exploitation and exploration capabilities.These attributes render them well suited for the adaptive adjustment of hyperparameters.In essence, it is advisable to treat the hyperparameters within artificial neurons as variables subject to iteration within the algorithmic framework, leveraging the algorithm's evolutionary prowess to fine-tune these hyperparameters.
In this study, we have integrated the key hyperparameters of the DNM with SHADE.We have implemented an adaptive hyperparameter-adjustment approach, leveraging the inherent evolutionary capabilities of these algorithms.The resultant novel optimization framework is denoted as hyperparameter-tuning success-history-based parameter adaptation for differential evolution (HSHADE), with a type of neuron that can self-evolve as the algorithm iterates.We conducted a comprehensive evaluation of HSHADE using a benchmark comprising 10 real-world problems commonly employed for assessing algorithmic performance in classification tasks.Comparative analyses were performed against the original algorithm, well-established algorithms with a track record of effectiveness, and contemporary state-of-the-art algorithms within the same problem set.The findings conclusively demonstrate that HSHADE exhibits a notable advantage in terms of classification accuracy and significantly streamlines the parameter tuning process.
The main contributions of this study are summarized below: (1) HSHADE successfully iterates the fixed parameters of the DNM adaptively using evolutionary algorithms, thus reducing its tuning workload.(2) HSHADE achieves the same or better accuracy than the state-of-the-art algorithms on the same set of classification problems.(3) HSHADE maintains the fast problem-solving characteristics of a single neuron and also achieves very high accuracy.(4) The power-law distribution of information interaction networks observed during the iterative process of HSHADE provides new insights for future neural model training.
The remainder of the paper is structured as follows: the DNMs, EAs, and self-evolving DNM are formulated in Section 2. The experimental results are analyzed in Section 3. Section 4 presents the discussion and conclusions.

Materials and Methods
This section will cover the use of DNMs for classification problem processing as well as different types of EAs that can be used to optimize DNMs.We will first introduce the structure of DNMs and their learning process, then describe some of the EA steps, including SHADE, which is used in this paper to work in conjunction with DNMs, and finally present the concept of self-evolving DNMs.

Dendritic Neuron Model
Figure 1 depicts the entire integrated framework of the dendritic neuron model, which is very similar to the characteristics of biological neurons.A dendritic neuron can be divided into four different layers, with signals entering from the synaptic layer to the dendrite layer, and then conducting to the membrane layer, and finally the soma layer for output.Meanwhile, Figure 2 depicts the DNM learning process, which includes five major steps: training algorithms, morphological conversion based on four connection states, dendritic pruning and its output, along with synaptic pruning and its output.Pruning can remove unnecessary dendrites and synapses, and it can also infer the number of dendrites in the presynaptic axon terminal and dendritic connection position, as well as how they are connected.Soma

Synaptic Layer
Signals from axon terminals to dendrites are processed by the synaptic layer, and receptors on the postsynaptic cell take up the particular ions received; the potentials of these ions vary depending on the state of the synaptic connection, which can be inhibitory or excitatory.In this, {X 1 , ..., X i , ..., X I } (i = 1, 2, ..., I) is the mathematical representation of the I external inputs to the presynaptic axon terminal, and alternatively, the function of the synaptic layer can be denoted by the following sigmoid function [17]: In the above equation, there are two synaptic parameters, one is the synaptic weight θ ij , which represents the state of synaptic connectivity, either excitatory (θ ij > 0) or inhibitory (θ ij < 0); and the other is the threshold parameter p ij .Together, θ ij and p ij control the dendritic and axonal morphology of the DNM, and both can be trained.In addition, k represents the distance parameter.The entire formula Q ij represents the ability of the jth (j = 1, 2, ...., J) post-dendritic cell to distance itself from the ith presynaptic axon end.
As shown in Figure 2, by optimizing θ ij and p ij , the learning algorithm may imitate the synaptic plasticity mechanism.Furthermore, as demonstrated in Figure 3, learned synaptic plasticity can be realized by splitting synaptic connections into four different states, i.e., the four connection states show the neuron's morphology biologically by locally identifying the position of each dendritic and synaptic species in the process of morphology shift [18], which involves the following: (1) Constant 1 connection: the potential of the postsynaptic cell is still close to 1 despite the input varying between 0 and 1 when (1) (2) Constant 0 connection: the potential is still close to 0 despite the input varying between 0 and 1 when (1) 3) Excitatory connection: the potential is always proportional to the input signal despite the input varying between 0 and 1, as long as 0 < p ij < θ ij .( 4) Inhibitory connection: the potential is always inversely proportional to the input signal despite the input varying between 0 and 1, as long as θ ij < p ij < 0.

Dendritic Layer
The dendritic layer employs multiplication, which has been considered to be the simplest nonlinear operation found in neurons [19] and is performed by each dendrite.In addition, the multiplication operator is equivalent to a logical AND operation when the dendritic layer receives constant-1 or constant-0 connections.As all signals transmitted by the synaptic layer will be received by the dendrites in the dendritic layer Y j , the function of the jth dendrite is as follows:

Membrane Layer
Then, the membrane layer captures all signals from the dendritic layer.Simultaneously, an accumulation operator similar to a logical OR operation is applied to the membrane layer V, and the summed signals are sent to the soma to stimulate neurons.The following is the equation: 2.1.4.Soma Layer Finally, the soma body utilizes a sigmoid function for the membrane layer V, with the goal of determining whether or not a neuron fires based on the overall model output.Furthermore, q is the firing parameter, which ranges between 0 and 1, indicating the soma body's threshold.The formula is as follows: (4)

Evolutionary Algorithms
Evolutionary algorithms (EAs) usually include population initialization, search operation, evaluation, and selection operation.In this section, these steps are described in detail, using SHADE as an example.
The assignment of initial parameters and the process of generating initial individuals in the solution space constitute the initialization operation.X i represents the initialized individuals, i = 1, 2, ..., G, and G represents the population size of SHADE.SHADE is an improved DE algorithm because it updates the values of F and C R by following the Cauchy and normal distributions, respectively, while keeping track of the successful parameter history.
where C and N mean it follows the Cauchy and normal distribution.M F,r and M C,r mean the memory archive of F and C R .These adaptive parameters significantly balance the exploitation and exploration capabilities of the DE, allowing it to achieve excellent performance on the standard test set.The memory archive is updated in Equation ( 6) where c is a [0.05, 0.2] random number and mean A is the root of the arithmetic mean.The update of mean L is as follows:

Search Operation
The following equation shows the search operation of SHADE, where t denotes the number of iterations, r1 and r2 are two numbers chosen at random from the sets 1, 2, ..., G, which are distinct from each other and from i. X t α is the top-ranking member of the parent population.The random individual in the parent population is represented by X t r1 .Y t r2 is an individual chosen at random from the present and older populations.The adaptation parameters are F i and C Ri .
where V t i is a temporary offspring individual whose retention in the new population requires selection through an evaluation operation.X t i is the parent individual.j rand is an integer, serving as a safeguard in instances where all random values exceed C Ri .Its primary function is to guarantee at least a single information exchange, thus optimizing the utilization of computational resources and preventing wastage.

Evaluation Operation
The next step is the evaluation operation, in which the fitness values of all V t i are evaluated, and the individuals that outperform the X t i are retained in the next generation by comparing f (V t i ) to the f (X t i ) using a one-to-one greedy selection strategy.The evaluation operation function f (•) will evaluate each individual's fitness in the population.The selection operation is formulated as follows: where X t i is the offspring individual.After the algorithm completes the selection operation, it proceeds to the subsequent iterative search operation.

Self-Evolving DNM
Unlike the weight parameters, the hyperparameters within the DNM framework bear practical similarities to the concept of a learning rate.Typically, these hyperparameters remain constant throughout the training process, with their values determined through repeated experimentation.It is worth noting that these hyperparameters often possess specific tuning ranges, and values falling outside these predefined ranges can significantly diminish the neuron's problem-solving capacity.Hence, it is entirely feasible to treat hyperparameters as variables and incorporate them as dimensions within the iterative process of the algorithm.In general, the value of q in the DNM is in the range [0, 1], and k is used twice in the two sigmoid functions in the range [0,20].Given that SHADE employs upper and lower bounds of [−1, 1] for training the DNM optimization in classification problems, the newly introduced variables will also be normalized.This normalization is implemented to enhance their alignment with the algorithm.The value of k is scaled up by a factor of ten after one iteration to satisfy its value domain.Therefore, the ith individual of HSHADE is {X where dim is the number of dimensions.Before solving the problem using DNM, its parameters, and hyperparameters are determined in the following: Altering the number of branches M, will result in a shift in the dimensionality of the algorithmic solution process, employing variable-dimensional matrix operations typically necessitates specialized encoding techniques.Currently, extant variable-dimension algorithms exhibit subpar performance in the context of optimizing continuous problems.Consequently, there exists a lack of effective solutions for the adaptive enhancement of M. The complete execution flow of HSHADE is demonstrated using Algorithm 1.According to Equation (10), extract parameters θ i,j , p i,j , q i , k i in X i .

6
Substitute the parameters into the DNM.

7
Evaluate the performance f (X i ) of the ith DNM for classification.According to Equation (10), extract parameters θ i,j , p i,j , q i , k i in V i .

15
Substitute the parameters into the DNM.

16
Evaluate the performance f (V i ) of the ith DNM for classification.Get new M F,r using Equations ( 6) and (7).

22
Get new M C,r using Equation ( 6).

23
Get new F i , C Ri using Equation (5).

Results
In the experiment, the number of iterations is 30,000.All experiments were performed using MATLAB on a PC with a 3.00 GHz Intel(R) Core(TM) i7-9700 CPU and 36 GB of RAM.All experiments were run independently 30 times.The associated code will be publicly available as detailed in the Data Availability Statement.Additionally, a Python-based implementation will be developed in subsequent work.

Dataset Description and Parameter Settings
To evaluate the efficacy of the introduced HSHADE in training the DNM, we utilized ten commonly employed classification datasets from the UCI Machine Learning Repository.These datasets encompass Tic-tac-toe, Heart, Australia, Congress, Vote, Spect, German, Breast, Ionosphere, and KrVsKpEW.Table 1 provides an overview of each dataset, detailing the number of attributes (inclusive of the class label), the total number of samples, and the DNM learning space dimension for each respective problem.It is important to emphasize that in the table k and q are hyperparameters that need to be adjusted when training the DNM using traditional algorithms.However, when training the DNM with HSHADE, only the hyperparameter m requires adjustment.In this study, m is an integer ranging from 1 to 20.In addition, to obtain the optimal hyperparameter combinations of the DNM in Table 1 for various datasets, extensive parameter discussions are required [5], which significantly consumes valuable computational resources.

Evaluation Criteria
The following assessment tools were used to gauge how well the algorithms performed: (1) Symbols "+", "=", and "−": The notation "+" is used when HSHADE outperforms other EAs, while "=" denotes comparable performance.If HSHADE is inferior, this is represented by "−".The evaluation is grounded in the outcomes of the Wilcoxon rank-sum test.
(2) Wilcoxon rank-sum test (significance threshold p < 0.05): This non-parametric test assesses the null hypothesis that two sample sets are derived from an identical population.
After 30 optimization iterations, this test was applied to discern differences between HSHADE's function results and those of other meta-heuristics.The distinctions are symbolized by W/T/L.Here, W represents the count of functions where HSHADE notably surpasses other methods.T denotes those where HSHADE's performance aligns with other methods and L marks functions where HSHADE lags behind.(3) Box-and-whisker plots: The top line illustrates the peak value, whereas the line at the base signifies the least value.The box's upper and lower boundaries correspond to the third and the first quartiles, respectively.The central red line marks the median, with the red "+" symbol highlighting outliers.A wider gap between maximal and minimal values suggests more pronounced algorithmic variability.

Experimental Setup
Numerous optimization algorithms have been introduced for DNM training.In this study, we evaluate the performance of the proposed HSHADE in relation to its contemporaries.This includes IFDE [5], the winning algorithm in the DNM training, L-SHADE [20], the top-performing algorithm in IEEE CEC2014, CJADE [21], a cutting-edge DE variant, SCJADE [22], an enhanced version of CJADE, BBO [23], known for its robust performance with DNM, SASS [24], a newly introduced meta-heuristic, and BP [25].Each algorithm was executed independently 30 times on an identical computer setup, adhering to the parameter specifications suggested in their respective papers.It is noteworthy that the dataset was split with 70% used for training and the remaining 30% designated for testing.

Performance Comparison of EA-Trained DNMs
In this part, we compare the performance of DNMs trained by EAs on the classification problem.Table 2 shows the performance of DNMs trained by EAs on the classification problem.The initial comparison involves BP, given that the backpropagation method coupled with gradient descent stands as the most frequently employed algorithm for neural network training.BP demonstrates a pronounced advantage in dealing with SpectEW's classification tasks, potentially attributable to its capacity for rapid convergence towards local optima.However, within a limited number of solutions, meta-heuristic algorithms strive for enhanced global-optimum-solving capabilities, leading to HSHADE significantly outperforming BP across multiple problems.HSHADE exhibits substantial superiority over both conventional algorithms and their enhanced counterparts.CJADE and SCJADE have previously been ac-knowledged as effective DNM training algorithms, boasting higher accuracy compared to most MHAs on a majority of problems.HSHADE possesses a substantial and decisive advantage over them.In a similar vein, HSHADE demonstrates enhanced performance when pitted against state-of-the-art algorithms such as L-SHADE and SASS.In past studies, SHADE's performance lags significantly behind these two algorithms, underscoring the effectiveness of our improvements that enable HSHADE's impressive performance.Ultimately, HSHADE even surpasses the newest and most efficient IFDE, signifying that it not only conserves substantial computational resources in terms of parameter tuning but also secures the top position in terms of accuracy among contemporary training algorithms.Table 3 shows the performance of DNMs trained by HSHADE on the classification problem.Those results marked in bold represent the outcomes with the highest average accuracy in the given problem.It is evident that the variation in the number of branches has a noticeable impact on the results.Therefore, conducting experiments to explore the effects of different branch numbers remains essential.The values of M employed by HSHADE represent the optimal outcomes achieved through these twenty experiments.Table 4 shows the ablation experiment of HSHADE, where SHADE-R denotes that the original SHADE was used to train DNM-R, and SHADE-A denotes that HSHADE was used to train the original DNM.SHADE represents the result of training the original DNM using the original SHADE.Figures 4 and 5 show the box and convergence graphs of the ablation experiment of HSHADE.First and foremost, when examining the convergence graph, it becomes evident that HSHADE outperforms the other three approaches significantly in terms of its convergence capability.It excels in its ability to discover globally optimal solutions, consistently approaching closer results when the other methods begin to stagnate.However, it is important to note that in classification problems, the issue of overfitting arises.Thus, a better fit during training does not necessarily translate to higher accuracy during testing.To account for this, we generated box plots of all the test results.For instance, in the Tic-tac-toe problem, HSHADE not only exhibits the highest median accuracy but also achieves the highest accuracy in a single instance.On the other hand, in the Vote problem, while HSHADE's median accuracy falls slightly below that of HSHADE-onlytrained DNM, its optimal solution demonstrates significantly higher accuracy than the other methods.This underscores the practical value of HSHADE, as it consistently possesses the potential to achieve the best solution.The difference between SHADE's results in training DNM and DNM-R is not particularly pronounced, and its level of accuracy varies across different problems.However, SHADE, which relies solely on hyperparameter adaptation, exhibits a significant drop in performance on certain problems.Notably, it produces entirely incorrect training results for the Ionosphere problem.This issue arises from the introduction of two new dimensions, which significantly expands the algorithm's search space and adversely affects its original convergence capability.Therefore, HSHADE should be used in conjunction with DNM-R.DNM-R, with its signal amplification capability, outperforms the DNM in terms of problem sensitivity, thereby strengthening the algorithm's upper bound of convergence ability.On the other hand, the adaptive tuning of hyperparameters in HSHADE enhances the model's feature resolution compared to setting parameters with a single decimal place.In summary, the experimental data strongly support the effectiveness and indispensability of HSHADE's improvements, and HSHADE consistently demonstrates excellent performance.operations unfeasible.This, in essence, implies that while the HSHADE framework has made significant strides in auto-tuning certain parameters, automation of the M parameter remains elusive.
As we gaze into the future of this research, the overarching goal will pivot towards devising methodologies that can seamlessly and effectively auto-tune the M parameter.As previously highlighted, variations in M result in non-uniform dimensions among individuals within the population.In this context, the problem being optimized is termed a metameric variable-length problem [28].Contemporary mainstream algorithms rely on matrix operations to generate new individuals, making them incompatible with the metameric variable-length problem setting.Essentially, this study serves as an intermediary step.Our primary objective was to incorporate all parameters (including M) into the individuals of the evolutionary algorithm, facilitating the self-evolution of these parameters.However, only the self-evolution of k and q was successfully achieved.Tackling this challenge would not only enhance the efficacy of the HSHADE framework but also further solidify its position as a game-changer in the domain of artificial neuron modeling and optimization.
In terms of validity threats, the sole variable examined in this study pertains to the integration of hyperparameters into the algorithm's iterative process, with all other experimental conditions being held constant.Consequently, there are no discernible threats to internal validity at present.Nonetheless, given that our selection was confined to the UCI Machine Learning Repository dataset, it is imperative that future studies broaden the spectrum of datasets used to address potential threats to external validity.

Figure 1 .
Figure 1.The structure of dendritic neuron model.

3 Figure 2 .
Figure 2. The learning process of a DNM.

Figure 3 .
Figure 3. Connection cases of the synaptic layer.

Table 2 .
Performance of DNMs trained by EAs on the classification problem.

Table 3 .
Performance of DNMs trained by HSHADE on the classification problem.