A Neuron Model with Dendrite Morphology for Classiﬁcation

: Recent neurological studies have shown the importance of dendrites in neural computation. In this paper, a neuron model with dendrite morphology, called the logic dendritic neuron model (LDNM), is proposed for classiﬁcation. This model consists of four layers: a synaptic layer, a dendritic layer, a membrane layer, and a soma body. After training, the LDNM is simpliﬁed by proprietary pruning mechanisms and is further transformed into a logic circuit classiﬁer. Moreover, to address the high-dimensional challenge, feature selection is employed as the dimension reduction method before training the LDNM. In addition, the effort of employing a heuristic optimization algorithm as the learning method is also undertaken to speed up the convergence. Finally, the performance of the proposed model is assessed by ﬁve benchmark high-dimensional classiﬁcation problems. In comparison with the other six classical classiﬁers, LDNM achieves the best classiﬁcation performance on two (out of ﬁve) classiﬁcation problems. The experimental results demonstrate the effectiveness of the proposed model. A new perspective for solving classiﬁcation problems by the proposed LDNM is provided in the paper.


Introduction
Biological neural networks are complex and are composed of a large number of interconnected neurons. A neuron is an electrically excitable cell and consists of three typical parts, including a soma (cell body), dendrites, and an axon. Dendrites are filamentous and branch multiple times, constituting the dendritic tree of a neuron. An axon is a slender projection of a neuron. In general, a neuron receives signals via the synapses, which are located on its dendritic tree. Then, the neuron sends out processed signals down its axon. Inspired by the biological neuron model, McCulloch and Pitts first mathematically proposed an artificial neuron model in 1943 [1]. This model worked as a linear threshold gate by comparing a predefined threshold with the sum of inputs that were multiplied by a set of weights. Later, Rosenblatt optimized the artificial neuron model and developed the first perceptron [2]. However, these models are considered to be simplistic and lack flexible computational features. Specifically, nonlinear mechanisms of dendrites were not involved in these models [3].
Koch, Poggio, and Torre are the pioneers who investigated the nonlinear mechanisms of dendrites. They proposed a dendritic neuron model called δ cell in [4,5]. This model was based on the nonlinear interactions between excitation and inhibition on a dendritic branch. Further studies [6][7][8] also made researchers aware of the importance of dendrites in neural computation. Subsequently, numerous neuron models based on dendritic computation were proposed. For example, Rhodes et al. proposed a model with apical dendrites to reproduce different neuronal firing patterns [9]. Poirazi et al. proposed a neural network has structural plasticity, and a trained LDNM can be simplified by means of synaptic pruning and dendritic pruning. It is worth emphasizing that a simplified LDNM can be further transformed into a logic circuit classifier (LCC), which only consists of digital components: comparator, NOT, AND, and OR gates. Compared to most conventional classification methods [44], the proposed LDNM has two novel features. First, for a specific classification problem, the architecture of the trained LDNM can give us some insights into how the classification results are concluded. Second, the trained LDNM can be transformed into an LCC. The classification speed of the LCC is greatly improved when it is implemented in hardware because it only consists of digital components. On the other hand, since the size of the LDNM is considered large when it is applied in highdimensional classification problems, a heuristic optimization algorithm instead of a BP algorithm is employed in training the LDNM. In addition, feature selection is employed as the dimension reduction method to address the high-dimensional challenge. Finally, five high-dimensional benchmark classification problems are used to evaluate the performance of the proposed model. The experimental results evidence the high performance of LDNM as a very competitive classifier.
The remainder of this paper is organized as follows. The characteristics of the proposed LDNM are described in Section 2. Section 3 presents the two training methods. The experimental studies are provided in Section 4. Finally, Section 5 presents the conclusions of this paper.

Proposed model 2.1. Logic Dendritic Neuron Model
Previous neurological studies strengthened the opinion that the computation performed by dendrites is an indispensable part of neural computation [7,8]. Specifically, the issue regarding the mechanism of dendritic construction was investigated by Koch et al. in their spearheading works [5]. They suggested that the interactions between excitation and inhibition in synapses are nonlinear. An excitation is vetoed by a shunting inhibition that is on the pathway between the excitation and the soma. In other words, logic operations exist on the branch. Furthermore, the interaction between two synaptic inputs is considered a logic AND operation, and the branching points execute a logic OR operation.
Inspired by the abovementioned dendritic mechanism, we propose an LDNM which has a dendrite morphology in this study. The logic architecture of this model is displayed in Figure 1. Four layers compose the LDNM, including a synaptic layer, a dendrite layer, a membrane layer, and a soma body. In detail, the synaptic layers receive and process the input signals that originate in the axons of other neurons. Then, the AND operation is performed on each branch in the dendrite layer. Next, branching points execute the logic OR operation in the membrane layer. Finally, the soma body processes the output of the membrane layer and sends the output signal to its axon. The proposed LDNM can be described mathematically as follows.

Synaptic Layer
The synaptic layer represents the connection where nerve signals are transmitted from a presynaptic neuron to a postsynaptic neuron. A sigmoid function is used to model the synaptic connection in a dendritic branch. The synaptic layer connecting the ith (i = 1, 2, ..., n) synaptic input to the jth (j = 1, 2, ..., m) dendritic branch is expressed as follows: where x i is the ith synaptic input and ranges in [0, 1]. k is a positive constant and is set to 5. w ij and q ij are synaptic parameters that need to be adjusted.

Dendritic Layer
Previous works [45] have shown the existence of multiplicative operations in neurons to process neural information. This idea is also adopted in the proposed model. The dendritic layer performs multiplication to imitate the interaction among synaptic signals on a dendritic branch. This operation is equal to the logic AND operation when the synaptic inputs are binary. The output of the jth dendritic branch Z j is calculated as follows:

Membrane Layer
The membrane layer performs the sublinear summation operation on the result collected from all dendritic branches. This operation approximates a logical OR operation in the binary case. The output of the membrane layer V is defined as follows:

Soma Body
Finally, the output signal of the membrane layer is sent to the soma body. The soma body generates a value of 1 when the input exceeds a specific threshold, and 0 otherwise. A sigmoid function is employed in this layer as follows: where k is a positive constant and is set to 5. The threshold γ is set to 0.5 for the classification problem in this study. V is the output of the membrane layer, and O is the output of the soma body.

Structure Pruning
Structure pruning is an interesting mechanism of many neuron networks [46]. A trained LDNM can also be simplified by proprietary pruning mechanisms. Investigating the synaptic connection described in Equation (1), the states of the synaptic connection can be approximately divided into four types, as shown in Figure 2: constant 1 connection, constant 0 connection, direct connection, and inverse connection. The connection state of a synapse is uniquely determined by its synaptic parameters w ij and q ij . Specifically, the threshold θ ij of a synapse is defined as follows: In fact, θ ij describes the position of the image center of the sigmoid function (Equation (1)) on the X-axis, as shown in Figure 2. Four connection states corresponding to six cases are described as follows. Case 1. w > 0, q < 0 < w, e.g., w = 1, q = −0.5; thus, θ = −0.5, as shown in Figure 2a. This case is a constant 1 connection because the output is always 1, regardless of x i ranging in [0, 1]. Case 2. w > 0, 0 < q < w, e.g., w = 1, q = 0.5; thus, θ = 0.5, as shown in Figure 2b. This case is a direct connection because the input with high potential leads to the high output, and vice versa. Case 3. w > 0, 0 < w < q, e.g., w = 1, q = 1.5; thus, θ = 1.5, as shown in Figure 2c. This case is a constant 0 connection because the output is always 0, regardless of x i ranging in [0,1]. Case 4. w < 0, w < 0 < q, e.g., w = −1, q = 0.5; thus, θ = −0.5, as shown in Figure 2d. This case corresponds to a constant 0 connection. Case 5. w < 0, w < q < 0, e.g., w = −1, q = −0.5; thus, θ = 0.5, as shown in Figure 2e. This case is an inverse connection because the input with high potential leads to the low output, and vice versa. Case 6. w < 0, q < w < 0, e.g., w = −1, q = −1.5; thus, θ = 1.5, as shown in Figure 2f. This case corresponds to a constant 1 connection.  The values of w and q are all initialized in the range [−1.5, 1.5]. Consequently, a synapse is randomly connected to a dendritic branch. After training, the synaptic connection will land on one state of the four connection states, as shown in Figure 3. Furthermore, owing to the AND logic operation in Equation (2), it is easy to conclude that the constant 1 synaptic connection and the constant 0 synaptic connection play particular roles in the calculation of the output of a branch. More specifically, the constant 1 connection has no influence on the output of a branch. In contrast, the output of a branch is always 0 when a constant 0 synapse is connected to it. Thus, two pruning operations on a trained LDNM can be proposed. Figure 4a shows the synaptic pruning, where a constant 1 synaptic connection is screened out. Figure 4b shows the dendritic pruning, where the dendritic branch connected to a constant 0 is screened out.

Transforming the LDNM into a Logical Circuit
Similar to multilayer perceptrons (MLPs) [28], the proposed LDNM can also be applied in classification problems [33]. After supervised learning on a dataset, a unique LDNM with adjusted synaptic parameters w ij s and q ij s is generated. Then, the pruning operations are executed, and a more concise LDNM for the specific classification problem is obtained.
It is worth emphasizing that a determined LDNM can be further transformed into a logic circuit. Figure 5 shows an example of a determined LDNM and its equivalent logic circuit. The synapse connection is equivalent to a comparator coupled with a NOT gate or without it. The dendritic layer is equivalent to an AND gate, the membrane layer is an OR gate, and the soma body can be considered as a wire. The hallmark of the LDNM as a classifier is that an equivalent logic circuit can be generated from it. Classical classifiers, e.g., k-nearest neighbors (KNN), support vector machine (SVM), and naive Bayes (NB), are based on mathematical analysis and a large amount of floating-point computation. Compared with these classifiers, the LCC has a very high speed because it has no floating-point computation but only contains logic operations. In fact, with the trend of big data [40], the speed of a classifier is becoming more important in practical applications.

Learning Methods
Two learning methods are employed and compared in this study. The first one is the BP algorithm. The other method is a heuristic optimization algorithm.

Backpropagation Algorithm
Since the proposed LDNM is a feed-forward model, the error BP learning rule can be adopted in training the LDNM. Specifically, the error between the ideal output T p and the actual output O p of a neuron on the pth training sample is defined as follows: Then, to minimize E p , the synaptic parameters w ij and q ij are corrected in the negative gradient direction as follows: where η is the learning rate, and P is the number of training samples. The synaptic parameters w ij and q ij are updated at the next iteration t + 1 as follows: In particular, the partial differentials of E p with respect to the synaptic parameters w ij and q ij can be calculated as follows: The components in Equations (11) and (12) are expressed as follows:

Competitive Swarm Optimizer
BP has been proven to be a very effective method to train ANNs [20]. However, the dependence on gradient information makes BP sensitive to the initial condition [23]. In addition, the slow convergence speed and the capability of being easily trapped in local minima are the main disadvantages of BP methods [24]. In contrast, the utility of a heuristic optimization method to solve real-world problems [47,48] has aroused the interest of researchers in recent years, including training ANNs [28,34]. The heuristic optimization algorithm called the competitive swarm optimizer (CSO) is also employed to train the LDNM in this study.
The CSO is a novel evolutionary method for large-scale optimization that was proposed by Chen et al. recently [49]. This algorithm is based on a pairwise competition mechanism and is proven effective for solving practical problems [50,51]. CSO is similar to PSO and works as follows. Each particle in CSO is attributed by two vectors: the position vector x and the velocity vector v. In contrast to PSO [52], the particle in CSO learns from its competitors rather than its personal best or the global best. In iteration t, the swarm in CSO is divided into two groups randomly, and pairwise competitions are executed immediately. Then, the winner particles are directly passed to the next generation. Meanwhile, the loser particles learn from their corresponding winners and update their positions as follows: where x loser and v loser are the position and velocity of the loser, respectively. x winner is the position of the winner particle. R 1 , R 2 , and R 3 are three random vectors within [0, 1]. φ is the parameter that controls the influence of the mean position of the current swarmx. Finally, the main loop of the CSO terminates if the stopping criterion is met.
To employ the CSO as the learning method, all the synaptic parameters (w ij s and q ij s) of an untrained LDNM are encoded as a vector for optimization. The position vector x of each particle in CSO represents a candidate solution and can be encoded as follows: x = (w 1,1 , w 1,2 , ..., w nm , q 1,1 , q 1,2 , ..., q nm ) (21) where the definitions of n and m are the same as Equation (23). The fitness function of a particle in CSO, which evaluates an LDNM, is defined as the mean square error (MSE): where the definitions of O p , T p , p, and P are the same as above. During the process of training an LDNM, the CSO iteratively changes all of the w ij s and q ij s to minimize the value of MSE.

Feature Selection before Learning
Our previous works have shown the high performance of dendritic neuron models for classification problems [33,37,39]. However, it is not easy to apply a dendritic neuron model to high-dimensional classification problems because of the large number of synaptic parameters. The number of synaptic parameters (w ij and q ij ) in the proposed LDNM is calculated as follows: where n and m are the number of synaptic inputs and dendritic branches, respectively. Usually, m is set to 2n. As a result, the complexity of an LDNM increases with the square of the number of synaptic inputs. Thus, dimension reduction is a natural method when applying an LDNM to a high-dimensional classification problem. In general, dimension reduction methods can be roughly divided into two categories: feature extraction and feature selection [53]. The former methods transform original data from the high-dimensional space into a low-dimensional space. Finally, derived features are obtained. Two typical feature extraction methods are principal component analysis (PCA) and linear discriminant analysis (LDA). On the other hand, feature selection tries to select the most optimal subset of features from the original features. Feature selection rather than feature extraction is employed to perform dimension reduction in this study. Compared with feature extraction methods, feature selection methods are considered more simple and require less computation power. The optimal subset of relevant features selected by a feature selection method can improve the performance of a classifier. In addition, a determined feature selection model has no need of extra calculation. Thus, it can work better with the LDNM in classification tasks.
More features increase the difficulty of training an LDNM. However, more features also mean more information. Therefore, it is a trade-off to determine the number of features after dimension reduction. On the other hand, structure pruning also plays the role of feature selection for a trained LDNM such that irrelevant features are removed by synaptic pruning and dendritic pruning in the final structure, as shown in Figure 4. Such pruning indicates that some irrelevant features are permitted to be retained after feature selection before training an LDNM. In other words, as many features as possible should be retained before training an LDNM because more information is provided.

Experimental Studies
This section presents the experiments to evaluate the performance of the proposed LDNM. All algorithms in this study are implemented in C and Python language. The experiments are executed on the Linux 64-bit system with a Core-i5 CPU with 8 G RAM.

Experimental Setup for Determining the Suitable Learning Method
We use two high-dimensional benchmark classification datasets to verify the characteristics of the proposed model. The two datasets are the Wisconsin diagnostic breast cancer (WDBC) dataset [54] and the Ionosphere dataset [55], which can be accessed via the UCI machine learning repository. WDBC describes the breast cancer diagnosis according to the features that are computed from a digitized image of a fine needle aspirate of a breast mass. The Ionosphere dataset describes the collected radar data to determine whether signals pass through the ionosphere. The two problems are all binary classification problems, and the details of them are summarized in Table 1. Each dataset is randomly separated into two subsets. The training subset is used to train the LDNM, and the testing one is used to validate the model. The proportions of each subset are set to 50%-50%. Specifically, all features (attributes) of each dataset are normalized in the range [0,1] to fit the input of the proposed LDNM. In addition, the hyperparameters k and γ of LDNM are set to the typical values (5 and 0.5, respectively).
Mutual information (MI) [56] is a metric that measures the dependence of two variables, and a higher MI corresponds to a higher dependency. MI is sensitive to linear dependency and other kinds of dependency between variables. A simple univariate feature selection method based on MI is employed to select the best feature subset before training. The employed feature selection method is considered as a filter [53] and works as follows. First, the MI between each feature and class is computed. Then, the features with the highest values of MI are selected. Specifically, 16 features are selected for each dataset in this study. Consequently, the number of synaptic parameters is approximately limited under 1000. In fact, other effective feature selection methods [53,57] can also be employed in this step.

The Results Obtained by Four Training Methods
To investigate the influence of feature selection in training LDNMs, four training methods are employed to train LDNMs for the two classification problems. These methods are feature selection plus CSO (FS+CSO), CSO, feature selection plus BP (FS+BP), and BP. Note that only the training data are used to determine the feature subset that will be further used to reduce the dimension of training data and testing data. In addition, to ensure a fair comparison between CSO and BP, the maximum number of evaluations of CSO is set to 20,000, and the maximum number of iterations of BP is set to 10,000. In fact, BP takes more memory and training time than CSO in this setup.
The experiments of each training method are performed 30 times on two benchmark problems independently. The statistical results of MSE, training classification accuracy, and testing classification accuracy are provided in Table 2. In addition, classification accuracy has the following definition:

Classi f ication accuracy = Number o f correct predictions
Total number o f predictions * 100% (24) It is obvious that employing feature selection benefits the subsequent training process because the performance of the training methods with feature selection is ultimately improved. Specifically, the testing classification accuracies of the four methods are exhibited in Figure 6. This figure also shows the positive influence of feature selection on the subsequent training process because the testing accuracy is improved assuredly. Although an extra computing cost is introduced in feature selection, fewer features are selected after feature selection, and fewer synaptic parameters of LDNM are produced. Thus, a lower amount of computing resources is ultimately necessary in the subsequent training process. Therefore, not only the classification accuracy but also the training speed is improved by feature selection [53].  Comparing the performance of CSO with BP on training LDNMs, it is clear that CSO outperforms BP because the MSE and testing classification accuracy obtained by CSO are better than those obtained by BP. This result indicates that CSO is more powerful than BP in training an LDNM. Moreover, BP consumes more computational resources in training LDNMs.
To further investigate the differences in the four training methods in training LDNMs, the average (30 times) convergence curves of these methods for two benchmark problems are plotted in Figure 7. These curves reflect the efficiency and stability of a training method. It is clear that feature selection speeds up the convergence because those methods that are coupled with feature selection converge faster than the ones without feature selection. Moreover, BP is easily trapped in a local minimum [23,25] and is even invalid when the size of an LDNM is large. These findings indicate that BP is not an effective learning method for training LDNMs for high-dimensional classification problems.

Investigating Structure Pruning and Logic Circuit Transformation
As mentioned above, the structure of a trained LDNM can be pruned by synaptic pruning and dendritic pruning. Then, an LCC is obtained, which is transformed from the pruned LDNM. The performance of the pruned LDNM and the LCC is also verified in this section. The FS+CSO combination is employed as the training method and is performed 30 times. The classification accuracy of the pruned LDNMs and the LCCs of the trained LDNMs is verified on the same test data. Table 3 shows the classification results. It is clear that the performance of the pruned LDNMs and the LCCs has no significant degradation compared with their corresponding LDNMs. This finding indicates the feasibility of this transformation.
Two pruned LDNMs and their corresponding LCCs for the two benchmark problems are shown in Figure 8. In contrast to general ANNs that lack interpretation, the pruned LDNM and its corresponding LCC provide us with some insights into how the classification results are concluded. For example, there are two dendritic branches in the final LDNM for the WDBC problems, as shown in Figure 8a. This observation means that two patterns of the input data determine the classification. In the first pattern, when the 8th, 21st, and 28th input are all larger than the three thresholds, the LDNM outputs 1, and 0 otherwise. The second pattern has a similar explanation. These patterns benefit our understanding of the intrinsic characteristics of these problems.

Comparison with Classical Classifiers
The above experiments have shown that feature selection plus CSO is the most effective learning method in training LDNM, because this method has the fastest convergence speed and the highest classification accuracy. In this subsection, we also compare LDNM with other classical classifiers to verify the performance of LDNM. These classifiers include KNN, SVM, decision tree, random forest, MLP, and NB. The parameters of the seven classifiers are set as shown in Table 4. In addition to the abovementioned two classification problems, the other three benchmark classification datasets are also employed in this experiment. More details about these three datasets are presented in Table 5.
For each classifier, we conduct three times 10-fold cross-validation (CV) [58] to evaluate its effectiveness. Figure 9 is the box and whisker diagram which shows the classification accuracy of these classifiers on the five benchmark datasets. From Figure 9, we can see that there is no one classifier that always outperforms the others on all of the classification datasets. In addtion, we can see that LDNM has higher medians than most of the remaining classifiers on the datasets ForestTypes, Ionosphere, RAFPD, and SPECTF. It indicates that LDNM has a stronger classification ability in comparison with the others. In addition, the interquartile range (IR) in Figure 9 which is the length between the first and the third quartile reflects stability of classification performance. A shorter IR means a more stable performance. From Figure 9, we can see that LDNM has acceptable solutions with short IR. It indicates that the stability of LDNM is considered competitive in comparison with the other classifiers. The classification accuracy of these classifiers on five benchmark datasets is summarized in Table 6. From Table 6, we can see that LDNM, KNN, random forest, and NB achieve the best classification performance on two, one, one, and one classification problems, respectively. The comparison result suggests that the proposed LDNM can obtain better or very competitive results with these classifiers in terms of test classification accuracy.   To further determine the significant differences among the classification accuracy of these classifiers, we conduct the Friedman test on the classification accuracy of these classifiers in Table 6. Table 7 presents the statistical results, including the values of ranking, z value, unadjusted p value, and p bon f value. From Table 7, we can see that LDNM obtains the smallest ranking value of 3.26. It means that LDNM achieves the best performance among these classifiers in terms of test classification accuracy. In addition, we investigate the significance among these classifiers. To avoid the Type I error [59], we use a post hoc test, i.e., the Bonferroni-Dunn, to adjust the p values. From Table 7, we can see that the adjusted p values, i.e., p bon f values, of KNN, MLP, and random forest are larger than the significance level (α = 0.05). This indicates that LDNM is not significantly better but obtains competitive results in comparison with the three classifiers. In addition, the p bon f values of NB, SVM, and decision tree are smaller than 0.05. This indicates that LDNM is significantly superior to the three classifiers on the five classification problems. Therefore, we can conclude that the proposed LDNM can be considered as a very competitive classifier in comparison with these state-of-the-art classifiers.   Figure 9. Box and whisker diagram of test classification accuracy for five benchmark problems.

Conclusions
Recent research has strongly suggested that dendrites play an important role in neural computation. In this paper, a novel neuron model with dendrite morphology, called the logic dendritic neuron model, was proposed for solving classification problems. This model consists of four layers: a synaptic layer, a dendritic layer, a membrane layer, and a soma body. To apply this model to high-dimensional classification problems, we employed the feature selection method to reduce the dimensionality of the classification problems, although the reduced dimensionality is still comparatively high for the proposed LDNM. In addition, we attempted to use a heuristic optimization algorithm called CSO to train the proposed LDNM. This method was verified to be more suitable than BP in training the LDNM with numerous synaptic parameters. Finally, we compared LDNM with the other six classical classifiers on five classification problems to verify its performance. The comparison result indicated that the proposed LDNM can obtain better or very competitive results with these classifiers in terms of test classification accuracy.
It is worth pointing out that the proposed LDNM has two unique characteristics. First, a trained LDNM can be simplified by synaptic pruning and dendritic pruning. Second, the simplified LDNM can be transformed into a logic circuit classifier for a specific classification problem. It can be expected that the speed achieved by the logic circuit classifier is high when this model is implemented in hardware. In addition, the the trained LDNM provides us with some insights into the specific problem. However, it should be noted that the feature selection method chosen in this study was very simple. Therefore, more effective feature selection methods are required when solving more complicated problems.
In future studies, we will apply the proposed LDNM to more classification tasks to verify its effectiveness. The improvement of the architecture of the dendritic neuron model and the learning methods deserve our unremitting efforts. In addition, extending the proposed LDNM to dealing with multiclass classification problems is worth investigating. Moreover, implementing the proposed LDNM as a module unit in hardware is our future research topic.