Approaching Optimal Nonlinear Dimensionality Reduction by a Spiking Neural Network

: This work deals with the presentation of a spiking neural network as a means for efﬁciently solving the reduction of dimensionality of data in a nonlinear manner. The underneath neural model, which can be integrated as neuromorphic hardware, becomes suitable for intelligent processing in edge computing within Internet of Things systems. In this sense, to achieve a meaningful performance with a low complexity one-layer spiking neural network, the training phase uses the metaheuristic Artiﬁcial Bee Colony algorithm with an objective function from the principals in the machine learning science, namely, the modiﬁed Stochastic Neighbor Embedding algorithm. To demonstrate this fact, complex benchmark data were used and the results were compared with those generated by a reference network with continuous-sigmoid neurons. The goal of this work is to demonstrate via numerical experiments another method for training spiking neural networks, where the used optimizer comes from metaheuristics. Therefore, the key issue is deﬁning the objective function, which can relate optimally the information at both sides of the spiking neural network. Certainly, machine learning techniques have advanced in deﬁning efﬁcient loss functions that can become suitable objective function candidates in the metaheuristic training phase. The practicality of these ideas is shown in this article. We use MSE values for evaluating the relative quality of the results and also co-ranking matrices.


Introduction
Present and following information systems demand reviewing intelligent processes for their efficiency. A sample of this dynamic issue is drawn from [1], where three "influencedomain" categories, i.e., information processing, information transmission, and learning strategy, lead the analysis of existing deep learning architectures. On the other hand, from the context of the neuromorphic system, the computational spiking hardware is recognized as both energy-efficient and reliable, where mostly the spike-timing-dependent plasticity rule (STDP rule) is used for setting the weighting values of the synaptic connections, i.e., the learning rule [2]. In these architectures, layers of spiking neurons are aimed to resemble the temporal encoding of the information communication as it occurs in biological neural systems. The STDP rule represents the central mechanism for modifying the conductance value in biological synapses. Multilayer, recurrent, and hybrid spiking architectures along with their training algorithms are reviewed in [3] and show that the classic backpropagation method can be adapted for the spiking models, finding that it is not quite efficient for dealing with spatiotemporal information flows. The same work concludes that proposing new learning methods is still an open issue, observing as a goal their performance improvement. Reports on neuromorphic hardware implementations include similar arguments [4]. Taking the learning rule of the spiking neural systems as a pure optimization problem, mature metaheuristic algorithms might contribute to establishing efficient solutions for it. Based on this idea , king neurons to compute efficiently the

Main Topics
Current IoT technology can get both intelligence with edge computing and energy efficiency with neuromorphic hardware. These associations are conceived in this work due to the introduction of a metaheuristic process that optimizes a spiking neural network, making possible a type of this hardware. The following seven subsections deal with the topics that indirectly argue in favor of this concept.

Dimensionality Reduction
One of the topics in machine learning research is related to reducing the dimensionality of complex data, i.e., discovering the classes and the relevant information that represent them. In the linear sense of reducing the dimensionality, the classic Principal Component Analysis (PCA) method computes the eigenvectors and eigenvalues of the covariance matrix of the data set, which denote the new or principal directions on which the classes might be interpreted [13]. The neural architecture for approaching PCA is based on the linear auto-encode [14], whose diagram is depicted in Figure 1. Its training phase can be regarded in two parts, i.e., (1) there are two neuron layers namely, the bottleneck layer and the output layer, whose encoding and decoding weights reach optimal values in the MSE sense, and according to an auto-association task; then, (2) the output layer can be discarded, leaving the bottleneck layer and its encoding weights already able to approach PCA of new data.

Input vector
Output vector  This training process can be extended into deeper layers, but considering nonlinear neurons and appropriate loss functions to continue discovering efficiently other relevant features [15]. Based on this background and with the aim at reducing efficiently the dimensionality of input data, we propose using only the Bottleneck layer with nonlinear spiking neurons along with their encoding weights. This minimal spiking network architecture would be trained with the ABC algorithm, becoming the subject of this work and leading to propose a neuromorphic hardware realization. Likewise, we observe that this effort can benefit intelligent IoT systems.

Internet of Things
The Internet of Things was defined in its origin as a subnetwork to allow instruments, measuring and monitoring devices and mobile communications systems, collect data derived from human being activities for their living support, such as health care, environmental, smart city, commercial and industrial [16]. Technically, the goal in IoT systems was conceived to provide an initial platform from where the information could be sent to specific centers for their analysis, evaluation, and diagnosis; then, after conducting a decision-making process a signal was sent elsewhere through the internet-road for the action of control, security or other.

Edge Computing
At present, Internet of Things systems are becoming the first link among others on the path between the source and the destination of internet connections and where the edge computing scheme is introduced. This fact brings computing intelligence and security capabilities into the data generation domain. Hence, IoT systems can benefit from deep learning methods, which do efficient information reduction tasks with knowledge discovery [17,18]; this has the impact of optimizing costs and safety of wide-area-networks, clouds and data centers. Likewise, the trend of edge computing with intelligence includes creating frameworks and design standards to define smart IoT architectures. This type of framework is found in [19].

Pre-Training Concept
The primary nonlinear autoencoder is able to reduce the dimensionality in complex data, using its Bottleneck layer. The nonlinear autoencoder model was originally taken as a basis to generate early deep autoencoder architectures, where a cross-entropy error observed its optimization [15]. At present, a larger number of deep architectures are counted for [20], which efficiently discover subclasses in complex data. However, the strategy behind their success comes frequently from doing pre-training [21], leading to finding essential features in images, video, audio, and speech data in tasks for identification, classification, and reduction of memory. This concept is included in this paper for completeness.

Edge Computing Advantages by Using Neuromorphic Systems
The design of IoT devices and systems includes relevant advantages when the edge computing paradigm is delivered by Neuromorphic Systems (NS). We count the following.

1.
Energy efficiency: NS are more suitable than general-purpose computing hardware. The expected difference is several orders of magnitude.

2.
Low latency: NS surpass at processing continuous streams of data, reducing the delay to accomplish intelligent tasks.

3.
Adaptive processing: NS can adapt to changes in context.

4.
Rapid learning or adaptation: NS show capabilities beyond most standard artificial intelligence systems.
Extending this context, the spiking neural network presented in this work gives a basis for conceiving a neuromorphic processor prototype, whose advantages are below.

•
The Artificial Bee Colony (ABC) algorithm is a gradient-free optimizer that conveniently replaces the STDP rule and backpropagation-based algorithms. • The loss function in the t-distributed Stochastic Neighbor Embedding (t-SNE) method can act as the objective function in the ABC algorithm. • The training phase is guided by the objective function of the ABC algorithm. • The trained spiking neural network achieves efficiently a nonlinear dimensionality reduction, which is suitable for either task: reducing memory size or classifying complex data.

Metaheuristic Optimization
Optimal solutions in many complex engineering problems can be drawn from using metaheuristics methods, representing a practical alternative to gradient-based numerical procedures. Notably, the ABC algorithm [22] belongs to this optimization category. Its pseudocode is presented in Algorithm 1. In this pseudocode, the instructions named Employed Bees' labor, Onlooker Bees' labor, and Scout Bees' labor follow the Algorithms 2-4, respectively, described in [23]. An employed bee randomly selects one food source x k from the current population and chooses a random dimension index j. The new food source is obtained by: where i is the solution currently being exploited, k is a randomly chosen neighbor solution, and φ ij is randomly chosen from [−1, 1] drawn from the uniform distribution. An employed bee unloads the nectar and then gives information to onlookers about the quality and the location of her source. High-quality solutions have a high chance to be selected but the solutions with low quality can also be selected by the onlookers [23]. The probability of each solution (p i ) can be calculated proportionally to its fitness value: (2) Algorithm 2 Employed Bees' Labor Pseudocode (Modified from [15]). for f oodsources < x i do 4: new solution x ← produced by Equation (1) 5: exploit(x i ) ← 0 9: else 10: exploit(x i ) ← exploit(x i ) + for f ood sources x i do 4: p i ← assign probability by Equation (2) 5: end for 6: i ← 0 7: r ← rand(0, 1) 10: if r < p(i) then 11: t ← t + 1 12: x ← a new solution produce by Equation (1) 13: For a specific food source or solution X, if employed bees and onlooker bees cannot find any new food source or new solutions in its neighborhood to replace it, the food or solutions may be trapped into local minima. For this case, Scout Bees' Labor must be applied. if exploit(si) > limit then 5: x si ← random solution by Equation (1) 6:

The T-Sne Machine Learning Method
The t-distributed Stochastic Neighbor Embedding (t-SNE) [24] method was proposed as an alternative method for visualizing complex information in a reduced space or dimension, optimizing the quality of the clustering process to discover classes from data. It is successful due to its avoiding the "crowding-problem", by both introducing a symmetrized cost function and using a t-distribution instead of a Gaussian one in the data. Although these changes favored using a stochastic gradient for the original optimization process, the resulting cost function also becomes suitable as the objective function for our experiments with the ABC algorithm. The below equation defines the objective function in this work. Its value is equal to the cross-entropy up to an additive constant.
in this equation, KL(P||Q) is the Kullback-Leibler divergence between the joint distribution probability P in a high dimensionality space and the joint probability distribution Q in a low dimensionality space. The quantities p ij and q ij are the pairwise similarities in the low and high dimensionalities spaces, respectively. They are given with the Equations (4) and (5).

Topics Related to Preparing Experiments
The next four subsections include succinct but relevant information for realizing the experiments, which generate the supporting results of the work.

Neuron Model Used in This Work
The spiking neuron model created by Izhikevich [25] takes advantage of other existing ones, due to its low-complexity numerical formulation. In addition, it reproduces with high accuracy the dynamic behavior of a biological neuron for a wide set of brain cortical tissues. It is widely accepted for designing computational neuromorphic hardware. We have used its dynamic differential equations, given below for producing another nonlinear spiking behavior, i.e., a sigmoidal response.

Variables and Parameters Name
v membrane potential u membrane recovery v r resting membrane potential v t cutoff or threshold potential C membrane capacitance I injected current a, b, c, d, k dynamics type parameters Table 2 lists the values of the parameters a, b, c, and d of some of the main Izhikeivh neural model configurations, which are presented in a complementary publication by Izhikevich [26]. In particular, we find new and optimal values of the parameters: a, b, c, d, and k, to get a sigmoidal (or nonlinear) response in the rate of firing, which is achieved in a supervised manner and minimizing an MSE quantity. Table 3, presents their values, which were found with the ABC algorithm. The values of other variables are also included. Table 3. Values of a, b, c, d and k for sigmoidal response.

Name
Value In Figure 2, we can see the sigmoidal response of the spiking neuron as a nonlinear type configuration, where the parameters of Table 3 were used. The minimum and maximum Firing Rate (FR) values of the sigmoidal spiking neuron are 4 and 109 spikes/s, respectively, and, 4 and 104 for the reference continuous-sigmoid neuron. 70 × 10 −4 is the MSE value between these two graphs. To calculate the MSE value, the two graphs have been previously normalized dividing them by their respective maximum values of the firing rate.

Spiking Neural Network Architecture Used in This Work
The architecture of the spiking neural network (SNN) for the experiments in this work is shown in Figure 3. There are 3 sigmoidal spiking neurons, i.e., SSN 1 , SSN 2 and SSN 3 , which generate spikes at Firing Rates in spikes/s: FR 1 , FR 2 and FR 3 , respectively. These neurons receive the electrical currents I 1 , I 2 and I 3 , which come from the set of transduced voltages {e i,j } by the set of conductances {s j,k }.

Databases
In this work, three databases have been considered. The first one refers to the writing of numbers [27]. This was created considering the positions of the pixels on a Tablet with 500 × 500 pixels, and the pressure values exerted by 44 writers when writing the numbers from 0 to 9. We have taken 1000 samples from this database and each sample is made up of 16 attributes.
The second database refers to a set of images of flowers, fruits and faces [28]. There are 100 images for each type. They are gray-scale images with a dimension of 100 rows by 100 columns. 67 characteristics have been extracted from each image, which corresponds to the coefficients of the Local Binary Pattern (LBP) image processing technique, with a processing block size of 64 × 64.
The third database is a fused bi-temporal optical-radar data for cropland classification [29]. There are 98 radar features and 76 optical features, that is, each sample is made up of 174 features. Seven crop type classes exist for this data set as follows: 1-Corn; 2-Peas; 3-Canola; 4-Soybeans; 5-Oats; 6-Wheat; and 7-Broad-leaf. Table 4 summarizes the parameters of the three database.

Training Phase Strategy
The SNN architecture depicted in Figure 3 has been arranged to play the encoding process of an autoencoder and from it, 3 Dimensions are synthesized. This represents a reduction in the dimensionality of the databases. This is equivalent to extracting the first three main components from each database. To understand the training phase for dimensionality reduction of the databases a flow diagram is presented in Figure 4. The process starts by extracting from the input data a probability distribution function (PDF) named P, and this will be the reference for the search process, of the parameters corresponding to the synaptic conductances of three spiking neurons with the sigmoidal response. The SNN receives the data as a voltage vector and responds with an output matrix corresponding to the firing rate of each neuron.
The output matrix of the SNNs is considered as a possible dimensionality reduction of the data and from this process, a probability distribution function of the t-student type is obtained, named Q. The probability functions P and Q are compared to obtain their measure of similarity by calculating the Kullback-Leibler (K-L) divergence.
The K-L divergence is used as an objective function to be minimized by the ABC metaheuristic algorithm. The ABC algorithm is executed to determine the optimal values of the synaptic conductances. This process is repeatedly applied until reaching a certain value, as a criterion for optimality of this process.

Experimental Results
This section presents the results of the dimensionality reduction task by the proposed spiking neural network. They come from three experiments, whose databases are named: Handwriting Numbers, Images, and Croplands. It is also included a discussion subsection.

Relative Quality of Efficiency by Mse Evaluations
In principle, one spiking neural network can generate M output clusters from a set of input training vectors with M classes. In addition, each cluster will be visualized in a 3D space, whose 3 axes will be named Dimension 1, Dimension 2, and Dimension 3. The quality of this spiking neural network in performing the dimensionality reduction task over one class of M is evaluated with the MSE operator according to Equation (7).
The parameters in Equation (7) are explained in Table 5, where there can be two landmark experiments, i.e., by (1) a reference neural network and by (2) a random numbers generator with uniform distribution. Case (1) refers to a classic neural network with continuous-sigmoid neurons, replacing the spiking neurons. In case (2), the distances come from a random function contained in a standard numerical platform.  by the spiking network, in the same class.
From the above, we should realize two MSE quantifications for evaluating the quality of the spiking neural network. Comparing MSE-1 with MSE-2 is expected to get an MSE-1 lower than an MSE-2 to prove that the spiking neural network is able to correlate with the reference neural network. The adverse fact namely, MSE-1 equal to MSE-2, would show a lack of correlation.

Quality Measurement by the Co-Ranking Matrix
Given the dimensionality reduction, the evaluation of the new data mapping about the data expressed in high dimensions remains to be completed. To determine this objective measurement, it is proposed to use the co-ranking matrix, introduced in [30] and defined as Q kl = (i, j) | ρ ij = k and r ij = l , where ρ ij represents the rank of x i with respect to x j in the high-dimensional space of the original database. For its part, r ij is the rank of y i with respect to y j in the proposed low-dimensional space, and |·| denotes the number of elements in the set. The ranks are calculated using Equation (9), where δ ij represents the distance from x i to x j while d ij goes from y i to y j : When the values in the co-ranking matrix are set on the main diagonal, it is interpreted that the mapping is perfect. Although this quotation is almost impossible to fulfill for original data, the measurements themselves suffer from slight variations, causing alterations interpreted as intrusions or extrusions. An intrusion is considered when a point j has a lower rank with respect to the point i, in the representation of low-dimensional data compared to high-dimensional data. Conversely, an extrusion occurs when a point j has a higher rank with respect to point i in low-dimensional representation compared to high-dimensional data.The intrusion points are located below the main diagonal; the extrusion points above.
An improvement in the calculation of the co-ranking matrix is presented in [31]. The quality of the representation of low-dimensional data is measured as a function of the number of points that remain within a k-neighborhood during the projection process.To carry out this measurement, we will use Equation (10), where N represents the number of total points and K ends the value of the neighborhood: Q NX (K) is sensible for a small number of samples, where the mapping error might become large. In addition, over a particular value of K, the error saturates to a low value. The relation Q NX (K) versus the number of samples is a curve that follows a diagonal, where it represents the ideal Q NX (K). The expected quality comes from the numerical Q NX (K) graph, where it saturates. The saturation point is identified where the ranking matrix starts to follow the main diagonal. We will evaluate the dimensionality reduction task visually. We will compare saturations in both numerical and experimental curves.

Handwriting Numbers Experiment
To determine the performance of the spiking neural network in reducing dimensionality, its result has been compared with the response obtained with a reference neural network built with continuous-sigmoid neurons. The three databases have been evaluated with both architectures and the results are presented below. Figure 5 shows the dimensionality reduction obtained with the two architectures when the database corresponds to handwriting numbers. In Figure 5a, the distribution of the 10 classes granted by the SNN is shown. In Figure 5b, the distribution of the classes with reference neural network is shown. In Figure 5c, the distribution of the class corresponding to digit zero is shown. In blue color, the SNN distribution is shown, and black color, the reference continuous-sigmoid neural network distribution is shown. The process is repeated similarly way for the rest of the classes and they are graphed in Figure 5d-l.
The response of each neuron, namely with spiking and continuous neurons, is considered as a dimension in the graphs of Figure 5. The values of the centers of the classes for each dimension are presented in Figure 6. From these graphs, it is observed the high similarity in the response of both networks.
As it was stated in Section 4.1, and for evaluating the quality of the dimensionality reduction task in this experiment, i.e., Handwriting Numbers Experiment, we calculate the quantities MSE-1 and MSE-2. They are presented in Figure 7.
As mentioned in Section 4.2, to evaluate the quality of the data representation in low dimensions, the calculation of the co-ranking matrix has been obtained. Figure 8 contains the graphs of the co-ranking matrix, which corresponds to the dimension reduction of the database of handwritten numbers, for three techniques, numerical t-SNE, Spiking Neural Network, and Reference Neural Network.

Images Experiment
In this experiment, the database contains 3 classes of gray-scale images, i.e., Flowers, Fruits, and Human Faces. As it has proceeded in Section 4.3, we also do the same to show the dimensionality reduction result by both the SNN and the reference neural network in Figure 9a  As it was stated in Section 4.3, and for evaluating the quality of the dimensionality reduction task in this experiment, i.e., Images Experiment, we calculate the quantities MSE-1 and MSE-2. They are presented in Figure 11. In order to evaluate the quality of the representation of the data in low dimensions, the calculation of the co-ranking matrix has been obtained. Figure 12 contains the graphs of the co-ranking matrix for the database comprised of images of Flowers, Fruits, and Faces. Three techniques have been evaluated, numerical t-SNE, Spiking Neural Network, and Reference Neural Network.

Croplands Experiment
In this experiment, the database contains seven classes of complex data related to croplands, i.e., corn, peas, canola, soybeans, oats, wheat, and broad-leaf. As has proceeded in Section 4.3, we also do the same to show the dimensionality reduction result by both the SNN and the reference neural network in Figure 13a  The values of the centers of the classes for each dimension are presented in Figure 14. We observe that both networks generate similar distributions. As it was stated in Section 4.1, and for evaluating the quality of the dimensionality reduction task in this experiment, i.e., Croplands Experiment, we calculate the quantities MSE-1 and MSE-2. They are presented in Figure 15. In order to evaluate the quality of the representation of the data in low dimensions, the calculation of the co-ranking matrix has been obtained. Figure 16 contains the graphs of the co-ranking matrix for the database that referer to cropland. Three techniques have been evaluated, numerical t-SNE, Spiking Neural Network, and Reference Neural Network.

Discussion
In the machine learning community, a fair evaluation of any dimensionality reduction method would require knowing the reconstruction error that uses some available inverse mapping process. In our case, this is not possible for the moment. Alternatively, we consider three standard techniques for the evaluation, using a visual data method and two quality functions, MSE and co-ranking matrices. The four points below argue these evaluations on the results:

1.
The visual data method can be applied in Figures 5, 9, and 13. The deduction drawn is that all they are coherent.

2.
The numerical evaluation of the centers of the classes, i.e., Dimension 1, Dimension 2, and Dimension 3 in Figures 6, 10, and 14, by both the reference and spiking networks follow the same trend pattern. This proves that the spiking networks reproduce a performance near to the reference network. In addition, the third trace due to the numeric t-SNE was presented as the theoretical case, whose saturation is the same as the spiking and reference networks. The evaluation at this point demonstrates that the quality of the spiking networks are satisfactory.
The above 4 points show that the spiking neural networks in this article have a suitable performance in the nonlinear dimensionality reduction task.
The works [6][7][8][9][10][11][12] cited in the Introduction section about training spiking systems through using metaheuristics focus on either improving the backpropagation type algorithm or demonstrating the performance of evolutionary algorithms do not get concerned with using another objective function from artificial intelligence theory, but the fundamental MSE estimator. For completeness, we mention alternative and recent strategies to create spiking systems that include transforming continuous deep neural systems into spiking systems [32], using particular training formulations which compare superb with the SpikeProp algorithm but associated with particular loss function [33], retaking the biological STDP rule for convolutional deep networks [34], tuning convolutional-type networks with a biologically plausible algorithm [35], to name a few.
An extension of this experimental work would include more databases in number and attributes to categorize the efficiency of the intelligent capability of our spiking networks.

Conclusions
The spiking neural network paradigm is becoming suitable for implementing in hardware intelligent tasks. Therefore, these systems would have advantages derived from neuromorphic schemes. However, the training method for establishing efficiently the weighting values of the synaptic connection between neurons is still an open issue. In this work, we have presented meaningful results proving that the ABC algorithm is capable to solve this complex task. The reason for the efficiency in this metaheuristic algorithm is due to its capability to leave local minima, which certainly leads to find optimal solutions.
The problem conducted in this paper is related to reduce the nonlinear dimensionality of complex databases, which was realized with the proposed spiking neural network using available or generated attributes of the data. The nonlinearity transformation from high to low dimensions was implemented with sigmoidal spiking neurons, which were created also with the ABC algorithm. The loss function defined in the t-SNE method served as the objective function in the ABC algorithm, keeping its original properties on the user data during the training phase of the spiking neural network.
The intelligent process performed with the trained spiking neural network in this work is still partial. It needs a further spiking softmax network for evaluating numerically the actual classes.

Data Availability Statement:
The data and codes presented in this study are available by contacting Álvaro Anzueto-Ríos.

Conflicts of Interest:
The authors declare no conflict of interest concerning the supporting ideas and result reported in this paper.