Energy-Efficient Gabor Kernels in Neural Networks with Genetic Algorithm Training Method

Deep-learning convolutional neural networks (CNNs) have proven to be successful in various cognitive applications with a multilayer structure. The high computational energy and time requirements hinder the practical application of CNNs; hence, the realization of a highly energy-efficient and fast-learning neural network has aroused interest. In this work, we address the computing-resource-saving problem by developing a deep model, termed the Gabor convolutional neural network (Gabor CNN), which incorporates highly expression-efficient Gabor kernels into CNNs. In order to effectively imitate the structural characteristics of traditional weight kernels, we improve upon the traditional Gabor filters, having stronger frequency and orientation representations. In addition, we propose a procedure to train Gabor CNNs, termed the fast training method (FTM). In FTM, we design a new training method based on the multipopulation genetic algorithm (MPGA) and evaluation structure to optimize improved Gabor kernels, but train the rest of the Gabor CNN parameters with back-propagation. The training of improved Gabor kernels with MPGA is much more energy-efficient with less samples and iterations. Simple tasks, like character recognition on the Mixed National Institute of Standards and Technology database (MNIST), traffic sign recognition on the German Traffic Sign Recognition Benchmark (GTSRB), and face detection on the Olivetti Research Laboratory database (ORL), are implemented using LeNet architecture. The experimental result of the Gabor CNN and MPGA training method shows a 17–19% reduction in computational energy and time and an 18–21% reduction in storage requirements with a less than 1% accuracy decrease. We eliminated a significant fraction of the computation-hungry components in the training process by incorporating highly expression-efficient Gabor kernels into CNNs.


Introduction
Deep learning [1,2] has been used in a variety of detection [3][4][5], classification [6], and inference tasks [7,8].Convolutional deep features extracted from multiple layers, also known as "hypercolumn" features [9], are the foundation of deep learning.However, the huge amounts of computational energy and time required for regular trainable weight kernel learning hinders their extensive practical application.The large-scale structure and training complexity of convolutional neural networks (CNNs) necessitate the most computationally intensive workloads across all modern computing platforms [10], so the implementation of energy-efficient kernels in neural networks is of interest.
A variety of hardware and software techniques have been proposed to achieve energy and time efficiency [11][12][13][14].One aspect involves reducing the testing complexity of the networks.Another focuses on reducing the training complexity of a CNN [15,16].The latter is an important challenge for convolutional networks as high computational energy and time are needed.Especially in some online training applications in which the training time is included in the system action, reducing the training complexity means reducing the whole real-time action.At the same time, the dependence on platform can be reduced by the implementation of an energy-efficient network, which is important to some simple platforms.CNN-based feature extraction is a purely data-driven technique that can learn robust representations from data, but usually at the cost of high computational time and energy requirements [17].Trainable random kernels in CNNs are adjusted to the appropriate value step-by-step through the continuous cycle iteration of samples to express the depth characters.
Sufficient training data and iteration times demonstrated that the training of regular trainable weight kernels is a process that consumes considerable amounts of energy and time.Anisotropic filtering techniques have been widely used to extract robust image representations [18,19].The optimization of anisotropic filters is much simpler.Anisotropic filters determined by a small part of samples can often effectively express the common features of all samples.Hence, the combination of CNNs with anisotropic filters is a valid process to reduce the computational energy and time consumption of networks.Among them, Gabor filters have attracted attention due to their ability to provide discriminative and informative features [20].Compared with other filtering approaches, Gabor filters are advantageous in spatial information extraction, including edges and textures [21].Through the deep convolution neural network visualization toolbox "Yo shin ski/Deep-Visualization-Toolbox" [22], we can obtain convolutional kernels for each level by visualizing a pretrained CNN model, as shown in Figure 1.The visualization of CNN kernels indicates that they are often redundant, and most of the convolutional kernels are similar to some structural Gabor filters.This similarity and the inherent error resiliency of the networks were the basis of incorporating Gabor kernels into CNNs.
Electronics 2019, 8 FOR PEER REVIEW 2 of 18 focuses on reducing the training complexity of a CNN [15,16].The latter is an important challenge for convolutional networks as high computational energy and time are needed.Especially in some online training applications in which the training time is included in the system action, reducing the training complexity means reducing the whole real-time action.At the same time, the dependence on platform can be reduced by the implementation of an energy-efficient network, which is important to some simple platforms.CNN-based feature extraction is a purely data-driven technique that can learn robust representations from data, but usually at the cost of high computational time and energy requirements [17].Trainable random kernels in CNNs are adjusted to the appropriate value step-bystep through the continuous cycle iteration of samples to express the depth characters.Sufficient training data and iteration times demonstrated that the training of regular trainable weight kernels is a process that consumes considerable amounts of energy and time.Anisotropic filtering techniques have been widely used to extract robust image representations [18,19].The optimization of anisotropic filters is much simpler.Anisotropic filters determined by a small part of samples can often effectively express the common features of all samples.Hence, the combination of CNNs with anisotropic filters is a valid process to reduce the computational energy and time consumption of networks.Among them, Gabor filters have attracted attention due to their ability to provide discriminative and informative features [20].Compared with other filtering approaches, Gabor filters are advantageous in spatial information extraction, including edges and textures [21].Through the deep convolution neural network visualization toolbox "Yo shin ski/Deep-Visualization-Toolbox" [22], we can obtain convolutional kernels for each level by visualizing a pretrained CNN model, as shown in Figure 1.The visualization of CNN kernels indicates that they are often redundant, and most of the convolutional kernels are similar to some structural Gabor filters.This similarity and the inherent error resiliency of the networks were the basis of incorporating Gabor kernels into CNNs.Based on the inherent error resiliency of the networks and the similarity between convolutional kernels and Gabor filters, we introduced Gabor kernels into CNNs, and propose Gabor CNNs to reduce the computational energy and time required by networks, while maintaining a competitive output accuracy.In order to effectively imitate the structural characteristics of traditional weight kernels, we improved the two-dimensional Gabor filters by introducing parameters k1, k2, and k3 to adjust the oriented complex sinusoidal grating part.The improved Gabor filters have stronger frequency and orientation representations.We trained standard CNNs with a few samples and epochs as preliminary CNNs (or evaluation structures) to introduce and evaluate improved Gabor kernels in the first convolutional layer.To optimize the Gabor kernels in the first convolutional layer of preliminary CNNs, we designed a new multipopulation genetic algorithm (MPGA) [23,24] training method.In the iteration of MPGA, we optimized the Gabor kernels of the network by minifying Based on the inherent error resiliency of the networks and the similarity between convolutional kernels and Gabor filters, we introduced Gabor kernels into CNNs, and propose Gabor CNNs to reduce the computational energy and time required by networks, while maintaining a competitive output accuracy.In order to effectively imitate the structural characteristics of traditional weight kernels, we improved the two-dimensional Gabor filters by introducing parameters k 1 , k 2 , and k 3 to adjust the oriented complex sinusoidal grating part.The improved Gabor filters have stronger frequency and orientation representations.We trained standard CNNs with a few samples and epochs as preliminary CNNs (or evaluation structures) to introduce and evaluate improved Gabor kernels in the first convolutional layer.To optimize the Gabor kernels in the first convolutional layer of preliminary CNNs, we designed a new multipopulation genetic algorithm (MPGA) [23,24] training method.In the iteration of MPGA, we optimized the Gabor kernels of the network by minifying global samples error, based on a small portion of samples and the structure of neural networks.Through the much simpler optimization of Gabor kernels, required computing resources were reduced, rather than using a purely data-driven method.Simultaneously, we created a procedure to train Gabor CNNs, termed the fast training method (FTM).In the FTM, we designed the Gabor convolutional layer of the network using MPGA based on a small portion of samples, but trained the remaining network structures using back-propagation [1] based on all samples.The FTM reasonably allocates the energy consumption of each layer of the network.Given the structure of Gabor CNNs and the MPGA training method, we eliminated a significant fraction of the computation-heavy components in the training process, thereby producing a considerable reduction in computational energy and time consumption required for training.The experimental results show that our proposed methodology is energy-efficient and reduces storage requirements and training time, with minimal degradation of the classification accuracy.

Gabor Filters
After experiencing long-term evolution in nature, the biological vision system is one of the best information processing systems with the most complete mechanism.Riaz et al. used two-dimensional (2D) Gabor filters as a simple cell receptor field function to simulate its characteristics and responses [25,26].A circular 2D Gabor filter is a combination of a 2D Gaussian function and an oriented complex sinusoidal grating.It is widely used to extract spatial local spectral features, which are important for multiple pattern recognition.Many previous works have attempted to extract important spatial information including edges and textures, with the advantage of Gabor filters in sparse representation.Gabor filters have been successfully applied to face recognition [27,28], fingerprint identification [29][30][31], and phase extraction [32] using Gabor atoms in sparse expression.A 2D Gabor filter as expressed as: where i = √ 1, g σ,γ (x, y) is a Gaussian envelope defined as: where x, y represents the coordinates of a pixel, x = x cos θ + y sin θ, y = −x sin θ + y cos θ, σ denotes the standard deviation of a Gaussian envelope, λ denotes the wavelength of the span-limited sinusoidal grating, θ denotes the orientation in the interval 0-180 • , γ represents the aspect ratio of the space, and ψ represents the phase shift.A 2D Gabor filter G σ,θ,λ,γ,ψ (x , y ) can be decomposed into a real part R σ,θ,λ,γ,ψ (x , y ) and an imaginary part I σ,θ,λ,γ,ψ (x , y ), as shown in the following equations: R σ,θ,λ,γ,ψ x , y = g σ,γ x , y A circular 2D Gabor filter is a combination of a 2D Gaussian function and an oriented complex sinusoidal grating.The standard deviation of a Gaussian envelope σ controls the receptive field of Gabor filters.λ and θ control the wavelength and orientation of Gabor filters, respectively.The phase shift ψ controls the distance between the center of the sinusoidal grating and the receptive field.Part of the traditional Gabor filters with different parameters are shown in Figure 2.

Convolutional Neural Network: Basics
The basic operation of CNNs consists of two stages: training and testing [33].The testing process is basically forward propagation [34,35] and is used to test random data inputs, which is much simpler compared to training in terms of computational energy and time consumption.In the training process, a large number of samples are circularly iterated in CNNs, and random parameters are adjusted through gradient computation and weight update-both require considerable computation and time [16].In this paper, we propose a method to achieve energy efficiency in training by removing a significant portion of the energy-hungry gradient computation and weight update operations with MPGA optimization.
CNNs consist of convolutional [36,37], pooling [36,37], and fully connected layers [38].The nonlinear activation function [39,40] is applied at the end of the convolutional and fully connected layers.The convolutional layers are used to extract the depth features of the images [1,37].The process is shown in Equation (5): In Equation ( 5), is the input, is jth output feature map of the th convolutional layer, ( ) indicates the nonlinear activation function, indicates the th convolutional kernels of the th convolutional layer, and B l represents the learnable bias added after the convolution operation before entering the activation function.⨂ indicates the operation of convolution.represents all feature maps in th convolutional layer.Figure 3   The main energy-hungry steps of CNN training (back-propagation) are gradient computation and the weight updates of the convolutional and fully connected layers.In Sarwar et al. [16], the authors proposed an energy model for quantifying the energy consumption of the network during

Convolutional Neural Network: Basics
The basic operation of CNNs consists of two stages: training and testing [33].The testing process is basically forward propagation [34,35] and is used to test random data inputs, which is much simpler compared to training in terms of computational energy and time consumption.In the training process, a large number of samples are circularly iterated in CNNs, and random parameters are adjusted through gradient computation and weight update-both require considerable computation and time [16].In this paper, we propose a method to achieve energy efficiency in training by removing a significant portion of the energy-hungry gradient computation and weight update operations with MPGA optimization.
CNNs consist of convolutional [36,37], pooling [36,37], and fully connected layers [38].The nonlinear activation function [39,40] is applied at the end of the convolutional and fully connected layers.The convolutional layers are used to extract the depth features of the images [1,37].The process is shown in Equation (5): In Equation ( 5), X l−1 i is the input, X l j is jth output feature map of the lth convolutional layer, f ( ) indicates the nonlinear activation function, Kernel l ij indicates the ith convolutional kernels of the jth convolutional layer, and B l represents the learnable bias added after the convolution operation before entering the activation function.⊗ indicates the operation of convolution.M j represents all feature maps in lth convolutional layer.Figure 3

Convolutional Neural Network: Basics
The basic operation of CNNs consists of two stages: training and testing [33].The testing process is basically forward propagation [34,35] and is used to test random data inputs, which is much simpler compared to training in terms of computational energy and time consumption.In the training process, a large number of samples are circularly iterated in CNNs, and random parameters are adjusted through gradient computation and weight update-both require considerable computation and time [16].In this paper, we propose a method to achieve energy efficiency in training by removing a significant portion of the energy-hungry gradient computation and weight update operations with MPGA optimization.
CNNs consist of convolutional [36,37], pooling [36,37], and fully connected layers [38].The nonlinear activation function [39,40] is applied at the end of the convolutional and fully connected layers.The convolutional layers are used to extract the depth features of the images [1,37].The process is shown in Equation ( 5): In Equation ( 5), is the input, is jth output feature map of the th convolutional layer, ( ) indicates the nonlinear activation function, indicates the th convolutional kernels of the th convolutional layer, and B l represents the learnable bias added after the convolution operation before entering the activation function.⨂ indicates the operation of convolution.represents all feature maps in th convolutional layer.Figure 3    The main energy-hungry steps of CNN training (back-propagation) are gradient computation and the weight updates of the convolutional and fully connected layers.In Sarwar et al. [16], the authors proposed an energy model for quantifying the energy consumption of the network during training.In a conventional CNN denoted by [784 6c 2s 12c 2s 10o] (784 input neurons, 6 and 12 feature maps (6c and 12c) in the first and second convolutional layer, respectively, each followed by a pooling layer (2s) of stride 2, and finally, a fully connected output layer of 10 neurons, 10o), the second convolutional layer uses 27% of the overall energy consumption during training, whereas the first convolutional layer consumes 20%.In order to reduce computation and ensure network accuracy, we replaced trainable random kernels with Gabor filters.Because Gabor kernels do not require gradient computation and weight updates, the computation and time consumption of the Gabor convolutional layers are reduced.

Combination of Gabor Filters and CNNs
Studies have reported work undertaken to combine Gabor filters and CNNs.Such work can be divided into two main categories.In the first category, Gabor filters are used as a preprocessing step for neural network training to increase the accuracy of the networks, using the advantage of Gabor filters being similar to the simple human visual system [41,42] in the processes of texture feature expression and description [43].In this method type, the optimization of Gabor filters usually adopts an empirical formula, which is not universal.In addition, this method increases the computational energy and time consumption with a preprocessing step.In the other category, Gabor filters are introduced into CNNs as Gabor kernels (or a Gabor convolutional layer) to eliminate the preprocessing step.In Chang et al. [15,44], the authors attempted to remove the preprocessing overhead by introducing Gabor filters into the first convolutional layer of a CNN.In Mahmoud et al. [44], Gabor filters were used to replace the random filter kernels in the first convolutional layer.The training was then limited to the remaining layers of the CNN.In Chang et al. [15], the Gabor kernels in the first layer were fine-tuned with training.In other words, the authors used Gabor filters as a good starting point for training the classifiers, which helps with convergence.In Sarwar et al. [16], Gabor filters were introduced in two convolutional layers.The authors discussed a scheme where Gabor filters are used to replace trainable random kernels in CNNs in order to decrease the computational energy and time consumption.
In the above studies, Gabor kernels were successfully introduced into the training process of CNNs, with less computational energy and time consumption.However, in the proposed methods, Gabor filters were used to replace convolutional kernels in pretrained CNNs, or to only selectively replace trainable kernels in CNNs, without training Gabor kernels in the convolutional layers of networks.Gabor kernels selected by empirical formula are always aiming at certain kinds of problems and may not match the network.In this work, we improved upon the traditional Gabor filters and designed a new MPGA training method to optimize them.Through the MPGA training method, our Gabor kernels become trainable in Gabor CNNs and more universal.

Overview of Our Method
To address the computing-resource-saving problem, we develop Gabor convolutional neural network (Gabor CNN) and propose a fast training method (FTM) to train it.The structure of Gabor CNN and update of the convolutional layer and weight matrix in the fully connected layer is shown in Figure 5.The first convolutional layer of Gabor CNN consists of fixed Gabor kernels rather than trainable random weight matrix.In the FTM, the traditional Gabor filters are improved to imitate the structural characteristics of traditional weight kernels and introduced into the first convolutional layer.Then, we design a new training method based on the multipopulation genetic algorithm (MPGA) to optimize improved Gabor kernels instead of back-propagation.The training of Gabor kernels with MPGA is much more energy-efficient because less samples and iterations are needed.Finally, the rest of the Gabor CNN parameters are trained with back-propagation and all samples.The FTM for Gabor CNNs is shown in Figure 8.By replacing back-propagation with energy-efficient MPGA in the convolutional layer, we could eliminate a significant fraction of the computation-hungry components in the training process.

Improved Gabor Kernels in the First Convolutional Layer
For a given network, the size and number of trainable convolutional kernels are fixed.In order to maintain the accuracy and simultaneously reduce the computational energy and time consumption in the process of network training, we used Gabor kernels whose size and number were the same as regular trainable kernels, and whose orientations were equally spaced in direction and space to replace the kernels of the first layer of the CNN.In other words, the first layer of a Gabor CNN consists of Gabor kernels.With the introduction of Gabor filters with high-efficiency feature expression as convolutional kernels into CNNs, the feature extraction of the image can be expressed as: The similarity between Gabor filters and convolutional kernels in the network and the inherent error resiliency of the networks are the basis of incorporating Gabor kernels into the proposed network.To enrich Gabor transformation, we introduced parameters k 1 , k 2 , and k 3 to adjust the oriented complex sinusoidal grating part and defined the improved Gabor as: For example, for k 1 = 1, k 2 = 0, and k 3 = 1, the shape of Gabor filter is conventional with oriented grating; for k 1 = 2, k 2 = 2, and k 3 = 1 2 , the Gabor filter is circular.Figure 4 shows part of the improved 2D Gabor filters.
Electronics 2019, 8 FOR PEER REVIEW 6 of 18

Improved Gabor Kernels in the First Convolutional Layer
For a given network, the size and number of trainable convolutional kernels are fixed.In order to maintain the accuracy and simultaneously reduce the computational energy and time consumption in the process of network training, we used Gabor kernels whose size and number were the same as regular trainable kernels, and whose orientations were equally spaced in direction and space to replace the kernels of the first layer of the CNN.In other words, the first layer of a Gabor CNN consists of Gabor kernels.With the introduction of Gabor filters with high-efficiency feature expression as convolutional kernels into CNNs, the feature extraction of the image can be expressed as: The similarity between Gabor filters and convolutional kernels in the network and the inherent error resiliency of the networks are the basis of incorporating Gabor kernels into the proposed network.To enrich Gabor transformation, we introduced parameters , , and to adjust the oriented complex sinusoidal grating part and defined the improved Gabor as: For example, for = 1, = 0, and = 1 , the shape of Gabor filter is conventional with oriented grating; for = 2, = 2, and = , the Gabor filter is circular.In this paper, the Gabor CNN has two convolutional layers and each of them is followed by a subsampling layer.It has a fully connected layer that produces the final classification result.The first convolutional layer has k Gabor kernels, and the second convolutional layer extracts 2k features for each input with 2k random kernels.Taking the Mixed National Institute of Standards and Technology database (MNIST) [36] as an example, we used six Gabor kernels (with θ = 0°, 30°, 60°, 90°, 120°, and 150°) to form the first layer; hence, the second convolutional layer included 2k 2 = 72 random kernels.The 12 feature maps from the second layer were used as feature vector inputs to the fully connected layer, which produced the final classification result.As a rule of thumb, we set the ratio of oriented, circular, and complicated Gabor kernels to 10:3:1.
To produce the predicted output, samples undergo forward propagation of a Gabor CNN-the same as the CNNs.However, the gradient computation and weight update are operated only at the fully connected and second convolutional layers.The Gabor kernels in the first convolutional layer are optimized by MPGA with minimal energy-consuming steps.Hence, in our proposed method, we In this paper, the Gabor CNN has two convolutional layers and each of them is followed by a subsampling layer.It has a fully connected layer that produces the final classification result.The first convolutional layer has k Gabor kernels, and the second convolutional layer extracts 2k features for each input with 2k random kernels.Taking the Mixed National Institute of Standards and Technology database (MNIST) [36] as an example, we used six Gabor kernels (with θ = 0 • , 30   , 120 • , and 150 • ) to form the first layer; hence, the second convolutional layer included 2k 2 = 72 random kernels.The 12 feature maps from the second layer were used as feature vector inputs to the fully connected layer, which produced the final classification result.As a rule of thumb, we set the ratio of oriented, circular, and complicated Gabor kernels to 10:3:1.
To produce the predicted output, samples undergo forward propagation of a Gabor CNN-the same as the CNNs.However, the gradient computation and weight update are operated only at the fully connected and second convolutional layers.The Gabor kernels in the first convolutional layer are optimized by MPGA with minimal energy-consuming steps.Hence, in our proposed method, we achieve energy efficiency by eliminating a large portion of the gradient computation and weight update operations in the Gabor convolutional layer.In Figure 5, a schematic diagram shows an update of the convolutional layer and weight matrix in the fully connected layer in a Gabor CNN.Considering k = 6, there are six kernels in the first convolutional layer.Hence, the number of second convolutional layer kernels is 72, as mentioned above.In the Gabor CNN, the second convolutional layer and fully connected layer consist of trainable random kernels, but the first convolutional layer consists of optimized Gabor kernels.We trained standard CNNs with a few samples and epochs as preliminary CNNs (or evaluation structures) to introduce and evaluate improved Gabor kernels in the first convolutional layer.In order to select the most fitting Gabor kernels, training based on MPGA and preliminary CNNs is used to optimize the Gabor filters' parameters.Then, Gabor filters are established with an optimal set of parameters and are defined as Gabor kernels and introduced into the first convolutional layer.To train other parameters, only the gradient matrix of the fully connected and second convolutional layers is calculated by back-propagation with sample error.After that, the weights update is performed with gradient matrix and fixed learning rate, the same as in a conventional CNN.During continuous training, the Gabor CNNs meet the accuracy requirements and achieve energy efficiency.The comparison between MPGA optimization and back-propagation in the first convolutional layer is described in the next chapter.
Electronics 2019, 8 FOR PEER REVIEW 7 of 18 Considering k = 6, there are six kernels in the first convolutional layer.Hence, the number of second convolutional layer kernels is 72, as mentioned above.In the Gabor CNN, the second convolutional layer and fully connected layer consist of trainable random kernels, but the first convolutional layer consists of optimized Gabor kernels.We trained standard CNNs with a few samples and epochs as preliminary CNNs (or evaluation structures) to introduce and evaluate improved Gabor kernels in the first convolutional layer.In order to select the most fitting Gabor kernels, training based on MPGA and preliminary CNNs is used to optimize the Gabor filters' parameters.Then, Gabor filters are established with an optimal set of parameters and are defined as Gabor kernels and introduced into the first convolutional layer.To train other parameters, only the gradient matrix of the fully connected and second convolutional layers is calculated by back-propagation with sample error.After that, the weights update is performed with gradient matrix and fixed learning rate, the same as in a conventional CNN.During continuous training, the Gabor CNNs meet the accuracy requirements and achieve energy efficiency.The comparison between MPGA optimization and back-propagation in the first convolutional layer is described in the next chapter.

MPGA Optimization for Gabor Convolutional Kernels
The CNN classification accuracy is based on the efficient expression of input features, and

MPGA Optimization for Gabor Convolutional Kernels
The CNN classification accuracy is based on the efficient expression of input features, and improper substitution of Gabor kernels results in irreparable accuracy degradation.Therefore, the optimization of Gabor kernels is the key to Gabor CNN accuracy.Different from other methods, the optimization of Gabor kernels in a convolutional layer should be suitable for the network structure.In this paper, the number and size of Gabor kernels are determined by a given CNN; in terms of orientation, they are equally spaced.The standard deviation of the Gaussian envelope σ and the frequency of the span-limited sinusoidal grating µ cover the whole solution space.In order to select suitable Gabor kernels and produce a fast-learning first convolutional layer, we propose MPGA optimization for the standard deviation of a Gaussian envelope and the frequency of the span-limited sinusoidal grating.
A simple multipopulation genetic algorithm is an iterative procedure that maintains a constant-sized population (P) of candidate solutions consisting of individuals.During each generation, three genetic operators, called reproduction, crossover, and mutation, are performed to generate new populations.The best individual that represents the optimal solution in each generation is saved for the next generation.A cost function is used to evaluate the fitness values of individuals in each generation.An appropriate cost function is the key to MPGA optimization.The direction of error gradient descent in MPGA optimization for the Gabor convolutional layer must be the same as the direction of the error gradient descent in the training of CNNs.In this paper, we use global error as the cost function of MPGA optimization.The global error refers to the binomial norm of difference between the predicted values of all samples through CNNs and standard values, which is an important index that reflects the accuracy.In other words, the descent of global error can directly reflect the rise of network accuracy.The global error can be expressed as: where y k is the predicted value of the kth sample through Gabor CNNs, y k is its label value, and n represents the number of samples.In this work, the global error was used as the cost function of MPGA optimization.Using the above description, MPGA optimization for Gabor convolutional kernels using the global error as the cost function can be expressed as shown in Figure 6.
The MPGA optimization for Gabor convolutional kernels is as follows: (1) An initial population P with a constant size 2k is randomly generated.k is the number of Gabor convolutional kernels in the first layer.Genes of individuals in the population represent the standard deviation of Gaussian envelope σ and the frequency of the span-limited sinusoidal grating µ of Gabor kernels.(2) The fitness for each initial individual corresponding to Gabor kernels is calculated.
(3) The next generation, including the best individual from the previous generation, is created through reproduction, crossover, and mutation.(4) Each individual in the new generation is evaluated and the best Gabor kernels corresponding to one individual are saved.(5) If the search goal is achieved, or an allowable generation is attained, the best individual corresponding to Gabor kernels is returned as the solution; otherwise, return to step (3).
The training of improved Gabor kernels with MPGA is much more energy-efficient than the back-propagation method with a few samples and iterations.Some of the conventional trained kernels and optimized Gabor kernels of the first convolutional layer are shown in Figure 7.
where is the predicted value of the kth sample through Gabor CNNs, is its label value, and n represents the number of samples.In this work, the global error was used as the cost function of MPGA optimization.Using the above description, MPGA optimization for Gabor convolutional kernels using the global error as the cost function can be expressed as shown in Figure 6.The MPGA optimization for Gabor convolutional kernels is as follows: The training of improved Gabor kernels with MPGA is much more energy-efficient than the back-propagation method with a few samples and iterations.Some of the conventional trained kernels and optimized Gabor kernels of the first convolutional layer are shown in Figure 7.

Fast Training Method for Gabor Convolutional Neural Networks
Trainable random kernels in CNNs are adjusted to the appropriate values by gradient computation and weight update, which are not fit for Gabor kernels.Different from other methods (such as the empirical formula), optimization of Gabor kernels in the convolutional layer should be suitable for the network structure.In this paper, we propose an FTM for Gabor CNNs.The Gabor convolutional layer is fast trained with MPGA and the evaluation structure, and then the rest of the Gabor CNN parameters are trained with back-propagation.The FTM for the Gabor CNNs is shown in Figure 8.

Fast Training Method for Gabor Convolutional Neural Networks
Trainable random kernels in CNNs are adjusted to the appropriate values by gradient computation and weight update, which are not fit for Gabor kernels.Different from other methods (such as the empirical formula), optimization of Gabor kernels in the convolutional layer should be suitable for the network structure.In this paper, we propose an FTM for Gabor CNNs.The Gabor convolutional layer is fast trained with MPGA and the evaluation structure, and then the rest of the Gabor CNN parameters are trained with back-propagation.The FTM for the Gabor CNNs is shown in Figure 8.
The basic operation of FTM consists of two stages: (1) optimization of Gabor convolutional kernels with a few samples and evaluation structure; and (2) training other Gabor CNNs parameters with all samples.
Gabor kernels can effectively express common features (like edges and texture) with simple optimization.Hence, we propose MPGA optimization with fewer iterations for the Gabor convolutional layer instead of training with a fixed learning rate.The Gabor features of the samples are similar within the same class but differ from those in other classes, which has been proven in many classification problems [32,44].Based on the above reasoning, MPGA optimization for Gabor kernels can be completed with few representative samples.Degradation of the classification accuracy can be compensated for by the inherent error resiliency of the networks and the efficient ability to express Gabor kernels.In the first stage, a preliminary CNN is established with traditional randomly weighted kernels as the evaluation structure.Then, Gabor kernels constructed by a random initial population are incorporated into a preliminary CNN to replace the random kernels.An MPGA training method and a few samples from each class are used to optimize the Gabor convolutional layer and the output of the first layer of the optimized Gabor CNN.The training of improved Gabor kernels with MPGA is much more energy-efficient with fewer samples and iterations.In other words, we achieved energy efficiency by eliminating a large portion of the gradient computation and weight update operations in the first stage.In the second stage, other Gabor CNN parameters are trained with back-propagation to improve the network accuracy.The experiment results show that the inherent error resiliency of the networks and the ability of Gabor kernels to efficiently express features can effectively minimize the loss of accuracy.
The training of improved Gabor kernels with MPGA is much more energy-efficient than the back-propagation method with a few samples and iterations.Some of the conventional trained kernels and optimized Gabor kernels of the first convolutional layer are shown in Figure 7.

Fast Training Method for Gabor Convolutional Neural Networks
Trainable random kernels in CNNs are adjusted to the appropriate values by gradient computation and weight update, which are not fit for Gabor kernels.Different from other methods (such as the empirical formula), optimization of Gabor kernels in the convolutional layer should be suitable for the network structure.In this paper, we propose an FTM for Gabor CNNs.The Gabor convolutional layer is fast trained with MPGA and the evaluation structure, and then the rest of the Gabor CNN parameters are trained with back-propagation.The FTM for the Gabor CNNs is shown in Figure 8. Gabor kernels can effectively express common features (like edges and texture) with simple optimization.Hence, we propose MPGA optimization with fewer iterations for the Gabor convolutional layer instead of training with a fixed learning rate.The Gabor features of the samples

Implementation and Experiment
In this section, we present the details of implementation of the Gabor CNN and the MGPA training method.We used modified versions of open-source MATLAB (MathWorks, Natick, MA, USA) codes [45,46] to implement multilayer CNNs for our experiments.We incorporated Gabor kernels into CNNs with MPGA optimization and the evaluation structure mentioned above to realize Gabor CNNs.Firstly, we used an example to analyze the energy efficiency and performance of Gabor CNNs on MINIST.Then, the accuracy, training time, and storage requirement were compared between two structures on the datasets listed in Table 1.We discussed the effects of iterations and the sampling rate in MPGA optimization on the three indicators above.
The architectures of two structures and parameters of MPGA optimization are listed in Table 2.

Energy Efficiency and Performance
We conducted an experiment where we trained a CNN ([784 6c 2s 12c 2s 10o]) and a Gabor CNN with the same structure on the MINIST dataset.Each of the structures was trained for 200 epochs.Figure 9 shows the overall classification accuracy and mean square error obtained from each structure.In the plot (Figure 9a), the red curve corresponds to the CNN samples' mean square error and the green curve corresponds to the Gabor CNN samples' mean square error.K 1 (1.27) and K 2 (0.26) are the initial values of the samples' mean square error of each structure.The difference between K 1 and K 2 indicates that optimized Gabor kernels can drastically reduce the samples' mean square error and that the Gabor CNNs have preliminary identification ability.In the plot (Figure 9b), the red and green curves correspond to the overall classification accuracy of CNN and Gabor CNN, respectively.The degradation in classification accuracy is less than 1% and the curves in the two groups show similar trends.The trends in both curves suggest that CNN and Gabor perform similarly, but Gabor CNN is more efficient in terms of computing.As expected, incorporating Gabor kernels into CNNs causes minimal degradation in network performance, and MPGA optimization is a process that increases the overall classification accuracy.Fewer training data were used and fewer iterations were required for the optimization of Gabor kernels than for the CNN training process.Hence, the energy consumption during the optimization of Gabor kernels is far less than in gradient computation and weight update of the first random convolutional layer with back-propagation.Convolution is the main operator in both MPGA optimization and back-propagation.In Table 3, the number of convolutions (Conv) operated in Fewer training data were used and fewer iterations were required for the optimization of Gabor kernels than for the CNN training process.Hence, the energy consumption during the optimization of Gabor kernels is far less than in gradient computation and weight update of the first random convolutional layer with back-propagation.Convolution is the main operator in both MPGA optimization and back-propagation.In Table 3, the number of convolutions (Conv) operated in optimization of Gabor kernels and back-propagation (in standard CNN) method for one iteration in the forward process (FP) and back process (BP) of the first layer in both structures are listed for comparison.The optimization of Gabor kernels includes two parts: a preliminary CNN is established with randomly weighted kernels and MPGA for the first convolutional layer replaced by Gabor kernels.Correspondingly, the number Conv operated in optimization of Gabor kernels includes the number of Conv in both MPGA and the preliminary CNN.
The computational energy and time required for the optimization of Gabor kernels completes the genetic algorithm iterations and training for the preliminary CNN.Table 3 shows that the number of convolutions operated in MPGA for one iteration in the forward process (FP) and back process (BP) is more than in back-propagation.However, fewer iterations are needed in MPGA optimization than in the back-propagation method.It can be seen from the calculation that the computational energy consumption required for the optimization of Gabor kernels is 3-7% of the back-propagation in the conventional CNN.In Figure 10, the pie chart represents different computation distributions between Gabor and conventional CNNs across different segments.The energy of the error and loss function consumes a small fraction (~1%) in both structures.The energy consumption of the second convolutional layer and forward propagation in the two structures are the same in training.However, the energy consumption proportion of the first convolutional layer in Gabor CNN is about 1%, which is far less than in the conventional CNN.Of the entire 20% energy consumption required for the conventional CNN in the first convolutional layer, 19% of the energy can be saved by the optimized Gabor kernels in Gabor CNNs because Gabor kernels do not require gradient computation and weight update.In MPGA optimization, the sampling rate is 1%, the population number is 50, and the number of genetic iterations is 10.
The computational energy and time required for the optimization of Gabor kernels completes the genetic algorithm iterations and training for the preliminary CNN.Table 3 shows that the number of convolutions operated in MPGA for one iteration in the forward process (FP) and back process (BP) is more than in back-propagation.However, fewer iterations are needed in MPGA optimization than in the back-propagation method.It can be seen from the calculation that the computational energy consumption required for the optimization of Gabor kernels is 3-7% of the back-propagation in the conventional CNN.In Figure 10, the pie chart represents different computation distributions between Gabor and conventional CNNs across different segments.The energy of the error and loss function consumes a small fraction (~1%) in both structures.The energy consumption of the second convolutional layer and forward propagation in the two structures are the same in training.However, the energy consumption proportion of the first convolutional layer in Gabor CNN is about 1%, which is far less than in the conventional CNN.Of the entire 20% energy consumption required for the conventional CNN in the first convolutional layer, 19% of the energy can be saved by the optimized Gabor kernels in Gabor CNNs because Gabor kernels do not require gradient computation and weight update.

Accuracy Comparison
We trained two structures on datasets listed in Table 2 for 200 epochs to determine the accuracy, training time, and storage requirement information.The number of epochs was determined ensuring that all trainings converged and reached saturation.In each dataset, the Gabor CNN and conventional CNN had the same structure, and the latter was used as a baseline.The accuracies of networks are listed in Table 4.  4 shows the accuracy of the two structures on MINIST.The accuracy decrease of the Gabor CNN is minimal.This result can be attributed to the fact that MINIST is a grayscale image dataset where edges and textures are remarkable features in the classification and the advantages of Gabor kernels in spatial information extraction can be fully reflected.However, the decreased accuracy of Gabor CNN in GTSRB is larger but tolerable.The larger accuracy loss may have occurred because the edges and textures are not all remarkable features in Traffic Sign Recognition, considering color is prominent in some situations.In Face Recognition, the accuracy of Gabor CNN is better than that at baseline.As ORL has fewer samples, and MPGA optimization for Gabor kernels can overcome this problem, this may be the reason for better accuracy.

Training Time Comparison
The bar chart in Figure 11

Accuracy Comparison
We trained two structures on datasets listed in Table 2 for 200 epochs to determine the accuracy, training time, and storage requirement information.The number of epochs was determined ensuring that all trainings converged and reached saturation.In each dataset, the Gabor CNN and conventional CNN had the same structure, and the latter was used as a baseline.The accuracies of networks are listed in Table 4.  4 shows the accuracy of the two structures on MINIST.The accuracy decrease of the Gabor CNN is minimal.This result can be attributed to the fact that MINIST is a grayscale image dataset where edges and textures are remarkable features in the classification and the advantages of Gabor kernels in spatial information extraction can be fully reflected.However, the decreased accuracy of Gabor CNN in GTSRB is larger but tolerable.The larger accuracy loss may have occurred because the edges and textures are not all remarkable features in Traffic Sign Recognition, considering color is prominent in some situations.In Face Recognition, the accuracy of Gabor CNN is better than that at baseline.As ORL has fewer samples, and MPGA optimization for Gabor kernels can overcome this problem, this may be the reason for better accuracy.

Training Time Comparison
The bar chart in Figure 11  for the optimization of Gabor kernels than for the gradient computation and weight update of the first random convolutional layer with back-propagation.We observed a 17-19% reduction in training time in MINIST and GTSRB.However, this reduction was not obvious in ORL as ORL had fewer samples and we must increase the sampling rate of MPGA optimization to ensure that the Gabor convolutional kernels are optimized.In conclusion, we achieved a significant reduction in training time with sufficient samples.

Storage Requirement Comparison
Figure 12 shows the storage requirement reduction obtained using the proposed scheme for different applications.We achieved a 6-21% reduction in storage requirements across the various benchmarks for the memory read/write operations.The reduction was not obvious in ORL as ORL had fewer samples and the computation saved by MPGA optimization is not obvious.In forward propagation, each kernel requires one read operation, and in back-propagation, each kernel requires one write operation.We observed an 18-21% reduction in the storage requirements in MINIST and GTSRB.The proposed Gabor CNNs and FTM significantly improved in storage requirements with sufficient samples.

Effects of Iterations and Sampling Rate
The Gabor convolutional layer was trained with MPGA and the evaluation structure.The iterations and the sampling rate are key parameters of MPGA optimization.Insufficient iterations and sampling rate could cause the improper substitution of Gabor kernels, resulting in irreparable accuracy degradation.Conversely, superfluous iterations or sampling rate minimize the reduction in

Storage Requirement Comparison
Figure 12 shows the storage requirement reduction obtained using the proposed scheme for different applications.We achieved a 6-21% reduction in storage requirements across the various benchmarks for the memory read/write operations.The reduction was not obvious in ORL as ORL had fewer samples and the computation saved by MPGA optimization is not obvious.In forward propagation, each kernel requires one read operation, and in back-propagation, each kernel requires one write operation.We observed an 18-21% reduction in the storage requirements in MINIST and GTSRB.The proposed Gabor CNNs and FTM significantly improved in storage requirements with sufficient samples.for the optimization of Gabor kernels than for the gradient computation and weight update of the first random convolutional layer with back-propagation.We observed a 17-19% reduction in training time in MINIST and GTSRB.However, this reduction was not obvious in ORL as ORL had fewer samples and we must increase the sampling rate of MPGA optimization to ensure that the Gabor convolutional kernels are optimized.In conclusion, we achieved a significant reduction in training time with sufficient samples.

Storage Requirement Comparison
Figure 12 shows the storage requirement reduction obtained using the proposed scheme for different applications.We achieved a 6-21% reduction in storage requirements across the various benchmarks for the memory read/write operations.The reduction was not obvious in ORL as ORL had fewer samples and the computation saved by MPGA optimization is not obvious.In forward propagation, each kernel requires one read operation, and in back-propagation, each kernel requires one write operation.We observed an 18-21% reduction in the storage requirements in MINIST and GTSRB.The proposed Gabor CNNs and FTM significantly improved in storage requirements with sufficient samples.

Effects of Iterations and Sampling Rate
The Gabor convolutional layer was trained with MPGA and the evaluation structure.The iterations and the sampling rate are key parameters of MPGA optimization.Insufficient iterations and sampling rate could cause the improper substitution of Gabor kernels, resulting in irreparable accuracy degradation.Conversely, superfluous iterations or sampling rate minimize the reduction in

Effects of Iterations and Sampling Rate
The Gabor convolutional layer was trained with MPGA and the evaluation structure.The iterations and the sampling rate are key parameters of MPGA optimization.Insufficient iterations and sampling rate could cause the improper substitution of Gabor kernels, resulting in irreparable accuracy degradation.Conversely, superfluous iterations or sampling rate minimize the reduction in training time and storage requirements.The line chart in Figure 13a shows the training time reductions of different iterations and sampling rates in MNIST.With increasing iterations and sampling rate, the training time reduction decreased correspondingly.In Figure 13b, the accuracy was maximized when the number of iterations was about 20 and the sampling rate was about 0.1.The results show the significant negative linear correlation between the training time reduction and the sampling rate.The accuracy was the highest value when the sampling rate was over 1%.To obtain the greatest degree of training time reduction, we set the sampling rate to 1% and the number of iterations to 20 in MINIST.

Conclusions
High computational energy and the time required hinder the practical application of CNNs.Due to the advantages of Gabor filters in spatial information extraction, including edges and textures, the combination of CNN with Gabor kernels efficiently reduces the training time and energy consumed.We improved the traditional Gabor filters by strengthening the frequency and orientation representations.Then, we introduced Gabor kernels into CNNs and termed it the Gabor Convolutional Neural Network (Gabor CNN) and designed a new training method based on the multipopulation genetic algorithm (MPGA) to optimize the improved Gabor kernels.We proposed a procedure to train Gabor CNNs, termed FTM.We eliminated a significant fraction of the energyconsuming components of back-propagation in the training process, thereby considerably reducing the energy and time consumption.In FTM, the Gabor convolutional layer was fast trained with

Conclusions
High computational energy and the time required hinder the practical application of CNNs.Due to the advantages of Gabor filters in spatial information extraction, including edges and textures, the combination of CNN with Gabor kernels efficiently reduces the training time and energy consumed.We improved the traditional Gabor filters by strengthening the frequency and orientation representations.Then, we introduced Gabor kernels into CNNs and termed it the Gabor Convolutional Neural Network (Gabor CNN) and designed a new training method based on the multipopulation genetic algorithm (MPGA) to optimize the improved Gabor kernels.We proposed a procedure to train Gabor CNNs, termed FTM.We eliminated a significant fraction of the energy-consuming components of back-propagation in the training process, thereby considerably reducing the energy and time consumption.In FTM, the Gabor convolutional layer was fast trained with MPGA and an evaluation structure, and then the remaining Gabor CNN parameters were trained with back-propagation.Experiments across various benchmark applications with our proposed scheme showed that Gabor CNNs and the MPGA training method reduced computational energy and time by 17-19% and storage requirements by 18-21% with a less than 1% accuracy decrease when samples were sufficient.However, the reduction of computational time and storage requirements are not obvious when sufficient samples are unavailable.Introducing Gabor filters into deeper layers is also difficult because the deeper convolutional layer is complex and the similarity between pretrained deep convolutional kernels and Gabor filters is poor.The accuracy of the network is difficult to guarantee when replacing all convolutional layers.Employing Gabor kernels is also beneficial for larger and more complex CNNs, considering the structure of Gabor CNNs and FTM.In the future, we will introduce Gabor kernels into more complicated CNNs and applications.

Figure 1 .
Figure 1.Convolutional kernels of each level by visualizing a pretrained convolutional neural network (CNN) model.

Figure 1 .
Figure 1.Convolutional kernels of each level by visualizing a pretrained convolutional neural network (CNN) model.
represents a standard architecture of a deep-learning CNN.

Figure 3 .
Figure 3.A standard architecture of a deep-learning Convolutional Neural Network (CNN).
represents a standard architecture of a deep-learning CNN.

Figure 3 .
Figure 3.A standard architecture of a deep-learning Convolutional Neural Network (CNN).

Figure 3 .
Figure 3.A standard architecture of a deep-learning Convolutional Neural Network (CNN).

Figure 5 .
Figure 5. Update of convolutional layer and weight matrix in the fully connected layer in a Gabor CNN.

Figure 5 .
Figure 5. Update of convolutional layer and weight matrix in the fully connected layer in a Gabor CNN.

Electronics 2019, 8 Figure 9 .
Figure 9. (a) Samples' mean square error and (b) overall classification accuracy obtained from each structure.

Figure 9 .
Figure 9. (a) Samples' mean square error and (b) overall classification accuracy obtained from each structure.
shows the normalized training time after 200 epochs of the two structures in each dataset.The training time of conventional CNN in three datasets is about 1.570 × 104 s, 1.026 × 104 s, and 104.7 s, respectively.Correspondingly, the training time of our Gabor CNN is about 1.272 × 104 s, 8.411 × 103 s, and 98.4 s.Since FTM involves optimization of Gabor convolutional kernels and training other parameters, the Gabor CNN training time includes the corresponding two parts.Less training data and iterations are required for optimization of Gabor kernels compared to the training process of CNN.Hence, considerably less training time is required
shows the normalized training time after 200 epochs of the two structures in each dataset.The training time of conventional CNN in three datasets is about 1.570 × 104 s, 1.026 × 104 s, and 104.7 s, respectively.Correspondingly, the training time of our Gabor CNN is about 1.272 × 104 s, 8.411 × 103 s, and 98.4 s.Since FTM involves optimization of Gabor convolutional kernels and training other parameters, the Gabor CNN training time includes the corresponding two parts.Less training data and iterations are required for optimization of Gabor kernels compared to the training process of CNN.Hence, considerably less training time is required for the optimization of Gabor kernels than for the gradient computation and weight update of the first random convolutional layer with back-propagation.We observed a 17-19% reduction in training time in MINIST and GTSRB.However, this reduction was not obvious in ORL as ORL had fewer samples and we must increase the sampling rate of MPGA optimization to ensure that the Gabor convolutional kernels are optimized.In conclusion, we achieved a significant reduction in training time with sufficient samples.Electronics 2019, 8 FOR PEER REVIEW 14 of 18

Figure 11 .
Figure 11.The normalized training time after 200 epochs of the two structures in each dataset.

Figure 12 .
Figure 12.The normalized storage requirement reduction of the two structures in each dataset.

Figure 11 .
Figure 11.The normalized training time after 200 epochs of the two structures in each dataset.

Figure 11 .
Figure 11.The normalized training time after 200 epochs of the two structures in each dataset.

Figure 12 .
Figure 12.The normalized storage requirement reduction of the two structures in each dataset.

Figure 12 .
Figure 12.The normalized storage requirement reduction of the two structures in each dataset.

Electronics 2019, 8 Figure 13 .
Figure 13.(a) Training time reduction and (b) accuracy with different numbers of iterations and different sampling rates in MNIST.

Figure 13 .
Figure 13.(a) Training time reduction and (b) accuracy with different numbers of iterations and different sampling rates in MNIST.

Table 1 .
Benchmarks used in experiments.

Table 2 .
Architectures of two structures and parameters of MPGA optimization.

Table 3 .
Convolutions used in MPGA optimization and back-propagation in the first layer.Correspondingly, the number Conv operated in optimization of Gabor kernels includes the number of Conv in both MPGA and the preliminary CNN.
Preliminary CNN (or evaluation structure) and conventional CNN denoted by [784 6c 2s 12c2s 10o] In MPGA optimization, the sampling rate is 1%, the population number is 50, and the number of genetic iterations is 10.

Table 3 .
Convolutions used in MPGA optimization and back-propagation in the first layer.