Temperature Compensation Method Based on an Improved Firefly Algorithm Optimized Backpropagation Neural Network for Micromachined Silicon Resonant Accelerometers

The output of the micromachined silicon resonant accelerometer (MSRA) is prone to drift in a temperature-changing environment. Therefore, it is crucial to adopt an appropriate suppression method for temperature error to improve the performance of the accelerometer. In this study, an improved firefly algorithm-backpropagation (IFA-BP) neural network is proposed in order to realize temperature compensation. IFA can improve a BP neural network’s convergence accuracy and robustness in the training process by optimizing the initial weights and thresholds of the BP neural network. Additionally, zero-bias experiments at room temperature and full-temperature experiments were conducted on the MSRA, and the reproducible experimental data were used to train and evaluate the temperature compensation model. Compared with the firefly algorithm-backpropagation (FA-BP) neural network, it was proven that the IFA-BP neural network model has a better temperature compensation performance. The experimental results of the zero-bias experiment at room temperature indicated that the stability of the zero-bias was improved by more than an order of magnitude after compensation by the IFA-BP neural network temperature compensation model. The results of the full-temperature experiment indicated that in the temperature range of −40 °C~60 °C, the variation of the scale factor at full temperature improved by more than 70 times, and the variation of the bias at full temperature improved by around three orders of magnitude.


Introduction
Micromachined silicon resonant accelerometers have the advantages of small size, low power consumption, mass production, and quasi-digitalization [1,2]. They have been widely used in aerospace and Earth exploration fields [3,4]. Due to the influence of the materials and fabrication, the output of the MSRA is prone to drift in a temperature-changing environment. The influence of temperature on the accelerometer is mainly reflected in the following aspects: firstly, the Young's modulus of the silicon will change with temperature [5]; secondly, the mismatched thermal expansion coefficient of silicon and the base material will create thermal stress in the resonator; thirdly, the fabricating and packaging process will lead to the generation of residual thermal stress [6][7][8]. Temperature drift is one of the key factors limiting further improvements in the accuracy of MSRAs, and it needs to be suppressed [9]. At present, the common temperature drift suppression methods mainly include temperature control systems, structural optimization, and temperature compensation models. The temperature control system can make an accelerometer operate at a constant temperature, which effectively avoids the influence of temperature change. In general, after compensation in the zero-bias experiment at room temperature. The full-temperature experiment indicated that in the temperature range of −40 • C~60 • C, the variation of the scale factor at full temperature improved by more than 70 times, and the variation of the bias at full-temperature improved by around 1000 times after compensation.

BP Neural Network
The BP neural network is a multilayer feedforward neural network based on error backpropagation, and its learning process includes forward signal propagation and error backpropagation. The BP neural network is composed of an input layer, a hidden layer, and an output layer. When the number of hidden layer neurons is appropriately determined, a three-layer neural network with a single hidden layer can achieve the approximation of an arbitrary nonlinear function [29]. For problems with low complexity, a neural network with one hidden layer is sufficient, and an excessive number of hidden layers may lead to difficulties in convergence. The structure of the three-layer BP neural network is shown in Figure 1. The BP neural network does not need the relational expression between the input and output, and only needs to be trained with a large amount of data to obtain a high-precision model. Therefore, it has the advantages of strong self-adaptation and strong learning and it has been widely used in many fields. posed in this study. It optimizes the initial weights and thresholds of the BP neural network by taking advantage of the optimization ability of IFA and then applies the obtained neural network model to perform compensation for the accelerometer to improve the temperature performance of the accelerometer. Finally, a temperature experimental system was established to verify the effect of the IFA-BP neural network compensation model. The results showed that the stability of the zero-bias improved by more than 10 times after compensation in the zero-bias experiment at room temperature. The full-temperature experiment indicated that in the temperature range of −40 °C~60 °C , the variation of the scale factor at full temperature improved by more than 70 times, and the variation of the bias at full-temperature improved by around 1000 times after compensation.

BP Neural Network
The BP neural network is a multilayer feedforward neural network based on error backpropagation, and its learning process includes forward signal propagation and error backpropagation. The BP neural network is composed of an input layer, a hidden layer, and an output layer. When the number of hidden layer neurons is appropriately determined, a three-layer neural network with a single hidden layer can achieve the approximation of an arbitrary nonlinear function [29]. For problems with low complexity, a neural network with one hidden layer is sufficient, and an excessive number of hidden layers may lead to difficulties in convergence. The structure of the three-layer BP neural network is shown in Figure 1. The BP neural network does not need the relational expression between the input and output, and only needs to be trained with a large amount of data to obtain a high-precision model. Therefore, it has the advantages of strong self-adaptation and strong learning and it has been widely used in many fields. To conduct the training of the BP neural network, the topology of the neural network should first be established, which involves determining the number of hidden layers and the number of neurons in each layer. The number of input neurons and output neurons is determined by the application. In this study, the input data are the frequency difference and the temperature, so the number of input neurons is 2 = m . The output data are the To conduct the training of the BP neural network, the topology of the neural network should first be established, which involves determining the number of hidden layers and the number of neurons in each layer. The number of input neurons and output neurons is determined by the application. In this study, the input data are the frequency difference and the temperature, so the number of input neurons is m = 2. The output data are the corrected acceleration values, so the number of output neurons is n = 1. The number of hidden neurons can be determined by the following equation [30]: where c is a constant between 0 and 10.
After establishing the topology of the neural network, the weights and thresholds of the neural network need to be initialized. By inputting the signal X = x 1 , x 2 , · · · , x m into the neural network through the input layer, the output is obtained [31]: where σ is the activation function of the hidden layer, ϕ is the activation function of the output layer; w mj is the weight assigned by the input layer to the j th neuron of the hidden layer, w li is the weight from the hidden layer to the i th neuron of the output layer, b i is the threshold of the i th neuron of the hidden layer, and b n is the threshold of the n th neuron of the output layer. The output of the neural network is compared with the expected output to obtain the network error function.
where y k is the expected output. The weights and thresholds of the neural network are corrected backward by finding the partial derivatives of the error function and by using a gradient descent until the error or the number of iterations reaches the set value. However, BP neural networks have inherent disadvantages, such as low convergence accuracy and low robustness. Some researchers have proposed the use of modern optimization algorithms to improve the performance of BP neural networks. The firefly algorithm is one of the modern optimization algorithms which simulates the luminous characteristics and attraction behavior of fireflies and has the advantages of a simple structure, few adjustment parameters, and excellent search capability.

Standard Firefly Algorithm
The firefly algorithm was first proposed by Xin-She Yang in 2008 [32] and is a heuristic algorithm derived from the behavior of fireflies in nature. The basic principle of FA is that each firefly can emit light, and the intensity of its brightness is related to its position. Fireflies with high brightness will attract fireflies with less brightness, and the greater the intensity of brightness, the greater the attraction. By updating the position of the fireflies, we gradually find the position with the highest intensity of brightness. In FA, the intensity of brightness is the value of the objective function, and the position is the feasible solution to the problem to be solved.
The method randomly initialized n fireflies in the D-dimensional space, with each firefly positioned at X = (x 1 , x 2 , · · · , x D ). Therefore, the attractiveness of a firefly is [33]: where β 0 is the initial attractiveness, γ is the light absorption coefficient, and r ij is the Euclidean distance between the firefly X i and firefly X j . Each firefly will move toward all fireflies whose brightness is greater than its own, and its position is updated by the equation: where n is the number of iterations, α 0 is the step size factor, and ξ is a random number subject to uniform distribution on [0,1].

Improved Firefly Algorithm
The FA has been used in many fields since it was proposed, but it has disadvantages; for example, it easily falls into bad local minima and has possible oscillations in the later iterations. To address these problems, an improved firefly algorithm (IFA) was proposed to further improve the optimal finding ability and stability of the FA.

Improvement of the step size strategy
In the standard FA, the step size is constant during the iterations. If the chosen step size is too large, the algorithm can quickly move to the optimum at the beginning of the iterations, which makes the algorithm have a strong global search capability; however, at the end of the iterations, the optimum may be skipped or the iteration may oscillate due to the large step size, which greatly reduces the accuracy of the algorithm. If the chosen step size is too small, it can make the algorithm approach the local optimum more accurately in the later iterations; however, in the early iterations, it will lead to slow convergence and reduce the global search capability. To balance the ability for global search and local search, an adaptive step size update formula was designed using a nonlinear function. The step size is calculated as follows: where k is the adjustment factor and maxgen is the maximum number of iterations. According to the formula, the value of the step size decreases with an increase in the number of iterations. At the beginning of the iterations, a larger step size can make the algorithm's global search capability stronger and improve the iteration efficiency, whereas at the end of the iteration, a smaller step size can enhance the local search capability of the algorithm and improve the optimization accuracy.

Improvement of the best firefly
According to the FA, each firefly is attracted to the firefly with the greatest brightness, which makes the position of the best firefly greatly affect the algorithm's search process. For instance, if the best firefly is near the bad local minima, it is possible to make the algorithm converge to it. To update the position of the best firefly, the Metropolis criterion in the simulated annealing algorithm [34] combined with mutations is introduced. The Metropolis criterion can be expressed as follows: when the system is subjected to a perturbation that generates a new value X as well as a new objective function value C , the system calculates the acceptance probability P i according to the Metropolis criterion to determine whether to update the new value. The calculation formula is as follows: where T is the temperature of the simulated annealing algorithm, relative to the number of iterations. If the objective function value of the new value is better, the new value is received with a probability of 1; otherwise, if the objective function value of the new value is worse than the original value, the new value is received with a probability of exp((C − C )/T). A worse value can be received with the probability calculated by the Metropolis criterion, which enhances the stability of the algorithm and provides the opportunity to jump out of the bad local minima. The temperature in the simulated annealing algorithm gradually decreases with the number of iterations, which means the probability of accepting the worse value decreases in the later iterations. In order to apply the Metropolis criterion to the firefly algorithm, some modifications to the original formulation are required. where ε is the correction factor. Meanwhile, in order to perturb the best firefly, combining the mutations and balancing the convergence speed and accuracy, the following variational perturbation formula is used to perturb the firefly's position.
where ξ is a random number in the range [0,1] and s is the width of the definition domain.
The variational perturbation formula also decreases nonlinearly with an increase in the number of iterations, avoiding or attenuating the oscillations that may result from a fixed perturbation. Thus, the position of the best firefly subject to perturbation is updated as: where ∆X = (∆x 1 , ∆x 2 , · · · , ∆x D ) is the perturbation matrix.

Improvement of the firefly position update strategy
In the FA, the fireflies move toward all fireflies with greater brightness, which inevitably leads to rapid convergence of the firefly population. However, if the firefly population gathers at a bad position prematurely, the search ability of the algorithm decreases rapidly, which makes it difficult to jump out of the bad local minima. The position update strategy of fireflies is improved by randomly selecting only several individuals from the firefly population, so that the fireflies only move toward selected fireflies with greater brightness. In addition, the fireflies may be out of the search space after updating the position, and the positions of fireflies that are out of the search range are corrected according to the following equation: (11) where x min and x max are the boundary values of the search space.

Simulation Analysis of Optimization Algorithm Based on Test Functions
Two test functions were selected to evaluate the performance of the proposed IFA algorithm and the standard FA, and images of the two test functions are shown in Figure 2. The population size of the two firefly algorithms is 10, the number of dimensions is 2, the number of iterations is 50, the initial step size α 0 is 0.25, and the initial attractiveness β 0 and the light absorption γ are both set to 1.

Simulation Analysis of Optimization Algorithm Based on Test Functions
Two test functions were selected to evaluate the performance of the proposed IFA algorithm and the standard FA, and images of the two test functions are shown in Figure  2. The population size of the two firefly algorithms is 10, the number of dimensions is 2, the number of iterations is 50, the initial step size 0  is 0.25, and the initial attractiveness 0  and the light absorption  are both set to 1.
The Schaffer function has many local minima distributed near the global minima, which can be used to evaluate the global optimization searching ability of the algorithms. The evolution curves of the two optimization algorithms for the Schaffer function are shown in Figure 3. Figure 3a shows the evolution curves of the FA. As shown in the figures, although the best fitness of the algorithm can converge to the global minima of −1, the average fitness falls to local minima around −0.45, indicating that the FA is less stable. Figure 3b shows that the IFA can make both the best fitness and the average fitness converge to the global minima at −1, which indicates that the IFA has great stability and global search capability.  The Schaffer function has many local minima distributed near the global minima, which can be used to evaluate the global optimization searching ability of the algorithms. The evolution curves of the two optimization algorithms for the Schaffer function are shown in Figure 3. Figure 3a shows the evolution curves of the FA. As shown in the figures, although the best fitness of the algorithm can converge to the global minima of −1, the average fitness falls to local minima around −0.45, indicating that the FA is less stable. Figure 3b shows that the IFA can make both the best fitness and the average fitness converge to the global minima at −1, which indicates that the IFA has great stability and global search capability. The Rastrigin function has many local minima distributed throughout the definition domain, making it easy to fall to the local minima for the optimization algorithm. Figure  4 shows the evolution curves of the two optimization algorithms for the Rastrigin function. Figure 4 shows the evolution curves of the FA and the IFA, and it can be seen that the FA falls to the local minima and converges around 2. The IFA also falls to the local minima of 1.3 at the eighth iteration, but because the improved algorithm makes it accept a worse value with a certain probability, it jumps out of the local minima of 1.3 at the 12th iteration and finally converges to the global minima of 0. This simulation results indicate that IFA has a stronger ability to jump out of the local minima compared with the FA. The Rastrigin function has many local minima distributed throughout the definition domain, making it easy to fall to the local minima for the optimization algorithm. Figure 4 shows the evolution curves of the two optimization algorithms for the Rastrigin function. Figure 4 shows the evolution curves of the FA and the IFA, and it can be seen that the FA falls to the local minima and converges around 2. The IFA also falls to the local minima of 1.3 at the eighth iteration, but because the improved algorithm makes it accept a worse value with a certain probability, it jumps out of the local minima of 1.3 at the 12th iteration and finally converges to the global minima of 0. This simulation results indicate that IFA has a stronger ability to jump out of the local minima compared with the FA.

IFA-BP Neural Network Model
The selection of the initial weights and thresholds has a large impact on the training, and bad initial values may make the training too slow or even fail. The proposed IFA was used to optimize the initial weights and thresholds of the BP neural network with a better global search ability and the ability to jump out of local minima. The flow chart of the algorithm for optimizing the BP neural network by IFA is shown in Figure 5, and its main steps are as follows: (1) Initialize the BP neural network topology and all parameters of the IFA and generate the initial population of fireflies.
(2) Calculate the brightness of the fireflies, perturb the best fireflies, and calculate the acceptance probability according to the Metropolis criterion; other fireflies randomly select the moving target and all fireflies update their positions according to the position update rule.
(3) Update the step size and determine whether the termination iteration condition is satisfied. If so, save the best firefly and jump to Step (4); otherwise, return to Step (2).

IFA-BP Neural Network Model
The selection of the initial weights and thresholds has a large impact on the training, and bad initial values may make the training too slow or even fail. The proposed IFA was used to optimize the initial weights and thresholds of the BP neural network with a better global search ability and the ability to jump out of local minima. The flow chart of the algorithm for optimizing the BP neural network by IFA is shown in Figure 5, and its main steps are as follows: (1) Initialize the BP neural network topology and all parameters of the IFA and generate the initial population of fireflies.
(2) Calculate the brightness of the fireflies, perturb the best fireflies, and calculate the acceptance probability according to the Metropolis criterion; other fireflies randomly select the moving target and all fireflies update their positions according to the position update rule.
(3) Update the step size and determine whether the termination iteration condition is satisfied. If so, save the best firefly and jump to Step (4); otherwise, return to Step (2).
(4) Use the best firefly as the initial weights and thresholds of the BP neural network. (5) Substitute the accelerometer output dataset into the model and calculate the error function.
(6) Determine whether the termination iteration condition has been satisfied. If it has been satisfied, save the weights and thresholds and quit training; if not, update the weights and thresholds of the neural network using the gradient descent method and return to Step (5).

Experiments
Determination of the parameters of the neural network model requires numerous accelerometer output data and temperature data for training. Therefore, it was necessary to build a temperature experimental system to conduct a series of temperature experiments on the accelerometer. The temperature experimental system mainly consisted of a high-precision turntable, a temperature chamber and a MSRA prototype. The experimental system is shown in Figure 6.

Experiments
Determination of the parameters of the neural network model requires numerous accelerometer output data and temperature data for training. Therefore, it was necessary to build a temperature experimental system to conduct a series of temperature experiments on the accelerometer. The temperature experimental system mainly consisted of a highprecision turntable, a temperature chamber and a MSRA prototype. The experimental system is shown in Figure 6.

Experiments
Determination of the parameters of the neural network model requires numerous accelerometer output data and temperature data for training. Therefore, it was necessary to build a temperature experimental system to conduct a series of temperature experiments on the accelerometer. The temperature experimental system mainly consisted of a high-precision turntable, a temperature chamber and a MSRA prototype. The experimental system is shown in Figure 6. The MSRA prototype was used to conduct a zero-bias experiment at room temperature and a full-temperature experiment. The output of the accelerometer and the experimental temperature were collected. The zero-bias experiment was conducted at room temperature with a sampling rate of one time per second for at least an hour and a half when the input acceleration was zero gravity (0 g). The operating temperature of the MSRA in the full-temperature experiment was between −40 • C and 60 • C, and the test nodes were established at intervals of 20 • C. After the operating temperature reached the expected value, the temperature was maintained for one hour. Each temperature node collected data for four states of acceleration at +0 g, +1 g, −0 g, and −1 g, and each state was collected for at least 30 s while the time interval for data collection was set to one second. The variations of the scale factor and bias at full temperature were used as the evaluation index in the full-temperature experiment. The variation of the scale factor at full temperature was the standard deviation of the scale factor at different temperature points divided by the mean value. The variation of the bias at full temperature was the standard deviation of the bias at different temperature points, and the bias was calculated as: where U +0 g , U +1 g , U −0 g , and U −1 g are the outputs of the accelerometer at +0 g, +1 g, −0 g, and −1 g, respectively.

Results and Discussion
By repeating the zero-bias experiment at room temperature and full temperature experiment, six datasets for the zero-bias experiment at room temperature and five datasets for the full-temperature experiment were obtained, among which four datasets of the zerobias experiment at room temperature and three datasets of the full-temperature experiment were randomly selected as the training datasets, and the remaining datasets were utilized for evaluation.
The FA-BP and IFA-BP neural network temperature compensation models used in the experiments have an input layer, a hidden layer, and an output layer. The number of neurons in the input layer is 2, the number of neurons in the hidden layer is 10, and the number of neurons in the output layer is 1. We used the FA-BP model and IFA-BP model to train the datasets 30 times, and the FA-BP model and IFA-BP model with the smallest RMSE (root mean square error) were selected for comparison. To evaluate the performance of the compensation model for different test datasets, the two test datasets of the zero-bias experiment at room temperature were compensated by the same model, and the two test datasets of the full-temperature experiment were also compensated by the same model.  Table 1. According to the table, it can be seen that the zero-bias stability of the accelerometer has been significantly improved by the neural network, and the same compensation model is effective for different test datasets, indicating the applicability of the temperature compensation model based on a neural network. Among them, after compensation by the IFA-BP model, the zero-bias stability after 30 min of startup, zero-bias stability after 20 min of startup, and zero-start zero-bias stability are better than those of the FA-BP model. A comparison of the MSRA before and after compensation by the IFA-BP is shown in Figure 7. For graphing convenience, a mean value was subtracted from the measured data.  The variation of the scale factor and bias at full temperature of two test datasets before and after compensation are shown in Table 2. The comparison results indicate that the IFA-BP model has a better effect in improving the accelerometer's performance at full temperature than the FA-BP model. The variation of the scale factor at full temperature after compensation by the IFA-BP model has improved by more than 70 times, and the variation of the bias at full temperature has improved by around three orders of magnitude. Figure 8 shows a comparison of the frequency output of the MSRA at six temperature points and four states before and after compensation by the IFA-BP model. Figure 9 and Figure 10 show the curves of the accelerometer's output with temperature before and after IFA-BP neural network model compensation at four acceleration states. In these figures, it can be seen that the accelerometer's frequency output before compensation is affected by the temperature and produces frequency drift. After compensation by the IFA-BP model, the temperature performance of accelerometer has improved greatly. The variation of the scale factor and bias at full temperature of two test datasets before and after compensation are shown in Table 2. The comparison results indicate that the IFA-BP model has a better effect in improving the accelerometer's performance at full temperature than the FA-BP model. The variation of the scale factor at full temperature after compensation by the IFA-BP model has improved by more than 70 times, and the variation of the bias at full temperature has improved by around three orders of magnitude. Figure 8 shows a comparison of the frequency output of the MSRA at six temperature points and four states before and after compensation by the IFA-BP model. Figures 9 and 10 show the curves of the accelerometer's output with temperature before and after IFA-BP neural network model compensation at four acceleration states. In these figures, it can be seen that the accelerometer's frequency output before compensation is affected by the temperature and produces frequency drift. After compensation by the IFA-BP model, the temperature performance of accelerometer has improved greatly.

Conclusions
In order to improve the temperature performance of MRSAs, an accelerometer temperature compensation method based on an improved firefly algorithm optimized BP neural network was proposed in this study. The IFA was used to optimize the initial values of the BP neural network to improve the convergence accuracy and robustness of the neural network's training. Zero-bias experiments at room temperature and full-temperature experiments were conducted on the MSRA, and temperature compensation models of the FA-BP and IFA-BP neural networks were established. A comparison of the accelerometer's output before and after compensation shows that the proposed IFA-BP neural network temperature compensation model is effective. The zero-bias stability of the accelerometer in the zero-bias experiment at room temperature improved by more than 10 times. The variation of the scale factor at full temperature improved by more than 70 times and the variation of the bias at full temperature improved by around 1000 times. The results indicate that the temperature compensation method based on the IFA-BP neural network is suitable for MRSAs in both zero-bias experiments at room temperature and full-temperature experiments.