Online Nonlinear Error Compensation Circuit Based on Neural Networks

: Nonlinear errors of sensor output signals are common in the ﬁeld of inertial measurement and can be compensated with statistical models or machine learning models. Machine learning solutions with large computational complexity are generally ofﬂine or implemented on additional hardware platforms, which are difﬁcult to meet the high integration requirements of microelectromechanical system inertial sensors. This paper explored the feasibility of an online compensation scheme based on neural networks. In the designed solution, a simpliﬁed small-scale network is used for modeling, and the peak-to-peak value and standard deviation of the error after compensation are reduced to 17.00% and 16.95%, respectively. Additionally, a compensation circuit is designed based on the simpliﬁed modeling scheme. The results show that the circuit compensation effect is consistent with the results of the algorithm experiment. Under SMIC 180 nm complementary metal-oxide semiconductor (CMOS) technology, the circuit has a maximum operating frequency of 96 MHz and an area of 0.19 mm 2 . When the sampling signal frequency is 800 kHz, the power consumption is only 1.12 mW. This circuit can be used as a component of the measurement and control system on chip (SoC), which meets real-time application scenarios with low power consumption requirements.


Introduction
Microelectromechanical system (MEMS) inertial sensors, such as gyroscopes, acceleration meters, angular position sensors (APS), are manufactured by the MEMS process. MEMS inertial sensors have the characteristics of small size, low cost, and low power consumption, and are widely used in the fields of aerospace, intelligent robots, vehicles, mobile equipment, etc. [1,2]. Compared with mechanical gyros or accelerometers, fiber gyros, laser gyros, APS under the printed circuit board (PCB) process, etc., the accuracy of MEMS inertial sensors needs to be further improved to expand the application field [3]. For accuracy improvement, the MEMS inertial sensors realization can be optimized from the structural design, MEMS process, signal processing circuit, error compensation scheme, and measurement and control algorithm. In terms of the error compensation scheme, the stability of output signal amplitude is an important indicator for the accuracy of inertial devices [4,5], which is affected by different kinds of error components.
Relevant works about MEMS inertial sensor measurement errors mainly focus on the deterministic error and nondeterministic error [6]. The deterministic error includes direct-current (DC) bias, misalignment, scale factor error, etc., which can be compensated by calibration [7]. The nondeterministic error, generated by mechanical noise, electrical noise, etc., includes complex nonlinear error components. This type of error directly affects the stability of the output signals and is difficult to be processed directly through device calibration [8]. Therefore, the modeling and compensation schemes of the nonlinear error Y measure = Y true + E sensor + E environmental + E nonlinear . (1) where Y measure represents the measurement output of the sensor, Y true represents the true output of the sensor, E sensor represents the deterministic error of the sensor, E environmental represents the error caused by the environment, and E nonlinear represents the nonlinear error.
ing the input signal, the sensor outputs the measurement signal according to the observation model. The output results are affected by deterministic factors and nonlinear errors. The deterministic factors include sensor deterministic error (bias, scale factor errors, etc.) and environmental error. The actual output model is as follows: where represents the measurement output of the sensor, represents the true output of the sensor, represents the deterministic error of the sensor, represents the error caused by the environment, and represents the nonlinear error.
The nonlinear error in the composition of measurement error is discussed. In related research studies [9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26], the nonlinear error is modeled as a time series model, in which the error at the current moment is related to the observation values at previous moments. The error compensation scheme discussed in this paper is also based on the analysis of the time series model.  continuous data ( ( + − 1), … , ( + 1), ( ), = 1,2,3 …) are selected and preprocessed through the sliding window and sent to the compensation model, in which ( + − 1) is the sampled value at the current moment, and the other data are the sampled value of previous moments. The compensation scheme uses the sampled data of the previous − 1 moments to compensate for the data of the current moment. The compensation model outputs the compensated value, and the output compensated signal of the current time is equal to the difference between ( + − 1) and the compensation value.  The nonlinear error in the composition of measurement error is discussed. In related research studies [9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26], the nonlinear error is modeled as a time series model, in which the error at the current moment is related to the observation values at previous moments. The error compensation scheme discussed in this paper is also based on the analysis of the time series model. Figure 2 shows the schematic of the error compensation scheme. N continuous data (X(N + k − 1), . . . , X(k + 1), X(k), k = 1, 2, 3 . . .) are selected and preprocessed through the sliding window and sent to the compensation model, in which X(N + k − 1) is the sampled value at the current moment, and the other data are the sampled value of previous moments. The compensation scheme uses the sampled data of the previous N − 1 moments to compensate for the data of the current moment. The compensation model outputs the compensated value, and the output compensated signal of the current time is equal to the difference between X(N + k − 1) and the compensation value.

Problem Description and Introduction of Error Compensation Scheme
The output model of the MEMS inertial sensors is shown in Figure 1. After detecting the input signal, the sensor outputs the measurement signal according to the observation model. The output results are affected by deterministic factors and nonlinear errors. The deterministic factors include sensor deterministic error (bias, scale factor errors, etc.) and environmental error. The actual output model is as follows: where represents the measurement output of the sensor, represents the true output of the sensor, represents the deterministic error of the sensor, represents the error caused by the environment, and represents the nonlinear error.
The nonlinear error in the composition of measurement error is discussed. In related research studies [9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26], the nonlinear error is modeled as a time series model, in which the error at the current moment is related to the observation values at previous moments. The error compensation scheme discussed in this paper is also based on the analysis of the time series model.  continuous data ( ( + − 1), … , ( + 1), ( ), = 1,2,3 …) are selected and preprocessed through the sliding window and sent to the compensation model, in which ( + − 1) is the sampled value at the current moment, and the other data are the sampled value of previous moments. The compensation scheme uses the sampled data of the previous − 1 moments to compensate for the data of the current moment. The compensation model outputs the compensated value, and the output compensated signal of the current time is equal to the difference between ( + − 1) and the compensation value.  The signal preprocessing process entails using the moving average filtering [24] to reduce the impact of white noise on the robustness of model fitting. The filtering process is as follows:X where k represents one sample time, M represents the length of the filtered sequence andX(k + M) represents the filtered data. The sampled data are filtered and sent to the compensation model to output the compensated value. The compensation model used in the scheme is described in Figure 3. The compensated value is output through the MLP model, which is a simple neural network that includes an input layer, a hidden layer, and an output layer. The sampled continuous data are sent into the input layer, processed through the hidden layer with neuron model shown in Figure 3b, and then the compensation value is output. The output of a neuron can be expressed as f (W·X + b), in which X is the input vector, W and b are the weight and output bias to be trained and obtained, and f is the activation function.
The signal preprocessing process entails using the moving average filtering [24] to reduce the impact of white noise on the robustness of model fitting. The filtering process is as follows: ( + ) = ( + ) / , ( = 1,2, … ).
where represents one sample time, represents the length of the filtered sequence and ( + ) represents the filtered data. The sampled data are filtered and sent to the compensation model to output the compensated value.
The compensation model used in the scheme is described in Figure 3. The compensated value is output through the MLP model, which is a simple neural network that includes an input layer, a hidden layer, and an output layer. The sampled continuous data are sent into the input layer, processed through the hidden layer with neuron model shown in Figure 3b, and then the compensation value is output. The output of a neuron can be expressed as ( • + ), in which is the input vector, and are the weight and output bias to be trained and obtained, and is the activation function.
(a) (b) The key part of a neural network for nonlinear error fitting is the activation function, and the sigmoid activation function is commonly used in related works, which has the following expressions [20][21][22][23][24]: where is the input value, and ( ) is the output activated value. The exponential calculation in the above function is complicated, and an activation function named leaky rectified linear units (ReLUs) [28] is used in the scheme to replace it. The expression is as follows: where is a constant in the interval (1, +∞). This function is simple to be calculated and easy to be implemented at the circuit level.

Implementation Details and Experimental Results
This section introduces the experimental results based on the above algorithm. The influence of network parameter scale and temperature on the results is also analyzed. As shown in Figure 4, the experimental device includes the inertial sensor ADIS16475 [29]  The key part of a neural network for nonlinear error fitting is the activation function, and the sigmoid activation function is commonly used in related works, which has the following expressions [20][21][22][23][24]: where x is the input value, and f (x) is the output activated value. The exponential calculation in the above function is complicated, and an activation function named leaky rectified linear units (ReLUs) [28] is used in the scheme to replace it. The expression is as follows: where α is a constant in the interval (1, +∞). This function is simple to be calculated and easy to be implemented at the circuit level.

Implementation Details and Experimental Results
This section introduces the experimental results based on the above algorithm. The influence of network parameter scale and temperature on the results is also analyzed. As shown in Figure 4, the experimental device includes the inertial sensor ADIS16475 [29] and the data acquisition circuit [30]. The output data of the angular velocity under zero input in the X-axis direction of the sensor were collected and analyzed. Since the nonlinear error of the sensor is related to the temperature [31], the output data at a temperature of −40.6 • C, −19.6 • C, 0.6 • C, 22.6 • C, 42.6 • C, 62.6 • C, and 82.6 • C were collected in the experiment. A total of 152840 sets of data were analyzed, and the performance of the compensation scheme was analyzed at different temperatures. and the data acquisition circuit [30]. The output data of the angular velocity under zero input in the X-axis direction of the sensor were collected and analyzed. Since the nonlinear error of the sensor is related to the temperature [31], the output data at a temperature of −40.6 °C, −19.6 °C, 0.6 °C, 22.6 °C, 42.6 °C, 62.6 °C, and 82.6 °C were collected in the experiment. A total of 152840 sets of data were analyzed, and the performance of the compensation scheme was analyzed at different temperatures. The collected data are shown in Figure 5, which shows that temperature has an effect on the zero-bias output. The proposed solution works at a fixed temperature, while the error compensation effects at different temperatures are also carried out to show the solution effectiveness at different temperatures and the potential for error compensation in the full temperature range. The following analysis was carried out at a temperature of 22.6 °C, and the results were compared with related works. The data sets were divided into training sets and test sets in accordance with the proportion of 6:4. The first 60% of the collected data were used for training, and the remaining 40% of the data were used for scheme evaluation. Before the data were further processed, the Kolmogorov-Smirnov (KS) test [32] was used to verify whether the training data and the test data set conform to the same distribution. After calculation, the p-value of the training data and the test data is 0.70, which is greater than 0.05, and the null hypothesis is not rejected. The results of the KS test show that it is reasonable to use neural network fitting models to suppress errors in short-term sampling data.
In the experiment, the window size of the moving average filter is a constant. Referring to Equation (2), is equal to 6. The MLP model was trained using the Adam optimizer [32]. The value of in Equation (4) is 5, and other parameters that need to be The collected data are shown in Figure 5, which shows that temperature has an effect on the zero-bias output. The proposed solution works at a fixed temperature, while the error compensation effects at different temperatures are also carried out to show the solution effectiveness at different temperatures and the potential for error compensation in the full temperature range. The following analysis was carried out at a temperature of 22.6 • C, and the results were compared with related works. The data sets were divided into training sets and test sets in accordance with the proportion of 6:4. The first 60% of the collected data were used for training, and the remaining 40% of the data were used for scheme evaluation. and the data acquisition circuit [30]. The output data of the angular velocity under zero input in the X-axis direction of the sensor were collected and analyzed. Since the nonlinear error of the sensor is related to the temperature [31], the output data at a temperature of −40.6 °C, −19.6 °C, 0.6 °C, 22.6 °C, 42.6 °C, 62.6 °C, and 82.6 °C were collected in the experiment. A total of 152840 sets of data were analyzed, and the performance of the compensation scheme was analyzed at different temperatures. The collected data are shown in Figure 5, which shows that temperature has an effect on the zero-bias output. The proposed solution works at a fixed temperature, while the error compensation effects at different temperatures are also carried out to show the solution effectiveness at different temperatures and the potential for error compensation in the full temperature range. The following analysis was carried out at a temperature of 22.6 °C, and the results were compared with related works. The data sets were divided into training sets and test sets in accordance with the proportion of 6:4. The first 60% of the collected data were used for training, and the remaining 40% of the data were used for scheme evaluation. Before the data were further processed, the Kolmogorov-Smirnov (KS) test [32] was used to verify whether the training data and the test data set conform to the same distribution. After calculation, the p-value of the training data and the test data is 0.70, which is greater than 0.05, and the null hypothesis is not rejected. The results of the KS test show that it is reasonable to use neural network fitting models to suppress errors in short-term sampling data.
In the experiment, the window size of the moving average filter is a constant. Referring to Equation (2), is equal to 6. The MLP model was trained using the Adam optimizer [32]. The value of in Equation (4) is 5, and other parameters that need to be Before the data were further processed, the Kolmogorov-Smirnov (KS) test [32] was used to verify whether the training data and the test data set conform to the same distribution. After calculation, the p-value of the training data and the test data is 0.70, which is greater than 0.05, and the null hypothesis is not rejected. The results of the KS test show that it is reasonable to use neural network fitting models to suppress errors in short-term sampling data.
In the experiment, the window size of the moving average filter is a constant. Referring to Equation (2), M is equal to 6. The MLP model was trained using the Adam optimizer [32]. The value of α in Equation (4) is 5, and other parameters that need to be set include the number of neurons in the input layer and the hidden layer, which are defined as N and H, respectively. When N = 28 and H = 8, the signal output bias before and after compensation is shown in Table 1. From the experimental results, the peak-to-peak value and the standard deviation of the output bias are reduced by 17.00% and 16.95%, respectively, which reduce the fluctuation range of the output bias and improve the signal stability. For the influence of the network parameter scale on the results, the parameter settings of the model were changed, and the error peak-to-peak value was used as an indicator to test the fitting effect. The compensated peak-to-peak value of N = 28 and H = 8 is taken as the comparison benchmark, and the increased ratio of the values, compared to the benchmark value under different parameters, is displayed in the bubble chart shown in Figure 6. For the influence of the network parameter scale on the results, the parameter settings of the model were changed, and the error peak-to-peak value was used as an indicator to test the fitting effect. The compensated peak-to-peak value of = 28 and = 8 is taken as the comparison benchmark, and the increased ratio of the values, compared to the benchmark value under different parameters, is displayed in the bubble chart shown in Figure 6.
The yellow bubble in the figure represents the benchmark, red bubbles indicate that the error suppression effect is better than the benchmark, and the white bubbles indicate that the error suppression effect is worse than the benchmark. In Figure 6, there are three numbers next to the bubble, indicating the value of , and the peak-to-peak value growth percentage after error suppression compared to the benchmark. The results shown in Figure 6 show that, compared with the selected benchmark value, when the parameter scale is enlarged, the proportion of further reduction of the error peak-to-peak value is limited. In the test process, a faster convergence speed of the model parameters occurs with larger values of and , while such a phenomenon does not exist when the values of and are greater than 128. For subsequent analy- The yellow bubble in the figure represents the benchmark, red bubbles indicate that the error suppression effect is better than the benchmark, and the white bubbles indicate that the error suppression effect is worse than the benchmark. In Figure 6, there are three numbers next to the bubble, indicating the value of N, H and the peak-to-peak value growth percentage after error suppression compared to the benchmark.
The results shown in Figure 6 show that, compared with the selected benchmark value, when the parameter scale is enlarged, the proportion of further reduction of the error peak-to-peak value is limited. In the test process, a faster convergence speed of the model parameters occurs with larger values of N and H, while such a phenomenon does not exist when the values of N and H are greater than 128. For subsequent analysis and circuit design, N = 28 and H = 8 in the benchmark are set as network parameters.
In addition, the reduction ratios of error standard variance, compared with other machine-learning-based or deep-learning-based error suppression schemes are summarized in Table 2. The first column describes the various error suppression schemes, the second column lists the error suppression effects of the schemes, and the third column shows the performance improvement compared with the scheme adopted in this article. The percentage less than 100% indicates that the results presented in related works are better than the scheme adopted in this article, and the percentage greater than 100% indicates that the results are worse than the results of this article. Taking the scheme adopted in this article as a benchmark, for related works, a smaller percentage indicates better performance, while a greater percentage indicates worse performance. All the related works in Table 2 are implemented offline. For real-time online signal processing, the results of the proposed scheme are compared with small-scale networks, and the use of MLP network and leaky ReLU activation function with simple calculations are sufficient to achieve the corresponding error suppression effect. From Table 2, when smallscale recurrent neural networks are used [23,24], the error suppression effect obtained is similar to that of the MLP network used in this article, and with large-scale networks [22,26], the error suppression effect obtained is better than that of the proposed method. The result of the comparison shows that the compensation effect of the adopted scheme at the algorithm level is effective and reasonable.
For the influence of the temperature on the results, the model fitting was performed on the data at different temperatures under the same network scale, and the error suppression effect was evaluated and summarized in Table 3. From the experimental results, the error suppression effect is similar at different temperatures. The discussed application scenario in this paper is the error suppression at a fixed temperature, and the error compensation model must be refitted at different temperatures. Taking the temperature as an input variable [31], the adopted method has the potential of error compensation in the full temperature range. The experimental results in this section show that the adopted scheme has a simple calculation paradigm, and the compensation effect achieved under a small-scale network is even better than related works. The influence of different network scale parameters on the results is also analyzed and the network parameters with N = 28 and H = 8 are determined. The analysis of the compensation results at different temperatures shows that the scheme has the potential to suppress nonlinear errors in the full temperature range.

Circuit-Level Realization and Analysis of the Error Compensation Scheme
The computational complexity of the error compensation scheme based on neural networks is large, which is a challenge for real-time online applications. Compared with related works [22][23][24][25][26], the network model and activation function used in this paper are simpler and less complex. When the number of neurons in the input layer and the number of neurons in the hidden layer are N and H, the shape of the input vector is (1, N), the shape of the hidden layer parameter matrix is (N, H), and the shape of the hidden layer feature matrix is (1, N). Finally, a total of at least 2 × (N 2 × H + H) multiplication and addition operations, H activation operations are required, which means at least 2 × N 2 × H + H + H machine cycles are required when the algorithm is running in the CPU. In the proposed design, N = 28 and H = 8, and a low-power processor with a master clock frequency of 100 MHz is considered to implement the design, which can process at most 7.96 kHz sampling signals, which does not meet the requirements of tens to hundreds of kHz sampling signals required for inertial measurement.
Even for a simple network model adopted in this design, the above analysis illustrates the necessity of circuit-level implementation and acceleration of the algorithm. Circuit design and analysis based on the above scheme are presented in the following section.

Details of the Implemented Circuits
The collected signal from the sensor is 16-bit signed integers, which are expanded to 32 bits to reduce the effect of rounding errors in the calculation of fixed-point number multiplication. The sampled data are preprocessed by the moving average filtering circuit, and the implementation scheme is shown in Figure 7. The length of the filtered sequence is M, and 2M registers are contained in the circuit, in which the sampled data and the intermediate value of the summation calculation are cached, respectively. After the new data are sampled, the original cached data are stored at the next address, in turn, the last data no longer need to be stored, and the sampled data at this time are stored in the first position of the register array. The results of the summation calculation are updated at each sampling time.  Figure 9, respectively. A finite state machine is used to process five states (S0, S1, S2, S3, and S4) in the calculation process. The cached data from the moving average filtering circuit are read in the S0 state, the matrix multiplication and the summation of the output bias of the hidden layer is performed in S1 and S2 states, respectively, the output activation of the hidden layer is performed in the S3 state, and the calculation of the compensation data of the output layer is performed in the S4 state. The multiplier array is multiplexed during hidden layer calculation and output calculation, and its input operands are selected by a multiplexer, and the selection signal is determined according to the current state from the finite state machine. The data of the feature matrix are updated through a multiplexer. The data derive from hidden layer matrix multiplication, bias addition, and activation function, and the selection signal is determined according to the current state.
× multiplication operations, addition operations, and activation operations are required in the hidden layer, and the output layer requires multiplication operations and 1 addition operation. The most time-consuming operation is multiplication, and with multiple multipliers to process at the same time, the circuit processing time can be effectively reduced while the additional overhead of hardware resources is The average operation after summation is a division by a constant, which is equivalent to the operation of multiplying by a constant. An M-stage pipeline is adopted in the circuit, the signal is the output after a delay of M clock cycles, and a valid signal is an output to indicate that the filtered signal is available. Finally, the filtered data with a valid signal will be updated with one clock cycle and be further processed by the MLP processing circuit.
The data path and the control path of the MLP processing circuit are shown in Figures 8 and 9, respectively. A finite state machine is used to process five states (S0, S1, S2, S3, and S4) in the calculation process. The cached data from the moving average filtering circuit are read in the S0 state, the matrix multiplication and the summation of the output bias of the hidden layer is performed in S1 and S2 states, respectively, the output activation of the hidden layer is performed in the S3 state, and the calculation of the compensation data of the output layer is performed in the S4 state. The multiplier array is multiplexed during hidden layer calculation and output calculation, and its input operands are selected by a multiplexer, and the selection signal is determined according to the current state from the finite state machine. The data of the feature matrix are updated through a multiplexer. The data derive from hidden layer matrix multiplication, bias addition, and activation function, and the selection signal is determined according to the current state.
cation, bias addition, and activation function, and the selection signal is determined according to the current state.
× multiplication operations, addition operations, and activation operations are required in the hidden layer, and the output layer requires multiplication operations and 1 addition operation. The most time-consuming operation is multiplication, and with multiple multipliers to process at the same time, the circuit processing time can be effectively reduced while the additional overhead of hardware resources is brought. In the circuit, multipliers are used to process calculations in the hidden layer neurons at the same time. The matrix multiplication of the hidden layer is completed in clock cycles, and the operation of the output layer is completed in one clock cycle.  A start signal is needed to enable the MLP processing circuit. A total of + 4 clock cycles is needed to complete a compensation output, and a high-level effective valid signal is output when the compensated data are valid. The brief data path of the whole compensation circuit is shown in Figure 10. The continuous sampling signal passes through the moving average filter circuit and the MLP compensation circuit to obtain the compensation value, and the difference between the current sampling value and the compensation value is obtained as the compensated output data. The compensation circuit can select and output the uncompensated signal, the filtered signal, and the compensated signal through the multiplexer.
The frequency of the sampling clock and the master clock of the circuit are denoted by and , respectively. In order to streamline the compensation processing of the sampled data, the moving average circuit and MLP processing circuit need to complete data processing and data output in each sampling period. According to the above requirement, the required clock frequencies of the moving average filtering circuit and the MLP processing circuit are and × ( + 4), respectively. The following relationship between the sampling clock and the master clock is established: N × H multiplication operations, H addition operations, and H activation operations are required in the hidden layer, and the output layer requires H multiplication operations and 1 addition operation. The most time-consuming operation is multiplication, and with multiple multipliers to process at the same time, the circuit processing time can be effectively reduced while the additional overhead of hardware resources is brought. In the circuit, H multipliers are used to process calculations in the H hidden layer neurons at the same time. The matrix multiplication of the hidden layer is completed in N clock cycles, and the operation of the output layer is completed in one clock cycle.
A start signal is needed to enable the MLP processing circuit. A total of N + 4 clock cycles is needed to complete a compensation output, and a high-level effective valid signal is output when the compensated data are valid. The brief data path of the whole compensation circuit is shown in Figure 10. The continuous sampling signal passes through the moving average filter circuit and the MLP compensation circuit to obtain the compensation value, and the difference between the current sampling value and the compensation value is obtained as the compensated output data. The compensation circuit can select and output the uncompensated signal, the filtered signal, and the compensated signal through the multiplexer. compensation value is obtained as the compensated output data. The compensation circuit can select and output the uncompensated signal, the filtered signal, and the compensated signal through the multiplexer.
The frequency of the sampling clock and the master clock of the circuit are denoted by and , respectively. In order to streamline the compensation processing of the sampled data, the moving average circuit and MLP processing circuit need to complete data processing and data output in each sampling period. According to the above requirement, the required clock frequencies of the moving average filtering circuit and the MLP processing circuit are and × ( + 4), respectively. The following relationship between the sampling clock and the master clock is established: (5) Figure 10. The brief data path of the compensation circuit.
The clock configuration relationship in Equation (4) enables the moving average filtering and MLP compensation to be completed once in each sampling clock cycle. The The frequency of the sampling clock and the master clock of the circuit are denoted by f s and f , respectively. In order to streamline the compensation processing of the sampled data, the moving average circuit and MLP processing circuit need to complete data processing and data output in each sampling period. According to the above requirement, the required clock frequencies of the moving average filtering circuit and the MLP processing circuit are f s and f s × (N + 4), respectively. The following relationship between the sampling clock and the master clock is established: The clock configuration relationship in Equation (4) enables the moving average filtering and MLP compensation to be completed once in each sampling clock cycle. The moving average filtering circuit has an output delay of M/ f s , and the MLP circuit starts working when N filtered data are cached, resulting in an output delay of N/ f s . The output delay of the entire compensation circuit is (M + N)/ f s . In the experiment, M = 6, N = 28, and H = 8, the number of multipliers consumed is 8, f = 32 f s , and the output delay of the circuit is 34/ f s .

Analysis of the Implemented Results
According to the design scheme of the compensation circuit, the Verilog hardware description language was used to realize the circuit, and the circuit-level error compensation experiment was carried out in VCS, a digital circuit analysis platform from Synopsys. The output data after compensation are 32-bit integers, multiplied by 2 31 − 1 to obtain the real results, and were analyzed in MATLAB software. The signal waveforms before and after compensation are shown in Figure 11, and the signal information is shown in Table 4. The results of Tables 1 and 4 are compared, indicating that the compensation of the circuit-level has no loss of precision.
sation experiment was carried out in VCS, a digital circuit analysis platform from Synopsys. The output data after compensation are 32-bit integers, multiplied by (2 − 1) to obtain the real results, and were analyzed in MATLAB software. The signal waveforms before and after compensation are shown in Figure 11, and the signal information is shown in Table 4. The results of Table 1 and Table 4 are compared, indicating that the compensation of the circuit-level has no loss of precision.  The designed circuit is implemented and analyzed under the SMIC 180 nm complementary metal-oxide semiconductor (CMOS) process. Synthesis in Design Compiler and static timing analysis in Primetime were performed at three process corners-slow process (SS), typical process (TT), and fast process (FF). The maximum operating frequency of the system master clock is 96 MHz, corresponding to the sampling frequency of 3.2 MHz, which meets the requirements of inertial measurement.
Under the process corner TT (1.8 V voltage, 25 °C working temperature), when the frequency of the master clock is 96MHz and the sampling frequency is 3.2 MHz, the dynamic power consumption of the circuit obtained by Power Compiler is 4.22 mW. The dynamic power consumption of the circuit is positively related to the clock frequency.  The designed circuit is implemented and analyzed under the SMIC 180 nm complementary metal-oxide semiconductor (CMOS) process. Synthesis in Design Compiler and static timing analysis in Primetime were performed at three process corners-slow process (SS), typical process (TT), and fast process (FF). The maximum operating frequency of the system master clock is 96 MHz, corresponding to the sampling frequency of 3.2 MHz, which meets the requirements of inertial measurement.
Under the process corner TT (1.8 V voltage, 25 • C working temperature), when the frequency of the master clock is 96MHz and the sampling frequency is 3.2 MHz, the dynamic power consumption of the circuit obtained by Power Compiler is 4.22 mW. The dynamic power consumption of the circuit is positively related to the clock frequency. When the sampling signal is 800 kHz [2], the main clock frequency is 25.6 MHz, and the dynamic power consumption of the circuit is 1.12 mW, which meets the needs of low-power applications. In addition, the total cell area obtained from the area report is 0.19 mm 2 , in which the multiplier consumes more combinational area, which is a direction for resource optimization. The above analysis of power consumption and chip area is based on the results of digital front-end implementation. A more complete and accurate report needs to be obtained in conjunction with SoC and digital back-end implementation, and it is not discussed here.

Discussions and Conclusions
For the nonlinear error of the inertial sensors, a small-scale MLP network and the leaky ReLU activation function with simple calculations were used for error suppression, in which the error peak-to-peak value and standard variance are reduced to 17.00% and 16.95%, respectively. The experimental results confirm the effectiveness of the adopted scheme in the short-term, constant temperature working condition. For the real-time, online, and low-power requirements of edge devices, a digital processing circuit based on the developed method was designed, and the error suppression effect of the compensation circuit is consistent with the results in the algorithm verification experiments. For the circuit performance evaluation results, the supported maximum signal sampling frequency is 3.2 MHz, and the area of the circuit under the SMIC 180 process is 0.19 mm 2 . When the frequency of the master clock is 25.6 MHz, the total power consumption is only 1.12 mW, which meets the demand for low-power application scenarios. Circuit-level design and experiments confirm the feasibility of the on-chip solution for real-time and online applications.
In the experiment, the compensation effects with networks of the same parameter scale under different temperatures were analyzed. Although the compensation schemes still have similar error suppression effects, the trained parameters have different values and the assumed working condition should be a constant temperature. Temperature compensation of nonlinear error should be studied and processed in future research. In addition, the output signals of inertial sensors are affected by bias drift after working for a long time, and the statistical characteristics and distribution functions of the error may also change, which brings challenges to the robustness of the machine learning series of error compensation solutions. In terms of circuit design, the multiple frequency processing clock of the MLP compensation circuit limits the increase in the maximum sampling frequency. In addition to providing more on-chip computing resources, the optimization of the circuit data path and control path is also worth exploring.
In the collected experimental data, the working temperature has a major influence on the zero offsets. For future research, temperature information should be considered for compensation of nonlinear errors, and the coprocessor circuit of the compensation scheme will be integrated with the SoC to realize sensor drive, demodulation, and compensation of detected signals on a single processing chip.