Investigation on C and ESR Estimation of DC-Link Capacitor in Maglev Choppers Using Artiﬁcial Neural Network

: The reliability of capacitors is one of the most important issues in power electronics. The health status of capacitors can be evaluated through the comparison of estimated C/ESR values with their original values. In this paper, a two-input artiﬁcial neural network (ANN) is proposed for C and ESR estimation of DC-link capacitors in Maglev choppers; combined with the existing voltage and current sensors that are used for protection and control, it provides a promising solution for the health condition monitoring of the Maglev chopper in the Maglev train. Compared with prior-art research, the actual capacitor degradation progress where both C and ESR degrade is considered in training. Moreover, ANN’s advantage of ﬁtting nonlinear and complex relationships is explored by building an aggressive mapping between the voltage ripple at 5 kHz to C/ESR at 120 Hz, which cannot be described or analyzed by linear circuit equations. Thus, cross validation must be implemented to avoid an occasional poor ﬁtting, and ensure the stability of ANN. Experimental results show that the proposed ANN outperforms the support vector regression in both ESR and C estimation. While C estimation suffered from over-ﬁtting and instability, ESR estimation by the ANN is accurate and stable with a low average prediction error within 1%, showing great potential for condition monitoring of DC-link capacitors in Maglev choppers.


Introduction
Capacitors (a single capacitor or many capacitors connected in series or as parallels) are widely used in the dc-link, filtering, and snubber circuits of power converters. Although modern capacitor technology has advanced with great improvement, capacitors are still reported to be one of the stand-out components in terms of failure rate in the field operation of power converters [1,2]. If a capacitor comes to the end of its life, the power converter will malfunction. This will normally lead to a systematic converter inspection and analysis. After the degraded capacitor is identified, maintenance can be carried out for replacement. Even for a capacitor bank that only has a single degraded capacitor, the entire bank would be recommended to be directly replaced as a whole to ensure the overall converter performance and reliability [3,4].This post-accident maintenance approach is straightforward, but very costly and unsafe when the maintenance replacement is carried out normally, after the converter having performed abnormally or failed. There are more stringent reliability requirements in the field of aerospace, energy industries, and emerging modern transportation systems including electric vehicle, high-speed railway train, and Maglev train. Therefore, implementation of reliable capacitor health status evaluation or monitoring to ensure reliable field operation and preventive maintenance is significantly important [5].
In Figure 1, a simplified equivalent model of capacitors is shown along with its corresponding frequency characteristics. The capacitor impedance Z is divided into three different regions dominated by C, ESR, and the equivalent series inductance (ESL). ESL In Figure 1, a simplified equivalent model of capacitors is shown along with its corresponding frequency characteristics. The capacitor impedance Z is divided into three different regions dominated by C, ESR, and the equivalent series inductance (ESL) . ESL has strong influence on the switching state of power converters [6]. Most of state-of-theart capacitor monitoring methods are based on the two typical indicators, namely C and ESR. The health status of capacitors can be evaluated through comparison of estimated or measured capacitance/ESR values and the original values. The end-of-life criteria vary greatly with the specific capacitor type. For example, the widely accepted end-of-life criterion for the aluminum electrolytic capacitor (AEC) is a 20% reduction of the original C, or double of the original ESR [7]. However, there are practical difficulties in evaluating the health status of a capacitor by this widely-accepted criteria. Firstly, the capacitance and ESR are highly temperature-and frequency-dependent [8][9][10], which might lead to inappropriate maintenance decisions. Secondly, the estimation approach must survive in the actual capacitor aging, measurement noises, environmental disturbances, load variances and so on.
In the past decade, many efforts from academia and industry are made in the development of C and ESR estimation methods. According to the methodology and implementation [3], they can be mainly classified into three types, including the capacitor ripple current sensor based method, the circuit model based method, and data-based advanced algorithm method. For the first category, methods in [11][12][13] adopt a classic current sensor (e.g., resistors or Hall Effect sensors) to calculate the ESR, while methods in [14,15] are based on the PCB Rogowski coil sensor. These methods can be implemented online. Methods in [12,13] estimate the ESR at switching frequency while methods in [11,14,15] over a high frequency range. However, the accuracy of these methods on ESR estimation is comparatively low. An alternative approach is to estimate C and ESR offline, which can provide results of a full frequency range. This can be achieved by externally injecting a desirable signal of current or voltage at a certain frequency into the capacitor [16]. Different algorithms [17][18][19] are utilized to extract the relationship between the input current and the output voltage of the experimental circuit that is composed of a signal generator, a power amplifier, and an oscilloscope. These methods require additional hardware or software, which increases the cost and complexity. In addition, the approaches in [17][18][19] need the removal of the capacitor from the converter. Therefore, using existing sensors for the control or protection purpose of converter is highly preferred. The voltage ripple or current information of a capacitor can be indirectly yielded according to the circuit structure and operation, which is the idea of the second category. Several representative methods are reported in [10,[20][21][22][23]. Most of those The health status of capacitors can be evaluated through comparison of estimated or measured capacitance/ESR values and the original values. The end-of-life criteria vary greatly with the specific capacitor type. For example, the widely accepted end-of-life criterion for the aluminum electrolytic capacitor (AEC) is a 20% reduction of the original C, or double of the original ESR [7]. However, there are practical difficulties in evaluating the health status of a capacitor by this widely-accepted criteria. Firstly, the capacitance and ESR are highly temperature-and frequency-dependent [8][9][10], which might lead to inappropriate maintenance decisions. Secondly, the estimation approach must survive in the actual capacitor aging, measurement noises, environmental disturbances, load variances and so on.
In the past decade, many efforts from academia and industry are made in the development of C and ESR estimation methods. According to the methodology and implementation [3], they can be mainly classified into three types, including the capacitor ripple current sensor based method, the circuit model based method, and data-based advanced algorithm method. For the first category, methods in [11][12][13] adopt a classic current sensor (e.g., resistors or Hall Effect sensors) to calculate the ESR, while methods in [14,15] are based on the PCB Rogowski coil sensor. These methods can be implemented online. Methods in [12,13] estimate the ESR at switching frequency while methods in [11,14,15] over a high frequency range. However, the accuracy of these methods on ESR estimation is comparatively low. An alternative approach is to estimate C and ESR offline, which can provide results of a full frequency range. This can be achieved by externally injecting a desirable signal of current or voltage at a certain frequency into the capacitor [16]. Different algorithms [17][18][19] are utilized to extract the relationship between the input current and the output voltage of the experimental circuit that is composed of a signal generator, a power amplifier, and an oscilloscope. These methods require additional hardware or software, which increases the cost and complexity. In addition, the approaches in [17][18][19] need the removal of the capacitor from the converter. Therefore, using existing sensors for the control or protection purpose of converter is highly preferred. The voltage ripple or current information of a capacitor can be indirectly yielded according to the circuit structure and operation, which is the idea of the second category. Several representative methods are reported in [10,[20][21][22][23]. Most of those approaches can achieve online monitoring of capacitors without external signal injections. However, those approaches are heavily dependent on the converter structure, which limits their applications. A few methods [24][25][26][27][28][29] have been recently proposed to externally inject a signal for the C or ESR estimation under various applications, which requires special operation of the converter for external signal generation during the normal operation. Recently, efforts are being continuously made on the third category [29][30][31][32][33], where the power converters are treated as a black box, and only their terminal information is utilized for capacitance estimation.
The motivation of the paper is to lift the application limits on the prior traditional methods and further reduce the requirement on the extra sensors and hardware circuits. The estimation is based on intelligent algorithms, such as support vector regression (SVR) and the artificial neural network (ANN) algorithm. Less sensitive to noises with less hidden layers, the artificial neural network is the best non-deep learning model and chosen as the baseline for deep learning models [34,35]. The pioneered research in [31][32][33] uses the 300 Hz voltage ripple to produce a nice mapping to estimate C at 300 Hz, but the influence of ESR during the actual capacitor aging is ignored. In other words, the actual capacitor degradation progress where both C and ESR degrade fails to be considered into training. Only a single best fitting of the ANN is given, which might be an occasional good result in certain challenging fitting tasks.
Among the aforementioned approaches, estimation of C or ESR is preferred at room temperature where the healthy capacitors have huge differences with the aged one, which can avoid the influence of temperature [9,10]. For most of the given approaches, C is preferred to be estimated at low frequency (30 Hz in [26,29], 120 Hz in [25], 100 Hz in [22]) as shown in Figure 2b while ESR at high frequency range over 1 kHz (mostly at switching frequency [10,21,23,27]). However, according to [8,9], the difference between ESR of the degraded capacitor and the healthy one is much larger at l20 Hz than over 1 kHz, which has been further proven in [24,28]. Therefore, using ESR at low frequency is also appealing, as the life indicator with the additional advantage that manufacturers also provide important reference data at low frequency. approaches can achieve online monitoring of capacitors without external signal injections. However, those approaches are heavily dependent on the converter structure, which limits their applications. A few methods [24][25][26][27][28][29] have been recently proposed to externally inject a signal for the C or ESR estimation under various applications, which requires special operation of the converter for external signal generation during the normal operation. Recently, efforts are being continuously made on the third category [29][30][31][32][33], where the power converters are treated as a black box, and only their terminal information is utilized for capacitance estimation.
The motivation of the paper is to lift the application limits on the prior traditional methods and further reduce the requirement on the extra sensors and hardware circuits. The estimation is based on intelligent algorithms, such as support vector regression (SVR) and the artificial neural network (ANN) algorithm. Less sensitive to noises with less hidden layers, the artificial neural network is the best non-deep learning model and chosen as the baseline for deep learning models [34,35]. The pioneered research in [31][32][33] uses the 300 Hz voltage ripple to produce a nice mapping to estimate C at 300 Hz, but the influence of ESR during the actual capacitor aging is ignored. In other words, the actual capacitor degradation progress where both C and ESR degrade fails to be considered into training. Only a single best fitting of the ANN is given, which might be an occasional good result in certain challenging fitting tasks.
Among the aforementioned approaches, estimation of C or ESR is preferred at room temperature where the healthy capacitors have huge differences with the aged one, which can avoid the influence of temperature [9,10]. For most of the given approaches, C is preferred to be estimated at low frequency (30 Hz in [26,29], 120 Hz in [25], 100 Hz in [22]) as shown in Figure 2b while ESR at high frequency range over 1 kHz (mostly at switching frequency [10,21,23,27]). However, according to [8,9], the difference between ESR of the degraded capacitor and the healthy one is much larger at l20 Hz than over 1 kHz, which has been further proven in [24,28]. Therefore, using ESR at low frequency is also appealing, as the life indicator with the additional advantage that manufacturers also provide important reference data at low frequency. In this paper, using the reliability-critical Maglev chopper as the background, a twoinput ANN is explored for the dc-link capacitor C/ESR estimation. The proposed method only uses inputs sensed by the existing voltage/current sensors installed in the Maglev chopper. The load variance's impact on the ANN is investigated as well as the actual capacitor aging. In experiments, our proposed ANN aims at building an aggressive mapping between capacitance/ESR at 120 Hz and inputs (voltage ripple at 5 kHz and average levitation current), the strong nonlinearity and complexity of which cannot be described or suggested by linear circuit equations. Therefore, cross validation must be In this paper, using the reliability-critical Maglev chopper as the background, a twoinput ANN is explored for the dc-link capacitor C/ESR estimation. The proposed method only uses inputs sensed by the existing voltage/current sensors installed in the Maglev chopper. The load variance's impact on the ANN is investigated as well as the actual capacitor aging. In experiments, our proposed ANN aims at building an aggressive mapping between capacitance/ESR at 120 Hz and inputs (voltage ripple at 5 kHz and average levitation current), the strong nonlinearity and complexity of which cannot be described or suggested by linear circuit equations. Therefore, cross validation must be applied to avoid over-fitting or under-fitting in the learning process to ensure the stability of ANN.

Maglev Chopper
Electromagnetic Suspension (EMS) Maglev trains are very successful in commercial applications [36,37]. There are two types of EMS Maglev trains. One adopts short primary linear induction motor (SLIM) installed on two sides of the rail as shown in Figure 2, and the other uses long primary linear synchronous motor (LSM) [36]. The former type targets at a top speed of 100 km/h while the latter can operate up to 200 km/h. Their levitation system and tires of the automobiles stay the same [31]. The right half of Figure 2a shows the cross-sectional schematic of the EMS-SLIM Maglev system. From Figure 2a, it can be seen that the primary side of SLIM is installed on the bogie and the secondary side (reaction plate) is installed on the top of the F-track. The actual side view of the bogie is shown in the upper left corner of Figure 2a. Each carriage of the Maglev train contains five bogies, and each bogie consists of two suspension solenoid modules, which are both located on both sides of the F-track. As shown in Figure 2a, a suspension control box controls two electromagnets in each module simultaneously. The mutual mechanical coupling between the paired electromagnet modules is decoupled by the bogie. This allows that the control of one solenoid module will be independent of the other modules, so that the suspension control will be much easier. The air gap and acceleration sensor with an additional current sensor feedback the air gap, acceleration, and levitation current information into the control processor where the reference air gap is set as in the right of Figure 2a. The suspension control processor then adjusts the duty cycle of the Maglev chopper, producing a desired levitation current. This levitation current is injected into the coil of electromagnets so that a suspension force is generated for stable levitation of the Maglev train. The main circuit of one Maglev chopper is shown in Figure 2b. The chopper is constituted by two IGBT modules. In each IGBT module, there is only one active switch (VQ1 or VQ2). The dc-link capacitor, an AEC, is mainly used to stabilize the dc voltage.
A typical mission profile of Maglev chopper based on the first domestic commercial Maglev line in China is shown in Figure 3. The single one-way trip lasts about 20 min. Initially, the train is levitated so that the levitation current rapidly increases to its rated value. During the movement of Maglev, the current ripples are as a result of dynamic suspension control. At around 600 s, the train is still levitated but stops at a station. During the stop, the levitation current is almost a constant, as seen in Figure 3. Finally at around 1100 s, the train is landed at the terminal station when the current drops to zero. The train will normally wait around 10 min for the instruction to move again and return to its starting station. During the waiting period, the levitation current stays at zero. Thus, the train being levitated without movement where the levitation current is stable provides a nice slot for C/ESR estimation. applied to avoid over-fitting or under-fitting in the learning process to ensure the stability of ANN.

Maglev Chopper
Electromagnetic Suspension (EMS) Maglev trains are very successful in commercial applications [36,37]. There are two types of EMS Maglev trains. One adopts short primary linear induction motor (SLIM) installed on two sides of the rail as shown in Figure 2, and the other uses long primary linear synchronous motor (LSM) [36]. The former type targets at a top speed of 100 km/h while the latter can operate up to 200 km/h. Their levitation system and tires of the automobiles stay the same [31]. The right half of Figure 2a shows the cross-sectional schematic of the EMS-SLIM Maglev system. From Figure 2a, it can be seen that the primary side of SLIM is installed on the bogie and the secondary side (reaction plate) is installed on the top of the F-track. The actual side view of the bogie is shown in the upper left corner of Figure 2a. Each carriage of the Maglev train contains five bogies, and each bogie consists of two suspension solenoid modules, which are both located on both sides of the F-track. As shown in Figure 2a, a suspension control box controls two electromagnets in each module simultaneously. The mutual mechanical coupling between the paired electromagnet modules is decoupled by the bogie. This allows that the control of one solenoid module will be independent of the other modules, so that the suspension control will be much easier. The air gap and acceleration sensor with an additional current sensor feedback the air gap, acceleration, and levitation current information into the control processor where the reference air gap is set as in the right of Figure 2a. The suspension control processor then adjusts the duty cycle of the Maglev chopper, producing a desired levitation current. This levitation current is injected into the coil of electromagnets so that a suspension force is generated for stable levitation of the Maglev train. The main circuit of one Maglev chopper is shown in Figure 2b. The chopper is constituted by two IGBT modules. In each IGBT module, there is only one active switch (VQ1 or VQ2). The dc-link capacitor, an AEC, is mainly used to stabilize the dc voltage.
A typical mission profile of Maglev chopper based on the first domestic commercial Maglev line in China is shown in Figure 3. The single one-way trip lasts about 20 min. Initially, the train is levitated so that the levitation current rapidly increases to its rated value. During the movement of Maglev, the current ripples are as a result of dynamic suspension control. At around 600 s, the train is still levitated but stops at a station. During the stop, the levitation current is almost a constant, as seen in Figure 3. Finally at around 1100 s, the train is landed at the terminal station when the current drops to zero. The train will normally wait around 10 min for the instruction to move again and return to its starting station. During the waiting period, the levitation current stays at zero. Thus, the train being levitated without movement where the levitation current is stable provides a nice slot for C/ESR estimation.

ANN for C and ESR Estimation
ANN is capable of both adequately approximating any complex nonlinear relationship and adapting to unknown systems. Therefore, an intelligent solution for capacitor condition monitoring can be achieved by finding an accurate mapping between the input data available from the converter and the targeted output (i.e., C or ESR). After the completion of training, such a mapping becomes a monitor. In this role, it continuously receives the actual input data of the Maglev chopper and estimates C or ESR.
The temperature effect of electrolytic capacitors during their service life can cause their own electrolytes to evaporate, leading to deterioration [6]. In [38], electrolyte evaporation is closely examined, where it was found that reduction in C and increase in ESR are actually in different tempos. A typical illustration of capacitor degradation is shown in Figure 4. It reflects C and ESR variances for most AECs. The ESR is reported to decrease much earlier and faster in the aging process.

ANN for C and ESR Estimation
ANN is capable of both adequately approximating any complex n relationship and adapting to unknown systems. Therefore, an intelligent solu capacitor condition monitoring can be achieved by finding an accurate mapping the input data available from the converter and the targeted output (i.e., C or ES the completion of training, such a mapping becomes a monitor. In this role, it cont receives the actual input data of the Maglev chopper and estimates C or ESR.
The temperature effect of electrolytic capacitors during their service life c their own electrolytes to evaporate, leading to deterioration [6]. In [38], el evaporation is closely examined, where it was found that reduction in C and in ESR are actually in different tempos. A typical illustration of capacitor degra shown in Figure 4. It reflects C and ESR variances for most AECs. The ESR is rep decrease much earlier and faster in the aging process. In [31][32][33], different ANNs have been firstly proposed and discusse estimation. However, ESR estimation is not discussed, and the ESR's effe estimation is ignored. The actual capacitor aging progress where both C and ES ignored. Especially, ESR in dc-dc converters contribute a large proportion of th AC voltage ripple [21]. The proposed ANN must reflect the actual degradation p a capacitor to comprehensively determine the capacitor health status. In this ANN can be more robust to produce reliable estimation.
The two-input ANN has been used here, as shown in Figure 5. The target is estimation. The input data of ANN is chosen to be the average levitation curren voltage ripple on the dc-link capacitor. Within the Maglev chopper, a voltage se been selected to monitor the dc-link voltage for the protection purposes, and a sensor for the levitation current monitoring for suspension control. No extra se required for the proposed ANN. In [31][32][33], different ANNs have been firstly proposed and discussed for C estimation. However, ESR estimation is not discussed, and the ESR's effect on C estimation is ignored. The actual capacitor aging progress where both C and ESR vary is ignored. Especially, ESR in dc-dc converters contribute a large proportion of the output AC voltage ripple [21]. The proposed ANN must reflect the actual degradation process of a capacitor to comprehensively determine the capacitor health status. In this way, the ANN can be more robust to produce reliable estimation.
The two-input ANN has been used here, as shown in Figure 5. The target is C or ESR estimation. The input data of ANN is chosen to be the average levitation current and the voltage ripple on the dc-link capacitor. Within the Maglev chopper, a voltage sensor has been selected to monitor the dc-link voltage for the protection purposes, and a current sensor for the levitation current monitoring for suspension control. No extra sensors are required for the proposed ANN.

ANN Structure
The neural network model used in this paper is a simple ANN whose structure consists of three layers, i.e., an input layer, a hidden layer, and an output layer, as shown in Figure 5. The input layer has no computational function, and its main role is to store the data input to the ANN. The hidden layer has a computational function, which can be used for forward propagation of data in the input layer. The hidden layer is interconnected with the input layer and the output layer through weighted connecting lines with different weights, thus establishing a nonlinear relationship between the input layer and the output layer. The main function of the output layer is to output the target parameters and calculate the loss function for error back propagation, thus updating the weights. The C or ESR is normalized and output to the ANN. Usually, a single hidden layer with multiple neurons is then able to fit any nonlinear function [39]. However, the reason for not using multiple hidden layers (also known as deep learning) in this paper is the following [39][40][41].
1. Multiple hidden layers mean there are multiple layers of neurons, which means there are more neurons in the network structure. More neurons make the network training slow, and it is difficult to remove residual noise during training. 2. For some training cases, curve fitting becomes very specialized. This reduces the ability of the neural network to estimate new inputs rather than training inputs. 3. Multiple hidden layers increase the risk of producing local optima. Eventually, a locally optimal neural network will be trained. This results in producing inaccurate prediction outputs when making predictions. 4. Therefore, a single hidden layer is also used in this paper.
Next, the mathematical definitions of ANN shall be given. As shown in Figure 5, any ANN has a key element, neuron. The inputs can be represented as an vector I1 = (i1, i2), where i1 and i2 are the value of the 1th and the 2th dimension of input, respectively. A weight is related with each connected pair of neurons. Hence, weights connected to the first neuron (i.e., neuron 1) in the hidden layer can be represented as a weight vector of W1 = (w11, w21), where w11 and w21 represent the weights associated to the connection between the input, and the neuron 1. The first mapping process from inputs to the first neuron in hidden layer can be denoted as follows:

ANN Structure
The neural network model used in this paper is a simple ANN whose structure consists of three layers, i.e., an input layer, a hidden layer, and an output layer, as shown in Figure 5. The input layer has no computational function, and its main role is to store the data input to the ANN. The hidden layer has a computational function, which can be used for forward propagation of data in the input layer. The hidden layer is interconnected with the input layer and the output layer through weighted connecting lines with different weights, thus establishing a nonlinear relationship between the input layer and the output layer. The main function of the output layer is to output the target parameters and calculate the loss function for error back propagation, thus updating the weights. The C or ESR is normalized and output to the ANN. Usually, a single hidden layer with multiple neurons is then able to fit any nonlinear function [39]. However, the reason for not using multiple hidden layers (also known as deep learning) in this paper is the following [39][40][41].

1.
Multiple hidden layers mean there are multiple layers of neurons, which means there are more neurons in the network structure. More neurons make the network training slow, and it is difficult to remove residual noise during training.

2.
For some training cases, curve fitting becomes very specialized. This reduces the ability of the neural network to estimate new inputs rather than training inputs.

3.
Multiple hidden layers increase the risk of producing local optima. Eventually, a locally optimal neural network will be trained. This results in producing inaccurate prediction outputs when making predictions.

4.
Therefore, a single hidden layer is also used in this paper.
Next, the mathematical definitions of ANN shall be given. As shown in Figure 5, any ANN has a key element, neuron. The inputs can be represented as an vector I 1 = (i 1 , i 2 ), where i 1 and i 2 are the value of the 1th and the 2th dimension of input, respectively. A weight is related with each connected pair of neurons. Hence, weights connected to the first neuron (i.e., neuron 1) in the hidden layer can be represented as a weight vector of W 1 = (w 11 , w 21 ), where w 11 and w 21 represent the weights associated to the connection between the input, and the neuron 1. The first mapping process from inputs to the first neuron in hidden layer can be denoted as follows: where y 1 denotes the result after the inner product of inputs and weights. A neuron contains a threshold value that is used to regulate its action potential, therefore, to mimic that, an activation function named sigmoid nonlinear function σ(.) (which can be chosen in various ways as detailed below) is followed by the first mapping process, which has the following form: The other mapping processes, the inputs to other hidden layers are with the similar explanations as the Equations (1) and (2).
Similarly, the mapping result O from the hidden layer to the output layer can be written as: During the training process, in order to obtain the optimal weight parameters and thus minimize the difference between the predicted output of the neural network and the actual result, it is necessary to set the corresponding loss function and specify the corresponding optimization algorithm. In this task, a loss function named mean square error L mse between estimateed values O and ground labels Y is described as: As shown in Figure 5, the training process is a typical supervised training with the output C/ESR known. The process mainly includes an adaptive optimization algorithm, which is used for error minimization purposes. The training purpose is to find the best parameters that fits the input to the output. Therefore, a corresponding weight function composed of hidden neurons can be found. The optimizer algorithm used in this training is actually Bayesian regularization [31]. This optimization is a very effective global optimization algorithm. It has fewer iterations (more time efficient) and the granularity can be very small compared to the grid search algorithm. In addition, the training algorithm that we used prevents overfitting by stopping the iterations before the model converges iteratively on the training dataset.

Size of the ANN Hidden Layer
Neurons in the hidden layer of a neural network convert the input layer data into data that can be used in the output layer. An insufficient number of neurons can be underfitted, resulting in a network that is uncapable of expressing the task. A sufficient number of neurons mean that the ANN requires more data for training, which greatly increases the computational workload and severely slows down the network training speed. It may also negatively affect the generalization properties of ANNs when increasing the number of neurons to very large numbers [40][41][42]. These effects limit the applicability of ANN online settings. Essentially, there should be a trade-off between the number of neurons in the hidden layer and the computational power and generalization efficiency of the neural network. Several techniques exist in the literature [42] to determine the optimal number of neurons in the hidden layer. In [43], it is suggested that the sufficient amount of hidden neurons m (Nc for C Estimation and N ESR for ESR Estimation), which can be calculated by: where y, n are the number of neurons in the output layer and the number of samples in the training set, respectively. However, when the method to calculate the optimal number of neurons in the hidden layer is inapplicable, the trial-and-error method is the most primitive method in existing studies. When the trial-and-error method is used, trial-and-error results that are closer to the optimal number of neurons are usually obtained. In fact, in most applications, the user will keep changing the number of hidden neurons during the training process until the training generates a neural network with the optimal number of neurons.

Size of the Training Data
All artificial neural networks have to go through a training phase in order to evaluate their performance in the testing phase. After determining the optimal number of neurons in the network, it is necessary to select the number of training samples in order for the ANN to exhibit acceptable generalization capabilities in the testing phase. In [34], Vapnik and Chervonenkis defined a parameter called Vapnik-Chervonenkis dimension (VCdim) as a metric of the ANN generalization ability. If the number of training samples exceeds the VCdim, the error in the testing phase can be within a limited range. Regardless of the number of layers of the artificial neural network and the type of activation function it uses, for an ANN with N W weights and N N nodes, it can be shown that [34]: where n i is the dimension of the input data. A useful rule of thumb is that the VCdim value should be around One tenth the number of the training samples.

Simulation Results
A typical simulation model of a standard Maglev chopper was developed in Matlab. The circuit simulation parameters are presented in Table 1.  The activation function for the hidden layer is chosen to be 'tansig' while the output activation is chosen to be 'purelin' in Matlab. The training algorithm is 'trainbr'. The maximal training step is set to be 100,000, the minimum training error is set to be 1.0 × 10 −8 , and the learning rate is 0.01. Since the condition monitoring problem is actually a problem of both regression and estimation, methods that use known data are based on curve fitting to establish the relationship between inputs and outputs in this paper. Three metric parameters R 2 , PE, and MAPE are used to evaluate the regression performance: where y m, y are the actual value and average of C or ESR, andŷ m is the estimated value. The coefficient of determination R 2 is a value between 0 and 1. As long as the regression factor is close to 1, a strong correlation between inputs and outputs can be achieved. Percentage error (PE) and mean absolute percentage error (MAPE) are used for assessing the percentage error of estimated result. In [31][32][33], ESR's effect on the estimation of C using the ANN approach is ignored. In the actual aging of a dc-link capacitor, as shown in Figure 4, C will decrease while ESR increase, although the reduction rate for the ESR is much larger. Co is still 1800 µF, ESRO is 31.5 mΩ. The variation range for C is 80% to 100% Co while ESR is 100% to 200% ESRO. The tuning step for C is 2% of Co while for the ESR is 10% of ESRO. The normal duty cycle is 0.6 which responds to a normal current of 8 A. Considering the possible variance of the levitation current when the train stand still but is levitated, the duty cycle is chosen to be 60%, 61% and 62% for training. This consideration is used as load perturbations to increase the robustness of the ANN and reinforce its immunity to noises. Overall 363 sample data can be generated by Matlab simulation. Considering the two-input ANN, the desired neuron number for this ANN is about 3 to 18, which gives an important reference to determine the number of neurons. This is supposed to be a more challenging fitting task than the previous research in [31][32][33].
To find a best fitting, 363 data sets are randomly divided into 10 subsections. 8 subsections are used for training, 1 for testing and 1 for validation. After the single data division on 363 data sets is done, the training can start. When the number of neuron N ESR increased from 1 up to 10, surprisingly in Figure 6a, the overall R for ESR estimation is 0.99999 for a single fitting. Similarly, when the number of neuron NC increased from 1 up to 10, the overall R for ESR estimation reaches 0.99972 as shown in Figure 7a.
where y , y are the actual value and average of C or ESR, and y is the estimated value. The coefficient of determination R 2 is a value between 0 and 1. As long as the regression factor is close to 1, a strong correlation between inputs and outputs can be achieved. Percentage error (PE) and mean absolute percentage error (MAPE) are used for assessing the percentage error of estimated result. In [31][32][33], ESR's effect on the estimation of C using the ANN approach is ignored. In the actual aging of a dc-link capacitor, as shown in Figure 4, C will decrease while ESR increase, although the reduction rate for the ESR is much larger. Co is still 1800 μF, ESRO is 31.5 mΩ. The variation range for C is 80% to 100% Co while ESR is 100% to 200% ESRO. The tuning step for C is 2% of Co while for the ESR is 10% of ESRO. The normal duty cycle is 0.6 which responds to a normal current of 8 A. Considering the possible variance of the levitation current when the train stand still but is levitated, the duty cycle is chosen to be 60%, 61% and 62% for training. This consideration is used as load perturbations to increase the robustness of the ANN and reinforce its immunity to noises. Overall 363 sample data can be generated by Matlab simulation. Considering the two-input ANN, the desired neuron number for this ANN is about 3 to 18, which gives an important reference to determine the number of neurons. This is supposed to be a more challenging fitting task than the previous research in [31][32][33].
To find a best fitting, 363 data sets are randomly divided into 10 subsections. 8 subsections are used for training, 1 for testing and 1 for validation. After the single data division on 363 data sets is done, the training can start. When the number of neuron NESR increased from 1 up to 10, surprisingly in Figure 6a, the overall R for ESR estimation is 0.99999 for a single fitting. Similarly, when the number of neuron NC increased from 1 up to 10, the overall R for ESR estimation reaches 0.99972 as shown in Figure 7a. However, the obtained mapping by the single data division cannot prove if the found mapping can actually reveal the real correlation between the inputs and C. The found mapping might only be a reflection of a local optimal relationship, which only indicates an occasional nice estimation. Therefore, cross validation must be applied. It is achieved by randomly dividing 363 data sets ten times for ten different data combinations for training, validation, and test. Correspondingly, 10 different trainings are performed. However, the obtained mapping by the single data division cannot prove if the found mapping can actually reveal the real correlation between the inputs and C. The found mapping might only be a reflection of a local optimal relationship, which only indicates an occasional nice estimation. Therefore, cross validation must be applied. It is achieved by randomly dividing 363 data sets ten times for ten different data combinations for training, validation, and test. Correspondingly, 10 different trainings are performed.
For stability under cross validation, the proposed ANN also shows stable performance in C and ESR estimation. However, the ESR estimation is more stable than C. The MAPE for ESR estimation as shown in Figure 6b is only 0.01%. For different fittings, the PEs under different data samples are distributed in a very small range of ±0.15%. The result of C estimation is shown in Figure 7b. Its MAPE is 0.16%. The PEs under different estimations are distributed in a larger range of ±0.6%. Further increasing NC is not conducive to reducing the MAPE as well as the errors under different estimations in 10 random fittings.

Get Training Data
From the simulation, it can be concluded that accurate ESR/C estimation can be achieved under the proposed ANN, although their performance differs. In order to validate previous analysis, a test rig was built. The experimental platform is a scaled Maglev train system that previously was used to test the levitation controller performance and turn the control parameters for the engineering Maglev train. The bogie is shown in Figure 8a where the suspension electromagnets are shown under the bogie. The C/ESR Variation Board part in Figure 8a is shown in Figure 8b. The C and ESR adjustable PCB board is well designed to minimize the stray inductance. In the ESR adjustment area, different resistance values are connected in series to the bus capacitor through the terminal to simulate the ESR increase caused by capacitor aging. In the C adjustment area, the capacitance is connected in parallel to the bus capacitance at the terminal to simulate the capacitance change caused by capacitance aging. For stability under cross validation, the proposed ANN also shows stable performance in C and ESR estimation. However, the ESR estimation is more stable than C. The MAPE for ESR estimation as shown in Figure 6b is only 0.01%. For different fittings, the PEs under different data samples are distributed in a very small range of ±0.15%. The result of C estimation is shown in Figure 7b. Its MAPE is 0.16%. The PEs under different estimations are distributed in a larger range of ±0.6%. Further increasing N C is not conducive to reducing the MAPE as well as the errors under different estimations in 10 random fittings.

Get Training Data
From the simulation, it can be concluded that accurate ESR/C estimation can be achieved under the proposed ANN, although their performance differs. In order to validate previous analysis, a test rig was built. The experimental platform is a scaled Maglev train system that previously was used to test the levitation controller performance and turn the control parameters for the engineering Maglev train. The bogie is shown in Figure 8a where the suspension electromagnets are shown under the bogie. The C/ESR Variation Board part in Figure 8a is shown in Figure 8b. The C and ESR adjustable PCB board is well designed to minimize the stray inductance. In the ESR adjustment area, different resistance values are connected in series to the bus capacitor through the terminal to simulate the ESR increase caused by capacitor aging. In the C adjustment area, the capacitance is connected in parallel to the bus capacitance at the terminal to simulate the capacitance change caused by capacitance aging.
The voltage sensor is LEM CV3-500 and the current sensor is LA 55-P. NI 6008 data sampling device with sampling frequency of 10 kHz is used to convert the measured current data into the PC. In this paper, the ripple voltage value of the capacitor voltage needs to be extracted as an input parameter. Since the switching frequency of the controller is 5 kHZ, in order to collect the complete ripple waveform, NI6210 data acquisition card is selected to collect the capacitor voltage and used to convert the measured voltage data of the DC link capacitor. The sampling frequency is 250 kHz, 50 times of the switching frequency, which meets the experimental requirements. The Labview software on the PC is used to shown the exact voltage and current waveforms. The FPGA board is used to adjust the duty cycle. A high-performance DC voltage source is used to avoid extra harmonics for the test, as the DC power supply for the Maglev chopper in Maglev train is stable. A LCR meter (HIOKI IM 3536) is used to measure the equivalent ESR and C at the DC link.
The other experimental parameters are the same as that used in the simulation. The ANN training settings are the same as that in Section 4. The voltage sensor is LEM CV3-500 and the current sensor is LA 55-P. NI 6008 data sampling device with sampling frequency of 10 kHz is used to convert the measured current data into the PC. In this paper, the ripple voltage value of the capacitor voltage needs to be extracted as an input parameter. Since the switching frequency of the controller is 5 kHZ, in order to collect the complete ripple waveform, NI6210 data acquisition card is selected to collect the capacitor voltage and used to convert the measured voltage data of the DC link capacitor. The sampling frequency is 250 kHz, 50 times of the switching frequency, which meets the experimental requirements. The Labview software on the PC is used to shown the exact voltage and current waveforms. The FPGA board is used to adjust the duty cycle. A high-performance DC voltage source is used to avoid extra harmonics for the test, as the DC power supply for the Maglev chopper in Maglev train is stable. A LCR meter (HIOKI IM 3536) is used to measure the equivalent ESR and C at the DC link. The other experimental parameters are the same as that used in the simulation. The ANN training settings are the same as that in the simulation section.
In this experiment, we need to extract the peak and peak of ripple voltage. The peak to peak value of the required ripple voltage is shown in Figure 9, and the peak to peak value is the average value of the data within one second. The above methods are also used to process the collected current data. Figure 9. The peak-to-peak value of the required ripple voltage. In this experiment, we need to extract the peak and peak of ripple voltage. The peak to peak value of the required ripple voltage is shown in Figure 9, and the peak to peak value is the average value of the data within one second. The above methods are also used to process the collected current data. The voltage sensor is LEM CV3-500 and the current sensor is LA 55-P. NI 6008 data sampling device with sampling frequency of 10 kHz is used to convert the measured current data into the PC. In this paper, the ripple voltage value of the capacitor voltage needs to be extracted as an input parameter. Since the switching frequency of the controller is 5 kHZ, in order to collect the complete ripple waveform, NI6210 data acquisition card is selected to collect the capacitor voltage and used to convert the measured voltage data of the DC link capacitor. The sampling frequency is 250 kHz, 50 times of the switching frequency, which meets the experimental requirements. The Labview software on the PC is used to shown the exact voltage and current waveforms. The FPGA board is used to adjust the duty cycle. A high-performance DC voltage source is used to avoid extra harmonics for the test, as the DC power supply for the Maglev chopper in Maglev train is stable. A LCR meter (HIOKI IM 3536) is used to measure the equivalent ESR and C at the DC link. The other experimental parameters are the same as that used in the simulation. The ANN training settings are the same as that in the simulation section.
In this experiment, we need to extract the peak and peak of ripple voltage. The peak to peak value of the required ripple voltage is shown in Figure 9, and the peak to peak value is the average value of the data within one second. The above methods are also used to process the collected current data.  In this experiment, the electromagnet current has three values, the capacitance changes in six states, and the ESR changes in five states. A total of 90 groups of training data are obtained. Using the data obtained, we can get the amplitude of 5 kHz component in DC-link voltage vs. actual ESR/C, as shown in Figure 10. As it can be seen, with the average levitation current as the input, the estimation of C would become very difficult due to the strong nonlinearity. However, The ESR could be easily estimated from voltage ripple and current. This is also why capacitance estimation is less precise with ANN and requires more complex ANN. changes in six states, and the ESR changes in five states. A total of 90 groups of training data are obtained. Using the data obtained, we can get the amplitude of 5 kHz component in DC-link voltage vs. actual ESR/C, as shown in Figure 10. As it can be seen, with the average levitation current as the input, the estimation of C would become very difficult due to the strong nonlinearity. However, The ESR could be easily estimated from voltage ripple and current. This is also why capacitance estimation is less precise with ANN and requires more complex ANN.

Training and Result Analysis
However, there is an important difference between simulation and experiments. In simulation, the actual C and ESR data for training has no frequency characteristics. In experiments, the C and ESR must be measured by the LCR meter at a selected frequency, as the values of C and ESR will vary with the changing of frequency. Here, the LCR measurement on ESR and C is done at 120 Hz specifically. ESR is also measured at 5 kHz for comparison. Averages are made on the sampling of the two inputs to avoid the effects of environmental noises.
For ESR estimation at 120 Hz, the overall R for the single best fitting when NESR is increased to10 is 0.99952 shown in Figure 11a. In comparison, R for C estimation at 120 Hz is only 0.8443 when Nc = 10 in Figure 12a. This is a typical under-fitting. Increasing Nc to 100, R for the best single fitting can also reach 0.99001. However, the stability of the proposed ANN for C estimation is much worse than that for ESR estimation.

Training and Result Analysis
However, there is an important difference between simulation and experiments. In simulation, the actual C and ESR data for training has no frequency characteristics. In experiments, the C and ESR must be measured by the LCR meter at a selected frequency, as the values of C and ESR will vary with the changing of frequency. Here, the LCR measurement on ESR and C is done at 120 Hz specifically. ESR is also measured at 5 kHz for comparison. Averages are made on the sampling of the two inputs to avoid the effects of environmental noises.
For ESR estimation at 120 Hz, the overall R for the single best fitting when N ESR is increased to10 is 0.99952 shown in Figure 11a. In comparison, R for C estimation at 120 Hz is only 0.8443 when Nc = 10 in Figure 12a. This is a typical under-fitting. Increasing Nc to 100, R for the best single fitting can also reach 0.99001. However, the stability of the proposed ANN for C estimation is much worse than that for ESR estimation.    The ESR estimation from 10 random trainings would have a much lower PE of approximately ±2%, while that for C estimation at Nc = 10 is as high as ±25%. Even when Nc is increased to 100, the MAPE is still 6.91% as shown in Table 2. The MAPE for ESR is 0.96% while that for C is 6.18% at 10 neurons. As shown in Figure 12c, a typical over-fitting is found on the C estimation when Nc = 100. The training data are well fitted, but the test and validation data are poorly fitted. As a result, the MAPE is very high, and the PE can reach over 40% as shown in Figure 12d.

Comparative Analysis
The well-know SVR approach [29] has also been used by comparison. The results are shown in Figures 13 and 14. Clearly, the fitting by SVR is much worse than the proposed ANN. As shown in Table 2, the ESR estimation by SVR will have an acceptable R of 0.97661, which giving an unstable MAPE of about 7.34%. For C estimation, the fitting is unsuccessful. 0.96% while that for C is 6.18% at 10 neurons. As shown in Figure 12c, a typical over-fitting is found on the C estimation when Nc = 100. The training data are well fitted, but the test and validation data are poorly fitted. As a result, the MAPE is very high, and the PE can reach over 40% as shown in Figure 12d.

Comparative Analysis
The well-know SVR approach [29] has also been used by comparison. The results are shown in Figures 13 and 14. Clearly, the fitting by SVR is much worse than the proposed ANN. As shown in Table 2, the ESR estimation by SVR will have an acceptable R of 0.97661, which giving an unstable MAPE of about 7.34%. For C estimation, the fitting is unsuccessful.  At last, the ESR estimation at 5 kHz is also performed with the ANN and SVR approaches. The results are close to that at 120 Hz. The fitting by the proposed ANN still far more stable and accurate than that by the SVR, as shown in Table 2.

Discussion
Both C and ESR estimation using the two-input ANN to estimate C or ESR of in Maglev choppers are discussed in this paper. Different from the previous data-driven At last, the ESR estimation at 5 kHz is also performed with the ANN and SVR approaches. The results are close to that at 120 Hz. The fitting by the proposed ANN still far more stable and accurate than that by the SVR, as shown in Table 2.

Discussion
Both C and ESR estimation using the two-input ANN to estimate C or ESR of in Maglev choppers are discussed in this paper. Different from the previous data-driven methods [30][31][32][33], the ESR's degradation effect on the C estimation is considered. Therefore, the actual capacitor aging progress is integrated into training. Moreover, an aggressive fitting is tested by our proposed ANN. In simulations, the fittings on ESR/C estimation are almost equally good, although the stability to estimate C is slightly worse. Experimental results, however, illustrated that ESR estimation using the proposed two-input ANN outperforms C estimation. The fitting for ESR estimation is very successful at both 120 Hz and 5 kHz. For the single best fitting, both C and ESR estimation can achieve excellent results, i.e., a very high R close to 1. C estimation at 120 Hz needs much more neurons in the hidden layer to achieve a satisfactory single fitting, so the estimation on C takes more time and effort for training. However, in terms of stability, C estimation shows some problems. The found mapping by the ANN might be a local optimal solution. There is high possibility that the good fitting might be occasional. By changing the training and testing data sets, the estimation errors will differ greatly. It indicates that in practical applications, the proposed ANN might occasionally provide inaccurate results.
In comparison with prior-art data-drive approaches, ESR estimation by the proposed two-input ANN is much more stable. After ten random selections of train and test data, all trained mapping can accurately estimate the ESR. The MAPEs are around 1%. Moreover, the neuron number is only 10 for ESR estimation, which indicates a quick training process. One reason why there is a huge difference in C estimation by the ANN is that the capacitance in simulation would not change with the frequency as it will in practice. Therefore, in the experiment, the nonlinearity on C estimation is strengthened. Another reason is that the voltage ripple caused by the ESR variation is much larger in the high frequency dc-dc converter than by C, so the fitting for ESR estimation is comparatively easier. However, there is no discussion on the temperature influence on the ESR or C estimation due to the page limit, which deserves future research.
Therefore, future online capacitor monitoring can use the two-input ANN for ESR estimation at 120 or 5 kHz, dependent on the availability of original values. The ten-neuron ANN will generate a very simple target function composed of several polynomials. This obtained mapping via ANN training can be implemented easily into a typical DSP control board. In other words, the function can be integrated into the Maglev suspension control. The online capacitor health condition monitoring can be implemented each the Maglev train is levitated stably before put into movement.

Conclusions
This paper mainly investigates C/ESR estimation in Maglev choppers by using the ANN algorithm. Distinct from prior-art research, the actual operation-caused capacitor degradation is considered by changing both C and ESR during the training process. The two-input (average levitation current and dc-link ripple voltage) ANN is proposed based on our current Maglev chopper design without requiring extra sensors. The ANN's advantage in fitting a complex and nonlinear co-relation is further explored in this ANN structure. The voltage ripple at switching frequency is used to map C/ESR at 120 Hz. In the experiments, for the single best fitting, both ESR and C estimation by the ANN had a nice R of around 0.99. Cross validation was implemented to check the stability of the proposed ANN to avoid an occasional best fitting. Results show that the stability of ANN for ESR estimation at both 120 Hz and 5 kHz is far better than that for C estimation. C estimation at low frequency by the proposed ANN suffered from over-fitting, even when the neuron is increased to 100. With a neuron number of only 10, the MAPE for ESR estimation is 0.96% compared with C being 6.18% and 6.91% when Nc = 100. Moreover, the proposed ANN outperforms SVR in both ESR and C estimation. Such table estimation would have great potential for the condition monitoring of DC-link capacitors in Maglev choppers.