Health Assessment of Cooling Fan Bearings Using Wavelet-Based Filtering

As commonly used forced convection air cooling devices in electronics, cooling fans are crucial for guaranteeing the reliability of electronic systems. In a cooling fan assembly, fan bearing failure is a major failure mode that causes excessive vibration, noise, reduction in rotation speed, locked rotor, failure to start, and other problems; therefore, it is necessary to conduct research on the health assessment of cooling fan bearings. This paper presents a vibration-based fan bearing health evaluation method using comblet filtering and exponentially weighted moving average. A new health condition indicator (HCI) for fan bearing degradation assessment is proposed. In order to collect the vibration data for validation of the proposed method, a cooling fan accelerated life test was conducted to simulate the lubricant starvation of fan bearings. A comparison between the proposed method and methods in previous studies (i.e., root mean square, kurtosis, and fault growth parameter) was carried out to assess the performance of the HCI. The analysis results suggest that the HCI can identify incipient fan bearing failures and describe the bearing degradation process. Overall, the work presented in this paper provides a promising method for fan bearing health evaluation and prognosis.


Introduction
The integration level and energy consumption of electronic circuits is increasing, resulting in increased heat flux densities and temperatures in electronic devices. In addition, the recent trend toward super-light and super-thin electronics has imposed further challenges on system thermal design. According to [1], temperature has a great impact on electronic component reliability, and the failure rate of a component increases exponentially as the temperature increases. Therefore, it is necessary to utilize thermal design techniques so as to reduce the internal temperature of electronic devices. The working principle of thermal design is to accomplish the following: (1) lessen heat dissipation by utilizing low energy consumption techniques and reducing the number of heat-generating components; and (2) move heat out through conduction, convection, or radiation. Cooling fans, as an active heat transfer device, have been used in many electronics systems to lower the system temperature and improve reliability.
As a commonly used thermal solution for most electronic devices, cooling fans have simple structures with low cost. According to [2], cooling fan failure is a major problem for many electronic devices. It causes system instability, malfunctioning, and damage to electronic components by over-heating, and can finally lead to system failure [3]. This may result in severe economic, or even catastrophic, losses under certain applications, such as large-scale data servers in financial divisions, communication networks, avionics, medical devices, etc. Therefore, it is necessary to conduct research on cooling fan condition monitoring and health assessment to guarantee the normal operation of a fan.
A cooling fan is composed of both electronic and mechanical parts. The mechanical parts include the bearings, shaft, fan blades, and fan housing; out of these, bearing failure is the top contributor to fan failure. The types of bearings used in cooling fans can be categorized as sleeve bearings, ball bearings, fluid bearings, and magnetic bearings. The selection of bearings should consider parameters such as performance, durability, cost, size, weight, and noise. Ball bearings have the advantage of a good balance between these factors, and so they are widely used in cooling fans. Specifically, ball bearings have a longer lifespan at higher temperatures (63,000 hours at 50 °C) than sleeve bearings (40,000 hours at 50 °C) [3].
As a typical rolling-element bearing, a ball bearing is the fundamental rotating part in a mechanical system, and numerous studies have been conducted on bearing fault diagnosis [4][5][6][7][8][9][10][11][12]. Regarding the current progress on machinery health assessment, Miao et al. developed gear health assessment methods using empirical mode decomposition [13] and wavelet decomposition [14]. Wang et al. [15] presented gearbox fault diagnosis and prognosis by the fusion of multiple health indicators through support vector data description. Yang and Makis [16] used an ARX model to evaluate gearbox health conditions under variable load conditions. Lin et al. [17] proposed an approach for gearbox conditionbased maintenance, and the fault growth parameter was defined using the residual error signal. Qiu et al. [18] proposed a self-organizing-map-based performance degradation method for assessing bearing health condition. Ocak et al. [19] developed a new scheme based on wavelet packet decomposition and the hidden Markov model for bearing prognostics. Pan et al. [20,21] used wavelet packet node energies as bearing fault features. Then, fuzzy c-means [20] and support vector data description [21] were respectively employed to evaluate how far the current bearing health condition was from normal bearing health condition. In their following up studies, Pan et al. [22] proposed a hybrid model for bearing performance degradation utilizing support vector data description and fuzzy c-means. Jiang et al. [23] proposed a new approach combining the autoregressive model and fuzzy cluster analysis for bearing diagnosis and degradation assessment. Shen et al. [24] considered the cumulative characteristics of bearing performance deterioration and proposed a monotonic health index for evaluating bearing health condition. Lei et al. [25] proposed health indicators for monitoring planetary gearboxes health condition.
The purpose of this research is to investigate cooling fan bearing health assessment methods and develop a prognostics and health management (PHM) solution for fan degradation assessment. However, the literature on fan bearing health assessment is limited. The Center for Advanced Life Cycle Engineering (CALCE) at the University of Maryland, has conducted research on fan bearings, including fan bearing fault identification [3], a physics-of-failure approach for fan PHM [26], a precursor monitoring approach for cooling fans [27], and fan bearing degradation using acoustic emission [28]. This paper proposes a new health indicator for fan bearing degradation assessment. The comblet, which was initially proposed by Miller [29] for gearbox vibration analysis, is utilized for the extraction of fault-sensitive information from the frequency domain of the bearing vibration signal. The health indicator is defined by data taken from the bearing vibration spectrum, incorporating the idea of exponentially weighted moving average (EWMA). The proposed EWMA based health indicator can utilize historical information (current and previous data) about the test sample, and it does not require model training, as opposed to other related studies [18][19][20][21][22]24]. To validate the proposed method, a test rig for a cooling fan accelerated life test was established, and a set of fan bearing vibration data collected from the test rig was used.
The rest of this paper is organized as follows: Section 2 introduces the fundamentals of the comblet filter. In Section 3, a new health indicator for fan bearing degradation assessment is proposed. In Section 4, the fan bearing accelerated life test rig is introduced, and then vibration data collected from this test rig are used for validation of the proposed method. Conclusions are presented in Section 5.

Time-Domain Synchronous Averaging
Time-domain synchronous averaging (TSA) is a useful technique for rotating machinery fault diagnosis. It can extract fault-related periodic information from complicated signals and eliminate extraneous periodic components and noise. In TSA, the measured signal is synchronously averaged over the rotational period of the target of interest. The nonsynchronous vibration from other sources and noise are averaged out by applying this procedure. After a large amount of averaging in the time-domain, the averaged signal gradually approximates the expected periodic signal, and the signal to noise ratio is improved.
Given a piece of signal , the corresponding time-domain averaging can be defined as: where is the period of the target component, and is the number of averages. Taking the Z-transform on Equation (1), the following can be obtained: where is the Z-transform of .
Thus, the transfer function of the time-domain averaging is obtained as: The frequency response of Equation (3) can be calculated as: The corresponding amplitude response and phase responses are described as: Therefore, TSA can be seen as a comb impulse train filter which implements signal filtering in the frequency domain. It keeps the information around the main frequency and its harmonics and suppresses other unrelated frequency components. However, good performance of the TSA technique requires many averages and long signal lengths, and these may not be available due to cost and other technical restrictions in data collection. Furthermore, the underlying assumption of the constant rotation speed is not always met in rotating systems because the rotation speed fluctuates according to working conditions, such as load and electrical supply. If the rotation speed is fluctuating, synchronous sampling is necessary, which collects vibration at a rate related directly to the rotation speed of the target. Another solution is to record the vibration signal at an arbitrary sampling rate and do resampling through interpolation. However, the implementation of these techniques is difficult due to the higher cost of hardware and increased computational burden.

Wavelet Filtering
The wavelet transform provides a time-frequency representation of a signal through a set of wavelet basis functions [30]. It has been widely used in machinery fault diagnosis. Given a mother wavelet function, , a series of wavelet functions can be defined as: where is the scale parameter, is the translation parameter, and represents a set of real numbers. The wavelet function should satisfy the following admissibility criterion: denotes the Fourier transform of . The continuous wavelet transform of a signal can be described as: (9) where represents the complex conjugation of .
The equivalent frequency-domain representation can be expressed as: (10) where and are the Fourier transforms of and , respectively, and represents the inverse Fourier transform. Accordingly, the continuous wavelet transform can be treated as a band-pass filter. The bandwidth and central frequency of the filter is determined by the scale parameter of the wavelet function.

Comblet Filter Design
The wavelet coefficient measures the correlation between the wavelet function and the signal of interest at different frequencies determined by the scaling parameter and at different time locations determined by the translation parameter . A coefficient with a large value means that the correlation between the wavelet function and the signal is high; conversely, a small value indicates a low correlation. Thus, the wavelet function can be designed according to the characteristics of the signal. Due to its property of time-frequency localization, the Morlet wavelet has been widely used in signal processing. It is defined as: (11) where is the shape factor, is the wavelet central frequency, and is a positive parameter.
The Fourier transform of the Morlet wavelet is: (12) Figure 1 shows the time-domain and frequency-domain plots of a complex Morlet wavelet, given that =15 Hz, =5, and =1. As seen in Section 2.1, TSA can be treated as a comb filter that extracts fault-related information from a complicated vibration signal. To overcome the aforementioned limitations of TSA, a new wavelet function is designed which possesses the properties of both the exponential decay of the Morlet wavelet and the flat passband of the harmonic wavelet. The new wavelet is called a comblet, and the mathematic definition of this wavelet is given by: (13) where is the comblet central frequency and is the half central bandwidth. Typically, is chosen as , and Equation (13) can be re-written as: The half central bandwidth is defined as: Here, is the rotation variation parameter, which describes the percentage of rotation fluctuation.
In bearing fault diagnosis, the comblet central frequency, , is usually selected as the fault-related characteristic frequency . Figure 2 gives an example of this comblet function in the frequency domain with one comb tooth, where the central band has a magnitude of 1 (i.e., ), the rotation variation parameter is 2%, and the sideband follows the Morlet wavelet function. From Equation (14), the comblet filter can be designed by constructing a comb filter where the comb teeth correspond to the fault-related characteristic frequency and its harmonics. According to the Nyquist-Shanon sampling theorem, the bandwidth limit of a signal is determined by the sampling frequency . That is: (16) Thus, the maximum number of comb teeth in a comblet filter is obtained by: (17) where is the fault-related characteristic frequency and denotes the round-down operation.
Therefore, a comblet filter with comb teeth is written as: For example, a comblet filter with =6 comb teeth can be constructed based on the single tooth comblet shown in Figure 2. The new comblet filter is shown in Figure 3, and Figure 4 is the time-domain plot of this comblet filter.

Fan Bearing Vibration Characteristics
A vibration signal collected from a cooling fan involves the integration of several components, including shaft, bearing, motor, blade, and noise. If there is a local defect on a certain part of a bearing, an impulse is generated when a mating element encounters the local defect. Since the local defect iteratively contacts with other parts of the bearing, it generates low-frequency vibration components. The frequency of the vibration signal is related to the rotation speed and the geometrics of the bearing; it is called the bearing characteristic frequency (BCF). Typical failures of ball bearings include local defects on the rolling element, inner race, outer race, and cage. The corresponding bearing characteristic frequencies are defined as follows [3]: Ball spin frequency (BSF): Ball pass frequency, inner race (BPFI): Ball pass frequency, outer race (BPFO): Fundamental train frequency (FTF): where is the bearing rotation speed (Hz), is the number of rolling elements, is the mean diameter of the rolling elements (mm), is the pitch diameter of the bearing (mm), and is the contact angle (°).
It should be noted that the bearing characteristic frequencies are non-integer multiples of the rotation speed. In practice, since the rolling motions are accompanied by a degree of sliding which occurs in the contact areas [7], the derived bearing characteristic frequencies (Equations (19)- (22)) are approximate. The resulting variation in bearing frequency is typically around 1-2% [7], which provides a criterion for the choice of rotation variation parameter c.

Implementation of Comblet Filtering
As previously mentioned, vibrations produced by a cooling fan can be complex and can result from many different sources including the shaft, bearing, motor, blade, and noise. These different vibration sources interact with each other, which makes it almost impossible to identify the frequency of interest from the vibration spectrum without any data manipulation methods. In particular, vibration generated by a local defect in a bearing is usually very weak at the initial failure stage, and fault-related information, such as BCFs, is masked by other vibrations. It is therefore advantageous to remove the irrelevant information before proceeding with bearing health assessment.
A wavelet transform can be seen as a kind of bandpass filtering operation on a signal. The central frequency and bandwidth of the wavelet are tuned by the scale parameter . In the design of the comblet filter, the central frequency and bandwidth are determined by and , and these parameters can be changed according to the practical scenario. Therefore, the comblet filtering technique provides a flexible solution for the extraction of bearing-fault-related information.
As stated in Section 2.3, the comblet is a new wavelet defined on the basis of some classical wavelet functions. Thus, the comblet transform can be described as the filtering operation on the signal under consideration with a comblet filter. Mathematically, it is defined as: where is the comblet coefficient in which is the translation parameter, represents the convolution operation, is the complex conjugation operation, is the constructed comblet function, and is the inverse Fourier transform of the comblet. After comblet filtering, a time-domain filtered signal is obtained, and further analysis can proceed.

Fan Bearing Degradation Assessment
Before discussing methods for fan bearing degradation assessment, it is important to have an understanding of how the vibration signal changes as bearing failures develop. In general, bearing failure progresses through pre-failure, early failure, near failure, and near catastrophic failure stages [31]. In this process, the size of a local defect in a bearing becomes larger, and the impulses excited by the local defect are intensified. Therefore, the energy of the vibration signal tends to enhance around the BCFs and their harmonics, and the health indicator can be constructed to describe the fan bearing health condition.
Given the filtered signal , the spectrum analysis is performed by: (24) where denotes the absolute value of the Fourier transform amplitude of the filtered signal .
Assume the spectrum energy of the filtered signal is . The definition of is given by: where is the length of .
In order to detect the occurrence of fan bearing incipient failures, an exponentially weighted moving average (EWMA) is utilized to define the health indicator. The exponentially weighted moving average is very effective in detecting small shifts in the process [32], and it is suitable for bearing incipient failure detection since a failure symptom is very weak initially.
We chose the spectrum energy as the observation at time in the process. The first observations are used to estimate the initial health condition indicator: (26) After that, the remaining observations are used to evaluate the fan bearing health condition. The length of the observation sequence for evaluation is . The proposed health condition indicator (HCI) is calculated as: (27) where is the weight given to the historical data. A large gives more weight to recent data and less weight to older data. The value of is usually set between 0.2 and 0.3 [32], and in this paper it is selected as =0.3.  Figure 5 presents a flow chart of the cooling fan bearing health assessment method presented in this paper. Given the original signal collected from the cooling fan, pre-processing is conducted to normalize the data. The comblet filters are designed using the bearing characteristic frequencies, including BSF, BPFI, BPFO, and FTF. After performing comblet filtering, the frequency spectrum is obtained. The health indicator HCI can be calculated to assess the fan bearing health condition.

Description of Experimental Setup
Generally, the lifespan of a cooling fan can be several years, and it is uneconomical to conduct a life test under nominal working conditions. For a cooling fan working under its nominal load, lubrication degradation leads to wear in the bearing and shortens the lifespan of the cooling fan. Therefore, it is reasonable to choose the lubrication level to simulate lubrication degradation and accelerate the fan bearing life test. Fan bearing lubrication usually includes grease and oil from the manufacturing process. Assuming that the nominal amount of grease is at the 100% level, a certain lubrication level p% represents the percentage of grease being added to the bearing.
The cooling fan used in this research was an axial type brushless direct current (BLDC) fan with dimensions of 92 × 92 × 38 mm. The fan has two ball bearings to support the shaft, whose overall diameter is 8 mm. Figure 6 shows the cooling fan tested in this experiment. The geometric specifications of the fan bearing used in this experiment are given in Table 1.  In the experiment, a cooling fan with bearings containing only residual oil and no added grease (0% lubrication level) was used in the life test. To measure the vibration data, an in-situ monitoring system was established with a PCB 352C42 accelerometer attached to the fan housing near the bearing. Since a higher temperature reduces the film thickness between mating surfaces, thus accelerating localized deformation and friction on the bearing mating components, the fan was stressed in a chamber at a temperature of 70 °C. The rotation speed of the fan was 4800 rpm, corresponding to a frequency, , of 80 Hz. A condenser microphone was set up 50 cm away from the cooling fan to record the acoustic noise from the fan. Data collection was conducted using the National Instruments LabVIEW program.
The test was stopped when the acoustic noise increased 3dB from the initial value, which is one of fan failure criteria defined in the IPC-9591 standard [33]. Given the fan rotation speed, the bearing characteristic frequencies were calculated, as listed in Table 2. The cooling fan was in a good health condition before the experiment. The accelerated life test started at 09/09/2010 10:14 and ended at 09/22/2010 0:47, when the acoustic noise measured by the microphone increased 3dB from its initial value. In this period, the experiment paused occasionally. The vibration signal was collected in blocks of 10 seconds every 15 minutes, and the sampling frequency, f s , was 25.6 kHz. The vibration signal was saved as a data file and numbered sequentially. There were a total of 388 data files. Since the first data file was a measurement during oven stabilization, it was excluded from the data analysis. Thus, the data set used in this paper included 387 data files.

Evaluation of the Proposed Health Indicator
To validate the proposed fan bearing health assessment method, the comblet filters were designed first. The central frequency of each BCF filter is the corresponding bearing characteristic frequency.
The rotation variation parameter is selected as 2%. The half central bandwidth and the number of comb teeth in each comblet filter are calculated using Equations (15) to (17). The calculation results are listed in Table 3. After the design of the comblet filters, the proposed health indicator is validated using the collected data set. The first 10 data files are used to calculate the initial health condition indicator, HCI 0 . The remaining 377 data files are used for the fan bearing health assessment and are numbered as 1 to 377. Figure 7 shows the assessment results with the four comblet filters. From Figures 7(a-d), it can be observed that the incipient failure should occur around file number 98-99, which corresponds to the time at 09/10/2010 20:36. The dashed line in Figure 7 represents the start of the fan bearing incipient failure. In order to verify that the fan bearing incipient failure started at file numbers 98-99, Fourier spectrum analysis was conducted on data files 97, 98, and 99 after filtering. Figure 8 shows the spectral analysis results. Figure 8(a) is the spectrum of data file 97, and only the 4th and 8th order harmonics of the rotation frequency, , can be identified. From Figures 8(b,c), the four BCFs (BSF, BPFI, BPFO, FTF) and their harmonics can be identified from the spectral analysis results. Based on the results in Figure 8, it can be concluded that the bearing incipient failures started at data files 98-99.

Comparison with Other Health Indicators
In this section, a comparison study is conducted between the proposed HCI and other methods. In vibration analysis, the root mean square (RMS) and kurtosis are two popular statistics of the timedomain signal for fault diagnosis, and they are given by: where is the sampling point of the signal, is the number of sampling points, is the mean of the signal , and is the standard deviation of the signal.  Another health indicator is fault growth parameter 1 (FGP1) [14]. It is defined as the part (percentage of points) of the residual error signal that exceeds three standard deviations calculated from the baseline residual error signal: where the 's are the current residual error signal points, is the mean of the current residual signal, is the "baseline" standard deviation, is the indicator function, and is the floor function.  Figure 9 shows the health assessment results of RMS, kurtosis, and FGP1 using fan bearing vibration data. It is obvious that the performance of HCI (in Figure 8) is much better than these health indicators. For example, based on the results presented in Figure 9, incipient fan bearing failures may occur at file number 110. However, according to the spectrum analysis results presented in Figure 8, the time of incipient failure should be around file number 98. Furthermore, the trend in fan bearing degradation cannot be observed in Figure 9, and the health assessment methods using RMS, kurtosis, and FGP1 cannot be used for the further research on fan bearing prognosis.

Conclusions
Cooling fans are commonly used in microelectronics. In order to ensure high reliability in air-cooled electronic systems, it is necessary to conduct research on the life expectancy and health assessment of cooling fans. Fan bearing failure is a major failure mode that causes excessive vibration, noise, reduction in rotation speed, locked rotor, and failure to start, among other problems, which may result in an electronic system's malfunction and lower the electronics reliability.
This paper presents a coherent solution for the health assessment of cooling fan bearings. The method utilizes the comblet concept. A health indicator was proposed based on the techniques of comblet filtering and exponentially weighted moving average. An accelerated life test was conducted on a cooling fan to simulate fan bearing degradation. The recorded vibration data were used to validate the proposed method. To demonstrate the performance of the proposed method, a comparative study was conducted between the proposed HCI and the commonly used methods of RMS, kurtosis, and FGP1. Based on the analysis results, the HCI can detect incipient fan bearing failures, and the bearing degradation process can be captured by the proposed method.
The work presented in this paper provides a promising method for cooling fan bearing health evaluation and prognosis. With this method, the critical failure of a cooling system can be avoided, and the reliability of electronic systems can be guaranteed. Furthermore, the proposed solution may also be used in generic bearing health evaluation and prognosis, which is currently the focus of prognostics and health management of mechanical systems.