2.1. Wavelet Transform
The fundamental method for the algorithm to work is wavelet transform, which decomposes the input signal into detailing coefficients that can be used for subsequent failure analysis and localization.
The wavelet decomposition algorithm, often referred to as the Mallat algorithm [
52] or fast wavelet transform, is an efficient method for decomposing a signal into multiple levels of detail. This algorithm is the basis for many signal processing applications, including noise reduction, data compression, and feature analysis. The pseudocode provided as ‘WaveletDecomposition’ describes the scheme of Mallat’s algorithm for one-dimensional signals (Algorithm 1).
Algorithm 1 WaveletDecomposition (Mallat’s Algorithm) |
- Require:
—input one-dimensional signal - Require:
—wavelet type (e.g., ‘db20’) - Require:
—decomposition depth - Ensure:
Coefficient array - 1:
Initialize the list of coefficients as empty - 2:
- 3:
for to do - 4:
Perform convolution of with the scaling filter (low-pass) from the selected wavelet - 5:
Perform convolution of with the high-pass filter from the selected wavelet - 6:
Perform a downsampling operation (select every second sample) for and - 7:
▹ The signal at the next step is the approximating coefficients - 8:
Save to the internal buffer - 9:
end for - 10:
Save (i.e., ) to - 11:
Add all detailing coefficients to in order from last to first return
|
2.1.1. Basic Principles and Steps of the Algorithm
The wavelet decomposition algorithm is based on the use of a set of wavelet filters, including scaling (low-pass) and detail (high-pass) filters, which are associated with the selected wavelet. The decomposition process is iterative and is applied a given number of times, determined by the decomposition level (‘level’).
Input data: The algorithm takes as input a one-dimensional signal (‘signal’), a wavelet type (‘wavelet’) that specifies the set of filters to be used, and a decomposition depth (‘level’) indicating the number of decomposition levels.
Initialization: At the beginning of the algorithm, an empty list of ‘coeffs’ is initialized to store the coefficients of the wavelet decomposition. The current signal being processed ‘current signal’ is set as equal to the input signal.
Iterative decomposition process (FOR loop): The algorithm performs iterations, the number of which being determined by the level of decomposition (‘level’). At each iteration k (from 1 to ‘level’), the following steps are performed:
Scaling filter convolution: The current signal ‘current signal’ is subjected to a convolution operation with the scaling filter (low-pass filter) associated with the selected wavelet. The result of this operation is the approximating coefficients of the current level, denoted as . These coefficients represent the low-frequency component of the signal, reflecting the overall structure or trend of the signal at a given resolution level.
Wrap with detail filter: Simultaneously, the same current signal ‘current signal’ is convolved with a detail filter (high-pass filter) also originating from the selected wavelet. The result is the detailing coefficients of the current level, denoted as . These coefficients represent the high-frequency component of the signal, containing details and abrupt changes such as noise and signal features.
Downsampling (decimations): Both the approximating coefficients and the detailing coefficients undergo a “downsampling” operation, which consists of selecting every second sample. This is performed to reduce the size of the data and increase the level of resolution in the next decomposition step. Downsampling is a key part of the Mallat algorithm, ensuring its efficiency.
Update current signal: For the next iteration of the algorithm, the current signal ‘current signal’ is replaced by the approximating coefficients of the current level. This means that, at the next level, only the approximated, smoother version of the previous level signal is decomposed.
Saving detailing coefficients: The detailing coefficients obtained at the current level are temporarily stored in an internal buffer for later sequencing.
Saving approximating coefficients of the last level: At the end of the iteration cycle, when a given decomposition depth (‘level’) has been reached, the last obtained approximating coefficients (which are the ‘current signal’ values at the last iteration) are stored in the ‘coeffs’ list. These coefficients represent the crudest approximation of the original signal.
Add detail coefficients to the output array: Then, all the detailing coefficients stored in the internal buffer during the iterations are added to the ‘coeffs’ list. It is important to note that they are added in the order of the last level to the first, i.e., in the order .
Output: The algorithm returns an array of coefficients ‘coeffs’, which is a list starting with the approximating coefficients of the last level , followed by the detailing coefficients of all levels, from to . Thus, the structure of the output coefficients is of the following form: .
2.1.2. Meaning and Application of the Algorithm
The Mallat wavelet decomposition algorithm is an efficient and widely used method for analyzing signals in many fields. Decomposing a signal into approximating and detailing coefficients at different levels allows the signal to be analyzed at different frequency ranges and resolution levels. This is useful for identifying various signal characteristics, such as trends, details, noise, and features. The resulting coefficients can be used for a variety of tasks, including noise reduction (by thresholding the detailing coefficients), data compression (by discarding small coefficients), and feature extraction for classification and pattern recognition.
2.3. Algorithm
Now, we can consider the basic algorithms for signal fault localization and classification, which are based on the use of previous methods and algorithms.
Mathematically, we can describe the problem to be solved as follows:
.
.
Here, the following applies:
1. y: the original signal represented as a time series. This signal is the reference signal and is used to evaluate the accuracy of the approximation.
2. : the approximated signal obtained by applying an error function with certain parameters. The goal is to match this signal as closely as possible to the original signal y.
3. ErrorFunction: a function that takes stretch, shift, and amplitude parameters as input and returns an approximated signal. Formally, denotes the approximated signal obtained given the parameters s, t, and a.
4. S: the set of possible values of the signal stretching parameter along the OX axis. Each element of this set represents a stretching factor that can be applied to the original signal.
5. T: the set of possible values of the signal shift parameter. Each element of this set represents a shift value that can be applied to the original signal.
6. A: a set of possible values of the signal amplitude parameter. Each element of this set represents an amplitude coefficient that can be applied to the original signal.
7. : The Cartesian product of sets S, T, and A that is the set of all possible combinations of stretch, shift, and amplitude parameters.
The purpose of the method is to find such a combination of parameters that minimizes the MSE error between the original signal y and the approximated signal .
Thus, this formula defines a procedure for finding the optimal stretch, shift, and amplitude parameters that provide the best approximation of the original signal in the sense of minimizing the MSE error.
The main steps of the algorithm are as follows (
Figure 1):
1. Input signal: Obtaining a raw signal containing potential anomalies (failures).
2. Signal preprocessing: Normalization, the removal of unwanted noise components, and peak detection.
3. Wavelet analysis: Wavelet transform (decomposition of the signal into wavelet coefficients for analysis at different scales) and coefficient analysis (examination of wavelet coefficients to identify features associated with failures).
4. Error Classification: Using optimization methods (Nelder–Mead method, BFGS) to minimize the MSE between a failure in the signal interval and a possible failure from error sample libraries.
5. Results output: Error classes (defined failure types), error estimation (metrics output), temporal localization (determining when the failure occurs).
2.3.1. WONC-FD mse_classification Algorithm
The mse_classification algorithm is designed to classify the type of “error” or “defect” in a one-dimensional signal y by comparing it to a set of predefined “error signals” Error_signal of various types listed in the array errors. The classification is based on minimizing the mean square error (MSE) between the input signal and the approximation obtained using different error types, sizes, and shifts.
The Main Steps of the Algorithm 8.
Algorithm 8 WoncFD_mse_classification |
- 1:
Input: y—one-dimensional array of points, - 2:
—list of possible error types (e.g., [“haar”, “haar1”, …]), - 3:
—base length of the signal (e.g., 1000), - 4:
—a small threshold to check for proximity to 0, - 5:
n—fraction of zero (modulo) points for rejection. - 6:
Output: List: [name of the best error type, MSE] - 7:
if
or
then - 8:
return [“Bad signal”, “NaN”] - 9:
end if - 10:
Initialize empty array - 11:
an array of uniform values from 0 to 1 of size - 12:
- 13:
- 14:
- 15:
for each of do - 16:
a large number (e.g., ) - 17:
for each s of do - 18:
for each of do - 19:
Calculate the vector of error values from the dictionary - 20:
if then - 21:
forward-shifted - 22:
else - 23:
backward - 24:
end if - 25:
a matrix of size , where the first column is 1, the second column is - 26:
Find the using the least squares method: - 27:
- 28:
- 29:
if then - 30:
- 31:
end if - 32:
end for - 33:
end for - 34:
Add to the array - 35:
end for - 36:
Find the index - 37:
return [
|
Input data: The algorithm takes as input a one-dimensional array of points y, a list of error type names errors, a base signal length leng, a small threshold to check for proximity to zero, and a fraction of zero (modulo) points n to reject the signal.
Preliminary Signal Checking: The algorithm performs an initial check on the input signal y to weed out “bad” signals that may lead to incorrect results.
Check signal length: The signal is checked for whether it is too short. If the length of y is less than or equal to of the base length of leng (i.e., length(y) ), the signal is considered “bad”.
Check for near-zero dots: The number of points in y whose absolute value is less than the specified small threshold () is counted. If the number of such points exceeds a fraction n of the total signal length (i.e., the number of points > ), the signal is also considered “bad”.
Return for “bad” signals: If a signal is considered “bad” by one of the above criteria, the algorithm does not perform further classification and returns the list [“Bad signal”, “Nan”] as the result, indicating that classification is impossible or unreliable.
Initialization: For good signals, the algorithm continues the classification process:
An empty result array is initialized to store MSE values for each error type.
An array x is created representing uniformly distributed values from 0 to 1, the size of the signal length y. This can serve as a normalized “time” axis for error generation.
The initial amplitude of ampl for generated errors is set to 1.
A set of size_values is defined. In pseudocode, these are sizes from length(y) in increments of 10 to . These dimensions will be used to resize the generated error signals.
A set of shifts from to in increments of 1 is defined. These shifts will be used to offset the generated error signals relative to the beginning of the y signal.
Cycle by error type (external FOR loop): The algorithm iterates over each error type err from the list errors. For each error type, the following process is performed:
Initialization of minimum MSE for error type: The initial value of the minimum MSE mse_err for the current error type is set to a large number (e.g., ). This value will be updated if smaller MSE values are found.
Cycle by size (middle FOR loop): For each size s from the set size_values:
Shift cycle (inner FOR loop): For each shift shift from the set of shifts:
Generating and resizing the error signal: A “base” error signal is generated using the Error_signal(err, x, ampl, 1) function for the current error type err, “time axis” x, amplitude ampl, and scale 1. Then, using the function resize_vector, the size of the generated error signal is resized to length s.
Applying a shift to the error signal: Depending on the sign of the shift shift, the error signal Error is shifted either forward (if shift ) or backward (if shift < 0) relative to the signal y. The shift is realized by adding zeros to the beginning or end of the error signal and truncating the result to the length length(x).
Building a feature matrix: A feature matrix features_matrix of size is created. The first column of the matrix is filled with ones and the second column is filled with the shifted error signal arr_error. The first column with units allows us to account for the bias (constant) in the model.
Least Squares Method (LSM): A linear regression problem is solved using the least squares method to find the coefficients () minimizing the norm of the difference . This allows us to find the optimal linear combination (bias and scaling) of the error signal to approximate the y signal.
Calculating the approximated signal: The approximated signal approximated_signal is calculated as the product of the feature matrix features_matrix by the found coefficients coefficients.
Calculation of MSE: The mean square error (MSE) between the original signal y and the approximated signal approximated_signal is calculated.
Update Minimum MSE: If the calculated MSE value is less than the current minimum MSE mse_err for this error type, then mse_err is updated with this new, smaller MSE value.
Saving the minimum MSE for the error type: After the size and shift cycles are completed, the minimum MSE value found mse_err for the current error type err is added to the array result.
Determination of the best error type: After looping through all error types, the algorithm finds the index of the minimum value in the result array. This index corresponds to the error type that provided the lowest MSE when approximating the signal y.
Output: The algorithm returns a list containing two elements: the name of the best error type (from the errors list) and the corresponding minimum MSE value. This indicates the error type that best “explains” the y signal structure in terms of MSE.