Method of Constructing a Nonlinear Approximating Scheme of a Complex Signal: Application Pattern Recognition

: A method for identiﬁcation of structures of a complex signal and noise suppression based on nonlinear approximating schemes is proposed. When we do not know the probability distribution of a signal, the problem of identifying its structures can be solved by constructing adaptive approximating schemes in an orthonormal basis. The mapping is constructed by applying threshold functions, the parameters of which for noisy data are estimated to minimize the risk. In the absence of a priori information about the useful signal and the presence of a high noise level, the use of the optimal threshold is ineffective. The paper introduces an adaptive threshold, which is assessed on the basis of the posterior risk. Application of the method to natural data has conﬁrmed its effectiveness.


Introduction
Currently, scientists are actively conducting research related to the development of methods for modeling and analyzing complex nonstationary signals [1][2][3]. The need to create such methods arises when carrying out a number of fundamental and applied investigations in such areas as biomedicine, geophysics, ecology, seismology, etc. When there is no possibility to make direct measurements or observations of the characteristics of a research object, the task is to determine the cause from the consequences obtained during observations or experiments (a class of inverse problems). In this case, the determination of the model parameters is based on the observation results. For example, in cosmophysics, the problem arises when determining the state of the galactic cosmic ray flux based on the data of the world network of neutron monitors [4]. In addition, an example of such a problem is to determine the state of the Earth's magnetic field based on the measurements of ground magnetic stations [5].
The recorded natural data have a complex nonstationary structure, but the main problem of such investigations is the lack of a priori information on the useful signal and the presence of a high noise level [4]. The absence of an accurate mathematical apparatus for constructing the estimates of signals with such properties results in the application of heuristic approaches and methods [6], the spectrum of which is very wide at the present time. For example, there have been successful attempts to apply machine learning methods [7], allowing one to obtain approximations of acceptable accuracy even without complete a priori data. However, a significant disadvantage of such methods is the need for periodic correction of system parameters (for example, retraining of a neural network), due to their significant dependence on external conditions. This factor significantly affects the quality and efficiency of the results obtained.
Taking into account these disadvantages, an approach based on the construction of nonlinear adaptive approximating schemes on the basis of orthonormal functions is proposed. Since the signal distribution has a complex shape, linear approximation is not effective and nonlinear threshold estimates give better results [6]. In the article, wavelets are used as approximating functions. It is known that wavelet filtering makes it possible to effectively detect the structures of a complex signal and suppress noise [6,[8][9][10][11][12][13]. Different wavelet bases approximate the signal in different ways, therefore, the choice of the best basis, in the sense of identifying certain structures, provides an effective solution to the problem. However, we should take into account the fact that if a signal contains different types of structures localized at different times, then it is impossible to build a basis adapted to all structures [6]. In this case, it is necessary to use large basis dictionaries [6,13], for example, as was proposed in [14]. Thus one can extend the class of orthogonal functions by dictionaries of linearly independent functions. An effective result, in this case, is given by the pursuit approximation. For example, dictionaries of wavelet packets and local cosine trees allow for constructing the best approximations of signals of finite length by minimizing the concave cost function [6].
However, the great computational complexity of this method makes it ineffective. Consistent pursuit algorithms [15] using the "greedy" strategy make it possible to optimize the process of constructing a basis and to obtain sufficiently accurate approximations. Even when there is no knowledge about noise, a signal can be assessed by isolating coherent structures [6]. However, when the signal energy is small relative to the noise energy, such estimates give a very small threshold [6] and their application, as was shown by the estimates made in the work, does not allow for obtaining good results. To solve the problem, we propose a method based on a heuristic approach that, by minimizing the posterior risk, makes it possible to obtain the best estimate in the absence of a priori knowledge about a useful signal and the presence of a high noise level. Applying the example of a wavelet packet dictionary, the paper shows that larger threshold increases the risk, but allow one to obtain more accurate estimates. An increase in the detection efficiency of different types of structures is achieved by applying an adaptive threshold. The effectiveness of the proposed method is confirmed by the results of experiments with real data. In addition, the comparison with the machine learning method (using the autoencoder neural network) showed the effectiveness of the proposed approach.

Nonlinear Signal Approximation Based on Its Expansion in Basis
In the case of nonlinear approximation, the signal f H (H is the Hilbert space) is represented by M vectors adaptively selected from the orthonormal basis B = {g m } m N (N are natural numbers, including 0) of the space H [6]: where I M is a set of indices.
The approximation error is Obviously, minimization of the error [M] is achieved by choosing I M such that M vectors g m with indices from I M have the largest moduli of the scalar product | f , g m |, that is, correlate with the signal in the best way.
We arrange {| f , g m |} m N in descending order, denoting f B [k] = f , g m k the coefficient of rank k as: In this case, approximation (1) with the smallest error [M] is It can be obtained by applying the threshold function Then, (2) takes the form Approximation error (3) is Considering the base dictionary D, which is the union of orthonormal bases in the space of signals of finite length N: the cost of f approximation in the basis B λ = g λ m 1≤m≤N can be estimated by the concave Schur sum [16] and the basis B α , minimizing the error, is defined as In the case of wavelet packet bases [6,8], each node of the wavelet packet tree corresponds to the space W p j defining an orthonormal basis B p j = Ψ p j (2 j t − m) m∈N [6,8,17].
The space W p j is divided into orthogonal subspaces The basis that minimizes the error is the basis O p j of the space W p j [16]: Recursive calculation of the bases (6) when moving from the bottom up the tree allows one to determine the wavelet packet basis that minimizes the cost (4).

Nonlinear Approximation of a Noisy Signal
Let us have a discrete signal f [n], which is defined for 0 ≤ n < N (n N) and is polluted with noise: where X[n] are recorded discrete data, f [n] is the signal, V[n] is noise. Using mapping (3) gives the estimate The risk of estimating F is where E is the mathematical expectation. Following Wald's theory [18], signals f will be considered as elements of a special set θ, disregarding the probability distribution on this set. Since we do not know the probability distribution of signals, to minimize the risk r(D, f ), we can use the minimax approach [6]. The problem is to determine such an operator D that minimizes the maximum risk (minimax risk) [6]: O is a set of operators performing mapping (8).
It is clear that to minimize the risk r O (θ), the threshold T in (8) should be chosen so that there is a high probability that it is greater than the maximum level of noise coefficients (7)). It was proved in [19] that in the case of white noise with variance σ 2 , the risk close to r O (θ) gives the threshold T O = σ √ 2 ln N: The risk of estimate (9) is associated with the approximation error f in the basis B λ and can be estimated as [6] Therefore, the risk of the resulting estimate r( f ) depends on the basis. In addition, when the noise V is not white and σ 2 m = E |V B λ [m]| 2 depends on each vector g λ m of the basis, Donohoe and Johnston [19] showed that the threshold estimate with T m = σ m √ 2 ln N also gives a risk close to r( f ). Note that for wavelet packet bases (see (5)), the noise variance σ 2 can be estimated [19] as: In turn, it follows from Jaffard's theorem [20] (Jaffard's theorem is given below) and the equivalence of continuous and discrete wavelet expansions [8,17] that when choosing a threshold higher than the maximum amplitude of the noise coefficients, we, using (8), suppress noise and with a high probability keep the coefficients in the vicinity of the signal structural features. It is also clear that the threshold needs to be scaled for better approximation.
In this case, the wavelet Ψ must have n zero moments and n derivatives with fast decay. The evidence of Jaffard's theorem is given in [6].
We get the following algorithm for constructing an approximating scheme (ACAS): 1. We decompose the signal X into wavelet packets (see (5)): 2. Based on the estimation of normalized energies, we determine the tree branches corresponding to the structural components of the signal: the basis B where the set of indices I l , l = P, 2P, 2P + 1 is defined as follows: , L is the number of elements.
The nodes of the wavelet packet tree selected on the basis of (10) determine the components that have the greatest correlation with the basis (coherent structures).
The threshold T j i = K * σ l j i can be estimated by posterior risk [21]. The threshold splits the space of values of the analyzed function into two nonoverlapping areas Θ 0 and Θ 1 . When using a certain threshold for a given state h s , the loss average can be determined as where ∏ sz is the loss function, P{x ∈ Θ z /h s } is the conditional probability of falling within the area Θ z , if in reality there is a state h s , s = z, s, z are the state indices ("/" sign means the conditional probability). Averaging the conditional risk function over all states h s , we obtain the average risk: where p s is the prior probability of the state h s . When we do not know the probability distribution of the signal, the posterior probabilities P{h s /x}, s = 0, 1 are the most complete characteristics of the states h s with available a priori data. For a simple loss function ∏ sz = 1, s = z, 0, s = z, the posterior risk is

Detection of Geomagnetic Pulsations in Geomagnetic Data
Geomagnetic data are the Earth's magnetic field variations, which are obtained by magnetometer direct measurements at a magnetic station network [5]. Analysis of geomagnetic data is important in solving practical problems of space weather forecasting [5,22]. The negative impact of geomagnetic anomalies on technical objects determines the importance of developing methods for their detection. The main source of impact is geomagnetic pulsations (short-period variations of the geomagnetic field) and their detection is of key importance [23]. The structure of geomagnetic data is complex. They contain local features of different structure and duration. Therefore, detection of geomagnetic anomalies is a complex and urgent task. The results of applying the method to geomagnetic data are shown in Figures 1-5. Figure 1 shows the histograms of two nodes of the wavelet packet tree, the dashed line shows the estimated σ. The results support the need to adapt the threshold to the scale. The tree constructed on the basis of the ACAS up to the 6th level using Coiflet 3 is shown in Figure 2. Geomagnetic pulsations are determined by the detailing nodes of the tree, therefore the nodes (j i , 0) are not analyzed below. Table 1 and Figure 3 present the error estimation results, which show that Coiflet 5 provides the smallest losses for different periods of solar activity. The results of detection of geomagnetic pulsations using different wavelets are shown in Figure 4.
The results of risk estimates (errors of the 1st and 2nd kind) are presented in Table 2. They show that the threshold coefficient K = 2.5 gives the best results. Risks were estimated based on statistical modeling. As a comparison, Figure 5 shows the results of the application of the threshold T o = σ √ 2 ln N and the threshold T j i = K * σ l j i , with K = 2.5. It can be seen that threshold T o , which is close to optimal one, does not allow us to suppress the noise. The results confirm the effectiveness of the proposed method.
Mathematics 2021, 9, x FOR PEER REVIEW 6 of 15 Figure 1 shows the histograms of two nodes of the wavelet packet tree, the dashed line shows the estimated . The results support the need to adapt the threshold to the scale. The tree constructed on the basis of the ACAS up to the 6th level using Coiflet 3 is shown in Figure 2. Geomagnetic pulsations are determined by the detailing nodes of the tree, therefore the nodes ( , 0) are not analyzed below. Table 1 and Figure 3 present the error estimation results, which show that Coiflet 5 provides the smallest losses for different periods of solar activity. The results of detection of geomagnetic pulsations using different wavelets are shown in Figure 4.
The results of risk estimates (errors of the 1st and 2nd kind) are presented in Table 2. They show that the threshold coefficient = 2.5 gives the best results. Risks were estimated based on statistical modeling. As a comparison, Figure 5 shows the results of the application of the threshold о = √2 and the threshold = * , with = 2.5.
It can be seen that threshold о , which is close to optimal one, does not allow us to suppress the noise. The results confirm the effectiveness of the proposed method.   Mathematics 2021, 9, x FOR PEER REVIEW 6 of 15 Figure 1 shows the histograms of two nodes of the wavelet packet tree, the dashed line shows the estimated . The results support the need to adapt the threshold to the scale. The tree constructed on the basis of the ACAS up to the 6th level using Coiflet 3 is shown in Figure 2. Geomagnetic pulsations are determined by the detailing nodes of the tree, therefore the nodes ( , 0) are not analyzed below. Table 1 and Figure 3 present the error estimation results, which show that Coiflet 5 provides the smallest losses for different periods of solar activity. The results of detection of geomagnetic pulsations using different wavelets are shown in Figure 4.
The results of risk estimates (errors of the 1st and 2nd kind) are presented in Table 2. They show that the threshold coefficient = 2.5 gives the best results. Risks were estimated based on statistical modeling. As a comparison, Figure 5 shows the results of the application of the threshold о = √2 and the threshold = * , with = 2.5.
It can be seen that threshold о , which is close to optimal one, does not allow us to suppress the noise. The results confirm the effectiveness of the proposed method.

Detection of Sporadic Features in Neutron Monitor Data
Cosmic ray dynamics are studied using neutron monitor data. Neutron monitor data represent particle counts per unit time and reflect the secondary cosmic ray intensity. In addition to the useful information, the data contain a high level of noise, including natural and human-made interferences [4]. Periodic variations correspond to the regular course and anomalous (sporadic) features characterize Forbush effect occurrences and strong ground level enhancements (GLE events). Sporadic features have different shapes and durations, and their detection is a difficult problem [4]. Of particular interest is the problem of detecting low-amplitude sporadic features that serve as predictors of magnetic storms [4]. Figures 6-11 show the results of applying the method to the neutron monitor (NM) data of the Inuvik station (USA, [24]). Figure 6 shows the histograms of two nodes of the wavelet packet tree. The dashed line shows the estimated σ. The results are similar to geomagnetic data.
To assess the effectiveness of the proposed method, its results were compared with the results of the autoencoder method [25]. Autoencoder is a deep feed forward neural network using back propagation and unsupervised learning. The autoencoder network is described in detail in [25]. The work used undercomplete autoencoders. Minimizing the approximation error, the undercomplete autoencoders allow one to isolate dependencies

Detection of Sporadic Features in Neutron Monitor Data
Cosmic ray dynamics are studied using neutron monitor data. Neutron monitor data represent particle counts per unit time and reflect the secondary cosmic ray intensity. In addition to the useful information, the data contain a high level of noise, including natural and human-made interferences [4]. Periodic variations correspond to the regular course and anomalous (sporadic) features characterize Forbush effect occurrences and strong ground level enhancements (GLE events). Sporadic features have different shapes and durations, and their detection is a difficult problem [4]. Of particular interest is the problem of detecting low-amplitude sporadic features that serve as predictors of magnetic storms [4]. Figures 6-11 show the results of applying the method to the neutron monitor (NM) data of the Inuvik station (USA, [24]). Figure 6 shows the histograms of two nodes of the wavelet packet tree. The dashed line shows the estimated σ. The results are similar to geomagnetic data.
To assess the effectiveness of the proposed method, its results were compared with the results of the autoencoder method [25]. Autoencoder is a deep feed forward neural network using back propagation and unsupervised learning. The autoencoder network is described in detail in [25]. The work used undercomplete autoencoders. Minimizing the approximation error, the undercomplete autoencoders allow one to isolate dependencies in the data and to suppress noise. The logistic sigmoidal function was used as the encoder transfer function: f (z) = 1 1 + e −z As the transfer function of the decoder, a linear transfer function was used: To train the network, we used NM data for 2013-2015. Figure 7 shows the results of NM data processing by different methods. The threshold T o = σ √ 2 ln N, the threshold T j i = K * σ l j i and the autoencoder network were used. It is clear that the threshold T o = σ √ 2 ln N does not allow us to suppress all the noise. The error of network approximation increases sharply in the test data. That is associated with the effect of retraining and the need to adapt the network. The best results are obtained by the proposed method with the threshold T j i = 2.5 * σ l j i . Table 3 shows the errors of different methods. Analysis of Table 3 also confirms the effectiveness of the proposed method. Figures 8 and 9 show the results of detecting Forbush effects in NM data based on the proposed method and autoencoder. Data were processed sequentially. First, the data were processed based on the ACAS (algorithm for constructing an approximating scheme). Then, the anomaly detection algorithm described in Appendix A [26] was applied. Alternatively, the data were processed by the autoencoder network, then the anomaly detection algorithm was applied. Forbush effects were detected on 4, 6, 8, 10, 11, 12 September 2014 [27]. In Figure 9, the periods of Forbush effects are marked with red vertical lines. The results show that application of both the autoencoder and the proposed method makes it possible to detect anomalies. It can be seen that the Forbush effect on 22 December was successfully detected based on the proposed method (Figure 9a), and the Forbush effect on 31 December was detected by a neural network (Figure 9b). For detailed comparison of the results, Figures 10 and 11 show periods of Forbush effects of different structures. Figure 10 shows the Forbush effect of a multiscale structure, which was detected by the proposed method. As the result in Figure 11 shows, the Forbush effect of a narrower spectrum was detected by the autoencoder.
Thus, the joint application of these methods increases the efficiency of the problem solution. Estimation of the efficiency of the proposed method is illustrated in Table 4. It shows that its efficiency (over 86%) exceeds that of the autoencoder neural network.
Forbush effect of a multiscale structure, which was detected by the proposed method. As the result in Figure 11 shows, the Forbush effect of a narrower spectrum was detected by the autoencoder.
Thus, the joint application of these methods increases the efficiency of the problem solution. Estimation of the efficiency of the proposed method is illustrated in Table 4. It shows that its efficiency (over 86%) exceeds that of the autoencoder neural network.    Table 3. Estimation of the approximation error.