Proposal of a Methodology Based on Using a Wavelet Transform as a Convolution Operation in a Convolutional Neural Network for Feature Extraction Purposes

Pérez-Quezadas, Nora Isabel; Benítez-Pérez, Héctor; Durán-Chavesti, Adrián

doi:10.3390/a18040221

Open AccessArticle

Proposal of a Methodology Based on Using a Wavelet Transform as a Convolution Operation in a Convolutional Neural Network for Feature Extraction Purposes

by

Nora Isabel Pérez-Quezadas

,

Héctor Benítez-Pérez

and

Adrián Durán-Chavesti

^*

Departamento de Ingeniería en Sistemas Computacionales y Automatización (DISCA), Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas (IIMAS), CDMX, Universidad Nacional Autónoma de México (UNAM), Mexico City 04510, Mexico

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(4), 221; https://doi.org/10.3390/a18040221

Submission received: 6 February 2025 / Revised: 28 March 2025 / Accepted: 7 April 2025 / Published: 11 April 2025

(This article belongs to the Section Algorithms for Multidisciplinary Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Using methodological tools to construct feature extraction from multidimensional data is challenging. Different treatments are required to build a coherent representation with those features that can be attenuated by various phenomena inherent to the observed process. It is interesting to note that in this methodological generation, several methods converge, such as Wavelet transform, focusing on convolution processing, windowed data shifting, and classification via Self-Organizing Maps. Likewise, a case study is presented in this work, allowing us to understand the scope of this methodological tool using an information cube to detect common features, as discussed previously.

Keywords:

machine learning; neural networks; discrete wavelet transform

1. Introduction

In this paper, we propose the use of a Discrete Wavelet transform as the convolution operation in a convolutional neural network [1,2] revised in [3]. In recent years, convolutional neural networks (CNNs) have had multiple applications in different areas, mainly in image recognition or feature separation in an information cube due to their excellent classification capacity, the flexibility of its algorithm [2], and the ability to pair to solve different problems in various areas [4] such as feature analysis [5]. The design of different CNN architectures [6,7] has progressed to improve the results of certainty, as well as save the computing resources used. To enhance these architectures, different hybrid CNN algorithms have been proposed [8,9] that apply to various problems and present advantages over traditional CNN [10] by being able to process information on multiple scales and various measurements [4] such as presented in [11]. Hybrid CNNs are a combinatorial strategy that uses other algorithms to improve several results, such as accuracy. In particular, using the multi-resolution strategy allows one to separate the multifrequency response, a typical situation within seismic signatures such as those presented by [12,13]. In that respect, several procedures may be followed from geophysics studies. Nevertheless, this particular approach of highlighting several features independently from frequency responses, as well as reclassifying this behavior at the rest of those geophysical characteristics, is quite valuable from these authors as well.

Information flows in a traditional CNN and is redefined based on interaction in the design of its architecture with the Discrete Wavelet transform [14,15] to provide an alternative to the convolution of a CNN and analyze the advantages and disadvantages that this proposal contains. In different works, the contribution of the Wavelet transform in the use of a CNN is known [6,16,17,18]. However, it is commonly used to process the data before it enters the CNN [17,19].

Unlike these approaches, a time-frequency decomposition method is proposed [9,20,21], seeking to recover as much information as possible and then organize it sensitively into patterns that allow categorizing specific characteristics with better sensitivity compared to a traditional CNN [22,23]. It should be noted that the approach followed in this paper incorporates the use of the Pooling algorithm as a matrix-based information compaction strategy, achieving the representation of relevant information between neighboring vectors.

This method aims to extract features of specific interest in a data interval scaled in power, time, and frequencies containing multidimensional data using a proximity map between them. The assertiveness rate designed by the user is of particular interest. A Discrete Wavelet transform explores focused frequency response in terms of several characteristics that may be highlighted later.

In the context of this work, it is interesting to point out the number of variables (Table 1), both global and local, which play an essential factor in the decision-making for the construction of the method, making this composition of local algorithms a global strategy feasible to be optimized under multiple metrics such as processing speed or assertiveness in the extraction of features, as well as other Merit features that are analyzed at the end of this work.

2. The Proposed Methodology

The structure of a CNN follows different steps throughout the data classification process [22,23,24], which can be alternated and applied several times to achieve a more complex structure according to the needs of the problem studied [3,16,25]. This section briefly explains the proposed method that varies from traditional CNN and includes the following steps (Figure 1).

Data

A set of data (signals, images, information in various dimensions, etc.) must be considered homogeneously: vectors with the same length and representing the same phenomenon give an information cube. The above is equivalent to saying that having vectors in the same vector space is necessary. In this sense, a set of data must be built that, on the one hand, is sensitive to the changes observed in the phenomenon. (diversity of information intrinsic to its entropy) that allow the classification of stably diverse patterns formed in different situations [26,27]. The data are intended to allow for the determination of characteristics independently of the scale [10,28,29] (Figure 2).

For this purpose, a sliding window of

β

x1 is constituted with the input data with a shift of

γ

that allows the extraction of local characteristics typical of nonpersistent behaviors in a focused way. It is interesting to note that comparing windows with standard information provides a better convolution strategy, which is primarily based on a discrete Mother Wavelet strategy such as Morlet wavelet. Given the latter, it is of interest to generate such pre-processing.

B.: Convolution

The convolution can be interpreted as a “moving weight average” where the “weight” is determined by a function g(x). This solution is commutative. Convolution is an operation that integrates several data points from an information source, such as a function f(x), compared to a given kernel g(x). Since the function f(x) may present significant variations, such as peaks or discontinuities, averaging around each point x tends to decrease these variations, lowering the peaks and smoothing the discontinuities. The use of DWT instead of Learned Kernels from CNN provides the precision to cover all possible ranges of scales, giving the possibility of highlighting features with loss of power. With the hypothesis that the Discrete Wavelet transform is a convolution in the strict sense, the insertion of this in the convolution of a convolutional neural network and modifying the characterization procedure by using the Self-Organizing Map is proposed (Appendix A).

C.: Normalization

In this normalization step, the dimension of the vectors does not change compared to the previous step. It is carried out only on the basis of the analysis of the infinite norm in a global manner and is applied to the entire data cube.

D.: Pooling

The next step is Pooling. In the case of Wavelet, this can be Pooling or Unpooling—corresponding to data reductions or increases, respectively—in the vectors resulting from convolution [1,3,14]. In the case of Pooling, Max Pooling or Average Pooling can be used. Otherwise, data are added as either repeated, average, or zero.

In the case of the present manuscript, a feature map is built to determine regions of similarity from a Self-Organized Map.

E.: Classification

The traditional CNN algorithm establishes a conventional neural network such as the Multilevel Perceptron. Multilevel Perceptron is a well-explored neural network and is known for its strategy in both training and classification, where the authors must carefully establish the parameters that will give the depth of the network. The input is the flattened vector. This is where neural network learning takes place through a feedforward neural network process applying backpropagation (which is the application of the chain rule) for the training and learning of the neural network. This is about calculating the gradient of the loss function with the weight function, known as the gradient descent, which is a time-consuming strategy. In the case of this manuscript, the use of a Self-Organizing Map is established since it is required to obtain a classification such that the information contained in the information cube can be distributed within the map in such a way that it allows highlighting characteristics not defined by the various scales.

Once a feature approximation map has been built, This leads to multicolor classification printing on the rest of the information cube once learning is ensured from the number of established epochs. This classification must be performed using some classification method, such as the Self-Organizing Map already mentioned, which will be discussed in detail in this manuscript. It is interesting to note that the classification presented as a multicolor illumination has already been worked on by Molino in [20].

It is essential to note that the parameters are considered in each step and must be chosen carefully. In the rest of this work, we will provide context for the number of variables to be selected and the effect these variables will have on the global result. The adjustment of these will be a matter of exploration based on global optimization. It cannot be the result of an approximation without the capacity for revision, rectification, and repetition. It should be noted that this set of steps results from a thorough review of different bibliographies from different areas described in the Introduction, seeking to give certainty and optimization on a multiscale scale in an information cube that must be multipurpose.

The idea behind integrating DWT instead of a local group of convolution filters is to cover the whole spectrum of scales by using diverse approximation levels. The final objective is to highlight a specific feature through a convolution strategy; thus, ensuring feature selections through multiscale approximation is a suitable technique for this purpose. It is important to explore this strategy as an integrated aspect of a Self-Organizing Map and a sliding window, as presented in the following section, to highlight how this information is recovered.

3. Structure

Unlike the traditional information processing algorithm based on a data planning structure, an original time–frequency decomposition is proposed, which allows the preprocessing to have more information regardless of the scales. This proposal contemplates several steps that need to be described in detail in terms of the scalar elements processed and the results of the neural network, which, for our purposes, will be a Self-Organizing Map. Initially, the information is presented by forming various packages at multiple scales from an information cube, as shown in Figure 2. In this sense, the management of the indices generates the strict follow-up of the geometry of the information that leads us to provide an order of the same indexing without losing the relevant characteristics in local terms and their collateral effects with geometrically close regions (Figure 3).

It is essential to establish that the scope of such a learning strategy by the Self-Organizing Map is a distribution of characteristics on a map of K neurons distributed so as to allow a stable representation construction. This is invariant for the type of information to be processed.

As a second stage, the characterization of the rest of the information is based on the trained neurons and the generated map, where it is possible to determine common characteristics in the rest of the maps that, for the particular example, refer to the planes of the information cube. In Section 4, more details of the case studies are given.

Based on these two stages, a stable notation must be generated that allows us to distinguish the processing of the information. For the case of Figure 2 the depth will be defined as

κ

.

The generated notation is as follows, let

a_{i, j, κ}^{μ, ζ} d_{i, j, κ}^{μ, ζ}

(1)

where:

$a_{i, j, κ}^{μ, ζ}$ : represents the element of the vector processed by the Wavelet, which we will call approximations;
d: represents the element of the vector processed by the Wavelet, which we will call details;
$κ$ : represents the plane based on the information cube to be processed, $κ = 1, \dots, θ$ ;
j: initial vector number, $j = 1, \dots, n$ ;
i: number of elements in the vector, $i = 1, \dots, m$ ;
$μ$ : number of applied Wavelet;
$ζ$ : level of applied Wavelet;
$α$ : original vector data bounded by the sliding window of $β$ with $γ$ .

Input vectors

V_{j}

correspond to the input plane with

κ = 1, \dots, θ

and scalar value

α_{i, j}

.

There are n input vectors of length m

\begin{matrix} V_{1} & V_{2} & \dots & V_{n} \\ α_{1, 1} & α_{1, 2} & \dots & α_{1, n} \\ α_{2, 1} & α_{2, 2} & \dots & α_{2, n} \\ α_{3, 1} & α_{3, 2} & \dots & α_{3, n} \\ \dots & \dots & \dots & \dots \\ α_{m, 1} & α_{m, 2} & \dots & α_{m, n} \end{matrix}

(2)

The idea of using Wavelet decomposition levels for feature extraction is that a characteristic presented within the analyzed data is highlighted from the rest of the information for a given type of scale. This procedure occurs as many levels are selected, extracting information as an accurate and reproducible mechanism.

This experimental process is a Discrete Wavelet to the initial data, processed after separation through a sliding window. For this example, we will apply it such that depending on the number of levels, we will have an Information Distribution that will depend on the size of the initial information cube and the initial sliding window

W T_{1}

in parallel with two levels to the initial data as shown below:

3.1. Convolution: Application of Wavelet

Discrete Wavelet transform is commonly used to separate approximations and detail coefficients that regulate certain representations at several levels. For example, A represents the approximation and D represents the detail vectors of the input vectors following Equation (2), as presented in the following representation. Further decomposition is possible, e.g., at level 2, where the approximation and detail vectors are referred to in Vector 1 (Appendix D).

For the

W T_{1}

V_{1}

→

\{\begin{matrix} A_{1}^{1, 1} \to \{\begin{matrix} A_{1}^{1, 2} \\ D_{1}^{1, 2} \end{matrix} \\ D_{1}^{1, 1} \end{matrix}

V_{2}

→

\{\begin{matrix} A_{2}^{1, 1} \to \{\begin{matrix} A_{2}^{1, 2} \\ D_{2}^{1, 2} \end{matrix} \\ D_{2}^{1, 1} \end{matrix}

V_{m}

→

\{\begin{matrix} A_{m}^{1, 1} \to \{\begin{matrix} A_{m}^{1, 2} \\ D_{m}^{1, 2} \end{matrix} \\ D_{m}^{1, 1} \end{matrix}

This will give us the result that each vector

V_{i}

is “decomposed” into several vectors

D_{i}^{1, 1}, D_{i}^{1, 2}, A_{i}^{1, 2}

of length

n / 2, n / 4

, and

n / 4

, respectively, if they were two levels (Figure 4).

3.2. Pooling

In this context, the variation of the characteristics by sliding windows at the beginning of the processing makes it possible to detect high-frequency events. However, this depends exclusively on the initial sampling capacity during data acquisition, in other words, on the information cube. This condition is crucial to define several parameters around this algorithm that are listed in Table 1, such as the numerical definition of sliding windows, the type of mother Wavelet to use, the type of Pooling to use, and the number of neurons, among others.

Concerning the procedure called Pooling, we have one option to handle the sampling, which is part of the conceptual definition of the information cube. In the remainder of the section, we will describe some differences that we consider of interest for the context of the article, concentrating on the processing used in the proposed methodology.

After forming the matrices, Pooling is applied to each matrix, for this case,

D_{i, j}^{1, 1}

,

D_{i, j}^{1, 2}

,

A_{i, j}^{1, 2}

. This manuscript employs Max Pooling for dimensionality reduction.

In this case, two-dimensional Pooling is conducted on the matrices

D_{i, j}^{1, 1}

,

D_{i, j}^{1, 2}

, and

A_{i, j}^{1, 2}

, so that the matrices have the exact dimensions, that is:

From mx

\frac{n}{2}

to mx

\frac{n}{4}

, Pooling is conducted on the larger matrices.

Then, matrices of the same size will be obtained. However, different information will be taken for each expression.

Concerning these strategies, we have sought to extract common characteristics allowing determination of the sensitivity of the aggregation, depending on the direction of the search for characteristics. In this sense, the way the information cube is observed based on the indexes (m, n,

θ

) and

β

,

γ

plays a fundamental role.

Now, for the specific case of the strategy proposed in this manuscript, we define an approximation based on the Approximation and Detail matrices formed in the previous section,

D_{M, N}^{1, 1}, D_{M, N}^{1, 2}, A_{M, N}^{1, 2}, A_{M, N}^{1, 1}

(Appendix B).

Concerning the last matrix, we have the following representation that allows the formation of the indexes necessary for the Pooling operation.

A_{i, j}^{1, 1} = [\begin{matrix} a_{1, 1, κ}^{1, 1} & a_{1, 2, κ}^{1, 1} & \dots & a_{1, n, κ}^{1, 1} \\ a_{2, 1, κ}^{1, 1} & a_{2, 2, κ}^{1, 1} & \dots & a_{2, n, κ}^{1, 1} \\ a_{3, 1, κ}^{1, 1} & a_{3, 2, κ}^{1, 1} & \dots & a_{3, n, κ}^{1, 1} \\ \dots & \dots & \dots & \dots \\ \dots & \dots & \dots & \dots \\ \dots & \dots & \dots & \dots \\ a_{\frac{m}{2}, 1, κ}^{1, 1} & a_{\frac{m}{2}, 1, κ}^{1, 1} & \dots & a_{\frac{m}{2}, n, κ}^{1, 1} \end{matrix}]

(3)

For the matrix proposed to generate the Pooling (in our case, we used Max Pooling, being a variable to be discussed in another research context) based on the Details, we will call it

σ

, and for what is related to the Approximations, we will call it

ψ

. Therefore, the dimensions of both matrices will consist of two fundamental parameters known as

S_{1}

, being the output columns in the Pooling process, i.e., the size of the window concerning the local columns (it is also the length of the input vector referring to n in Equation (1)) and

S_{2}

being the output rows in the Pooling process corresponding to the local window in terms of the Pooling window (it also expresses the number of input vectors used by the algorithm, m in Equation (1)). Then, there are two matrices with the following dimensions

σ_{S_{1}, S_{2}}, ψ_{S_{1}, S_{2}}

, which are based on the following equation:

S_{1} = \frac{(n - P S_{1})}{S T_{1}} + 1

(4)

S_{2} = \frac{(m - P S_{2})}{S T_{2}} + 1

(5)

where:

n: Full input length concerning Equation (1) for the case of m, defined concerning the pool determined by the data division by the predecessor process.
$P S_{1}$ : Pooling window size in terms of columns; in this case, the window $P S_{2}$ in terms of rows is equal.
$S T_{1}$ : Pooling division, called shifting between Pooling windows; for the case of $S T_{2}$ , the same value would be proposed.

If we consider

S_{1}

and

S_{2}

as the limits to be defined, the number of input vectors gives these in

S_{2}

. The number of points represented by the data division coming from the Wavelets; therefore, for the first level, it is

\frac{m}{2}

, while for the second level, it is

\frac{m}{4}

.

Given the information obtained by the proposed window concerning

P S_{1}

, defining an interaction relating to the total data in

S_{1}

and

S_{2}

is essential. We will define this variable as t, which is given by:

1 \leq t \leq ⌊ \frac{P S_{1}}{S T_{1}} ⌋

(6)

3.3. Classification

Starting with the context of the necessary processing for a classification and pattern recognition process is essential when processing the information cube and all the proposed decompositions.

Within Machine Learning, we find different types of learning [10]. The one that interests us due to the nature of some problems that are represented in the selection of characteristics in a multiscale way is Unsupervised Learning, where there are no defined labels for the output of this type of algorithm, but the same structure of the data allows learning and classification according to this algorithm. Due to knowledge representation, as long as new patterns are defined during Wavelet decomposition and Pooling processing. These characteristics constantly occur in self-organizing representation in an unsupervised strategy.

The unsupervised algorithm to be explored is the SOM (Self-Organizing Maps), which gives us a representation respecting the multidimensionality of each vector resulting from the Approximation and Detail generated by the Wavelet and processed by local selection in the Pooling mechanism and the decomposition from the sliding window proposed as preprocessing of the information cube. The challenge in the case study is to identify faults within large sections of the stack information cube as presented by [13,20].

SOM (Self-Organizing Maps)

A Self-Organizing Map (SOM) model has two layers of neurons. The input layer, with (

S_{1}

*

S_{2}

) neurons, one for each input data, and the output layer, formed by

ρ

neurons. The output layer stores the information and forms the feature map. The information is propagated from the initial or input layer to the final layer or output data. Each neuron i of the initial layer is connected through the weights

Ω_{i, j}

with each of the neurons

η

*

ξ

of the final layer or output. This product will be developed in the remainder of this section.

In this type of neural network, we use competitive learning, which means that the layer of neurons modifies its weights so that these weights are increasingly more

s i m i l a r

than the input data. The competitive learning concept presents the adaptive weight strategy, allowing one to incorporate knowledge at a specific neuron and the related modification at the surrounding neurons. These weights are called BMU (best matching unit), which, in our case, will be stored in the matrix

Ω

. Each neuron in the output layer has as input the data vector modified by the weight vector

Ω

. Once this is completed,

c o m p e t e n c e

consists of comparing the weight vectors with the input data

Δ_{1, 1}^{κ}

and verifying which will be the closest among them using the algebraic expression:

d i f_{i, j} = ‖ Ω_{i, j} - Δ_{1, 1}^{κ} ‖

(7)

We will concretely develop this equation concerning the work followed in previous sections. One of the fundamental characteristics in generating the map is the approximation of neighboring neurons to the winning neuron, but on a two-dimensional basis, which, in our case, is represented by a Gaussian equation, where

c_{i, j}^{2}

is the related variance.

h_{i, j} = e x p^{- \frac{d i f_{i, j}^{2}}{c_{i, j}^{2}}}

(8)

ϕ_{i, j} = \sqrt{h_{i, j}}

(9)

Therefore, concerning the modification of weights seen as a learning stage, a classical equation is proposed for the modification of weights.

Ω_{i, j} = Ω_{i, j} - α * φ_{i, j} * ((ψ_{ι, l}^{κ}, σ_{ι, l}^{κ}) - Ω_{i, j})

(10)

where:

$α$ is the learning rate or learning factor, which will be a value in the interval [0, 1];
$Ω$ is the weight matrix;
$Δ$ is the training matrix that is the product of the formation of the Approximations and the Details seen in the previous sections;
i, j corresponds to the iterations concerning the indices of the initial formation of the information cube based on Equation (1).

Since the multidimensional integration of the information in this processing mechanism is on an information cube that expands into several cubes depending on the number of levels generated by the wavelets and the decomposition from a local sliding window, it is of interest to have the tracking of the index as a consequent plane given in

θ

, as well as the indices expressed in Equations (A6)–(A8) and (3) (Appendix B and Appendix C). In this way, each data point backed up in

σ

and

ψ

has the following indexation:

σ_{i, j}^{κ}

(11)

ψ_{i, j}^{κ}

(12)

where:

$κ$ : the index of deepening planes based on Equation (1);
j: index of each data point in the vector resulting from the processing of the Pooling;
i: index on the column in the current plane resulting from processing the Pooling.

This results in the following equation, which is in the operation of each scalar in

Λ

d i f_{i, j} = ∥ Ω_{i, j} - Δ_{1, 1}^{κ} ∥ = \sqrt{{(ω_{i, j, 1} - ψ_{1, 1}^{κ})}^{2} + {(ω_{i, j, 2} - σ_{1, 1}^{κ})}^{2}}

(13)

Now, from the operation performed in

Λ

, where the minimum value is sought

Γ_{x, y} = m i n (Λ)

(14)

where x is the value in i, while y is the value in j that corresponds to the minimum in

Λ

with respect to

Δ_{1, 1}^{κ}

. This leads us to the fact that concerning the proximity function for each neuron in

κ

and according to the winning neuron given by the values x, y, it is proposed to use a multidimensional Gaussian expressed in Equation (8) to determine the proximities according to the differences as follows:

φ_{i, j} = \sqrt{e x p^{- \frac{d i f_{i, j}^{2}}{c_{i, j}^{2}}}}

(15)

Thus, the following matrix H is generated

H = [\begin{matrix} φ_{1, 1} & φ_{1, 2} & \dots & φ_{1, ξ} \\ φ_{2, 1} & φ_{2, 2} & \dots & φ_{2, ξ} \\ φ_{3, 1} & φ_{3, 2} & \dots & φ_{3, ξ} \\ \dots & \dots & \dots & \dots \\ \dots & \dots & \dots & \dots \\ \dots & \dots & \dots & \dots \\ φ_{η, 1} & φ_{η, 2} & \dots & φ_{η, ξ} \end{matrix}]

(16)

The feedback equation for the weight update is the following, based on the base equation expressed in Equation (10)

Ω_{i, j} = Ω_{i, j} - α * φ_{i, j} * ((ψ_{ι, l}^{κ}, σ_{ι, l}^{κ}) - Ω_{i, j})

(17)

Or what is equal to:

Ω_{i, j} = Ω_{i, j} - α * φ_{i, j} * ((ψ_{ι, l}^{κ}, σ_{ι, l}^{κ}) - (ω_{i, j, 1}, ω_{i, j, 2}))

(18)

Understanding that:

1 \leq i \leq η

1 \leq j \leq ψ

1 \leq ι \leq S 1

1 \leq l \leq S 2

Now, to implement the algorithm called SOM, in this case, we build a tuple of values for the dimension of the cube to be processed and incorporate the data into these for the training processing.

R e d S O M E (n c, κ, (S_{1}), (S_{2}))

(19)

where:

$n c$ is the number of features per observation to be considered.
$κ$ is the number of planes of interest for training.
$S_{1}$ is the number of vectors in the region of interest.
$S_{2}$ is the number of points in each vector in the region of interest.
For the construction of the SOM Analysis in terms of the characterization of features, the following tuple is proposed:

R e d S O M C (S_{1}, E_{1}, E_{2})

(20)

This implementation of the method as a whole is expressed in Appendix D of this manuscript. For this work and regarding the problem being addressed, the parameters are taken into account, and their variation intervals are:

Method:

The information cube is defined.
The sliding window is defined based on values $β$ and $γ$ .
The weight of each node $Ω_{i, j}$ is initialized to a random value.
The input vector V is chosen.
The Euclidean distance between the input vector V and the weight matrix $Ω$ is found.
The node that produces the smallest distance is found.
For each node of the two-dimensional map, Steps 3 and 4 are repeated.
The best matching unit (BMU), or closest element, is calculated.
The topological neighborhood of the BMU and radius are found in the map expressed in Equation (16).
The $κ$ neurons are determined.

4. Case Study

As mentioned above, the actual analysis of the data of an available data set will be a group of mechanical signals that we will call the seismic cube (Figure 5). For data confidentiality reasons, this set of signals will be treated only as input data with characteristics of signals in the frequency domain, omitting data such as location, extension, etc. The study was carried out in an area of Mexico that corresponds to an oil field. Through field studies, seismic data were obtained, processed, and converted to a readable format representing a set of vectors and ultrasound signals with 1200 entries. The number of vectors is around 1500 planes, with 600 points per vector and 350 vectors per plane.

Data were collected in the field. They were originally arranged three-dimensionally in a cube (seismic cube), in SEG-Y format, one of the various standard formats for this type of geophysical data. The seismic cube is made up of each of the ultrasound signals that travel through the subsurface and are captured at the surface. This process is explained in detail in [20].

These signals are the input data in our model and are convolved using a Wavelet decomposition strategy. This wavelet and the number of signal decomposition levels will be selected as parameters.

The next parameters to be selected correspond to the Pooling, where we choose the size of the Pooling matrix, which helps us reduce the dimensionality of the data after convolution, as well as the step size of this matrix over the data path. This is followed by the use of the Euclidean norm for data normalization.

The point of interest here is how the method classifies this data set according to the architecture’s adaptation of the Wavelet. An An analysis of this same data set was previously conducted with SOM without taking Pooling and sliding window processing stages; however, we seek to contribute with this work to understanding the effects of Wavelet coupling to a Self-Organizing Map by a sliding window.

In order to specify the experiments and compare them with several parameters, a valuable experiment is proposed, as shown in Figure 6. This particular result is presented in as Case 1 and is compared to several other experiments by the mean square error (MSE) from weight matrices (Equation (21)). MSE is a common comparison metric between vectors or matrices since the dimensionality is not lost and scaling is not a key factor in comparing values. MSE is used to compare matrices (weight matrices) since it is stable for defining the main differences between neurons.

In this study, our objective is to obtain, through the proposed algorithm, a classification of the data that allows us to identify areas with possible geological and geophysical characteristics that allow the storage of hydrocarbons following [20]. These areas are identified through the algorithm and visible in the graphics obtained after training the neural network and classifying the data.

The first step that has been performed is the identification of the input vectors,

V_{1}, V_{2}, \dots, V_{n}

, as described in Section 3. Each vector

V_{i}

represents data or a signal. The second step is to perform the algorithm, obtaining a particular weight matrix

Ω

considered as a basis, where the rest of the cases are those to be compared and named

\hat{Ω}

.

To test this proposal, a case study published in [20] is used, based on a seismic analysis taking partial data called stacked traces, as observed in Figure 5. Therefore, significant effort is necessary to combine all the possible variables (Table 1) to build the best case study from the proposed methodology. Understanding the meaning of correctness is crucial to allow for a proper selection of the most appropriate parameters. Table 2 shows some of the most essential sensitive variables to be selected. These are presented in terms of a suitable combination of current results. The size of the sliding window, called

P S_{1}

, is considered constant for this case study with a nominal of 11 points. Regarding mother wavelets, these are modified according to the parameter

μ

in Table 3. At Table 3, the resulting value is error, which is calculated following Equation (21).

error = \frac{1}{n} Σ_{i = 1}^{n} {(Ω_{i} - {\hat{Ω}}_{i})}^{2}

(21)

where n is the number of weights from the estimated weight matrix given a particular case (Table 3).

Given the analysis provided by the method presented in Section 2 and Section 3, the following result is presented: the sector to be scrutinized is linked as characteristics in the rest of the observed planes following the information cube. This result is the set of various patterns and is visualized in Figure 6 as in the results presented in [20], an identification of some cluster of characteristics is achieved sharing in such a way that they give us satisfactory results in both the separation and classification of the characteristics. The approximate results have been found by comparing to [30]. In this case, a map of neurons is given, where 36 neurons are built and illuminated through the image according to Figure 6 and the related Map Figure 7. In this initial case, the number of selected neurons is reflected in the numbers shown within each of the related neurons on the given map. Those selected neurons are represented in colors within Figure 6. The reader may identify several characteristics that are not highlighted in other cases, according to Table 2. These characteristics correspond to patterns identified within facies within the seismic data.

The following modifications (Table 2) are performed and shown in the Next figures from Cases 3, 8, 10, and 11 and the related Maps.

In this case, Figure 8 shows the resulting classification of the map based on the selection of a particular characteristic according to Figure 9. However, the reader may identify that the number of neurons and the number of points is the same in Case 1; the results are different since several seismic characteristics are not highlighted.

Now, in Cases 8 and 10, there are similar responses in terms of the characteristics highlighted in Figure 10 and Figure 12 that tend to be a blurred representation in the same state. Figure 11 and Figure 13 show the SOM maps for cases 8 and 10 respectively. In case 11 shown in Figure 14 with its corresponding SOM map shown in Figure 15, although the Daubechies 3 wavelet shows the area of interest, the Daubechies 2 wavelet shows better results.

In this sense, the plane named inline plane 2370 is used to train the algorithm as the first group of characteristics to be selected and then searched in the rest of the planes based on this group of characteristics. In this context, it has been possible to categorize stably and consistently from a predetermined decomposition based on frequency. Following accurate pattern recognition for feature extraction based on the variable selection.

Alternatively, a comparison between CNN and the proposal followed in this paper is made by processing the same selected plane as that used previously, as shown in Figure 16. The reader may realize that the image is quite blurred in comparison to Figure 6. The results of the processing of this information cube depict the number of selected neurons, as presented in Figure 17, where the information accumulates in 10 of the 36 neurons generated, allowing fewer features to be depicted.

5. Conclusions

This work presents an algorithm based on three essential pillars in the analysis and extraction of features: the decomposition of information, the expansion through the agreed compression of local information from the convolution, and the construction of maps of said features.

The functional contribution to knowledge is the ability to distinguish features with a high variation cost during a single segment of information calculation. In later applications, processing all the information to distinguish originally highlighted features is unnecessary. The procedure proposed in this paper presents various angles of local optimization from the perspective of global metrics, such as computation speed, which is not the focus of this paper.

This methodology uses maps to efficiently determine features with a particular type of scaling of specific interest in a map analysis. This particular interest is given by the rate of assertiveness designed by the user. In the case studies presented in this manuscript, the rate of reasonable coincidence is based on previous analysis. The use of the Wavelet is the multidimensional convolution process, the pooling process is used to determine the most critical effect between neighboring data, and the construction of a Self-Organizing Map is used to highlight differences through a multidimensional analysis.

Author Contributions

All authors contribute the same for this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by PAPIIT-UNAM grant number IT101323.

Data Availability Statement

Data is private due to confidentiality statements.

Acknowledgments

The authors are grateful for UNAM and its institute’s support, especially Rita Rodríguez-Martínez and Ricardo Villarreal-Martínez of the Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas (IIMAS).

Conflicts of Interest

Authors do not have any conflict of interest.

Appendix A

Definition: Given two functions f(x) and g(x), the convolution of f(x) and g(x), denoted h(x) = f*g(x), is defined as:

f * g (x) = \int_{R} f (x) g (x - t) d t

(A1)

Wavelets are signals that meet different characteristics:

(a): Finite;
(b): Zero energy or zero average.

Mother wavelet

ψ_{a, b} (t)

:

ψ_{a, b} (t) = 1 / \sqrt{a} \cdot ψ (t - b / a)

(A2)

where a represents the scaling, and b is the displacement.

Continuous Wavelet transform is defined as:

C W T (a, b) = 1 / \sqrt{a} \int_{- \infty}^{\infty} x (t) ψ (t - b / a) d t

(A3)

where

x (t)

represents the input function.

The Discrete Wavelet transform appears when the parameters a and b take the values:

a = 2^{- j}

and

b = k 2^{- j}

, with j and k in the set of integers.

In this way, the Discrete Wavelet transform is defined as:

D W T (a, b) = 1 / 2^{j / 2} \int_{- \infty}^{\infty} x (t) ψ (2^{j} t - k) d t

(A4)

where

x (t)

represents the input vector.

The signal reconstruction is obtained from the following equation:

x (t) = \sum_{k} \sum_{j} c_{j, k} ϕ (t) + \sum_{k} \sum_{j} d_{j, k} ψ (t); j, k \in Z

(A5)

where

c_{j, k}

represents the scale or approximation coefficients and

d_{j, k}

the wavelet or detail coefficients.

Each input vector is convolved with a Wavelet type at a certain level. This is analogous to the use of a filter or kernel in a conventional CNN [10,22,31] such as the filters used in [28]. In this step, each data vector is transformed or decomposed into vectors of shorter lengths, depending on the Wavelet level used. Each output vector expresses features differently from the original vector (in the case of signals, different frequencies are represented in each signal). In this step, two additional results tend to describe the information in a focused way; one is the shift, and the other is the fading, which reduces or increases the amount of data in the original vector at the time of the convolution [3]. It is interesting to note that the use of the Discrete Wavelet represents a main advantage, which is to build a convolution in all areas of the information to be explored such that it is proposed as the substitution of the Pooling tool for this purpose. Likewise, the main action in using this type of Wavelet is to decompose the information in a multiscale manner, allowing the analysis of approximations that can separate and delimit a particular characteristic.

Appendix B

Thus, we apply the

W T 1

at level 1 and 2 for each

V_{i}

\begin{matrix} V_{1} \\ α_{1, 1} \\ α_{2, 1} \\ α_{3, 1} \\ \dots \\ \dots \\ \dots \\ α_{m, 1} \end{matrix} \to \begin{matrix} D_{1}^{1, 1} & A_{1}^{1, 1} \\ d_{1, 1, κ}^{1, 1} & a_{1, 1, κ}^{1, 1} \\ d_{2, 1, κ}^{1, 1} & a_{2, 1, κ}^{1, 1} \\ d_{3, 1, κ}^{1, 1} & a_{3, 1, κ}^{1, 1} \\ \dots & \dots \\ \dots & \dots \\ \dots & \dots \\ d_{\frac{m}{2}, 1, κ}^{1, 1} & a_{\frac{m}{2}, 1, κ}^{1, 1} \end{matrix} \to \begin{matrix} D_{1}^{1, 2} & A_{1}^{1, 2} \\ d_{1, 1, κ}^{1, 2} & a_{1, 1, κ}^{1, 2} \\ d_{2, 1, κ}^{1, 2} & a_{2, 1, κ}^{1, 2} \\ d_{3, 1, κ}^{1, 2} & a_{3, 1, κ}^{1, 2} \\ \dots & \dots \\ \dots & \dots \\ \dots & \dots \\ d_{\frac{m}{4}, 1, κ}^{1, 2} & a_{\frac{m}{4}, 1, κ}^{1, 2} \end{matrix}

\begin{matrix} V_{2} \\ α_{1, 2} \\ α_{2, 2} \\ α_{3, 2} \\ \dots \\ \dots \\ \dots \\ α_{m, 2} \end{matrix} \to \begin{matrix} D_{2}^{1, 1} & A_{2}^{1, 1} \\ d_{1, 2, κ}^{1, 1} & a_{1, 2, κ}^{1, 1} \\ d_{2, 2, κ}^{1, 1} & a_{2, 2, κ}^{1, 1} \\ d_{3, 2, κ}^{1, 1} & a_{3, 2, κ}^{1, 1} \\ \dots & \dots \\ \dots & \dots \\ \dots & \dots \\ d_{\frac{m}{2}, 2, κ}^{1, 1} & a_{\frac{m}{2}, 2, κ}^{1, 1} \end{matrix} \to \begin{matrix} D_{2}^{1, 2} & A_{2}^{1, 2} \\ d_{1, 2, κ}^{1, 2} & a_{1, 2, κ}^{1, 2} \\ d_{2, 2, κ}^{1, 2} & a_{2, 2, κ}^{1, 2} \\ d_{3, 2, κ}^{1, 2} & a_{3, 2, κ}^{1, 2} \\ \dots & \dots \\ \dots & \dots \\ \dots & \dots \\ d_{\frac{m}{4}, 2, κ}^{1, 2} & a_{\frac{m}{4}, 2, κ}^{1, 2} \end{matrix}

...

\begin{matrix} V_{n} \\ α_{1, n} \\ α_{2, n} \\ α_{3, n} \\ \dots \\ \dots \\ \dots \\ α_{m, n} \end{matrix} \to \begin{matrix} D_{m}^{1, 1} & A_{m}^{1, 1} \\ d_{1, n, κ}^{1, 1} & a_{1, n, κ}^{1, 1} \\ d_{2, n, κ}^{1, 1} & a_{2, n, κ}^{1, 1} \\ d_{3, n, κ}^{1, 1} & a_{3, n, κ}^{1, 1} \\ \dots & \dots \\ \dots & \dots \\ \dots & \dots \\ d_{\frac{m}{2}, n, κ}^{1, 1} & a_{\frac{m}{2}, n, κ}^{1, 1} \end{matrix} \to \begin{matrix} D_{m}^{1, 2} & A_{m}^{1, 2} \\ d_{1, n, κ}^{1, 2} & a_{1, n, κ}^{1, 2} \\ d_{2, n, κ}^{1, 2} & a_{2, n, κ}^{1, 2} \\ d_{3, n, κ}^{1, 2} & a_{3, n, κ}^{1, 2} \\ \dots & \dots \\ \dots & \dots \\ \dots & \dots \\ d_{\frac{m}{2}, n, κ}^{1, 2} & a_{\frac{m}{2}, n, κ}^{1, 2} \end{matrix}

Thus, we proceed analogously for WT2 (and in general for the different wavelets to be applied), forming matrices with

\frac{m}{2}

and

\frac{m}{4}

rows, respectively.

[D_{1}^{1, 1} D_{2}^{1, 1} D_{3}^{1, 1} \dots D_{n}^{1, 1}], [D_{1}^{1, 2} D_{2}^{1, 2} D_{3}^{1, 2} \dots D_{n}^{1, 2}], \dots, [A_{1}^{1, 2} A_{2}^{1, 2} A_{3}^{1, 2} \dots A_{n}^{1, 2}]

That is,

D_{i, j}^{1, 1} = [\begin{matrix} d_{1, 1, κ}^{1, 1} & d_{1, 2, κ}^{1, 1} & \dots & d_{1, n, κ}^{1, 1} \\ d_{2, 1, κ}^{1, 1} & d_{2, 2, κ}^{1, 1} & \dots & d_{2, n, κ}^{1, 1} \\ d_{3, 1, κ}^{1, 1} & d_{3, 2, κ}^{1, 1} & \dots & d_{3, n, κ}^{1, 1} \\ \dots & \dots & \dots & \dots \\ \dots & \dots & \dots & \dots \\ \dots & \dots & \dots & \dots \\ d_{\frac{m}{2}, 1, κ}^{1, 1} & d_{\frac{m}{2}, 2, κ}^{1, 1} & \dots & d_{\frac{m}{2}, n, κ}^{1, 1} \end{matrix}]

(A6)

D_{i, j}^{1, 2} = [\begin{matrix} d_{1, 1, κ}^{1, 2} & d_{1, 2, κ}^{1, 2} & \dots & d_{1, n, κ}^{1, 2} \\ d_{2, 1, κ}^{1, 2} & d_{2, 2, κ}^{1, 2} & \dots & d_{2, n, κ}^{1, 2} \\ d_{3, 1, κ}^{1, 2} & d_{3, 2, κ}^{1, 2} & \dots & d_{3, n, κ}^{1, 2} \\ \dots & \dots & \dots & \dots \\ \dots & \dots & \dots & \dots \\ \dots & \dots & \dots & \dots \\ d_{\frac{m}{4}, 1, κ}^{1, 2} & d_{\frac{m}{4}, 2, κ}^{1, 2} & \dots & d_{\frac{m}{4}, n, κ}^{1, 2} \end{matrix}]

(A7)

...

A_{i, j}^{1, 2} = [\begin{matrix} a_{1, 1, κ}^{1, 2} & a_{1, 2, κ}^{1, 2} & \dots & a_{1, n, κ}^{1, 2} \\ a_{2, 1, κ}^{1, 2} & a_{2, 2, κ}^{1, 2} & \dots & a_{2, n, κ}^{1, 2} \\ a_{3, 1, κ}^{1, 2} & a_{3, 2, κ}^{1, 2} & \dots & a_{3, n, κ}^{1, 2} \\ \dots & \dots & \dots & \dots \\ \dots & \dots & \dots & \dots \\ \dots & \dots & \dots & \dots \\ a_{\frac{m}{4}, 1, κ}^{1, 2} & a_{\frac{m}{4}, 2, κ}^{1, 2} & \dots & a_{\frac{m}{4}, n, κ}^{1, 2} \end{matrix}]

(A8)

Appendix C

Then, the following matrices are the product of information processing from the input matrix in the

κ

th plane. First, by the Wavelets and then by the grouping in the Pooling mechanism so they will have the respective indexes of

S_{1}

and

S_{2}

.

σ_{S_{1}, S_{2}}^{κ} = [\begin{matrix} σ_{1, 1}^{κ} & σ_{1, 2}^{κ} & \dots & σ_{1, S_{2}}^{κ} \\ σ_{2, 1}^{κ} & σ_{2, 2}^{κ} & \dots & σ_{2, S_{2}}^{κ} \\ σ_{3, 1}^{κ} & σ_{3, 2}^{κ} & \dots & σ_{3, S_{2}}^{κ} \\ \dots & \dots & \dots & \dots \\ \dots & \dots & \dots & \dots \\ \dots & \dots & \dots & \dots \\ σ_{S_{1},_{1}}^{κ} & σ_{S_{1},_{2}}^{κ} & \dots & σ_{S_{1}, S_{2}}^{κ} \end{matrix}]

(A9)

ψ_{S_{1}, S_{2}}^{κ} = [\begin{matrix} ψ_{1, 1}^{κ} & ψ_{1, 2}^{κ} & \dots & ψ_{1, S_{2}}^{κ} \\ ψ_{2, 1}^{κ} & ψ_{2, 2}^{κ} & \dots & ψ_{2, S_{2}}^{κ} \\ ψ_{3, 1}^{κ} & ψ_{3, 2}^{κ} & \dots & ψ_{3, S_{2}}^{κ} \\ \dots & \dots & \dots & \dots \\ \dots & \dots & \dots & \dots \\ \dots & \dots & \dots & \dots \\ ψ_{S_{1},_{1}}^{κ} & ψ_{S_{1},_{2}}^{κ} & \dots & ψ_{S_{1}, S_{2}}^{κ} \end{matrix}]

(A10)

Now, the input matrices for the Self-Organizing Map must be formed, which we will call

Δ

, which is first named in a general way in Equation (7). Of this matrix, there will be as many as planes in

θ

, i.e.,

Δ^{κ} = [\begin{matrix} ψ_{1, 1}^{κ}, σ_{1, 1}^{κ} & ψ_{1, 2}^{κ}, σ_{1, 2}^{κ} & \dots & ψ_{1, S_{2}}^{κ}, σ_{1, S_{2}}^{κ} \\ ψ_{2, 1}^{κ}, σ_{2, 1}^{κ} & ψ_{2, 2}^{κ}, σ_{2, 2}^{κ} & \dots & ψ_{2, S_{2}}^{κ}, σ_{2, S_{2}}^{κ} \\ ψ_{3, 1}^{κ}, σ_{3, 1}^{κ} & ψ_{3, 2}^{κ}, σ_{3, 2}^{κ} & \dots & ψ_{3, S_{2}}^{κ}, σ_{3, S_{2}}^{κ} \\ \dots & \dots & \dots & \dots \\ \dots & \dots & \dots & \dots \\ \dots & \dots & \dots & \dots \\ ψ_{S_{1},_{1}}^{κ}, σ_{S_{1},_{1}}^{κ} & ψ_{S_{1},_{2}}^{κ}, σ_{S_{1},_{2}}^{κ} & \dots & ψ_{S_{1}, S_{2}}^{κ}, σ_{S_{1}, S_{2}}^{κ} \end{matrix}]

(A11)

Likewise, a weight matrix

Ω

is generated concerning the number of neurons

ρ

(Figure A1), which is the result of the mesh of indices generated by said weight matrix having the following dimensions

η

,

ξ

.

Figure A1. Overview of the distribution in the neural learning grid from an SOM.

Where each weight in

Ω

is formed by a vector that, in the case of our development, will be

ω_{i, j, 1}

and

ω_{i, j, 2}

, this is due to the conformation of matrix

Δ

(Equation (A11)). Returning to the dimensionality of the weight matrix, it is given from

ρ

=

η

*

ξ

, where

η

is the number of neurons in terms of the map to be generated through the Self-Organizing Map algorithm since it depends on the number of output neurons to be trained. Having then the following matrix

Ω = [\begin{matrix} Ω_{1, 1} & Ω_{1, 2} & \dots & Ω_{1, ξ} \\ Ω_{2, 1} & Ω_{2, 2} & \dots & Ω_{2, ξ} \\ Ω_{3, 1} & Ω_{3, 2} & \dots & Ω_{3, ξ} \\ \dots & \dots & \dots & \dots \\ \dots & \dots & \dots & \dots \\ \dots & \dots & \dots & \dots \\ Ω_{η,_{1}} & Ω_{η,_{2}} & \dots & Ω_{η, ξ} \end{matrix}]

(A12)

Ω = [\begin{matrix} ω_{1, 1, 1}, ω_{1, 1, 2} & ω_{1, 2, 1}, ω_{1, 2, 2} & \dots & ω_{1, ξ, 1}, ω_{1, ξ, 2} \\ ω_{2, 1, 1}, ω_{2, 1, 2} & ω_{2, 2, 1}, ω_{2, 2, 2} & \dots & ω_{2, ξ, 1}, ω_{2, ξ, 2} \\ ω_{3, 1, 1}, ω_{3, 1, 2} & ω_{3, 2, 1}, ω_{3, 2, 2} & \dots & ω_{3, ξ, 1}, ω_{3, ξ, 2} \\ \dots & \dots & \dots & \dots \\ \dots & \dots & \dots & \dots \\ \dots & \dots & \dots & \dots \\ ω_{η, 1, 1}, ω_{η, 1, 2} & ω_{η, 2, 1}, ω_{η, 2, 2} & \dots & ω_{η, ξ, 1}, ω_{η, ξ, 2} \end{matrix}]

(A13)

Having then a difference matrix

Λ

, which is the comparison of each neuron from its weight vector in

Ω

with each point in

Δ

as expressed in the following matrix

Λ = [\begin{matrix} ∥ Ω_{1, 1} - Δ_{1, 1}^{κ} ∥ & ∥ Ω_{1, 2} - Δ_{1, 1}^{κ} ∥ & \dots & ∥ Ω_{1, ξ} - Δ_{1, 1}^{κ} ∥ \\ ∥ Ω_{2, 1} - Δ_{1, 1}^{κ} ∥ & ∥ Ω_{2, 2} - Δ_{1, 1}^{κ} ∥ & \dots & ∥ Ω_{2, ξ} - Δ_{1, 1}^{κ} ∥ \\ ∥ Ω_{3, 1} - Δ_{1, 1}^{κ} ∥ & ∥ Ω_{3, 2} - Δ_{1, 1}^{κ} ∥ & \dots & ∥ Ω_{3, ξ} - Δ_{1, 1}^{κ} ∥ \\ \dots & \dots & \dots & \dots \\ \dots & \dots & \dots & \dots \\ \dots & \dots & \dots & \dots \\ ∥ Ω_{η, 1} - Δ_{1, 1}^{κ} ∥ & ∥ Ω_{η, 2} - Δ_{1, 1}^{κ} ∥ & \dots & ∥ Ω_{η, ξ} - Δ_{1, 1}^{κ} ∥ \end{matrix}]

(A14)

Appendix D

References

Aghdam, H.; Heravi, E. Guide to Convolutional Neural Networks: A Practical Application to Traffic-Sign Detection and Classification; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar] [CrossRef]
Alaba, S.Y.; Ball, J.E. WCNN3D: Wavelet Convolutional Neural Network-Based 3D Object Detection for Autonomous Driving. Sensors 2022, 22, 7010. [Google Scholar] [CrossRef]
Williams, T.; Li, R. Wavelet Pooling for Convolutional Neural Networks. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Bendjillali, R.; Beladgham, M.; Merit, K.; Taleb-Ahmed, A. Improved facial expression recognition based on DWT feature for deep CNN. Electronics 2019, 8, 324. [Google Scholar] [CrossRef]
Moisen, M.; Benítez-Pérez, H.; Medina, L. Ultrasonic NDT for flaws characterisation using ARTMAP network and Wavelet Analysis. Int. J. Mater. Prod. Technol. 2008, 33, 387–403. [Google Scholar] [CrossRef]
Ramanarayanan, S.; Murugesan, B.; Ram, K.; Sivaprakasam, M. DC-WCNN: A deep cascade of Wavelet-based convolutional neural Networks for MR Image Reconstruction. arXiv 2020, arXiv:2001.02397. [Google Scholar] [CrossRef]
Sermanet, P.; LeCun, Y. Traffic sign recognition with multiscale Convolutional Networks. In Proceedings of the the 2011 International Joint Conference on Neural Networks, San Jose, CA, USA, 31 July–5 August 2011; pp. 2809–2813. [Google Scholar] [CrossRef]
Solís, M.; Benítez-Pérez, H.; Rubio, E.; Medina-Gómez, L.; Moreno, E.; Gonzalez, G.; Leija, L. Pattern Classification of Decomposed Wavelet Information using ART2 Networks for Echoes Analysis. J. Appl. Res. Technol. 2008, 6, 33–44. [Google Scholar] [CrossRef]
Benítez-Pérez, H.; Medina-Gómez, L. Diverse Time-Frequency Distributions Integrated to an ART2 Network for Non-Destructive Testing. J. Appl. Res. Technol. 2008, 6, 14–32. [Google Scholar] [CrossRef]
Goodfellow, I.J.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; Available online: http://www.deeplearningbook.org (accessed on 1 January 2020).
Silik, A.; Noori, M.; Ghiasi, R.; Wang, T.; Kuok, S.-C.; Farhan, N.S.D.; Dang, J.; Wu, Z.; Altabey, W.A. Dynamic Wavelet neural network model for damage features extraction and patterns recognition. J. Civ. Struct. Health Monit. 2023, 3, 925–945. [Google Scholar] [CrossRef]
Mohebi, B.; Yazdanpanah, O.; Kazemi, F.; Formisano, A. Seismic damage diagnosis in adjacent steel and RC MRFs considering pounding effects through improved wavelet-based damage-sensitive feature. J. Build. Eng. 2021, 33, 101847. [Google Scholar] [CrossRef]
Kazemi, F.; Asgarkhani, N.; Jankowski, R. Optimisation-based stacked machine-learning method for seismic probability and risk assessment of reinforced concrete shear walls. Expert Syst. Appl. 2024, 255, 124897. [Google Scholar] [CrossRef]
Telgarsky, M. Deep Learning Theory Lecture Notes; University Illinois Urbana-Champaign: Champaign, IL, USA, 2021. [Google Scholar]
Natalia, N.; Orozco, D. El uso de la transformada wavelet discreta en la reconstrucción de señales senosoidales. Sci. Tech. 2008, 14, 381–386. [Google Scholar]
Fujieda, S.; Takayama, K.; Hachisuka, T. Wavelet Convolutional Neural Networks for Texture Classification. arXiv 2017, arXiv:1707.07394. [Google Scholar] [CrossRef]
Liu, P.; Zhang, H.; Lian, W.; Zuo, W. Multi-level Wavelet Convolutional Neural Networks. arXiv 2019, arXiv:1907.03128. [Google Scholar] [CrossRef]
Shahbahrami, A. Algorithms and architectures for 2D discrete Wavelet transform. J. Supercomput. 2012, 62, 1045–1064. [Google Scholar] [CrossRef]
Kumar, V.A.; Mishra, V.; Arora, M. Convolutional Neural Networks for Malignant and Benign Cell Classification using Dermatoscopic Images. In Proceedings of the 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), Tirunelveli, India, 4–6 February 2021; pp. 1040–1044. [Google Scholar] [CrossRef]
Molino-Minero-Re, E.; Rubio-Acosta, E.; Benítez-Pérez, H.; Brandi-Purata, J.M.; Pérez-Quezadas, N.I.; García-Nocetti, D.F. A method for classifying pre-stack seismic data based on amplitude–frequency attributes and Self-Organizing Maps. Geophys. Prospect. 2018, 66, 673–687. [Google Scholar] [CrossRef]
Nielsen, M.A. Neural Networks and Deep Learning; Springer: Cham, Switzerland, 2018. [Google Scholar]
Berner, J.; Grohs, P.; Kutyniok, G.; Petersen, P. The Modern Mathematics of Deep Learning. arXiv 2021, arXiv:2105.04026. [Google Scholar]
Caterini, A.L.; Chang, D.E. Deep Neural Networks in a Mathematical Framework, 1st ed.; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
Deisenroth, M.P.; Faisal, A.A.; Ong, C.S. Mathematics for Machine Learning; Cambridge University Press: Cambridge, UK, 2020. [Google Scholar]
Yadav, N. An Introduction to Neural Network Methods for Differential Equations; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar] [CrossRef]
Du, H.K.; Cao, J.X.; Xue, Y.J.; Wang, X.J. Seismic facies based on Self-Organizing Map and empirical mode decomposition. J. Appl. Geophys. 2015, 112, 52–61. [Google Scholar] [CrossRef]
Tian, J.; Cheng, W.; Sun, Y.; Li, G.; Jiang, D.; Jiang, G.; Tao, B.; Zhao, H.; Chen, D. Gesture recognition based on multilevel multimodal feature fusion. J. Intell. Fuzzy Syst. 2020, 38, 2539–2550. [Google Scholar] [CrossRef]
Golikov, E.A. Notes on Deep Learning Theory. arXiv 2020, arXiv:2012.05760. [Google Scholar] [CrossRef]
Othman, E.; Mahmoud, M.; Dhahri, H.; Abdulkader, H.; Mahmood, A.; Ibrahim, M. Automatic Detection of Liver Cancer Using Hybrid Pre-Trained Models. Sensors 2022, 22, 5429. [Google Scholar] [CrossRef]
Ek-Chacón, E.; Molino-Minero-Re, E.; Méndez-Monroy, P.E.; Neme, A.; Ángeles-Hernández, H. Semi-Supervised Training for (Pre-Stack) Seismic Data Analysis. Appl. Sci. 2024, 14, 4175. [Google Scholar] [CrossRef]
Jangra, M.; Dhull, S.; Singh, K.K.; Singh, A.; Cheng, X. O-WCNN: An optimised integration of spatial and spectral feature map for arrhythmia classification. Complex Intell. Syst. 2021, 9, 2685–2698. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Simple and typical CNN network structure.

Figure 2. Handling information from an information cube. This information is composed of inline planes, lines parallel to the survey direction, and crossline planes, lines perpendicular to them.

Figure 3. Diagram of the method in terms of information analysis. The stages of seismic data processing are shown to ultimately obtain a classification useful for the interpreter.

Figure 4. Diagram of wavelets. The analysis of the information cube shows high-frequency components (

D_{m}^{1, 1}

) for the first level, high-frequency components (

D_{m}^{1, 2}

) for the second level, and low-frequency components

A_{m}^{1, 2}

for the second level.

Figure 4. Diagram of wavelets. The analysis of the information cube shows high-frequency components (

D_{m}^{1, 1}

) for the first level, high-frequency components (

D_{m}^{1, 2}

) for the second level, and low-frequency components

A_{m}^{1, 2}

for the second level.

Figure 5. Study region based on the characteristics indicated about the experiment on accurate data following [20].

Figure 6. Results of the analysis based on [20] related to Case 1. This case is used as a comparison for the other cases using the Daubechies 2 wavelet.

Figure 7. Resulting SOM Map from Case 1. The number of hits per neuron is shown; the regions of interest are colored blue and pink.

Figure 8. Results of the analysis based on Case 3. In this case, the wavelet Coiflet 3 and the same sliding window are used; the result is very similar to Case 1.

Figure 9. Resulting SOM Map from Case 3. The number of hits per neuron is shown; the regions of interest are colored blue and pink. The number of hits is higher due to the number of turns from the sliding window.

Figure 10. Results of the Analysis based on Case 8. In this case, although the region of interest is shown, the figure begins a less defined line. Here, the Coiflet 2 wavelet is used.

Figure 11. Resulting SOM Map from Case 8. The number of hits per neuron is shown. The regions of interest are colored blue and pink. However, in this area, shades of brown and green are observed with less definition than desired.

Figure 12. Results of the analysis based on Case 10. Although this case shows the region of interest, the figure is less defined than in Case 8. Here, the Bior 3.5 wavelet is used.

Figure 13. Resulting SOM Map from Case 10. As in the previous cases, the number of hits per neuron is shown.

Figure 14. Results of the analysis based on Case 11. This case shows the region of interest. The wavelet used is the Daubechies 3 wavelet. The most accurate case for interpreters is where the Daubechies 2 wavelet is added.

Figure 15. Resulting SOM Map from Case 11. As in the previous cases, the number of hits per neuron is shown.

Figure 16. Results of the analysis based on CNN procedure.

Figure 17. Neuron distribution based on the CNN case study.

Table 1. This table shows the variables taken into account in the algorithm’s experiment.

Variable	Meaning
$β$	Sliding window
$γ$	Shift number from sliding window
a	Element produced by the Wavelet-named approximations
d	Element produced by the Wavelet-named details
k	Represents the plane based on the information cube to be processed
$μ$	Type of mother Wavelet
$ζ$	Level of applied Wavelet
$α$	Vector from information cube bounded by the sliding window of $β$ and $γ$
$A m$	Approximation matrix
$D m$	Detail matrix
$S_{1}$	Input columns from Pooling process
$S_{2}$	Input rows from Pooling process
$P S_{1}$	Pooling window size in terms of the columns
$P S_{2}$	Pooling division named shifting between Pooling windows
$ρ$	Number of neurons from output layer
$Ω$	Weight Matrix
$Δ_{1, 1}^{κ}$	Training Matrix
$α$	Learning rate
$η$ * $ξ$	Number of neurons $ρ$
nc	Number of features to be considered
$d i f_{i, j}$	Euclidean distance from $ω_{i, j, 1}$ and $ψ_{1, 1}^{κ}$
$Γ_{x . y}$	Minimum from $Λ$
$ϕ$	Approximation of neighboring neurons
H	Proximity matrix according to differences within the map
k	Number of planes of interest for training
RedSOME	Function generated to process training from Self-Organizing Map
RedSOMC	Function generated to process characteristics from Self-Organizing Map

Table 2. The different experiments (cases) are shown with the error obtained using the sensitive parameters

β

(sliding window),

γ

(shift number from the sliding window),

μ

(mother wavelet), and

α

(learning rate). Case 1 is used as the baseline against which the other cases are compared.

Table 2. The different experiments (cases) are shown with the error obtained using the sensitive parameters

β

(sliding window),

γ

(shift number from the sliding window),

μ

(mother wavelet), and

α

(learning rate). Case 1 is used as the baseline against which the other cases are compared.

Variables	${Case}_{1}$	${Case}_{2}$	${Case}_{3}$	${Case}_{4}$	${Case}_{5}$	${Case}_{6}$	${Case}_{7}$	${Case}_{8}$	${Case}_{9}$	${Case}_{10}$	${Case}_{11}$
$β$	10	20	10	20	10	20	20	10	5	5	5
$γ$	5	1	2	1	4	1	1	2	3	3	3
$μ$	4	1	3	3	2	5	4	2	2	1	5
$α$	0.9	0.99	0.9	0.99	0.9	0.9	0.999	0.9	0.91	0.991	0.99
error	0	0.0079	0.0095	0.0119	0.0297	0.0371	0.0612	0.092	0.1019	0.1159	0.2024

Table 3. The mother wavelets used in the different experiments are shown, each wavelet was assigned a numerical code (

μ

), which is used in Table 2 for each experimental case.

Table 3. The mother wavelets used in the different experiments are shown, each wavelet was assigned a numerical code (

μ

), which is used in Table 2 for each experimental case.

Mother Wavelet	$μ$ Variable
Bior 3.5	1
Coif 2	2
Coif 3	3
Daubechies 2	4
Daubechies 3	5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pérez-Quezadas, N.I.; Benítez-Pérez, H.; Durán-Chavesti, A. Proposal of a Methodology Based on Using a Wavelet Transform as a Convolution Operation in a Convolutional Neural Network for Feature Extraction Purposes. Algorithms 2025, 18, 221. https://doi.org/10.3390/a18040221

AMA Style

Pérez-Quezadas NI, Benítez-Pérez H, Durán-Chavesti A. Proposal of a Methodology Based on Using a Wavelet Transform as a Convolution Operation in a Convolutional Neural Network for Feature Extraction Purposes. Algorithms. 2025; 18(4):221. https://doi.org/10.3390/a18040221

Chicago/Turabian Style

Pérez-Quezadas, Nora Isabel, Héctor Benítez-Pérez, and Adrián Durán-Chavesti. 2025. "Proposal of a Methodology Based on Using a Wavelet Transform as a Convolution Operation in a Convolutional Neural Network for Feature Extraction Purposes" Algorithms 18, no. 4: 221. https://doi.org/10.3390/a18040221

APA Style

Pérez-Quezadas, N. I., Benítez-Pérez, H., & Durán-Chavesti, A. (2025). Proposal of a Methodology Based on Using a Wavelet Transform as a Convolution Operation in a Convolutional Neural Network for Feature Extraction Purposes. Algorithms, 18(4), 221. https://doi.org/10.3390/a18040221

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Proposal of a Methodology Based on Using a Wavelet Transform as a Convolution Operation in a Convolutional Neural Network for Feature Extraction Purposes

Abstract

1. Introduction

2. The Proposed Methodology

3. Structure

3.1. Convolution: Application of Wavelet

3.2. Pooling

3.3. Classification

4. Case Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

Appendix C

Appendix D

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI