Next Article in Journal
Optimal Unmanned Ground Vehicle—Unmanned Aerial Vehicle Formation-Maintenance Control for Air-Ground Cooperation
Next Article in Special Issue
Artificial Intelligence Mortality Prediction Model for Gastric Cancer Surgery Based on Body Morphometry, Nutritional, and Surgical Information: Feasibility Study
Previous Article in Journal
A Comparative Study of a Fully-Connected Artificial Neural Network and a Convolutional Neural Network in Predicting Bridge Maintenance Costs
Previous Article in Special Issue
Data Augmentation Based on Generative Adversarial Networks to Improve Stage Classification of Chronic Kidney Disease
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Texture and Materials Image Classification Based on Wavelet Pooling Layer in CNN

by
Juan Manuel Fortuna-Cervantes
1,
Marco Tulio Ramírez-Torres
2,*,
Marcela Mejía-Carlos
1,
José Salomé Murguía
3,4,
José Martinez-Carranza
5,
Carlos Soubervielle-Montalvo
6 and
César Arturo Guerra-García
2
1
Instituto de Investigación en Comunicación Óptica, Universidad Autónoma de San Luis Potosí, Álvaro Obregón 64, San Luis Potosí 78000, Mexico
2
Coordinación Académica Región Altiplano Oeste, Universidad Autónoma de San Luis Potosí, Carretera Salinas-Santo Domingo 200 Salinas, San Luis Potosí 78600, Mexico
3
Facultad de Ciencias, Universidad Autónoma de San Luis Potosí, Av. Chapultepec 1570, Priv. del Pedregal, San Luis Potosí 78295, Mexico
4
Laboratorio Nacional CI3M, Facultad de Ciencias, Universidad Autónoma de San Luis Potosí, Av. Chapultepec 1570, Priv. del Pedregal, San Luis Potosí 78295, Mexico
5
Department of Computational Science, Instítuto Nacional de Astrofísica, Óptica y Electrónica (INAOE), Puebla 72840, Mexico
6
Centro de Investigación y Estudios de Posgrado, Facultad de Ingeniería—UASLP, Av. Dr. Manuel Nava 8, Zona Universitaria, San Luis Potosí 78290, Mexico
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(7), 3592; https://doi.org/10.3390/app12073592
Submission received: 8 March 2022 / Revised: 25 March 2022 / Accepted: 30 March 2022 / Published: 1 April 2022
(This article belongs to the Special Issue Recent Advances in Deep Learning for Image Analysis)

Abstract

:
Convolutional Neural Networks (CNNs) have recently been proposed as a solution in texture and material classification in computer vision. However, inside CNNs, the internal layers of pooling often cause a loss of information and, therefore, is detrimental to learning the architecture. Moreover, when considering images with repetitive and essential patterns, the loss of this information affects the performance of subsequent stages, such as feature extraction and analysis. In this paper, to solve this problem, we propose a classification system with a new pooling method called Discrete Wavelet Transform Pooling (DWTP). This method is based on the image decomposition into sub-bands, in which the first level sub-band is considered as its output. The objective is to obtain approximation and detail information. As a result, this information can be concatenated in different combinations. In addition, wavelet pooling uses wavelets to reduce the size of the feature map. Combining these methods provides acceptable classification performance for three databases (CIFAR-10, DTD, and FMD). We argue that this helps eliminate overfitting and that the learning graphs reflect that the datasets show learning generalization. Therefore, our results indicate that our method based on wavelet analysis is feasible for texture and material classification. Moreover, in some cases, it outperforms traditional methods.

1. Introduction

In deep learning, texture and material analyses play an essential role in object classification, detection, and segmentation tasks. This type of analysis has some application areas, such as computer-aided medical diagnosis, fruit recognition using artificial intelligence, and object detection in aerial navigation with drones to mention a few.
Deep learning in the last decade has positioned itself as a new solution in the areas of robotics, computer vision, and natural language [1,2,3]. In particular, Convolutional Neural Networks (CNNs) are a category of deep learning, as they are adapted to object analysis by learning and extracting complex features [4,5]. On the other hand, although CNNs is a universal extractor, in practice, it is not clear whether CNNs can learn to perform spectral analysis—a methodology that can provide a better classification performance of textures and materials [6]. In this sense, a fusion of methods in order to address this problem is being used, combining spatial and spectral approaches in a unique architecture [7,8]. Despite the results obtained by the authors in texture classification, their architecture only merges the features that are lost with the spatial approach. In addition, it is known that regularization methods focus only on the convolutional layer. While the operations of the pooling layers have been left without an update [9]. In this sense, we can integrate wavelet analysis inside deep learning before merging spatial and spectral approaches; that is, we permit it to become part of the learning process by using the pooling method.
Motivated by the above reasons, in this study, we propose a classification system with a new pooling method called Discrete Wavelet Transform Pooling (DWTP). The pooling approach is based on image decomposition into sub-bands. The method is implemented and developed using Python and Keras API with Tensorflow as Backend. Moreover, the method is validated on three datasets: CIFAR-10, Describable Textures Dataset (DTD), and Flickr Material Database (FMD). Our approach is different from traditional methods because it is not a subsampling methodology by using neighboring regions, but wavelet pooling maintains its function as a reduction layer. Wavelets allow localization in scale (i.e., frequency) and space. In other words, wavelets can be used to analyze local and spatial transients in the data, such as edges or surfaces in an image [10]. Therefore, we can preserve the most relevant information of textures and materials, which is sometimes lost with traditional methods such as Max-Pooling (MaxP) and Ave-Pooling (AveP).
In a previous study, as support for this work in [11], we designed a CNN architecture for object detection with a repetitive pattern approach within aerial navigation as a first attempt. We argue that the characteristics at different frequencies, low and high, also affect the performance of the CNN during training. This architecture is characterized by wavelet analysis, applying multiresolution analysis to the original image. A new dataset is obtained when the image is converted to the wavelet domain. Therefore, the information improves learning performance, eliminates overfitting, and achieves higher efficiency in object detection. Based on the results from our previous work, we now present a wavelet pooling approach to improve the learning of the classification model with the following contributions:
  • We present a CNN architecture with a combination of regularization methods (DropOut, Data Augmentation, and Batch Normalization) to evaluate the performance of each pooling method: MaxP, AveP, and wavelet pooling (DWTP, DWTaP, and DWTdP). The objective is to have a reference of the learning behavior.
  • We present a complete evaluation of the classification performance of textures and materials in images, in addition to a state-of-the-art benchmark dataset. The idea is to evaluate the adaptability of deep learning with wavelet pooling. Furthermore, we argue that the methodology is ideal for this type of dataset, where it is recurrent to have repetitive patterns.
  • We show that the method eliminates the overfitting created by pooling methods while reducing features using an approach based on level-based decomposition, and it is more compact than pooling by using neighboring regions. Hence, we offer three configurations: DWTP = approximation and detail information; DWTaP = approximation information; and DWTdP = detail information. The goal is to preserve the most information for each texture and material.
  • We demonstrate that a correct inference of texture or material can be obtained if we determine the type of pooling to be used during learning. We have conducted several experiments, but now we can choose the best pooling method depending on the dataset. Our experiments indicate that this is also useful for future object detection applications, focusing on physical features such as texture.
In order to present our approach, this paper has been organized as follows: Section 2 shows related work; Section 3 introduces the methodology to address the texture and material classification problem. Section 4 shows the proposed methodology. After that, Section 5 shows the results with the three data sets, which have been used to test our approach and the experimental part. Finally, Section 6 discusses the results, and Section 7 presents the conclusions and future work.

2. Related Work

This section highlights some applications that have used the wavelet pooling layer to improve their performance. A new architecture called WaveCNN was proposed in [12]. They note that the pooling layer (Max-Pooling) in a conventional CNN does not consider the feature structure of the previous layer. Then, Max-Pooling may lose some features. Therefore, they replace the pooling layer with wavelet decomposition. They succeed in more adequately representing the features for MNIST handwritten digit classification. A novel method that combined classical CNN layers with squeeze-and-excitation modules and the Haar wavelet as a pooling layer was proposed by [13]. The main objective is the real-time classification of vehicle types. The development of the method improves the classifier’s performance by highlighting essential feature maps and decreasing the network entropy. Moreover, they propose a cross-entropy cost function and the use of DWT instead of Max-Pooling to improve the recognition rate. According to this layer, their model is named Wavelet Deep Neural Network (WDNN). Another alternative is the application of a multilevel analysis [14]. The method merges multiple wavelets transforms, as they function similarly to filters within convolutional neural networks. They show that some neighborhood methods introduce edge halos, aliasing, and blurring effects in specific datasets. Choosing the correct pooling method is key to obtaining good results. Thus, they explore the use of wavelet bases such as Haar, Coiflet, and Daubechies to perform pooling.
In semantic segmentation tasks, encoder–decoder-type networks have been used [15]. This type of CNNs usually uses pooling to reduce computational costs and improve invariances relative to certain distortions and expands the receptive field. However, pooling can result in information loss, which disruptive to later operations, such as feature extraction and analysis. Moreover, each image pixel is assigned a specific class in semantic segmentation tasks by dividing it into regions of interest. Therefore, a pooling method based on wavelet operations has been proposed to divide it into regions of interest. In [16], the authors presented an approach called 3D WaveUNet, based on wavelets and deep learning for 3D neuron segmentation. The encoder-decoder network is integrated with a 3D wavelet to segment the nerve fibers into cubes; the wavelets help the deep networks remove noise from the data and connect the broken fibers. At the end of the method, the segmented nerve fibers into cubes are assembled to generate the entire neuron. In this case, the neuron segmentation method can completely extract the target neuron in noisy neuron images. A U-Net architecture based on wavelet transform pooling is proposed in [17]. This work aims to segment multiple sclerosis (MS) lesions in magnetic resonance images (MR). One characteristic is that the first stage of the network uses the wavelet transform, and in the second stage, its inverse is used. In both stages, it highlights abrupt changes in the image and better describes the features. An advantage is its multiresolution analysis; thus, its use improves the detection of lesions of different sizes and in segmentation.
In the area of image restoration, a multilevel wavelet CNN (MWCNN) method was proposed to balance the size of the receptive area and computational efficiency [18]. The main idea is to integrate the wavelet transform within the CNN architecture to reduce feature maps. The MWCNN method is also based on a U-Net architecture and the inverse wavelet transform (IWT) for the reconstruction stage with a high resolution. In [9], they proposed another alternative called wavelet pooling as a layer inside the CNN architecture. This method decomposes the image into two sub-bands, discarding the first level to reduce the size of the feature map. The approach allows a structured compression of the data, reducing the creation of denoised edges and other defects in the image.
On the other hand, some works have employed deep learning and wavelet analysis in image processing. For example, the method proposed in [19] converted images from the CIFAR-10 and KDEF database to the wavelet domain, thus obtaining temporal and frequency features. The different representations created are added to multiple CNN architectures. This combination of information in the wavelet domain achieves a higher detection efficiency and a faster execution time than the procedure in the spatial domain.
In the automatic coding of an image, the design of the CNN architecture has a significant weight. In this case, the designed network is a Siamese convolutional neural network that receives fused information from infrared and visible images. The aim is to generate a weight map representing the saliency of each pixel. Fusion is performed by multiscale decomposition of the image using wavelet analysis, and the reconstruction result is more perceptual to the human visual system [20]. Following the same approach, the work proposed in [21] presented two methods to highlight the edges of the images in the classification area. The first method decomposes the images and subsequently reconstructs them in a limited manner. The second method that develops the enhanced images introduces local maximum wavelet coefficients. Both methods are applied before entering the CNN architecture.

3. Materials and Methods

3.1. Wavelet Analysis

Wavelets represent functions as simpler, fixed building blocks at different scales and positions [19]. The one-dimensional wavelet transform can be easily extended to a two-dimensional wavelet transform (2DWT), which is widely applied to two-dimensional signals such as images [22,23]. It has greatly impacted image processing tasks such as edge detection, image recognition, and image compression [6].

3.1.1. 2D Discrete Wavelet Transform

Given an image x, we can use 2D Discrete Wavelet Transform (2D-DWT) with four convolution filters, i.e., low-pass filter f L L and high-pass filters f L H , f H L , and f H H , to decompose x into four sub-images, i.e., x L L , x L H , x H L , and x H H [24]. Note that the four filters have fixed parameters with convolutional stride 2 during the transformation [13,18]. Taking the Haar wavelet as an example, these filters are defined in Equation (1).
f L L = + 1 + 1 + 1 + 1 , f L H = 1 1 + 1 + 1 , f H L = 1 1 1 1 , f H H = + 1 1 1 + 1
Moreover, the operation of DWT is defined in Equation (2):
x L L = f L L x 2 , x L H = f L H x 2 x H L = f H L x 2 , x H H = f H H x 2
where ⊗ denotes convolution operator, and 2 means the standard downsampling operator with factor 2. In other words, DWT mathematically involves four fixed convolution filters with stride 2 to implement the downsampling operator. Moreover, according to the theory of Haar transform [24], the ( i , j )th value of x L L , x L H , x H L , and x H H can be written in Equation (3).
x L L ( i , j ) = x ( 2 i 1 , 2 j 1 ) + x ( 2 i 1 , 2 j ) + x ( 2 i , 2 j 1 ) + x ( 2 i , 2 j ) x L H ( i , j ) = x ( 2 i 1 , 2 j 1 ) x ( 2 i 1 , 2 j ) + x ( 2 i , 2 j 1 ) + x ( 2 i , 2 j ) x H L ( i , j ) = x ( 2 i 1 , 2 j 1 ) + x ( 2 i 1 , 2 j ) x ( 2 i , 2 j 1 ) + x ( 2 i , 2 j ) x H H ( i , j ) = x ( 2 i 1 , 2 j 1 ) x ( 2 i 1 , 2 j ) x ( 2 i , 2 j 1 ) + x ( 2 i , 2 j )
Given that the derivability of the Haar transform is a good property for end-to-end backpropagation [25], Haar wavelet decomposition is used as a pooling layer in the proposed structure.

3.1.2. Pooling Method

The pooling method is described by some authors as a subsampling methodology [12,26,27,28]. The pooling method also transforms the activation map into a new feature map. The pooling operation works on small regions of size P q × P q , usually after each convolutional layer. The pooling method has two main purposes. The first is to reduce the number of parameters and, thus, reduce the computational cost. The second is to control overfitting [9,17]. The expectation is that an ideal pooling method extracts only useful information and discards irrelevant details [13]. In general, pooling takes two forms that are most commonly used: Max-Pooling (MaxP) and Ave-Pooling (AveP) [9,28,29,30,31,32,33].
These forms of pooling are deterministic, efficient, and simple but have shortcomings that hinder the learning potential of CNN. Depending on the data, Max-Pooling can erase details from an image [9]. Hence, this happens if important details have less intensity than insignificant details. Moreover, it generates noise accumulation, and it is not possible to restore lost information [16]. Moreover, Max-Pooling is sensitive to overfitting the dataset used for training and hinders generalization [34]. Average pooling, depending on the data, can dilute the relevant details of an image. Averaging data with values far below important details cause this action [9]. Figure 1 illustrates these shortcomings with the example of a toy image.
If we consider DWT filters as convolutional filters with predefined weights, then we can observe that DWT is a particular case of FCN (Fully Connected Layers) without the layers of nonlinearity. The original image can be decomposed by DWT and then reconstructed exactly by the DWT inverse without losing information [18]. On the other hand, the wavelet theory opens the possibility to represent the image details inside learning CNNs, thanks to the frequency and location features generated by the wavelet transform (see Figure 1 [17]).

3.2. Network Training and Parameter Setting

The algorithms are implemented and developed using the Python language and Keras API with Tensorflow as Backend. Keras is one of the deep learning frameworks with tools to create classification models. Moreover, it is an open-source project, and its manner of programming is sequential through blocks [35]. The hardware specifications of the training device are an Intel® Core™ i7 processor with an NVIDIA GeForce RTX™ 2080 graphics card, 12 GB of RAM, and Ubuntu 18.04 64-bit operating system.
The base architecture is the VGG network, and it is one of the first deep models with good results in a large-scale visual recognition challenge (ILSVRC-2014) with 92.7% top-5 accuracy [36]. This architecture is designed to facilitate the creation of a classification model—three convolutional blocks with their pooling layer and one classification stage. The process is as follows: using the base VGG architecture, combined with the preprocessed CIFAR-10, DTD, and FMD datasets, through supervised learning. Before training our CNN, the loss function and the optimizer need to be specified. These parameters determine how the network weights should be updated during the training process. In order to compile the network with Keras, we use the compile() function. Training a CNN means finding the best set of weights to map the inputs (images) to the outputs (labels) in the training dataset and, at the same time, in the validation dataset. Training is processed over epochs. An epoch is an iteration through all samples of the training dataset. Moreover, it is common for an epoch to be split into minibatches. Each minibatch consists of one or more samples. After each batch iteration, the weights of the network will be updated. In order to train the network with Keras, we use the fit() function. The training parameters for the proposed models are listed in Table 1.
We perform a complete analysis with each of the proposed pooling methods: MaxP, AveP, DWTP, DWTaP, and DWTdP. In addition, we combine them with the regularization methods DropOut [37,38], Data Augmentation [17,39], and Batch Normalization [40,41]. In this manner, a learning model is obtained, and we can predict the objects, textures, and materials in the dataset (test) images with better accuracy.

3.3. Benchmark Dataset

In classification tasks, the model must be evaluated on a dataset. We have performed our experiments on three datasets. The first dataset is CIFAR-10 [42], the second one is the Describable Textures Dataset (DTD) [43], and the last one is Flickr Material Database (FMD) [44]. CIFAR-10 consists of 60,000 images of 32 × 32 pixels of ten different objects. DTD contains 47 classes of 120 images in the wild. This dataset is developed in different uncontrolled conditions. Initially, it includes 40 training images, 40 validation images, and 40 test images for each class. Finally, FMD is built with standard materials. It has ten classes of 100 images, and each image is hand-picked from Flickr.com (under Creative Commons license) to ensure a variety of lighting conditions, compositions, colors, texture, and material subtypes.
A good practice is to split our dataset using the Hold-Out Cross-Validation sampling technique [35]. The technique is used to test the model’s predictive performance and how well it performs on the test or unseen data. The dataset is initially separated into two sets: training and test; then, the training set is split into two subsets: training and validation. The idea is that each set contains representative images of each class. Therefore, it is achieved to have balanced sets and random. In the case of CIFAR-10, the test set is initially left with approximately 16.66%, and the training set is divided into two subsets with the same distribution of images: training 80% and validation 20%. For DTD and FMD, the distribution is different because the dataset is small. The test set contains 15% of the data. Therefore, the rest is divided into the training subset with 82% and the validation subset with 18% of the data.
The images have dimensions of 224 × 224 pixels, except for the CIFAR-10 dataset, which has dimensions of 32 × 32 pixels. Following convention, it is helpful to normalize the pixel values to a range of 0 to 1 for our model to converge quickly because the inputs with large integer values can slow down the learning process. The number of images per class is shown in Table 2. As observed, the last two datasets have a few images, but one advantage is that they have a balance between the number of images per class.

3.4. Evaluation Index

To quantitatively evaluate the classification model based on the combination of deep neural networks with pooling methods, this paper adopts the metrics Accuracy, Recall, Precision, F1, and the confusion matrix to evaluate the classification index [45]. Accuracy measures the percentage of cases that the model predicted correctly. In this case, it functions well because the classes are correctly balanced. The indicators are calculated from Equations (4)–(7):
Acc = T P Total number of images
P = T P T P + F P
R = T P T P + F N
F 1 = 2 * P * R P + R
where T P is the number of positive samples correctly predicted, and F P is the number of samples where negative samples are predicted as positive. F N is the number of positive samples that are predicted as negative samples. The Scikit learn library provides us with a classification report to evaluate the quality of the predictions of a classification algorithm. The method shows us the main classification metrics (classification_report).

4. Proposed Method

The design of an effective model for texture and material classification considers several issues: CNN architecture, dataset, regularization methods, model accuracy, and information pooling. The proposed wavelet pooling method mainly focuses on improving the model’s classification performance. Moreover, the wavelet pooling method reduces the artifacts that result during a dimension reduction in feature maps. Our approach preserves the significant features that traditional methods cannot retain. To evaluate our approach (DWTP) and to have the effect of each pooling method concerning the dataset, we outline the main steps below:
  • We decided to involve digital images containing mainly textures and materials for the CNN training. Textures and materials are key features for evaluating the pooling method against a loss of information with repetitive patterns.
  • Each dataset being evaluated is divided into three parts: training, validation, and test. A higher distribution percentage for the training set and the remaining percentages for the validation and test sets are similar. This is a good practice in state-of-the-art CNNs [35].
  • An approximate version of the VGG16 architecture is used in the CNN design but with only three convolutional blocks. In addition, a classification block is proposed for our research case. The training hyperparameters are described in Table 1.
  • The configuration for pooling inside each convolutional block of the CNN (Block.CX) permits a reduction in the feature map. This initial configuration depends on the selection of the pooling method. Therefore, we have at our disposal Ave and Max Pooling, the proposed DWTP method, and the complementary versions DWTaP and DWTdP.
  • The evaluation stage includes the analysis of the classifier with the accuracy metric because it allows us to evaluate the performance of the model and its learning behavior.
  • Finally, we use regularization methods to improve the performance of the model.
The main contribution is to perform pooling (as a layer) inside the CNN using a level-based decomposition approach. Hence, the proposed approach (DWTP) concatenates the sub-images x L L , x L H , x H L and x H H , given Equation (3). From this approach, we obtain two configurations. The first configuration (DWTaP) uses only the first level approximation sub-band x L L , and the second approach (DWTdP) uses all the first level detail sub-bands. The traditional methods (AveP–MaxP) are implemented with the Keras and TensorFlow methods. The diagram of the proposed methodology is presented in Figure 2.

5. Experimental Results

The different classification models created allow us to analyze the contribution of wavelet pooling; in this case, we can analyze images with objects, textures, and materials. We can also observe the learning curve of the proposed pooling methods. Furthermore, we incorporate regularization methods for image classification to improve the model’s learning capability. The experiments obtained using the three proposed regularization techniques are shown in Figure A1 and Figure A2 in Appendix A, based on the VGG architecture and the pooling method. In this manner, a complete analysis of the performance of the classifier is provided.

5.1. Model Training Results and Analysis

In order to perform efficiency testing of each pooling method on each dataset, we use an initial configuration where each pooling layer inside the architecture has only one pooling method at a time. All pooling methods use a 2 × 2 window to perform the comparison with the proposed method.

5.1.1. Image Classification CIFAR-10

The first dataset we used is CIFAR-10, with a set of 60,000 images. Table 3 shows that our proposed method outperforms all methods. In this sense, the DWTaP combination uses only the approximation information. In addition, we retain the number of parameters to be trained. Figure 3 shows the learning curves of the pooling methods for CIFAR-10. In this case, it is observed that MaxP and DWTaP resist overfitting; moreover, it shows a slower tendency to learn in both sets. AveP maintains a consistent learning progression in both sets, but accuracy does not improve after epoch 50. In DWTP, it shows the smoothest drop-in learning. It also achieves the best accuracy performance for the training set. DWTdP shows a rapid decrease during learning, which does not resist overfitting after epoch 70.
The correlation of each class with their actual and predicted label for each model is shown in Figure A3 in Appendix B, which shows the multiple confusion matrix. Moreover, the classification report with the evaluation metrics for CIFAR-10 is shown in Table A1 of Appendix C.

5.1.2. Image Classification with Textures DTD

The second dataset we use is DTD, with 47 classes of different textures. Note that it has only 120 images for each category, which may cause overfitting in the model. Thus, the proposed method is also a solution when you have a small dataset. In this case, we performed two experiments by varying the training optimizer. First, we use SGD as the optimizer. Table 4 shows that our proposed DWTP method using its DWTaP configuration outperforms all the methods. In addition, we retain similitude in all three sets: training (37.40%), validation (31.17%), and test (34.16%). The DWTaP model obtained with this configuration is shown in Figure A4 of Appendix B, which shows the correlation of each class with its actual and predicted label. Based on this result, we decided to use the following optimizer to improve classification performances.
In the second experiment, we use an Adam-an extension of stochastic gradient descent. Table 5 shows that our proposed DWTaP method and MaxP exhibit the best classification performance on all three data sets. In this case, we consider a change in the optimizer that resulted in an essential factor for learning MaxP. Figure 4 shows the learning curves of the pooling methods for DTD. Here, MaxP shows a smooth learning decay and similar behavior between the two sets. It also resists overfitting, managing to have good accuracy performance. AveP and DWTP maintain a consistent learning progression, and their validation sets progress at a similar rate but does not resist overfitting. The learning rate of DWTaP resists overfitting in both sets, achieving one of the best accuracy performances. DWTdP shows a slow learning behavior; thus, the learning rate does not improve after epoch 28.
The DWTaP and MaxP learning models obtained with this configuration are shown in Figure A5 of Appendix B, which summarizes the level of success of the classification model predictions. Moreover, the classification reports obtained with both configurations (SGD and Adam) for DTD are shown in Appendix C and Table A2 and Table A3.

5.1.3. Image Classification with Materials FMD

The third dataset we used is FMD with ten classes of different materials. Moreover, it is a small dataset since it only has 100 images per class. Likewise, we performed two experiments: the first with the SGD optimizer and the second with the Adam optimizer. Table 6 shows that our proposed DWTP method using its DWTdP configuration outperforms all methods. In addition, we retain a similitude in the three sets: training (16.87%), validation (18.67%), and test (14.00%). The DWTdP model obtained with this configuration is shown in Figure A6 of Appendix B, which shows the correlation of each class with its actual and predicted label.
The change of the optimizer, in this case, was beneficial for AveP learning. Table 7 shows that our proposed DWTdP method and AveP exhibit the best classification performance on all three datasets. Figure 5 shows the learning curves of the pooling methods for FMD. In this case, MaxP shows a smooth learning descent and similar behavior between the two sets, but after epoch 22, it does not resist overfitting. AveP achieves the best performance at epoch 17, avoiding overfitting in the following epochs. DWTP and DWTaP maintain a consistent learning progression, and their validation sets progress at a similar rate but does not resist overfitting. DWTdP shows a slow learning trend in the early epochs, but after epoch 15, the learning rate improves, and the sets evolve at a similar rate, achieving good accuracy performance.
The DWTdP and AveP learning models obtained with this configuration are shown in Figure A7 of Appendix B, which summarizes the level of success of the classification model predictions. Moreover, the classification reports obtained with both configurations (SGD and Adam) for FMD are shown in Table A4 and Table A5 of Appendix C.

6. Discussion

Even though CNNs have established their position in image analysis and the different elements that are considered to improve classification performance and that these are well known in the literature, only a few experiments have been conducted by taking into account the pooling layers. In Figure 3, Figure 4 and Figure 5, we illustrate the learning behavior of each model and for each pooling method. From the Figures, it is clear that the DWTP versions of the model behavior in both training sets are uniformly distributed. That is, the learning curve remains stable and shows a similar generalization in all three training sets. When the optimizer change is proposed, the results are very similar for the DWTP versions. Moreover, it achieves increased classification performance for both the proposed version and the traditional methods.
Furthermore, Table 8 contains the results of comparison of our proposals with other methods proposed by Fujieda et al. [7] and Andrearczyk et al. [8], where the accuracy rate for the models trained from scratch on the DTD dataset is evaluated. The bold values shown in Table 8 indicate that our results are quite comparable with those of the other methods. Moreover, it shows the number of synaptic weights to be trained. The results show that our proposals are computationally lightweight. In a general manner, we can observe the algorithm’s efficiency for CIFAR-10 in Table 3. Currently, this dataset can be compared in the literature because it is one of the most important in the Deep Learning area. As for FMD, we can mention that there are algorithms with a performance above that obtained; however, it differs from the central concept in combining both approaches and considering the wavelet pooling method.
On the other hand, Table 4, Table 5, Table 6 and Table 7 show that the loss metric achieves a high index in the training sets compared to Table 3; this learning behavior is because the sets being evaluated are different. In this case, we have a CIFAR-10 with more than 1000 images per class, unlike for the sets with small data such as DTD and FMD. Therefore, the size of the dataset is one more parameter to consider for the contribution of our research, where overfitting is prevented, and we can maintain a similitude in the accuracy of the model.
In this context, it is observed that we analyzed the impact of considering DWTP and its different configurations inside CNN learning through the different experiments. Our main observations are as follows: (a) To consider a DWTP configuration in the learning stage that presents a learning uniformity; (b) the use of a DWTP configuration to reduce the number of features is desirable to preserve relevant information; and (c) although some tests upon optimizer change have a good response towards other methods, the DWTP method also increases its classification performance. However, we note that this approach depends on the dataset’s type.

7. Conclusions

We have presented wavelet pooling (DWTP, DWTaP, and DWTdP), a pooling method capable of preserving useful information to improve the classification performance of textures and materials in images. Wavelet pooling is introduced inside the proposed VGG architecture as a layer. This layer performs the same function as the traditional methods; however, the difference is that instead of using a subsampling technique on neighborhood regions, this technique is based on the multilevel decomposition of the input image using wavelet analysis. As a result, four new subsets of features contribute to model learning: approximation, vertical details, diagonal details, and horizontal details.
We demonstrate that the wavelet pooling method achieves acceptable classification performance. Moreover, wavelet pooling achieves matching and outperforms some traditional methods used in CNN learning. Our proposed method outperforms all others on the CIFAR-10 dataset with 89.70% on the test set. The DTD dataset shares a similar performance when changing the optimizer with 43%. In the case of the FMD set, the performance achieved was 22% in the detailed version and 30% with the Ave method, possessing similarities in its three training sets. The integration of DropOut, Data Augmentation, and Batch Normalization also positively reacts to the proposed methods, improving the classification performance.
The proposed methodology in its decomposition stage can result in a better reduction in image features. In addition, sub-bands at different levels can be considered in learning and could result in better accuracy. The results show that some methods perform better than others depending on the dataset, hyperparameter configurations, and the design of the CNN architecture.
On the other hand, CNN is characterized by the random aspect in the election of filters of the convolution layers. Therefore, as a further investigation, we can add stability in the selected filters inside the pooling layer.
This approach will allow us in the future to test other texture features and change the wavelet base to analyze which base works best for pooling. Moreover, the proposed architecture and pooling method can be applied in pattern recognition, classification tasks, and object detection in aerial robotics. Therefore, this is ideal for designing an object classification system for aerial navigation, where the main feature is the analysis of repetitive patterns such as textures. Furthermore, we will investigate possible methods to improve the architecture in order to reduce computational costs while preserving classification performances.

Author Contributions

Conceptualization, M.T.R.-T.; data curation, J.M.F.-C.; formal analysis, J.M.F.-C.; investigation, J.M.F.-C. and M.T.R.-T.; methodology, J.M.F.-C. and M.T.R.-T.; project administration, M.M.-C.; software, J.M.F.-C.; supervision, M.T.R.-T., M.M.-C., J.S.M., J.M.-C. and C.S.-M.; validation, M.M.-C. and J.M.-C.; writing—original draft, J.M.F.-C., M.T.R.-T. and M.M.-C.; writing—review and editing, J.S.M., J.M.-C., C.S.-M. and C.A.G.-G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by CONACYT through grant “Convocatoria de Ciencia Básica y Ciencia de Frontera 2022”, project ID 320036, and project CB 2017–2018 A1-S-45697.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The first author is thankful toward Consejo Nacional de Ciencia y Tecnología (CONACYT) for scholarship No. 776118.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CNNConvolutional Neural Network
DWTPDiscrete Wavelet Transform Pooling
2DWTTwo-Dimensional Wavelet Transform
DWTaPDiscrete Wavelet Transform Approximation Pooling
DWTdPDiscrete Wavelet Transform Details Pooling
DTDDescribable Textures Dataset
FMDFlickr Material Database

Appendix A. Training Process Using Regularization Techniques and Pooling

Appendix A.1. DTD Dataset

Figure A1. Learning behavior for baseline architecture + pooling, increasing DropOut, Data Augmentation, and Batch Normalization—DTD dataset.
Figure A1. Learning behavior for baseline architecture + pooling, increasing DropOut, Data Augmentation, and Batch Normalization—DTD dataset.
Applsci 12 03592 g0a1

Appendix A.2. FMD Dataset

Figure A2. Learning behavior for baseline architecture + pooling, increasing DropOut, Data Augmentation, and Batch Normalization—FMD dataset.
Figure A2. Learning behavior for baseline architecture + pooling, increasing DropOut, Data Augmentation, and Batch Normalization—FMD dataset.
Applsci 12 03592 g0a2

Appendix B. Multiple Confusion Matrix

The multiple confusion matrix is an N × N table that summarizes the level of success in the predictions of a classification model: that is, the correlation between the label and the classification of the model.

Appendix B.1. CIFAR-10 Dataset

Figure A3. In this case, each confusion matrix correlates with the five models obtained for the CIFAR-10 dataset.
Figure A3. In this case, each confusion matrix correlates with the five models obtained for the CIFAR-10 dataset.
Applsci 12 03592 g0a3

Appendix B.2. DTD Dataset

Figure A4. Experiment 1 with SGD Optimizer—the confusion matrix correlates with the best model (DWTaP) obtained for the DTD dataset.
Figure A4. Experiment 1 with SGD Optimizer—the confusion matrix correlates with the best model (DWTaP) obtained for the DTD dataset.
Applsci 12 03592 g0a4
Figure A5. Experiment 2 with Adam Optimizer—the confusion matrix correlates with the two best models (MaxP and DWTaP) for the DTD dataset.
Figure A5. Experiment 2 with Adam Optimizer—the confusion matrix correlates with the two best models (MaxP and DWTaP) for the DTD dataset.
Applsci 12 03592 g0a5aApplsci 12 03592 g0a5b

Appendix B.3. FMD Dataset

Figure A6. Experiment 1 with SGD Optimizer—The confusion matrix correlates with the best model (DWTdP) obtained for the FMD dataset.
Figure A6. Experiment 1 with SGD Optimizer—The confusion matrix correlates with the best model (DWTdP) obtained for the FMD dataset.
Applsci 12 03592 g0a6
Figure A7. Experiment 2 with Adam Optimizer—the confusion matrix correlates with the two best models (AveP and DWTdP) for the DTD dataset.
Figure A7. Experiment 2 with Adam Optimizer—the confusion matrix correlates with the two best models (AveP and DWTdP) for the DTD dataset.
Applsci 12 03592 g0a7

Appendix C. Classification Report with Evaluation Metrics

Appendix C.1. CIFAR-10 Dataset

Table A1. Classification report for CIFAR-10 dataset. In this case, each pooling method is evaluated considering DropOut, Data Augmentation, and Batch Normalization.
Table A1. Classification report for CIFAR-10 dataset. In this case, each pooling method is evaluated considering DropOut, Data Augmentation, and Batch Normalization.
MethodMaxPAvePDWTPDWTaPDWTdP
ClassPRF1PRF1PRF1PRF1PRF1Test
airplane0.890.920.900.860.930.890.920.900.900.900.900.900.860.900.881000
automobile0.920.970.950.930.960.940.940.970.950.930.970.950.920.960.941000
bird0.870.840.860.870.830.850.850.850.850.880.830.860.830.800.811000
cat0.810.750.780.780.770.770.810.770.780.800.790.790.750.720.731000
deer0.880.900.890.890.860.870.890.880.890.860.920.890.840.890.861000
dog0.850.820.830.880.780.830.840.840.840.860.830.840.820.750.781000
frog0.870.950.910.880.780.830.890.950.920.920.940.930.880.910.891000
horse0.920.920.920.880.940.910.940.930.930.930.940.930.900.910.901000
ship0.960.910.940.950.920.930.940.940.940.950.930.940.940.920.931000
truck0.940.930.940.910.930.920.940.940.940.930.940.930.930.920.931000
Acc 0.89 0.89 0.89 0.90 0.8710,000

Appendix C.2. DTD Dataset

Table A2. Experiment 1 with SGD Optimizer—classification report for the DTD dataset.
Table A2. Experiment 1 with SGD Optimizer—classification report for the DTD dataset.
MethodMaxPAvePDWTPDWTaPDWTdP
ClassPRF1PRF1PRF1PRF1PRF1Test
band0.670.440.530.750.500.600.710.560.630.830.560.670.120.060.0818
blot0.000.000.000.330.060.100.080.060.060.130.110.120.000.000.0018
brai0.170.060.080.130.110.120.290.110.160.100.060.070.110.060.0718
bubb0.170.060.080.640.390.480.300.330.320.200.170.180.100.110.1118
bump0.730.440.550.430.170.240.000.000.001.000.170.290.000.000.0018
cheq0.480.670.560.750.500.600.650.610.630.690.500.580.330.330.3318
cobw0.480.670.560.680.720.700.520.720.600.610.610.610.120.060.0818
crac0.450.280.340.330.440.380.330.440.380.290.330.310.110.110.1118
cros0.160.440.240.360.560.430.240.500.330.270.670.380.000.000.0018
crys0.350.330.340.600.500.550.530.440.480.290.330.310.110.110.1118
dott0.800.220.350.150.110.130.420.280.330.500.280.360.000.000.0018
fibr0.170.280.210.410.390.400.350.440.390.300.390.340.120.330.1818
flec0.150.440.220.130.390.190.180.170.170.170.560.260.100.170.1318
frec0.430.560.490.640.780.700.930.780.850.500.720.590.210.670.3218
fril0.210.220.220.530.440.480.600.500.550.360.280.310.000.000.0018
gauz0.380.170.230.320.330.320.390.390.390.450.280.340.100.220.1418
grid0.140.060.080.360.440.400.400.560.470.400.560.470.000.000.0018
groo0.180.330.230.170.390.240.320.610.420.220.440.290.000.000.0018
hone0.500.060.100.500.170.250.420.280.330.000.000.000.000.000.0018
inte0.360.280.310.400.440.420.380.440.410.830.280.420.170.060.0818
knit0.130.500.210.430.560.490.570.440.500.250.560.340.000.000.0018
lace0.140.280.180.290.330.310.220.440.300.240.720.360.070.170.1018
line0.410.720.520.520.670.590.530.560.540.710.560.630.110.060.0718
marb0.210.330.260.430.330.380.230.280.250.250.170.200.060.110.0718
matt0.290.440.350.350.390.370.400.330.360.360.280.310.070.220.1118
mesh0.500.060.100.500.170.250.500.440.470.200.060.090.170.060.0818
pais0.190.220.210.340.720.460.310.560.400.330.440.380.500.060.1018
perf0.400.220.290.460.330.390.350.440.390.460.330.390.500.060.1018
pitt0.240.330.280.000.000.000.250.220.240.240.220.230.000.000.0018
plea0.600.170.260.300.330.320.370.560.440.750.170.270.000.000.0018
polk0.500.170.250.420.560.480.620.560.590.560.280.370.000.000.0018
poro0.050.060.050.500.280.360.130.110.120.170.220.190.070.170.1018
poth0.260.440.330.380.670.480.450.720.550.310.890.460.070.390.1218
scal0.000.000.000.330.110.170.220.110.150.000.000.000.120.060.0818
smea0.000.000.000.220.110.150.140.110.120.200.060.090.000.000.0018
spir0.500.110.180.200.110.140.300.170.210.250.220.240.360.220.2818
spri0.750.170.270.500.330.400.500.330.400.330.110.171.000.060.1118
stai0.100.170.120.170.060.080.250.170.200.220.280.240.220.330.2718
stra0.310.500.380.380.670.480.400.560.470.240.280.260.000.000.0018
stri0.810.720.760.550.670.600.730.610.670.750.670.710.250.280.2618
stud0.640.500.560.600.670.630.550.610.580.640.390.480.200.500.2818
swir0.500.110.180.290.220.250.000.000.000.000.000.000.000.000.0018
vein0.250.280.260.430.330.380.460.330.390.750.170.270.140.280.1918
waff0.550.670.600.790.610.690.520.780.620.520.670.590.250.560.3418
wove0.250.220.240.400.560.470.450.500.470.390.500.440.140.220.1718
wrin0.000.000.000.300.170.210.200.060.090.290.110.160.000.000.0018
zigz0.180.110.140.300.390.340.270.220.240.420.440.430.000.000.0018
Acc 0.27 0.39 0.39 0.34 0.13846
Table A3. Experiment 2 with Adam Optimizer—classification report for the DTD dataset.
Table A3. Experiment 2 with Adam Optimizer—classification report for the DTD dataset.
MethodMaxPAvePDWTPDWTaPDWTdP
ClassPRF1PRF1PRF1PRF1PRF1Test
band0.650.720.680.710.670.690.820.780.800.750.500.600.210.330.2618
blot0.500.060.100.100.060.070.220.110.150.330.220.270.000.000.0018
brai0.230.170.190.000.000.000.160.170.160.250.110.150.000.000.0018
bubb0.540.390.450.430.170.240.400.440.420.470.390.420.000.000.0018
bump0.180.170.170.180.110.140.250.060.090.380.170.230.000.000.0018
cheq0.650.720.680.600.670.630.930.720.810.760.720.740.420.280.3318
cobw0.650.830.730.670.890.760.610.940.740.520.780.620.290.110.1618
crac0.390.610.480.550.610.580.770.560.650.380.560.450.330.110.1718
cros0.390.610.480.230.500.310.400.440.420.340.610.440.000.000.0018
crys0.590.560.570.470.500.490.440.780.560.420.440.430.140.280.1918
dott0.620.280.380.450.280.340.360.220.280.500.280.360.000.000.0018
fibr1.000.330.500.530.500.510.650.610.630.600.500.550.090.170.1218
flec0.320.390.350.150.280.200.300.390.340.320.330.320.210.220.2218
frec0.880.830.860.740.780.760.880.830.860.680.830.750.170.440.2518
fril0.590.560.570.580.390.470.530.560.540.800.440.570.080.060.0618
gauz0.380.280.320.390.390.390.280.280.280.580.390.470.120.280.1718
grid0.560.560.560.560.560.560.410.670.510.330.440.380.000.000.0018
groo0.250.390.300.320.500.390.370.390.380.270.440.330.000.000.0018
hone0.290.390.330.470.440.460.240.280.260.330.330.330.000.000.0018
inte0.360.560.430.410.390.400.500.670.570.530.500.510.250.060.0918
knit0.450.560.500.360.670.470.500.500.500.500.440.470.170.280.2118
lace0.330.440.380.370.390.380.500.560.530.240.440.310.120.390.1918
line0.650.610.630.570.720.630.720.720.720.620.720.670.440.390.4118
marb0.430.170.240.260.280.270.430.330.380.360.220.280.100.170.1218
matt0.480.610.540.410.390.400.570.440.500.500.170.250.080.280.1218
mesh0.330.220.270.430.330.380.550.330.410.430.330.380.200.110.1418
pais0.450.500.470.410.720.520.480.670.560.430.500.460.290.220.2518
perf0.330.390.360.500.500.500.350.440.390.340.560.430.000.000.0018
pitt0.140.060.080.250.280.260.130.110.120.150.110.130.170.060.0818
plea0.530.500.510.440.390.410.320.440.370.370.390.380.250.110.1518
polk0.480.670.560.500.610.550.440.440.440.530.560.540.000.000.0018
poro0.360.280.310.170.060.080.290.280.290.170.280.210.060.110.0818
poth0.530.500.510.450.560.500.650.610.630.540.720.620.110.500.1818
scal0.290.330.310.330.220.270.460.670.550.350.330.340.000.000.0018
smea0.000.000.000.080.060.060.120.060.080.180.110.140.000.000.0018
spir0.190.170.180.230.170.190.500.280.360.280.280.280.330.110.1718
spri0.560.280.370.570.220.320.560.500.530.500.330.400.000.000.0018
stai0.560.280.370.320.330.320.570.440.500.330.330.330.170.220.1918
stra0.560.560.560.380.440.410.590.560.570.470.500.490.000.000.0018
stri0.440.610.510.600.670.631.000.670.800.600.830.700.260.440.3318
stud0.440.610.510.600.670.630.610.610.610.730.610.670.380.330.3518
swir0.330.440.380.430.330.380.410.390.400.350.390.370.000.000.0018
vein0.220.280.240.300.330.320.450.280.340.370.390.380.120.280.1718
waff0.670.670.670.710.560.630.600.670.630.630.670.650.190.560.2918
wove0.430.670.520.350.440.390.450.720.550.380.280.320.240.280.2618
wrin0.500.110.180.400.110.170.330.440.380.310.220.260.000.000.0018
zigz0.360.560.430.450.560.500.600.500.550.470.500.490.040.060.0518
Acc 0.43 0.42 0.48 0.43 0.15846

Appendix C.3. FMD Dataset

Table A4. Experiment 1 with SGD Optimizer—classification report for the FMD dataset.
Table A4. Experiment 1 with SGD Optimizer—classification report for the FMD dataset.
MethodMaxPAvePDWTPDWTaPDWTdP
ClassPRF1PRF1PRF1PRF1PRF1Test
fabric0.000.000.000.070.070.070.000.000.000.290.130.180.140.070.0915
foliage0.000.000.001.000.330.501.000.730.850.730.730.730.310.530.3915
glass0.000.000.000.330.070.110.120.070.090.380.200.260.000.000.0015
leather0.120.800.210.170.330.230.170.200.180.170.470.250.100.200.1315
metal0.000.000.000.110.200.140.190.200.190.170.130.150.000.000.0015
paper0.430.400.410.150.130.140.300.200.240.000.000.000.130.470.2115
plastic0.000.000.001.000.130.170.430.200.270.330.130.190.180.130.1515
stone0.120.070.090.220.130.170.220.400.290.300.200.240.000.000.0015
water0.040.070.050.600.600.600.330.470.390.470.530.500.000.000.0015
wood0.000.000.000.210.470.290.250.400.310.250.530.340.000.000.0015
Acc 0.13 0.25 0.29 0.31 0.14150
Table A5. Experiment 2 with Adam Optimizer—classification report for the FMD dataset.
Table A5. Experiment 2 with Adam Optimizer—classification report for the FMD dataset.
MethodMaxPAvePDWTPDWTaPDWTdP
ClassPRF1PRF1PRF1PRF1PRF1Test
fabric0.060.070.060.200.070.100.060.070.060.110.130.120.000.000.0015
foliage0.820.600.690.880.470.610.920.730.811.000.800.890.560.670.6115
glass0.500.070.120.320.530.400.290.130.180.200.200.200.000.000.0015
leather0.120.130.120.170.270.210.130.130.130.210.330.260.000.000.0015
metal0.140.130.140.000.000.000.000.000.000.250.070.110.000.000.0015
paper0.250.130.170.250.400.310.310.330.320.170.070.100.170.330.2215
plastic0.250.070.110.330.200.250.330.200.250.360.270.310.110.070.0815
stone0.330.600.430.360.330.340.300.600.400.170.270.210.220.530.3115
water0.350.800.490.420.330.370.470.530.500.500.400.440.270.470.3415
wood0.250.270.260.220.400.290.350.400.380.220.330.260.140.130.1415
Acc 0.29 0.30 0.31 0.29 0.22150

References

  1. Perez, L.O.R.; Carranza, J.M. Autonomous Drone Racing with an Opponent: A First Approach. Comput. Sist. 2020, 24, 1271–1279. [Google Scholar]
  2. Alcalá-Rmz, V.; Maeda-Gutiérrez, V.; Zanella-Calzada, L.A.; Valladares-Salgado, A.; Celaya-Padilla, J.M.; Galván-Tejada, C.E. Convolutional Neural Network for Classification of Diabetic Retinopathy Grade. In Proceedings of the Mexican International Conference on Artificial Intelligence, Mexico City, Mexico, 12–17 October 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 104–118. [Google Scholar]
  3. Tapia-Téllez, J.M.; Escalante, H.J. Data Augmentation with Transformers for Text Classification. In Proceedings of the Mexican International Conference on Artificial Intelligence, Mexico City, Mexico, 12–17 October 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 247–259. [Google Scholar]
  4. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
  5. Bengio, Y.; Goodfellow, I.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2017; Volume 1. [Google Scholar]
  6. Vassilieva, N.S. Content-based image retrieval methods. Program. Comput. Softw. 2009, 35, 158–180. [Google Scholar] [CrossRef]
  7. Fujieda, S.; Takayama, K.; Hachisuka, T. Wavelet convolutional neural networks. arXiv 2018, arXiv:1805.08620. [Google Scholar]
  8. Andrearczyk, V.; Whelan, P.F. Using filter banks in convolutional neural networks for texture classification. Pattern Recognit. Lett. 2016, 84, 63–69. [Google Scholar] [CrossRef] [Green Version]
  9. Williams, T.; Li, R. Wavelet pooling for convolutional neural networks. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  10. Mallat, S.G. Multifrequency channel decompositions of images and wavelet models. IEEE Trans. Acoust. Speech Signal Process. 1989, 37, 2091–2110. [Google Scholar] [CrossRef] [Green Version]
  11. Fortuna-Cervantes, J.M.; Ramírez-Torres, M.T.; Martínez-Carranza, J.; Murguía-Ibarra, J.; Mejía-Carlos, M. Object Detection in Aerial Navigation using Wavelet Transform and Convolutional Neural Networks: A First Approach. Program. Comput. Softw. 2020, 46, 536–547. [Google Scholar] [CrossRef]
  12. Chaabane, C.B.; Mellouli, D.; Hamdani, T.M.; Alimi, A.M.; Abraham, A. Wavelet convolutional neural networks for handwritten digits recognition. In Proceedings of the International Conference on Hybrid Intelligent Systems, Delhi, India, 14–16 December 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 305–310. [Google Scholar]
  13. Gholamalinejad, H.; Khosravi, H. Vehicle Classification using a Real-Time Convolutional Structure based on DWT pooling layer and SE blocks. Expert Syst. Appl. 2021, 183, 115420. [Google Scholar] [CrossRef]
  14. Ferrà, A.; Aguilar, E.; Radeva, P. Multiple Wavelet Pooling for CNNs. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
  15. De Souza Brito, A.; Vieira, M.B.; de Andrade, M.L.S.C.; Feitosa, R.Q.; Giraldi, G.A. Combining max-pooling and wavelet pooling strategies for semantic image segmentation. Expert Syst. Appl. 2021, 183, 115403. [Google Scholar] [CrossRef]
  16. Li, Q.; Shen, L. 3D WaveUNet: 3D Wavelet Integrated Encoder-Decoder Network for Neuron Segmentation. arXiv 2021, arXiv:2106.00259. [Google Scholar]
  17. Alijamaat, A.; NikravanShalmani, A.; Bayat, P. Multiple sclerosis lesion segmentation from brain MRI using U-Net based on wavelet pooling. Int. J. Comput. Assist. Radiol. Surg. 2021, 16, 1459–1467. [Google Scholar] [CrossRef] [PubMed]
  18. Liu, P.; Zhang, H.; Lian, W.; Zuo, W. Multi-level wavelet convolutional neural networks. IEEE Access 2019, 7, 74973–74985. [Google Scholar] [CrossRef]
  19. Williams, T.; Li, R. An ensemble of convolutional neural networks using wavelets for image classification. J. Softw. Eng. Appl. 2018, 11, 69. [Google Scholar] [CrossRef] [Green Version]
  20. Piao, J.; Chen, Y.; Shin, H. A new deep learning based multi-spectral image fusion method. Entropy 2019, 21, 570. [Google Scholar] [CrossRef] [Green Version]
  21. De Silva, D.; Fernando, S.; Piyatilake, I.T.S.; Karunarathne, A. Wavelet based edge feature enhancement for convolutional neural networks. In Proceedings of the Eleventh International Conference on Machine Vision (ICMV 2018), Munich, Germany, 1–3 November 2018; International Society for Optics and Photonics: Bellingham, WA, USA, 2019; Volume 11041, p. 110412R. [Google Scholar]
  22. Burrus, C.S. Introduction to Wavelets and Wavelet Transforms: A Primer; Pearson: Upper Saddle River, NJ, USA, 1997. [Google Scholar]
  23. Walker, J.S. A Primer on Wavelets and Their Scientific Applications, 3rd ed.; CRC Press: Boca Raton, FL, USA, 2008. [Google Scholar]
  24. Mallat, S. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 1989, 11, 674–693. [Google Scholar] [CrossRef] [Green Version]
  25. Haar, A. Zur theorie der orthogonalen funktionensysteme. Math. Ann. 1910, 69, 331–371. [Google Scholar] [CrossRef]
  26. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
  27. Aggarwal, C.C. Neural Networks and deep Learning; Springer Nature: Berlin, Germany, 2018. [Google Scholar]
  28. Williams, T.; Li, R. Advanced image classification using wavelets and convolutional neural networks. In Proceedings of the 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), Anaheim, CA, USA, 18–20 December 2016; pp. 233–239. [Google Scholar]
  29. Nielsen, M.A. Neural Networks and Deep Learning; Determination Press: San Francisco, CA, USA, 2015; Volume 25. [Google Scholar]
  30. Lee, C.Y.; Gallagher, P.W.; Tu, Z. Generalizing pooling functions in convolutional neural networks: Mixed, gated, and tree. In Proceedings of the Artificial Intelligence and Statistics, Cadiz, Spain, 9–11 May 2016; pp. 464–472. [Google Scholar]
  31. LeCun, Y.; Boser, B.; Denker, J.; Henderson, D.; Howard, R.; Hubbard, W.; Jackel, L. Handwritten digit recognition with a back-propagation network. Adv. Neural Inf. Process. Syst. 1989, 2, 396–404. [Google Scholar]
  32. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
  33. Ranzato, M.; Boureau, Y.L.; LeCun, Y. Sparse feature learning for deep belief networks. Adv. Neural Inf. Process. Syst. 2007, 20, 1185–1192. [Google Scholar]
  34. Zeiler, M.D.; Ranzato, M.; Monga, R.; Mao, M.; Yang, K.; Le, Q.V.; Nguyen, P.; Senior, A.; Vanhoucke, V.; Dean, J.; et al. On rectified linear units for speech processing. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 3517–3521. [Google Scholar]
  35. Chollet, F. Deep Learning with Python; Manning Publications Co.: Shelter Island, NY, USA, 2018. [Google Scholar]
  36. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  37. Brigato, L.; Iocchi, L. A close look at deep learning with small data. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 2490–2497. [Google Scholar]
  38. Srivastava, N. Improving Neural Networks with Dropout. Ph.D. Dissertation, University of Toronto, Toronto, ON, Canada, 2013. [Google Scholar]
  39. Mikołajczyk, A.; Grochowski, M. Data augmentation for improving deep learning in image classification problem. In Proceedings of the 2018 International Interdisciplinary PhD Workshop (IIPhDW), Swinoujscie, Poland, 9–12 May 2018; pp. 117–122. [Google Scholar]
  40. Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 7–9 July 2015; pp. 448–456. [Google Scholar]
  41. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
  42. Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. 2009. Available online: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.222.9220&rep=rep1&type=pdf (accessed on 7 March 2022).
  43. Cimpoi, M.; Maji, S.; Kokkinos, I.; Mohamed, S.; Vedaldi, A. Describing textures in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 3606–3613. [Google Scholar]
  44. Sharan, L.; Rosenholtz, R.; Adelson, E. Material perception: What can you see in a brief glance? J. Vis. 2009, 9, 784. [Google Scholar] [CrossRef]
  45. Lin, F.; Hou, T.; Jin, Q.; You, A. Improved YOLO Based Detection Algorithm for Floating Debris in Waterway. Entropy 2021, 23, 1111. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Example of the shortcoming of Max and Average Pooling against the contribution of wavelet pooling, preserving the essential features.
Figure 1. Example of the shortcoming of Max and Average Pooling against the contribution of wavelet pooling, preserving the essential features.
Applsci 12 03592 g001
Figure 2. Block diagram for the proposed methodology.
Figure 2. Block diagram for the proposed methodology.
Applsci 12 03592 g002
Figure 3. Learning behavior on CIFAR-10 training and validation sets.
Figure 3. Learning behavior on CIFAR-10 training and validation sets.
Applsci 12 03592 g003
Figure 4. Learning behavior on DTD training and validation sets—Adam optimizer.
Figure 4. Learning behavior on DTD training and validation sets—Adam optimizer.
Applsci 12 03592 g004
Figure 5. Learning behavior on FMD training and validation sets—Adam optimizer.
Figure 5. Learning behavior on FMD training and validation sets—Adam optimizer.
Applsci 12 03592 g005
Table 1. Training parameters of the proposed model.
Table 1. Training parameters of the proposed model.
Hyperparameters
Learning rate0.001
Minibatch30, CIFAR-10 = 64
Loss function‘categorical_crossentropy’
Metrics‘acc’, ‘loss’
Epochs500
Callbacks API4
ModelCheckpointMonitor = ‘val_loss’, save_best_only = True, mode = ‘min’
EarlyStoppingMonitor = ‘val_acc’, patience = 15, mode = ‘max’
CVLogger‘model_history.csv’, append = True
ReduceLROnPlateauMonitor = ‘val_los’, factor = 0.2, patience = 10, min_lr = 0.001
OptimizerSGD—Adam
Table 2. The number of images per class.
Table 2. The number of images per class.
DatasetClassesImages per ClassTrainingValidationTest
CIFAR-101010,00040,00010,00010,000
DTD471203931863846
FMD10100700150150
Table 3. Performance of pooling methods on CIFAR-10.
Table 3. Performance of pooling methods on CIFAR-10.
MethodTrainable ParamsLossAccVal_LossVal_AccTest_LossTest_Acc
MaxP545,2060.27410.90690.30580.89900.33650.8913
AveP545,2060.32200.89060.32960.89320.34930.8850
DWTP1,558,9660.19580.93300.31810.90200.34610.8946
DWTaP545,2060.25680.91260.29700.90670.32080.8970
DWTdP1,221,0460.36780.87350.40400.87010.42070.8672
Table 4. Performance of pooling methods on DTD - SGD optimizer.
Table 4. Performance of pooling methods on DTD - SGD optimizer.
MethodTrainable ParamsLossAccVal_LossVal_AccTest_LossTest_Acc
MaxP12,344,8312.57150.31762.92170.24802.82030.2742
AveP12,344,8311.78490.50242.46060.37662.39370.3865
DWTP48,748,0311.88420.48172.49580.36852.44150.3924
DWTaP12,344,8312.32140.37402.74590.31172.63900.3416
DWTdP36,613,6313.50350.10693.49680.11363.48170.1288
Table 5. Performance of pooling methods on DTD—Adam optimizer.
Table 5. Performance of pooling methods on DTD—Adam optimizer.
MethodTrainable ParamsLossAccVal_LossVal_AccTest_LossTest_Acc
MaxP12,344,8311.74230.52252.18630.44262.13760.4350
AveP12,344,8311.66070.53242.18160.43452.19340.4184
DWTP48,748,0311.40550.59222.06470.48552.01950.4799
DWTaP12,344,8311.64080.53292.26570.44842.28780.4302
DWTdP36,613,6313.32410.13563.36660.14253.32050.1536
Table 6. Performance of pooling methods on FMD—SGD optimizer.
Table 6. Performance of pooling methods on FMD—SGD optimizer.
MethodTrainable ParamsLossAccVal_LossVal_AccTest_LossTest_Acc
MaxP12,341,6862.62340.22392.40130.16672.44260.1333
AveP12,341,6861.75550.43692.27730.20672.34200.2467
DWTP48,744,8861.51130.48962.02080.35332.14880.2867
DWTaP12,341,6861.38020.51012.19160.32672.33010.3066
DWTdP36,610,4863.04640.16872.31720.18672.41760.1400
Table 7. Performance of pooling methods on FMD—Adam optimizer.
Table 7. Performance of pooling methods on FMD—Adam optimizer.
MethodTrainable ParamsLossAccVal_LossVal_AccTest_LossTest_Acc
MaxP12,341,6861.47130.48212.00680.28672.12220.2867
AveP12,341,6861.90430.35942.22490.32672.09810.3000
DWTP48,744,8861.31160.54932.08320.32002.06670.3133
DWTaP12,341,6861.43190.51082.05660.36672.07170.2866
DWTdP36,610,4862.16600.22392.17280.26002.20710.2199
Table 8. Performance evaluation and comparison with other methods indicated as accuracy (%)—DTD dataset.
Table 8. Performance evaluation and comparison with other methods indicated as accuracy (%)—DTD dataset.
Test1 Test2
MethodT-CNNWavelet CNNDWTaPMaxPDWTaP
Trainable params (millions)23.414.112.312.312.3
DTD (%)0.27800.35600.34160.43500.4302
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Fortuna-Cervantes, J.M.; Ramírez-Torres, M.T.; Mejía-Carlos, M.; Murguía, J.S.; Martinez-Carranza, J.; Soubervielle-Montalvo, C.; Guerra-García, C.A. Texture and Materials Image Classification Based on Wavelet Pooling Layer in CNN. Appl. Sci. 2022, 12, 3592. https://doi.org/10.3390/app12073592

AMA Style

Fortuna-Cervantes JM, Ramírez-Torres MT, Mejía-Carlos M, Murguía JS, Martinez-Carranza J, Soubervielle-Montalvo C, Guerra-García CA. Texture and Materials Image Classification Based on Wavelet Pooling Layer in CNN. Applied Sciences. 2022; 12(7):3592. https://doi.org/10.3390/app12073592

Chicago/Turabian Style

Fortuna-Cervantes, Juan Manuel, Marco Tulio Ramírez-Torres, Marcela Mejía-Carlos, José Salomé Murguía, José Martinez-Carranza, Carlos Soubervielle-Montalvo, and César Arturo Guerra-García. 2022. "Texture and Materials Image Classification Based on Wavelet Pooling Layer in CNN" Applied Sciences 12, no. 7: 3592. https://doi.org/10.3390/app12073592

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop