Fuzzy Edge-Detection as a Preprocessing Layer in Deep Neural Networks for Guitar Classification

Deep neural networks have demonstrated the capability of solving classification problems using hierarchical models, and fuzzy image preprocessing has proven to be efficient in handling uncertainty found in images. This paper presents the combination of fuzzy image edge-detection and the usage of a convolutional neural network for a computer vision system to classify guitar types according to their body model. The focus of this investigation is to compare the effects of performing image-preprocessing techniques on raw data (non-normalized images) with different fuzzy edge-detection methods, specifically fuzzy Sobel, fuzzy Prewitt, and fuzzy morphological gradient, before feeding the images into a convolutional neural network to perform a classification task. We propose and compare two convolutional neural network architectures to solve the task. Fuzzy edge-detection techniques are compared against their classical counterparts (Sobel, Prewitt, and morphological gradient edge-detection) and with grayscale and color images in the RGB color space. The fuzzy preprocessing methodologies highlight the most essential features of each image, achieving favorable results when compared to the classical preprocessing methodologies and against a pre-trained model with both proposed models, as well as achieving a reduction in training times of more than 20% compared to RGB images.


Introduction
Artificial intelligence (AI) allows us to automate multiple tasks in different areas, such as artificial vision, which makes it possible to perform several jobs that were previously believed only humans would be capable of doing. The tasks are those such as object identification or object location in digital images; processes such as these require previous knowledge of the objects you are looking for, as well as the context in which they are found, requirements that are completely related to the vision and cognitive abilities of humans.
In recent years, deep neural networks (DNN) have made a huge advances in recognition tasks in multiple areas due to the capabilities of feature-extraction layers that DNNs have embedded in their design, which make them very attractive in multiple disciplines. Zhang et al. [1] created a fruit classification system using a deep neural network to replace handcrafted features, beating state-of-the-art approaches. Horng et al. [2] used a DNN, feeding it with aerial images to classify tree areas and help in the understanding of land use. Sebti et al. [3] provided a solution for forecasting and diagnosing of diabetic retinopathy by training a convolutional neural network with retina images, achieving over 96% accuracy.
Previous authors tackled the guitar/instrument classification problem, for example, Rabelo et al. [4] used a support vector machine to classify the sounds of guitars (in a binary problem) from two brands, obtaining 95% accuracy with a particular approach. Yet, this was done by using the sound (music notes) generated by the instrument, instead of images, to perform the classification tasks. Banerjee et al. [5] used musical notes and support vector machines to classify string instruments (cello, double bass, guitar, and violin), obtaining up to 100% recognition with random forest. This paper proposes an Sensors 2022, 22, 5892 2 of 22 artificial vision system that performs the classification by body types of some of the most popular guitar styles found nowadays. The system operates by exploiting the capabilities of the feature-extraction layers of DNN, feeding them with pictures of the guitars that need to be classified by the user. Given the lack of previous attempts to do so, we created a dataset by scraping images from popular search engines to generate a competent database that contained sufficient images of some of the most popular guitar models. The application of image-based instrument-classification systems could have great relevance in retail stores and warehouses of music instrument franchises, where there is a constant flow of inventory involved, especially of used products. Intelligent systems could be applied for inventory keeping or organization of unlabeled items. Another form of implementation viable for the music industry is a complementary system for off-site quoting services for instrument purchases or store credit requests, making it more compelling for customers to sell used instruments to the store for resale.
The process of detecting edges is an essential part of pattern-recognition systems; it simplifies the analysis by reducing the image to its minimal expression, and by doing so, it reduces the amount of processing needed [6]. This could be considered a difficult task, especially when the images contain noise or include irrelevant information. To solve this challenge, some fuzzy edge-detectors have been proposed. In Tao et al. [7], Hu et al. [8], and Ontiveros-Robles et al. [9], the authors presented some edge-detection methodologies with their foundation in type-1 fuzzy systems, T1FS. In Mendoza et al. [10,11], the authors propose fuzzy edge-detection based on the Sobel operator and interval type-2 fuzzy systems IT2FS. In Biswas et al. [12] and Melin et al. [13], the Canny method and the morphological gradient approach were improved using IT2FS, respectively. In the area of pattern-recognition, fuzzy edge-detection methods play an important role in improving the recognition rate when comparing the results with images without processing or when traditional filters are applied. We can mention other research works, where some edge-detection methodologies were based on T1FS, IT2FS, and general type-2 fuzzy sets (GT2FS) used in the preprocessing pipeline for face-recognition based on a monolithic neural network [14,15]. Mendoza et al. [16] applied edge-detectors to two face databases; this detection system was based on T1 and IT2FS. The edges found were used as inputs for a face-recognition system; the authors concluded that the recognition achieved by the system was improved when fuzzy edge-detection methods were applied. Martinez et al. [17] presented a face-recognition method with its foundation in modular neural networks, with favorable results when the fuzzy Sobel edge-detection was performed.
The success of convolutional neural networks (CNNs) in classification is due to their ability to perform both feature-extraction and classification, and most models perform very well without preprocessing steps. However, sometimes, the dataset images are unbalanced, have lower resolution, poor quality, or acquired some noise or uncertainty during the capture process. Due to these facts, some approaches use additional preprocessing methods, including image resizing, data augmentation, cropping, converting to grayscale to reduce the preprocessing time, and adding filters and equalization to improve the image quality or resolution. In the literature, we can find some works that have shown that the use of preprocessing methods before CNN-based models improves the results. Cordero-Martínez et al. [18,19] presented a comparison of four image preprocessing methods to classify diabetic retinopathy using CNN, and the authors evidenced and concluded that the preprocessing steps are important to increase the accuracy of the results. Kato et al. [20] proposed a preprocessing approach applied before a CNN-based model, and the results were 34.8% higher than the conventional CNN model. In Musallam et al. [21], three preprocessing steps were proposed to enhance the quality of MRI (magnetic resonance imaging) before these were introduced to a deep convolutional neural network (DCNN), which involved removing confusing objects, using a non-local mean algorithm (NLM), and applying histogram equalization. These were applied for automatic detection of brain tumors in MRI images and experimental results proved the robustness of the proposed architecture, which increased the detection accuracy of a variety of brain diseases in a short time compared to other well-known deep convolutional neural network (DCNN) models such as VGG16 [22], VGG19 [22], and hybrid CNN-SVM [23]. Finally, Lăzărescu et al. [24] presented an algorithm for fingerprint classification using a combination of edge features and CNN. In this work, some preprocessing steps were applied, including edge-enhancement operations, data resizing, data augmentation, and the images were enhanced using Prewitt and Laplacian of Gaussian. The proposed algorithm achieved a very good performance compared to the traditional hand-crafted features.
The motivation to implement fuzzy edge-detection in this paper as a preprocessing phase was that fuzzy logic is a good technique to model the uncertainty or noise encountered in images where, with an appropriate filtering operator, this will be suppressed. The fuzzy methodologies (Prewitt, Sobel, and morphological gradient fuzzy edge-detectors) presented in this paper consider implementing only T1 fuzzy sets and are referenced from the previous state-of-the-art research [25,26]. The implementation and combination of fuzzy preprocessing techniques and convolutional neural networks produce powerful artificial intelligence tools for pattern recognition.

Convolutional Neural Network
The convolutional neural network, also known as CNN, is one of the most popular deep-learning techniques. where a computer system imitates the visual cortex and brain to identify patterns from visual elements [27]. CNNs have been used in multiple fields, such as medical [28][29][30][31], autonomous vehicles [32][33][34], and agricultural [35,36], just to name a few. A CNN differs from a conventional neural network (NN) because it contains at least a convolutional layer. The basic architecture of a CNN consists of the following layers: input, convolutional, activation fn., pooling, dense, and output layer.

1.
Input layer: The input layer constitutes the input of the data that are going to be fed to the network, with a fixed size; 2.
Convolutional layer: This layer applies a scalar product with a small matrix (kernel) to extract features or patterns from the input. generating a new matrix with the extracted contents; 3.
Activation function: CNNs use activation functions to trigger the neurons when the individual values reach a threshold; these functions allow us to normalize the input data and perform non-linearization of the data. One of the most used activations is a rectified linear unit (ReLu); the output of the activation of a ReLu is the maximum between the values of the input; 4.
Pooling layer: This layer allows us to create a smaller representation of the input data, and by doing so, we can reduce the number of features held by the input. The maximum pooling layer is one of the most used layers; this layer uses a filter with size n × n, which does a pass all over the input, as the convolution layer. In this case, the maximum value of this filter is stored as the output in the new representation, reducing the dimensions of the input proportionally to the dimensions of the filter and its stride; 5.
Dense layer (fully connected): Consists of the weights and biases (like a traditional feed-forward neural network). The input of these layers is the output of a convolution or pooling layer; 6.
The output layer: This layer contains the neurons that will provide the output of the input that was fed through the model.

Fuzzy Logic Systems
Fuzzy logic aims to perform representations in nonlinear ways as human logic does. Often, linguistic terms are used that differ from conventional logic where it is usually binary, and fuzzy logic allows gradual representations in a continuous space, thereby allowing levels of uncertainty. Zadeh introduced the fuzzy sets given the discontent of classic sets (crisp sets) [37]. Fuzzy sets allow the use of membership, meaning their elements are capable of being part of one or more classes at the same time. The range of these sets is defined as human logic, where they depend on the concept or user applying them.
The T1FS set A is from Universe X that goes from [0, 1], belonging to a continuous function, i.e., µ A : X → [0, 1] . The T1 membership function A is denoted as µ A (x); this function is defined in Equation (1).
The different membership functions (MFs) most used to represent a fuzzy set are the following: the triangular MF, the trapezoidal MF, and the Gaussian MF, which is used in the fuzzy edge-detection approach presented in this paper. This last consists of two parameters {c, σ} and is expressed in Equation (2); variable c represents the mean of the MF and σ the amplitude.
A fuzzy inference system (FIS) is based on if-then rules with fuzzy reasoning [38] applied to fuzzy sets. The basic structure of a FIS constitutes a database, along with a reasoning mechanism, which infers a reasonable conclusion from the inputs, outputs, and the knowledge provided by the database. Popular fuzzy inference systems are: Mamdani [39], Tsukamoto, and Takagi-Sugeno-Kang [40].

Edge-Detection Methods
Image-processing techniques consist of the manipulation of digital images to hide or highlight details, specific patterns, improve illumination, or eliminate noise caused by external agents such as artifacts that can be caused by the camera sensor or movement when taking the picture. This processing consists of applying an operation in a pixel window (kernel), which goes through the images, changing their content uniformly to create a new image [41]. Equation (3) describes the process of the application of a kernel to an image, where the k represents the kernel, r, c represents the coordinates of the value of the kernel, where r represents the row and c the column, and the variable f represents the input image.
Operations such as edge-detection are performed to reduce the amount of information that an image contains. Edge-detection techniques identify discontinuities in the brightness levels of the image to identify borders.
Among the most popular edge-detection techniques are the Roberts, Sobel, and Prewitt operators [42][43][44], which are focused on calculating the gradient with the first derivate of an image by applying a convolution operation that approximates the gradient and returns the first derivate for the horizontal and vertical directions.

Prewitt and Sobel Edge-Detection Methods
The traditional Prewitt and Sobel operators work in almost the same way; they both consists of a 3 × 3 neighborhood gradient operator, but they differ in the mask used in the convolutional process. The masks of the convoluted Prewitt operator on a grayscale image are defined by Equations (4) and (5), which correspond to Prewittx and Prewitty, respectively. In contrast, the masks used in the Sobel operator are expresed in Equation (6) (Sobelx) and Equation (7) (Sobely).
The filter applies two different kernels to an image to generate the gradients g x in Equation (8) for a horizontal orientation and g y in Equation (9) for a vertical orientation. The coordinates of the input f are demonstrated in Figure 1, where the axis is represented by x for the horizontal axis and y for the vertical axis, and f for the image (input) source [26]. In Equations (7) and (8), kernelx represents the mask Prewittx or Sobelx, and kernely is the mask for Prewittx or Sobely, depending on the filter to be calculated.
consists of a 3 × 3 neighborhood gradient operator, but they differ in the mask used in the convolutional process. The masks of the convoluted Prewitt operator on a grayscale image are defined by Equations (4) and (5), which correspond to Prewittx and Prewitty, respectively. In contrast, the masks used in the Sobel operator are expresed in Equation (6) (Sobelx) and Equation (7) (Sobely).
The filter applies two different kernels to an image to generate the gradients gx in Equation (8) for a horizontal orientation and gy in Equation (9) for a vertical orientation. The coordinates of the input f are demonstrated in Figure 1, where the axis is represented by x for the horizontal axis and y for the vertical axis, and f for the image (input) source [26]. In Equations (7) and (8), kernelx represents the mask Prewittx or Sobelx, and kernely is the mask for Prewittx or Sobely, depending on the filter to be calculated. The magnitude of the gradients is obtained with Equation (10), which takes into consideration the results of the calculations of gx and gy from the image using Equations (8) and (9); these convolutions are applied using the kernels defined in Equations (4)- (7).
MG is an edge-detection technique that calculates the first derivate of an image in the four orientations of the image, i.e., vertical, horizontal, and its diagonals (0°, 45°, 90°, and 135°). This is demonstrated in Figure 2a, where the gradients are indi- The magnitude of the gradients is obtained with Equation (10), which takes into consideration the results of the calculations of g x and g y from the image f using Equations (8) and (9); these convolutions are applied using the kernels defined in Equations (4)- (7).
Morphological Gradient (MG) MG is an edge-detection technique that calculates the first derivate of an image in the four orientations of the image, i.e., vertical, horizontal, and its diagonals (0 • ,45 • , 90 • , and 135 • ). This is demonstrated in Figure 2a, where the gradients are indicated by the following variables G 1 , G 2 , G 3 , and G 4 . The calculation of the gradients is made in the following way.
G i (where I = 1 . . . 4) represents the direction of the edge (gradient). It is calculated with Equation (11) by using a 3 × 3 kernel. In Equation (11), z i represent the coefficient of each of the matrix positions shown in Figure 2b, where the coefficients are calculated with Equation (12); f is the representation of the image, i.e., the x-axis for the columns, and y-axis for the rows. The edge value is denoted with the variable "Edges", which is integrated using Equation (13) [13,25,45]. Gi (where I = 1…4) represents the direction of the edge (gradient). It is calculated with Equation (11) by using a 3 × 3 kernel. In Equation (11), zi represent the coefficient of each of the matrix positions shown in Figure 2b, where the coefficients are calculated with Equation (12); f is the representation of the image, i.e., the x-axis for the columns, and y-axis for the rows. The edge value is denoted with the variable "Edges", which is integrated using Equation (13) [13,25,45].

Methodology
The purpose of this investigation was to compare the effect of performing diverse image preprocessing techniques before feeding the images into a CNN to perform a classification test. Among the techniques performed were fuzzy edge-detection filters. Fuzzy filters (in this case edge-detection filters) allow us to determine the membership of a pixel to a boundary or a uniform region of the image, taking in consideration uncertainty. The general methodology of the proposal is illustrated in Figure 3 and explained as follows.

Methodology
The purpose of this investigation was to compare the effect of performing diverse image preprocessing techniques before feeding the images into a CNN to perform a classification test. Among the techniques performed were fuzzy edge-detection filters. Fuzzy filters (in this case edge-detection filters) allow us to determine the membership of a pixel to a boundary or a uniform region of the image, taking in consideration uncertainty. The general methodology of the proposal is illustrated in Figure 3 and explained as follows. each of the matrix positions shown in Figure 2b, where the coefficients are calculated with Equation (12); f is the representation of the image, i.e., the x-axis for the columns, and y-axis for the rows. The edge value is denoted with the variable "Edges", which is integrated using Equation (13) [13,25,45].

Methodology
The purpose of this investigation was to compare the effect of performing diverse image preprocessing techniques before feeding the images into a CNN to perform a classification test. Among the techniques performed were fuzzy edge-detection filters. Fuzzy filters (in this case edge-detection filters) allow us to determine the membership of a pixel to a boundary or a uniform region of the image, taking in consideration uncertainty. The general methodology of the proposal is illustrated in Figure 3 and explained as follows.

Input Database
In this study case, we used a self-made dataset that contained a balanced distribution of images of six different guitar styles (acoustic guitar, double cut, Les Paul, telecaster, stratocaster, and ukulele). Each class had a high degree of similarity with another, as denoted by the following pairings: acoustic with ukulele, double cut with Les Paul, and telecaster with stratocaster. These similarities only applied to our object of interest, while the images contained high levels of variation between each other. Since the images were scraped from the web by using popular search engines, the results included stock images for product advertisements from popular brands [46][47][48], or pictures posted in forums by owners of the instruments. The variations included a difference in position, orientation, and rotation of the object, illumination, other objects present in the frame (hands, cases, stands, straps, or multiple guitars in the background), and resolution. The guitar database created consisted of 5400 images in color format (see Table 1), with the classes balanced (900 images per class), and with normalized dimensions of 150 × 150 pixels. The process of creating the dataset was as follows: 1.
Create a list of keywords to perform a search in the search engines (Google Images and Bing Images); some of the keywords used are the follows: Manually prune the dataset to remove duplicates or outliers.

Fuzzy Preprocessing Phase
This section contains an explanation of the membership function used, as well as the fuzzy rules and the methodology for the implementation of the fuzzy edge-detection approaches.

Fuzzy Sobel and Fuzzy Sobel Edge-Detection
The process to generate the fuzzy Sobel and fuzzy Prewitt edge-detection methods is similar, depending on the operator to be applied, only changing the mask, as previously explained in Section 2.1.3 and expressed with Equations (4)-(7). The inference system is a type-1 fuzzy Mamdani FIS with two inputs, one output, and three fuzzy rules. The general methodology is explained as follows, emphasizing that the numerical results used as an example are calculated with fuzzy Sobel.

1.
Read the input image. First, we need to read the input image; in this case, the database is defined in Section 2.2.1 and illustrated in Figure 4; 2.
Obtain the inputs for the inference system. The type-1 inference system considers two inputs; these are the gradients calculated with Equations (8) and (9). The inputs are labeled as Dh and Dv, which represent the gradients in the horizontal and vertical positions, respectively. The two inputs stand for Gaussians membership functions expressed in Equation (3). The Dh input is granulated in three MFs with the linguistic variables: "LowDh", "MiddleDh", and "HighDh"; the Dv input also has three MFs defined as "LowDv", "MiddleDv", and "HighDv". The parameters are determined according to the gradient values of each image, i.e., considering that the input image is Figure 5, we obtain lower with Equation (14), high using Equation (15), middle with Equation (16), and σ values using Equation (17) for the gradients Dh and Dv.
tions expressed in Equation (3). The Dh input is granulated in three MFs with the linguistic variables: "LowDh", "MiddleDh", and "HighDh"; the Dv input also has three MFs defined as "LowDv", "MiddleDv", and "HighDv". The parameters are determined according to the gradient values of each image, i.e., considering that the input image is Figure 5, we obtain lower with Equation (14), high using Equation (15), middle with Equation (16), and σ values using Equation (17) for the gradients Dh and Dv.
ℎ ℎ ℎ = ( ℎ), ℎ ℎ = ( ) Considering the gradients in Figure 6, we obtain the values ℎ = 0 , ℎ = 219.50, ℎ ℎ ℎ = 878, and ℎ = 219.50 for the input ℎ. The parame- Considering the gradients in Figure 6, we obtain the values lowDh = 0, MiddleDh = 219.50, highDh = 878, and σDh = 219.50 for the input Dh. The parameters for the Gaussian MFs are expressed in Equations (18)- (20) and illustrated in Figure 6a.  1. Obtain the output. The T1FIS has one output labeled as Edges; this is divided in two linguistic values "Background" and "Edge". In this study case, the output (Edges) is normalized in a range between −5 and 4.5. The center value for the Background MF 1. Obtain the output. The T1FIS has one output labeled as Edges; this is divided in two linguistic values "Background" and "Edge". In this study case, the output (Edges) is normalized in a range between −5 and 4.5. The center value for the Background MF

1.
Obtain the output. The T1FIS has one output labeled as Edges; this is divided in two linguistic values "Background" and "Edge". In this study case, the output (Edges) is normalized in a range between −5 and 4.5. The center value for the Background MF is cBackground = −5 and for the Edge MF is defined as cEdge = 4.5. The σ value for both MFs is calculated with Equation (24); The parameters for the output MF (Edges) are expressed in Equations (25) 2.
Mamdani Fuzzy rules. The knowledge base is represented with three fuzzy rules (see Table 2); these are obtained based on expert knowledge. The fuzzy Sobel edgedetection output is illustrated in Figure 7a.
The parameters for the output MF (Edges) are expressed in Equations (25) and (26) 2. Mamdani Fuzzy rules. The knowledge base is represented with three fuzzy rules (see Table 2); these are obtained based on expert knowledge. The fuzzy Sobel edge-detection output is illustrated in Figure 7a. Algorithm 1 summarizes the process for calculating Prewitt and Sobel fuzzy edge-detection.

Algorithm 1. Fuzzy edge-detection using Prewitt and Sobel operators. 1.
Select the operator to calculate the gradients (Prewitt or Sobel).
Read input image 7.
Obtain image dimensions (rows and columns) 8.
Calculate the classical gradient corresponding to the fuzzy filter desired 10. [Dx, Dy] = zeros(rows, columns) // Generate a zero matrix with the same dimensions as f to capture the gradient 11.

1.
Select the operator to calculate the gradients (Prewitt or Sobel).
Read input image f 7.
Obtain image dimensions (rows and columns) 8.
Calculate the classical gradient corresponding to the fuzzy filter desired 10. [Dx, Dy] = zeros(rows, columns) // Generate a zero matrix with the same dimensions as f to capture the gradient 11.
Dx[i,j] = sum(kernelx * f [i: i+3, j: j+3]) // Equation (8) Tables 2 and 3. 19. Fuzzify the two input gradients Dx and Dy, using Gaussian MFs (Equations (18)-(23)). 20. Infer the edges output with the selected controller 21. Defuzzify the output (edges) of the controller.  This edge-detector combines the T1 fuzzy set theory and the morphological gradient technique (explained in Section 2.1.3). A fuzzy Mamdani system is implemented that consist of four inputs, one output, and three fuzzy rules; the process is calculated with the following steps.

1.
Obtain the four image gradients. After reading the input image, Equations (11) and (12) are used to calculate the image gradients in all directions (G1, G2, G3, and G4). Each gradient represents an input in the fuzzy inference system; 2.
Calculate the output. T1FIS consists of one output (Edges), granulated with two linguistic values "Background" and "Edge". In the same manner as in the fuzzy Sobel and fuzzy Prewitt approaches, the output edges matrix is normalized in a range between −5 and 4.5; the center value for the Background MF is cBackground = −5 and the Edge MF is cEdge = 4.5. The σ value for both MFs is calculated with Equation (44); σoutput = abs(cBackground − cEdge)/2 The parameters for the output MF (Edges) are expressed in Equations (45) and (46) and illustrated in Figure 9, considering the values of cBackground = −5, cEdge = 4.5, and σoutput = 4.75.

4.
Mamdani Fuzzy rules. The knowledge base is represented with three fuzzy rules ( Table 2). The fuzzy MG edge-detection output is illustrated in Figure 7b.

CNN Training and Validation
For this research, we experimented with three architectures: two proposed models of convolutional neural networks and a pre trained VGG16 model [49]. The proposed models were crafted with the ideology of achieving high levels of accuracy without consuming large amounts of computing resources, such as the VGG16 or Inception models [50], which are very large models trained for multiple categories. In this case, the model CNN-I (architecture overview in Table 4) consists of four convolution layers that include the rectified linear unit activation (ReLu) and maximum pooling in between each convolution layer, a dropout of 45%, two dense layers, and the output layer. The CNN-II model (architecture overview in Table 5) is a variant of the first model where the neurons in the convolutional layers are increased to extract as much information as possible from the filter; as well as CNN-I, each of these convolution layers includes a ReLu activation, and each convolution is followed by a maximum pooling layer with the same filter size. To compensate for overfitting the number of neurons used, the dropout layer is increased to turn off 75% of the neurons before passing to a single, but significantly larger, fully connected layer and finally going to the output.  Table 2). The fuzzy MG edge-detection output is illustrated in Figure 7b.

CNN Training and Validation
For this research, we experimented with three architectures: two proposed models of convolutional neural networks and a pre trained VGG16 model [49]. The proposed models were crafted with the ideology of achieving high levels of accuracy without consuming large amounts of computing resources, such as the VGG16 or Inception models [50], which are very large models trained for multiple categories. In this case, the model CNN-I (architecture overview in Table 4) consists of four convolution layers that include the rectified linear unit activation (ReLu) and maximum pooling in between each convolution layer, a dropout of 45%, two dense layers, and the output layer. The CNN-II model (architecture overview in Table 5) is a variant of the first model where the neurons in the convolutional layers are increased to extract as much information as possible from the filter; as well as CNN-I, each of these convolution layers includes a ReLu activation, and each convolution is followed by a maximum pooling layer with the same filter size. To compensate for overfitting the number of neurons used, the dropout layer   Table 2). The fuzzy MG edge-detection output is illustrated in Figure 7b.

CNN Training and Validation
For this research, we experimented with three architectures: two proposed models of convolutional neural networks and a pre trained VGG16 model [49]. The proposed models were crafted with the ideology of achieving high levels of accuracy without consuming large amounts of computing resources, such as the VGG16 or Inception models [50], which are very large models trained for multiple categories. In this case, the model CNN-I (architecture overview in Table 4) consists of four convolution layers that include the rectified linear unit activation (ReLu) and maximum pooling in between each convolution layer, a dropout of 45%, two dense layers, and the output layer. The CNN-II model (architecture overview in Table 5) is a variant of the first model where the neurons in the convolutional layers are increased to extract as much information as possible from the filter; as well as CNN-I, each of these convolution layers includes a ReLu activation, and each convolution is followed by a maximum pooling layer with the same filter size. To compensate for overfitting the number of neurons used, the dropout layer   The hyperparameters used to train the models were determined by experimentation. After many trials, the best results were obtained using the following parameters: • Batch size: 60.
In the case of the pre-trained model, we used a VGG16 from the TensorFlow library, using the weights from the ImageNet dataset [51]. The weights were updated by performing transfer learning. This was done by removing the last (output) layer of the model and freezing the training parameters of the other layers. A new single classification layer was added with six neurons (one for each class) and we retrained the model with the following parameters: For the training of these models, we performed K-fold cross-validation; in this process, division of the dataset was performed to create multiple training and test sets denominated as folds. This process was performed to evaluate the model with the entire dataset. Fuzzy preprocessing showed an overall performance improvement over the non-preprocessed images (grayscale and RGB).

Results
To compare the efficiency of the proposed models, we implemented them in Python, using the TensorFlow framework with the Keras API. Summaries of the results of the 30 experiments performed for each preprocessing approach for CNN-I and CNN-II are shown in Tables 6 and 7, respectively, and the results for the VGG16 model are displayed in Table 8. Fuzzy preprocessing showed an overall performance improvement against the grayscale images and the classic edge-detection methodologies. The experiments were performed on a laptop with the following specifications: CPU Intel core i7-11800H, 32 GB DDR4 RAM at 3200 MHz, and an Nvidia RTX 3080 Laptop GPU with 16 GB of video memory. To evaluate the performance between the fuzzy preprocessing techniques and the color images, we calculated a ROC curve to have a visual representation of the performance of each class. Curves were calculated for both proposed models, as shown in Figure 10 for CNN-I and Figure 11 for CNN-II, each time with the best instance of the models trained.
To expand the results of the ROC graph where the RGB images presented a slightly lower accuracy when compared to some of the fuzzy detectors, we calculated the average training time for each model. Figure 12 contains the average training time per model in seconds for CNN-I, and Figure 13 presents the averages for CNN-II.
To evaluate the performance between the fuzzy preprocessing techniques and the color images, we calculated a ROC curve to have a visual representation of the performance of each class. Curves were calculated for both proposed models, as shown in Figure 10 for CNN-I and Figure 11 for CNN-II, each time with the best instance of the models trained.   To evaluate the performance between the fuzzy preprocessing techniques and the color images, we calculated a ROC curve to have a visual representation of the performance of each class. Curves were calculated for both proposed models, as shown in Figure 10 for CNN-I and Figure 11 for CNN-II, each time with the best instance of the models trained.   training time for each model. Figure 12 contains the average training time per model in seconds for CNN-I, and Figure 13 presents the averages for CNN-II.

Statistical Test between Classic and Fuzzy Preprocessing Techniques
To verify the existence of significant evidence of the performance gain obtained with the different fuzzy preprocessing techniques, a Z-test statistical analysis was applied to compare the fuzzy preprocessing approaches (fuzzy MG edge-detection, fuzzy Sobel edge-detection, and fuzzy Prewitt) against the raw images (only resizing the images to fit the model). The tests for each preprocessing methodology were made independent for each model. The Z-test was a right-tailed test; the parameters used for the tests were the following: • Right-tailed test; • = 0.05 (95% confidence level, rejection zone at = 1.96); • = 30; • 0 : Fuzzy preprocessing approach ( 1 ) offers less or equal accuracy than the raw images ( 2 ). 0 : 1 ≤ 2 ; • a : Fuzzy preprocessing approach ( 1 ) offers more accuracy than the raw images ( 2 ). a : 1 > 2 (affirmation).
The results of the null hypothesis test for CNN-I are shown in Table 9. The 1 ̅̅̅ and 1 variables represent the mean and standard deviation, respectively, for the fuzzy preprocessing approach; 2 ̅̅̅ and 2 represent the mean and standard deviation, respectively, for raw images and grayscale. training time for each model. Figure 12 contains the average training time per model in seconds for CNN-I, and Figure 13 presents the averages for CNN-II.

Statistical Test between Classic and Fuzzy Preprocessing Techniques
To verify the existence of significant evidence of the performance gain obtained with the different fuzzy preprocessing techniques, a Z-test statistical analysis was applied to compare the fuzzy preprocessing approaches (fuzzy MG edge-detection, fuzzy Sobel edge-detection, and fuzzy Prewitt) against the raw images (only resizing the images to fit the model). The tests for each preprocessing methodology were made independent for each model. The Z-test was a right-tailed test; the parameters used for the tests were the following: • Right-tailed test; • = 0.05 (95% confidence level, rejection zone at = 1.96); • = 30; • 0 : Fuzzy preprocessing approach ( 1 ) offers less or equal accuracy than the raw images ( 2 ). 0 : 1 ≤ 2 ; • a : Fuzzy preprocessing approach ( 1 ) offers more accuracy than the raw images ( 2 ). a : 1 > 2 (affirmation).
The results of the null hypothesis test for CNN-I are shown in Table 9. The 1 ̅̅̅ and 1 variables represent the mean and standard deviation, respectively, for the fuzzy preprocessing approach; 2 ̅̅̅ and 2 represent the mean and standard deviation, respectively, for raw images and grayscale.

Statistical Test between Classic and Fuzzy Preprocessing Techniques
To verify the existence of significant evidence of the performance gain obtained with the different fuzzy preprocessing techniques, a Z-test statistical analysis was applied to compare the fuzzy preprocessing approaches (fuzzy MG edge-detection, fuzzy Sobel edge-detection, and fuzzy Prewitt) against the raw images (only resizing the images to fit the model). The tests for each preprocessing methodology were made independent for each model. The Z-test was a right-tailed test; the parameters used for the tests were the following: • Right-tailed test; • α = 0.05 (95% confidence level, rejection zone at z c = 1.96); • n = 30; • H 0 : Fuzzy preprocessing approach (µ 1 ) offers less or equal accuracy than the raw images (µ 2 ). H 0 : µ 1 ≤ µ 2 ; • H a : Fuzzy preprocessing approach (µ 1 ) offers more accuracy than the raw images (µ 2 ). H a : µ 1 > µ 2 (affirmation).
The results of the null hypothesis test for CNN-I are shown in Table 9. The x 1 and σ 1 variables represent the mean and standard deviation, respectively, for the fuzzy preprocessing approach; x 2 and σ 2 represent the mean and standard deviation, respectively, for raw images and grayscale. The results for the Z-test for the CNN-II model are then shown in Table 10.

Discussion
As demonstrated in the literature review, in addition to the results obtained, implementing fuzzy image preprocessing techniques before feeding the images into a convolutional neural network has proven to be beneficial in most instances for improving the accuracy of the model. As shown in Tables 6 and 7, the maximum accuracy was obtained with fuzzy Sobel and fuzzy Prewitt for CNN-I, with a value of 71.76% in both cases after performing the 30 experiments; in the case of the CNN-II, fuzzy Sobel achieved the best model performance, obtaining the maximum accuracy of 75.18%. These are competitive results against RGB images, where the latter do not require preprocessing but need more computational resources and time to train the model.
When reviewing the classical edge-detection methodologies for CNN-I, we can note that Sobel and MG preprocessing tended to decrease the accuracy, with an average of 53.07% for MG and 64.72% for Sobel. With Prewitt, we obtained the best average for the model with 67.63%, even better than the RGB images, which gave a result of 66.87%. The average for the fuzzy preprocessing offered a slight improvement of almost 1% when compared to the RGB images in all instances. The results obtained with CNN-II also followed a tendency where the classic Sobel and MG offered a decrease in the average of almost 15% for MG and 4% for Sobel. We also noted comparable averages with the fuzzy approaches and with Prewitt preprocessing; all four options were comparable to RGB images with a delta of ±1%.
In the case of CNN-II, even though the model had an overall accuracy improvement, when compared to CNN-I, we can note a similar pattern to CNN-I where classical edgedetection methodologies suffered an accuracy loss compared to grayscale images, though applying fuzzy filters allowed us to surpass the grayscale approach. In this case, the performance loss with MG edge-detection was more significant, with a 14.34% lower accuracy. In the case of the Sobel edge-detection, we had a 2.41% accuracy loss on average. Fuzzy Prewitt reported a 1.08% accuracy improvement when compared to the grayscale preprocessing, and a 0.18% accuracy improvement when compared to the color images.
The results obtained with the pre-trained VGG16 model, denoted in Table 8, were not as impressive as those for the specialized model without preprocessing, especially when compared to CNN-II with fuzzy Sobel, which gave over a 1.9% accuracy improvement with the best models. This alternative could represent a decent out-of-the-box implementation, without taking into consideration the training times needed, because the model utilizes 224 × 224 px RGB images instead of the 150 × 150 px grayscale used for fuzzy Sobel. The results can be validated with the ROC curves shown in Figures 10 and 11, which show the sensitivity and specificity of the models. We can observe that each class in every model has a similar performance, where the models are performing a decent separation of the classes. The models present a similar level of area under the curve (AOC) due to the relatively small differences between them; we found no more than a 2% difference between preprocessing methodologies and no more than a 5% difference between the best CNN-I models compared to the best of CNN-II models.
The ROC curves shown in Figures 10 and 11 demonstrate similar behavior in terms of accuracy between the fuzzy preprocessing and the color (RGB) images. To evaluate the effectiveness of the proposed methodologies in both architectures, we performed a time analysis by training 30 independent models, to evaluate the average training time, and we calculated the average preprocessing time for the images (Figures 12 and 13). With CNN-I, we noted a reduction by an average of 23% in the training time with the fuzzy preprocessing. In the case of CNN-II, we had a reduction in time of 18.5% with the more complex architecture. In both instances, we managed to achieve similar accuracy rates with a significant reduction in training time. These reductions can be significant in similar applications in which the models need to be trained with more information.
The models trained with a preprocessing step showed a clear advantage in training time when compared to the color images, due to the utilization of a single layer, instead of the three layers used in RGB images. The trade-off between the optimization in training times and doubling the preprocessing step time is directly reflected in GPU usage, which normally represents the highest upfront cost when compared to CPU usage, in the development of these types of systems.
The results of the statistical analysis performed, demonstrated in Tables 9 and 10, show that for CNN-I, the comparison between the utilization of fuzzy preprocessing against RGB images revealed a significant difference with the fuzzy Prewitt application, with a Z-score of 2.3. In the other two instances, the fuzzy filters presented better accuracy, but the threshold was not reached. On the other hand, when comparing the fuzzy filters against the grayscale images, we can note that all the Z-scores surpassed the 1.96 threshold, therefore, providing enough evidence to accept our alternative hypothesis, making the fuzzy preprocessing step a compelling alternative to improve accuracy. The results obtained for CNN-II demonstrated that there is sufficient evidence to validate the usage of a fuzzy filter (Prewitt) when compared to the grayscale model. The main support for this was the lack of margin between the results, where the margins of improvement in precision were not as tangible as for CNN-I, which had less capacity to extract characteristics from which it benefitted on multiple occasions with the use of fuzzy edge-detectors.

Conclusions and Future Work
The experimentation and statistical analysis showed that implementing fuzzy edgedetection for images before feeding them into the convolutional layers in a DNN (in some instances) can significantly improve the accuracy of the trained models, while reducing training times, especially in model CNN-I, where less filters are used for feature extraction. On other hand, model CNN-II, with more filters per layer, demonstrated improved accuracy, but no significant evidence was found to validate the usage of preprocessing layers.
When investigating the pre-trained VGG16 model, a 2% reduction in accuracy was found compared to CNN-2 with the RGB images, making it a poor alternative, especially considering the model size and training times.
We believe that the main limiting factors that affected our results were the complexity of our training information, with vast differences between images in the same category of the dataset, as well as similarities between the classes. In the future, we would like to improve the dataset and develop a more curated version that does not include the huge variation in the current version, and we would like to implement the proposed preprocessing methodology and model to benchmark datasets, to compare the efficiency against state-of-the-art models.