Classiﬁcation of the Microstructural Elements of the Vegetal Tissue of the Pumpkin ( Cucurbita pepo L.) Using Convolutional Neural Networks

: Although knowledge of the microstructure of food of vegetal origin helps us to understand the behavior of food materials, the variability in the microstructural elements complicates this analysis. In this regard, the construction of learning models that represent the actual microstructures of the tissue is important to extract relevant information and advance in the comprehension of such behavior. Consequently, the objective of this research is to compare two machine learning techniques—Convolutional Neural Networks (CNN) and Radial Basis Neural Networks (RBNN)— when used to enhance its microstructural analysis. Two main contributions can be highlighted from this research. First, a method is proposed to automatically analyze the microstructural elements of vegetal tissue; and second, a comparison was conducted to select a classiﬁer to discriminate between tissue structures. For the comparison, a database of microstructural elements images was obtained from pumpkin ( Cucurbita pepo L.) micrographs. Two classiﬁers were implemented using CNN and RBNN, and statistical performance metrics were computed using a 5-fold cross-validation scheme. This process was repeated one hundred times with a random selection of images in each repetition. The comparison showed that the classiﬁers based on CNN produced a better ﬁt, obtaining F1–score average of 89.42% in front of 83.83% for RBNN. In this study, the performance of classiﬁers based on CNN was signiﬁcantly higher compared to those based on RBNN in the discrimination of microstructural elements of vegetable foods.


Introduction
The transport phenomena, mechanical behavior, and sensory characteristics of food products depend on their structure and how they are modified during the production processes, which conforms to the structure-property-process relationships [1,2]; see Figure 1. Examples of characteristics analyzed in food products in the course of processing include, for instance, the level of oil absorption during frying of tortilla chips [3], the presence of pathogens in papaya fruits [4], among others. Hence, relationships in Figure 1 are important research areas in food engineering, due to their possible use in the development of models to predict the properties of food products at different structural levels (i.e., molecular, microscopic, mesoscopic, and macroscopic scales) [5]. The main aim of this research is to construct classification models of food products and process for the morphological analysis of microstructures [2]. The structure of foods of vegetable-origin is determined by how cells, intercellular spaces, and interconnections are distributed in the whole food [5,6]. Although visual inspection of microstructures commonly provides rich information to understand the behavior of food products, it requires time and effort that can be alleviated by automated systems. Consequently, it is necessary to develop techniques and methodologies to analyze the distribution of the different microstructural elements in food tissues.
The techniques that are commonly used for microstructural analysis are performed over micrographs obtained by optical, electronic, confocal, or atomic force microscopes. However, this analysis process is usually highly complex because plant tissues have interconnected structural elements; see Figure 2. Therefore, some authors have proposed that this task could be carried out semi-automatically by combining computing capabilities through specialized software and trained operators [6][7][8][9]. Researchers such as Oblitas et al. [9] or Pieczywek and Zdunek [6] have tested the feasibility of machine learning techniques as neural networks and Bayesian networks, respectively, to discriminate microstructural elements in apple tissues. In both cases, the discrimination was performed using the morfogeometric parameters in a semi-automatized manner and obtained around 90% accuracy.
More recently, deep learning techniques have shown good performance in pattern recognition literature, using different types of neural networks, such as Convolutional Neural Networks (CNNs) [10,11]. These types of neural networks have been applied to discriminate between normal and abnormal red blood cells [12], cells infected with malaria [13], among other applications. The architecture of a neural network is suitable to code enough information to extract different characteristics that represent the elements to be classified directly from the samples [10]. However, up to now, neural networks have not been reported to be used to discriminate micro-structures in plant-based food.
In this paper, classification models based on radial basis neural networks and convolutional neural networks for the discrimination of microstructures in vegetal foods are compared. The paper is organized as follows: The materials and experimental methodology used in the comparison are described in Section 2. Section 3 contains the experimental results and the discussion of the relevant findings and their impact on practice. Finally, Section 4 draws the conclusion and makes some recommendations for future work.

Obtaining Micrographs
The digitized micrographs used in this study were provided by L. Mayor, and were previously used in previous works [7,14]. The procedure followed to capture digitized micrographs can be summarized in the following five steps: • Pumpkin fruits (Cucurbita pepo L.) were collected and stored at 15-20 • C. Cylinders (25 mm length, 15 mm diameter) from the mesocarp's middle zone, parallel to the fruit's major axis, were taken. • A rectangular slab of 0.5-1.0 mm of thickness was gently cut parallel to cylinder's height at the maximum section area. The slab was then divided into four symmetrical cuts, and each quarter was newly divided into six parts. • These parts were fixed in 2.5% glutaraldehyde in 1.25% PIPES buffer at pH 7.0-7.2 during 24 h at room temperature. The parts were then dehydrated in a water/ethanol series and embedded in LR White resin (London Resin Co., Basingstoke, UK). After the samples were embedded in resin, semi-thin sections (0.6 µm) of the resin blocks were obtained with a microtome (model Reichert-Supernova, Leica, Wien, Austria). • The sections were stained with an aqueous solution Azure II 0.5%, Methylene Blue 0.5%, Borax 0.5% during 30 s. They were then washed in distilled water and mounted on a glass slide. • Micrographs of the stained samples were obtained under a stereomicroscope (Olympus SZ-11, Tokyo, Japan) that was attached to a digital color video camera (SONY SSC-DC50AP, Tokyo, Japan) and a computer.

Digital Treatment of the Micrographs
The micrographs were obtained using Olympus SZ-11 digital camera in RGB format; then, the micrographs were converted to grayscale format to facilitate processing. Next, image enhancement was applied to facilitate (1) edge extraction, (2) segmentation, and (3) classification, which is a standard procedure for extracting color features [15][16][17] or morphogeometric parameters as in [6,9,14]. In the following lines, the main steps are commented.

Improvement and Enhancement
The micrographs were initially converted from RGB (Red-Green-Blue) format to grayscale (image of intensity) using Equation (1).
where I gray is the image in grayscale format, and I R I G and I B are the Red, Green, and Blue channels of the image, respectively. Next, the gray scale images were enhanced using the Gaussian filter shown in Equation (2) to smooth visual artifacts [15,18]: where g(x,y) is the value of the filter centered at the position(x, y) of the image; σ standard deviation of the Gaussian filter.
where I BI N is binarized image, T threshold value for binarization, (x, y) position of pixel, S(A) skeletonizing function, γ , mathematical morphology operators of opening and erosion, S ρ (A) is the set of centers of maximal balls of radius ρ included in A, B ρ (respectively, B ρ ) denotes the open (respectively, closed) ball of radius ρ, Likewise, those elements with an area smaller than 80 pixels or having contact with the edge were removed (see Equation (6)) [20].
where f , f c is the original and cleared image, and (x, y) is the position of the pixel.

Data Extraction
Each previously labeled element was manually divided into cells and intercellular spaces, creating two subset folders of images. Next, morphological features were obtained according to Mayor et al. [14] and were selected four of these following the recommendations of Oblitas et al. [9], which determined the optimal parameters of an RBNN for discrimination of microstructural elements in Cucurbita pepo L. tissue using an exhaustive search algorithm. Figure 3 illustrates the selected morphological features.
The obtained feature values were then used in the design, implementation, and analysis of RBNN-based models. Each element in both subsets was resized to 277 × 277 pixels to be used in CNN.

Radial Basis Neural Network-RBNN
According to Oblitas et al. [9], Artificial Neural Networks (ANN) were inspired by the human nervous system, and they combine the complexity of statistical techniques with self-learning, imitating the human cognitive process. A general scheme for ANNs is shown in Figure 4. At this point, it is possible to understand that ANNs contain a very complicated set of interdependence and may incorporate some degree of nonlinearity, which helps them to face nonlinear problems.
Likewise, a special kind of ANN named Radial Basis Neural Network (RBNN), which is a specialized feed-forward network for classification, is presented. The principal characteristics of RBNN are that the design parameter is the spread of the radial basis transfer function and that little training is required (except for spread optimization) [21,22].
The RBNN general structure is shown in Figure 5 was used. The first layer computes distances from the input vector to the training input vectors and produces a vector whose elements indicate how close the input is to a training input. The second layer sums these contributions for each class of inputs to produce a vector of probabilities as its net output. Finally, a competitive output layer picks the maximum of these probabilities and produces a 1 for that class and a 0 for the other classes.

Convolutional Neural Network-CNN
The CNN AlexNet was used in this study, whose general structure is shown in Figure 6. The AlexNet has five convolutional layers and three fully connected layers. Each convolved layer contains multiple 3D filters (cores) connected to the previous layer's output. Fully connected layers contain multiple neurons, with positive values, connected to the previous layer [23]. AlexNet was selected because it is one of the most well-known and widely used convolutional neural network architectures for image classification. Besides, a model of AlexNet is provided in MatLab that is trained on more than one million images and can classify objects in up to 1000 categories [24].

Learning Transfer
The AlexNet, which is preloaded in Matlab 2018a, was modified to use the previously trained design, similar to that carried out by Zhou et al. [23] and Lu et al. [25]. Essentially, the parameters of the last three layers were modified to transfer the training of the remaining layers and adapt them to the classification process of the previously established classes, see Figure 7.

Statistical Comparison of Models
The cells and intercellular spaces were randomly divided into five groups. These groups were then used to model, test, and validate the models based on CNN in k-fold cross-validation with k = 5. For the RBNN-based model, the features and parameters for microstructural elements were obtained according to Section 2.2.2; likewise, as for CNN models, k-fold cross-validation was implemented.
This process was repeated one hundred times to evaluate the robustness of the method, calculating for this purpose, the confusion matrices; Figure 8 shows the basic form of the confusion matrix for binary classification. A confusion matrix is one of the most commonly used techniques in the machine learning community and contains information about the actual and predicted ratings obtained by a classification system. A confusion matrix has two dimensions: real and predicted classes. Each row represents the instances of a real class, whereas each column represents the cases of a predicted class. In the case of a binary classification, each cells contains: TP (True Positive), correctly identified; TN (True Negative), correctly rejected; FP (False Positive), incorrectly identified; and FN (False Negative), incorrectly rejected.
Some performance measures can be defined from the information contained in a confusion matrix, among them precision, recall, accuracy, and f-measure. These measures are determined by the number of classification errors and hits made by the classifier, as expressed by Equations (7)-(10).

•
Accuracy: This measures how many observations, both positive and negative, were correctly classified and it is defined by Equation (7).
• Recall: This measures how many observations out of all of the positive observations were classified as positive (see Equation (8)).
• Precision: This measures how many observations predicted as positive are, in fact, positive (see Equation (9)).
• F1-score: This combines precision and recall into one metric (harmonic mean, see Equation (10)). The images obtained from the pre-processing step were good enough for visual identification of cellular structures. However, some parameters may be adapted (or optimized) to maximize the information extracted from each micrography due to differences in capture conditions. The processing parameters to be tuned include the type of pre-processing filter, the size of the convolution mask, and the filtering repetitions, among others. Although the number of elements to be used for training and validation of neural network-based models is small, compared to those used by Pieczywek and Zdunek [6], and Kraus et al. [26], it is sufficient to assess its viability for the discrimination of microstructures in vegetal tissue. Images in Figure 10 evidence that the manual classification is based on morphogeometric features; however, the irregular geometry of the intercellular spaces and the image quality make pre-processing difficult (enhancement and improvement). Consequently, it is not easy to recognize the elements in the image [27,28]. Therefore, it is understandable that there are differences with other methods of determining characteristics, such as those based on manual segmentation with software such as image-J (National Institutes of Mental Health, Bethesda, Maryland, USA) or Adobe Photoshop (Adobe Systems Inc., San José, CA, USA). Figure 11 shows the values for the selected features according to their manual classification in cells and intercellular spaces. It is observed that there is an overlap in the range covered by the selected morphogeometric characteristics of both microstructural elements. In contrast, the medians and the first two quartiles differ in all cases, especially in the perimeter, length of the minor axis, and roundness.

Microstructural Elements
These differences allow their use for classification with different techniques, such as in the works of Pieczywek and Zdunek [6] using Bayesian Networks or Oblitas et al. [9] with Probabilistic Neural Networks. Figure 12 presents the accuracy of the CNN training process in one of the one hundred averaged iterations to determine the system's accuracy.

CNN Implementation
As can be seen, the validation reaches stable accuracy values from the third iteration of around 87%. As mentioned by Baker et al. [29], this is possible because the CNN (when they exclusively use the silhouette of objects, preventing textures) tends to be more error-prone.

Statistical Analysis
The statistical measures obtained for the CNN and RBNN based classification models are shown in Figure 13. As can be seen, except for Recall, both medium and range values were better for CNNs.
The median F1-score obtained in our experiment were 89.42% and 85.43% for CNN and RBNN models, respectively. However, comparing the adjust with those obtained by Pieczywek and Zdunek [6], over 97%, are relatively lower, which could be due to fewer differences between elements in Cucurbita pepo L. in front of Malus Domestica Borkh, mainly expressed in their angularity and roundness.
The classification capacity of both models (CNN and RBNN) was compared through the Accuracy, Recall, Precision, and F1-score indicators. The results, Table 1, show that using CNN gives a higher performance in terms of the four indicators mentioned. When performing the t-test, it was found that the p-value is less than 0.05, which shows that there is a statistically significant difference between the means of both models.  The significant superiority in the CNN's performance reflects its ability to encode classification information in the internal layers, which it does not share with the RBNN structure. Similarly, when comparing the standard deviation of performance in terms of F1-score, both classifiers have similar stability, with a reduced variation between the one hundred different iterations performed using the k-fold cross-validation strategy. This stability is indicative of the generalization capacity of both models.

Cells
Although the CNN AlexNet allows us to classify using two totally connected layers and the softmax function, some works have improved the adjustment by employing a classifier coupled to the CNN, such as the works of Rohmatillah et al. [30], Sadanandan et al. [31], or they have created specific CNN architectures. However, this requires a much more extensive database for training, such as in the works of Sharma et al. [32], Song et al. [33], and Akram et al. [34], among others. In this sense, new studies should be carried on to test this in microstructural element discrimination in vegetal origin food.

Conclusions
This work proposed the use of machine learning techniques to discriminate microstructural elements in vegetal tissues; this is the first report of the use of CNN to discriminate microstructural elements in vegetal origin food. As a case study, the microstructures of Cucurbita pepo L. tissue were classified. The CNN and RBNN techniques were compared to evaluate differences in performance measures derived from the confusion matrix. Results show that both methods produce relatively good discrimination when compared with other studies, with a median F1-score of 89.42% and 85.43% for CNN and RBNN, respectively. However, the CNN presented a consistent and significantly higher. Likewise, in terms of stability, a reduced variation is obtained and evaluated by the F-test. This indicates that both models have good generalization capacity. Consequently, the use of the CNN technique shows the potential for microstructural element discrimination in the tissue of vegetal origin and better conditions in front of RBNN.
For future works it is considered to analyze the effect of different architectures of convolutional networks (number and size of layers, filters and discrimination functions) to be used in processes of discrimination of cellular structures or similar tasks. Finally, tissue structures from other vegetables may be analyzed to find the utility of the proposed method in different applications.