Induction of Convolutional Decision Trees for Semantic Segmentation of Color Images Using Differential Evolution and Time and Memory Reduction Techniques

López-Lobato, Adriana-Laura; Acosta-Mesa, Héctor-Gabriel; Mezura-Montes, Efrén

doi:10.3390/mca30030053

Open AccessArticle

Induction of Convolutional Decision Trees for Semantic Segmentation of Color Images Using Differential Evolution and Time and Memory Reduction Techniques

by

Adriana-Laura López-Lobato

^*

,

Héctor-Gabriel Acosta-Mesa

and

Efrén Mezura-Montes

Artificial Intelligence Research Institute, University of Veracruz, Campus Sur, Calle Paseo Lote II, Sección Segunda No. 112, Nuevo Xalapa 91097, Mexico

^*

Author to whom correspondence should be addressed.

Math. Comput. Appl. 2025, 30(3), 53; https://doi.org/10.3390/mca30030053

Submission received: 15 April 2025 / Revised: 3 May 2025 / Accepted: 8 May 2025 / Published: 10 May 2025

(This article belongs to the Special Issue Feature Papers in Mathematical and Computational Applications 2025)

Download

Browse Figures

Versions Notes

Abstract

Convolutional Decision Trees (CDTs) are machine learning models utilized as interpretable methods for image segmentation. Their graphical structure enables a relatively simple interpretation of how the tree successively divides the image pixels into two classes, distinguishing between objects of interest and the image background. Several techniques have been proposed to induce CDTs. However, they have primarily been focused on analyzing grayscale images due to the computational cost of the Differential Evolution (DE) algorithm, which is employed in these techniques. This paper proposes a generalization of the induction process of a CDT with the DE algorithm using color images, implementing two techniques to reduce the computational time and memory employed in the induction process: the median selection technique and a memory of previously evaluated solutions. The first technique is applied to select a representative sample of pixels from an image for the model’s training process, and the second technique is implemented to reduce the number of evaluations in the fitness function considered in the DE process. The efficacy of these techniques was evaluated using the Weizmann Horse and DRIVE datasets, resulting in favorable outcomes in terms of the segmentation performance of the induced CDTs, and the processing time and memory required for the induction process.

Keywords:

semantic segmentation; image segmentation; convolutional decision tree; differential evolution; SHADE

1. Introduction

Semantic segmentation is a process that involves the labeling of image pixels to distinguish between objects of interest and the background of the image; see Figure 1. This task is challenging since the analyzed images have variations that can add noise to the segmentation process.

Image segmentation has numerous applications in computer vision, including analyzing medical images for diagnostic purposes [1,2]. Many methodologies have been devised to address this issue [3]; however, contemporary research has demonstrated a marked predilection for deploying Convolutional Neural Networks (CNNs) [4]. This is primarily because CNNs have been shown to yield optimal performance across a diverse array of applications. Nevertheless, CNNs are often characterized as “black boxes” due to the complexity of their internal processes, making it challenging to comprehend their decision-making processes [5]. This limitation is particularly problematic in domains where explainability is paramount, such as medicine.

Convolutional Decision Trees (CDTs) [6] present a viable alternative to CNNs for image segmentation. This method involves the construction of a multivariate decision tree, where a convolution kernel defines each condition on the tree nodes. These kernels are learned in a supervised manner by solving an optimization problem that aims to maximize the classification accuracy of the pixels in the segmentation of grayscale images.

Several techniques have been proposed to induce CDTs, considering different approaches to maximize the accuracy in classifying the image pixels for the segmentation task. The original approach uses an analytical process to maximize the information gain function on each node of the tree [6]. Consequently, it is a local greedy search that partitions the data (pixels) on each node of the CDT, but it can result in overfitting during the learning process. To conduct a global search approach, in [7], the capabilities of the differential evolution (DE) algorithm are analyzed to identify competitive convolution kernels to induce CDTs that maximize the F1-score in the classification task. The DE algorithm is a metaheuristic search strategy to solve optimization problems by exploring the problem domain with several parameters and stochastic elements that must be adapted to the specific problem [8]. In [7], each individual in the DE algorithm represents all the kernels of a CDT, so a set of CDTs are obtained to select the one with the best F1-score.

Both techniques consider CDTs with kernels of the same size in the tree’s nodes. In [9], a local search is performed using the DE algorithm to identify the optimal convolution kernel size and the convolution kernel that maximizes the F1-score on each node of the CDT. So, this technique obtains a unique CDT with convolutional kernels of different sizes.

Since the DE algorithm requires several parameters that must be adapted to the specific dataset (images in the training set) employed to induce a CDT, in [10], the SHADE algorithm is used instead of the traditional DE algorithm. SHADE saves the values of the successful parameters that guide the optimization process in a historical memory H, so the best values are considered for the DE process. This approach facilitates the attainment of comparable or superior results without establishing two parameters for the evolutionary process.

It is important to note that all the aforementioned methods necessitate substantial computational time and memory during the induction process. This is due to the requirement to classify each image pixel in the training set to evaluate the fitness function. This classification process is essential regardless of the method employed, be it an analytical approach or a search with differential evolution. Additionally, the nature of the search, whether local or global, influences the evaluation process. For local searches, evaluations are conducted for each kernel node, while, for global searches, evaluations are performed on all the proposed CDTs.

For this reason, in [11], two techniques for selecting a representative sample of pixels from an image for the model’s training process are proposed: raw selection and median selection. The results in the cited work demonstrate that these techniques reduce the computational time required for CDT induction while maintaining or enhancing the precision of the resulting segmentation.

Despite the advancements mentioned above, previous studies have indicated that numerous segmentation databases comprise color images, necessitating their conversion to grayscale to execute CDT induction processes. This conversion procedure often results in the loss of critical information. Consequently, this work proposes a comprehensive enhancement to the induction process of a CDT, considering the use of color images within the learning process and applying the median selection technique to reduce the computational cost that this generalization may entail.

So, the primary objective of this project is to implement a color image segmentation algorithm based on CDTs using the SHADE algorithm, operating under the hypothesis that it is possible to characterize the kernels corresponding to the CDT with the SHADE process by considering chromosomes as vectors that represent the three channels of the convolution kernels of the CDT nodes. The novel aspect of this project lies in its potential to broaden the scope of knowledge about semantic segmentation and CDT-based learning, by leveraging color images and three-channel tree structures, in contrast to the prevailing practice of grayscale image utilization.

The performance of the proposed model was evaluated using the “Weizmann Horse dataset” and “Digital Retinal Images for Vessel Extraction” (DRIVE) dataset, resulting in favorable outcomes regarding the segmentation performance of the induced CDTs, and a significant reduction in the processing time and memory required for the induction process when compared with previous works [6,7,9,10,11]. The findings of this study underscore the significance of incorporating color characterization into the segmentation process for CDTs. Moreover, the proposal facilitates the comprehension of the methodology employed for image pixel classification given the intuitive nature of the graphical representation of decision trees compared to the approach used by CNNs.

The remainder of the paper is structured into four sections. Section 2 describes the DE algorithm and highlights the most relevant characteristics of the Convolutional Decision Trees. This section also presents the details of the median selection technique employed for the reduction in computational time and memory and the proposal for the induction of CDTs with color images. Section 3 is dedicated to the experiments and results obtained. Finally, Section 4 and Section 5 offer a detailed discussion, conclusion, and consider future work.

2. Materials and Methods

In this section, the main subjects of the project are presented: the differential evolution algorithm and SHADE. Also, a description of the Convolutional Decision Trees (CDTs) and the methodology proposed for CDT induction using SHADE on color images is presented.

2.1. Differential Evolution Algorithm and SHADE

Differential Evolution (DE) algorithm is a stochastic research strategy for optimization problems [12]. In the DE process, a competitive solution for a given fitness function is found by perturbing a population of feasible solutions through an iterative process composed of three operators: crossover, mutation, and selection. There are several DE strategies [8,13], but the DE/rand/1/bin strategy is described in detail next.

For a given fitness function f, a population

P = {x_{i}}_{i = 1}^{N P}

of

N P

feasible solutions (individuals) is randomly generated. Then, a mutant vector

ν_{i}

is obtained for each target vector

x_{i}

in the population with the mutation operator described in Equation (1), where

x_{r_{0}}

,

x_{r_{1}}

, and

x_{r_{2}}

are individuals randomly selected from the population, and F is the scale factor defined by the user.

ν_{i} = x_{r_{0}} + F (x_{r_{1}} - x_{r_{2}}) .

(1)

The mutant vector

ν_{i}

and its corresponding target vector

x_{i}

are used as parents to generate a trial vector

u_{i}

, called offspring, by selecting between the parents’ coordinates, with the process described in Equation (2). If a random number

r n d_{j}

is less than a Crossing Rate

C R

, defined by the user, or the coordinate position j corresponds to one previously determined by chance (

j_{r n d}

), the component takes the value from

ν_{i}

. Otherwise, it takes the value from

x_{i}

.

u_{i j} = \{\begin{cases} ν_{i j}, & if (r n d_{j} \leq C R) o (j = j_{r n d}); j = 1, . . ., | x_{i} | . \\ x_{i j}, & otherwise . \end{cases}

(2)

Once the trial vector

ν_{i}

is generated, a binary tournament between it and the corresponding target vector

x_{i}

is performed to determine what vector survives to the next generation, selecting the one with the best fitness. If the trial vector has the best fitness, it replaces the corresponding target vector in the next generation. Otherwise, the target vector remains in the population, and the trial vector is discarded.

This process is performed

N G

times (generations), so the population is becoming better or remains the same concerning fitness but never worse; see Figure 2.

In the DE algorithm, the parameters

C R

, F,

N P

, and

N G

are problem-dependent and significantly influence the performance of the process [14]. Consequently, it is imperative to calibrate these parameters when implementing the algorithm in real-world scenarios to achieve optimal outcomes. For these reasons, numerous self-adaptive mechanisms to adjust these parameters have been the focus of academic study, including JADE [15], SHADE [16], and L-SHADE [17].

The present study focuses on the Success-History-Based Adaptive Differential Evolution (SHADE) algorithm. SHADE is a technique that regulates the DE algorithm parameters,

C R

and F, through an adaptive parameter control mechanism. This technique uses a historical memory of size H with successful parameters to modify and adapt the values during the search process; see Figure 2. Further details about SHADE can be found in [16].

2.2. Convolutional Decision Trees

The Convolutional Decision Tree (CDT) is a supervised machine learning model employed for image segmentation [6]. It has the graphical structure of an oblique (multivariate) decision tree, which is easy to interpret [9,10].

The input of a CDT is a digital image composed of pixels with specific positions and hues. Until this work, the images used to induce a CDT were restricted to grayscale images, which are two-dimensional arrays where each pixel has a unique integer value between 0 and 255 (the range of values that an 8-bit number can represent); see Figure 3.

Convolution kernels represent the primary components of CDTs. They are square arrays of discrete numbers, known as weights, that function as feature extractors through the convolution operation they execute. The convolution process entails transferring a kernel over an image, thereby integrating the information from both sets. This integration is achieved by applying a dot product between the kernel values and the corresponding values of the image region the kernel covers. The resultant pixel value is derived from this product in conjunction with the bias associated with the filter, thus generating an output feature map; see Figure 4. This process is repeated until the convolution kernel has passed through all pixels of the input image, at which point the full convolution is complete. The result is a filtered version of the image.

CDT employs the values obtained in the feature map to classify the pixels in the image according to a predefined criterion, called predicated

ϕ

. It is a function that has two output values, 0 or 1, depending on whether the value in the feature map is less than or equal to zero or greater than zero, respectively. This is represented in Equation (3), where

β

denotes the convolution kernel and x is the information of a pixel and its neighboring pixels, which together form an array of the same size of the kernel.

ϕ (x) = \{\begin{matrix} 0, & if x^{T} \cdot β \leq 0, \\ 1, & if x^{T} \cdot β > 0 . \end{matrix}

(3)

In this manner, each node of the CDT is represented by a convolutional kernel that partitions the data into two subsets of pixels. The

ϕ

value of a pixel serves as a directive, guiding toward the subsequent branch of the tree. Pixels with

ϕ

values equal to zero go to the left branch, and pixels with

ϕ

values equal to 1 go to the right branch. Consequently, the kernels within a CDT execute successive classifications of each pixel until a leaf node is reached, as illustrated in Figure 5. The

ϕ

value obtained in a leaf node of a CDT is the label or class assigned to the pixel.

The values of the weights of these kernels are learned in a supervised manner. The initial method proposed for the CDT induction is a local greedy search [6]. To enhance performance, the methods proposed in [7,9] use the DE algorithm to induce a CDT. However, these methods have parameters that must be defined by the user and directly affect the model’s performance. So, the use of the SHADE algorithm is proposed in [10] to auto-calibrate the parameters in the DE process.

Another challenge that all these proposals have to deal with is the computational cost of performing the learning process in terms of time and memory, so, in [11], two techniques were proposed (row and median selection). With these techniques, a representative subset of pixels in the images of the training set are selected, speeding up the learning process while maintaining and even improving the performance of the CDT.

In this study, an adaptation of the SHADE-CDT technique will be employed for the induction of a CDT that enables the segmentation of color images, using the median selection technique to reduce the computational cost of the process. The following sections will provide a concise description of these subjects.

2.2.1. SHADE-CDT Technique for CDT Induction

The SHADE-CDT technique performs a global search to induce a CDT [10]. It uses the variant of the differential evolution called SHADE algorithm. This variant tunes the values for the parameters of the crossover and mutation operators in the differential evolution process, described in Section 2.1, by adding a memory of successful parameters.

In the SHADE-CDT technique, each individual in the population conforms to a CDT, so they are vectors of size

(s^{2} + 1) (2^{d} - 1)

with possible weights for the kernels in the internal nodes of the CDT, obtained randomly between −255 and 255. In Figure 6, an example of a node kernel encoding for a CDT of depth

d = 3

with kernels of size

s = 3

is shown. In this case, each kernel node has

s^{2} + 1 = 10

values, and there are

2^{d} - 1 = 7

nodes, so, each individual for the SHADE-CDT technique is a vector with

(s^{2} + 1) (2^{d} - 1) = 70

values randomly obtained between −255 and 255.

The DE algorithm employs the F1-score metric to evaluate the accuracy of a model’s label assignment. This is achieved by comparing the predicted labels with the actual labels of the instances. To calculate the F1-score value, it is necessary to obtain the labels assigned to each pixel on the images in the training dataset with each individual of the population (representing a CDT) and compare them with the actual labels of the pixels. To achieve this, the images in the training dataset are preprocessed to obtain pixel representations as coded vectors. The corresponding encoded vector for a pixel is formed by the values of the pixels in the neighborhood of size

s \times s

surrounding it. The length of this vector depends on the kernel size s provided by the user, with a value of 1 added as a bias value; see Figure 7.

After encoding the pixel information and the kernel structure as vectors of equivalent size, a dot product is performed between them. The resulting value is then processed by an activation function. For the particular case of the SHADE-CDT method, the sigmoid function, defined in Equation (4), is employed.

S (x) = \frac{1}{1 + e^{- x}}, with x \in R .

(4)

This function returns a value between 0 and 1. This value undergoes a classification step in which value 1 is assigned to values greater or equal to 0.5 and 0 to values less than 0.5. The tree node where the instance needs to go is then decided based on this classification step, considering the label 0 for the branch on the left side of the node and 1 for the branch on the right side. This procedure is repeated until a leaf node is reached, at which point a label is assigned to the instance, as illustrated in Figure 8.

Subsequent to the acquisition of all the labels for all the pixels in the images in the training dataset, the F1-score metric, also known as Dice coefficient, is employed to evaluate the fitness of the corresponding CDT. The F1-score is calculated with Equation (5), where

T P

is the total of true positive cases,

F P

is the total of false positives, and

F N

is the total of false negatives.

F 1 - s c o r e = \frac{2 \times T P}{2 \times T P + F N + F P} .

(5)

An F1-score value approaching zero is indicative of poor segmentation, whereas a value approaching one is indicative of good segmentation.

This process is performed on each individual in the population under consideration for the SHADE process to identify the individual (CDT) with the highest F1-score after multiple generations.

As is made evident by the process, it envolves a substantial amount of computational time and memory. For this reason, two techniques to address this issue were proposed in [11]: the raw selection and the median selection techniques. Given the results obtained with these methods, the median selection technique is employed in this work as considering color images directly affects the amount of information taken into account in the learning process.

Also, it is important to mention that, for this method, five parameters are user-defined:

For the SHADE-CDT algorithm: the population size ( $N P$ ), the number of generations ( $N G$ ), and the memory size (H).
For the CDT structure: the size (s) of the kernels and the depth (d) of the tree.

Further details on the SHADE-CDT method can be found in [10].

2.2.2. Median Selection Technique for CDT Induction

The median selection technique involves the selection of samples of representative pixels from each of the two classes to be segmented in the images of the training set [11]. With the median selection technique, a partition of the dataset is performed to obtain the median vector with the representative information of each subset in the partition.

In this technique, a proportion value

P

is needed. This value indicates the quantity or proportion of the total image information sampled for the induction process. So, if the user establishes

P = 0.2

, this means that the process will find the representative information equivalent to only

20%

of the total information. For example, in Figure 9, a grayscale image and its ground truth are presented. Remember that the ground truth contains the labels for each pixel from the original image, considering two classes: 0 (black) and 1 (white). This image has dimensions

15 \times 26

, yielding a total of 390 pixels, 82 of which are classified as class 1 (white) and 308 as class 0 (black). Assuming that

P = 0.2

for the median technique, the image information will go through the corresponding process that results in the extraction of the representative information, as if the image were composed at most of 78 pixels, equivalent to 20% of the total pixels (

390 \times 0.2 = 78

).

A distinctive feature of this process is preserving the proportion between the classes. In the example of Figure 9, approximately 21% of the pixels in the original image are designated as class 0, while 79% are labeled as class 1. To maintain this ratio, the technique generates pixels equivalent to 20% of each class, i.e., 16 pixels for class 1 and 61 pixels for class 0 (

82 \times 0.2 = 16.4

and

302 \times 0.2 = 61.6

), consequently maintaining the proportion 21%/79% between classes and a total of 77 representative pixels for this image.

The generation of pixels is achieved through the implementation of a partitioning technique, which divides each class into a predefined number of subsets of the same size if possible. The number of subsets is equivalent to the number of pixels desired for each class. So, for the example, class 1 has 82 pixels, but, with the technique at

20%

, only 16 pixels of this class are required. So, the pixels of this class are partitioned into 16 disjoint subsets: 15 subsets of size 5 and 1 subset of size 7 (7 (

15 \times 5 + 7 = 82

). The pixels on each subset are randomly selected from the pixels of class 1, but, remembering that a partition is a division of a set into a family of subsets where each element in the original set is present only in one of the subsets and all the subsets together contain all the members of the original set, once a pixel is selected for a subset, it cannot be in any other subset. This process is repeated for the class 0 pixels, resulting in the acquisition of information for only 61 pixels out of 308 pixels that were originally present. Consequently, the partition of this class is constituted by 61 disjoint subsets, 60 subsets of size 5, and 1 subset of size 8 (

60 \times 5 + 8 = 308

). As in the other class, the pixels within each subset are randomly selected from the pixels of class 0.

Following the implementation of this partition, denoted by

P

, a total of 77 subsets are obtained, 16 of class 1 and 61 of class 0. For each subset, denoted by

ρ

, the vectors associated with each pixel in the grayscale image are obtained, considering the kernel size s of the CDT that is sought to be obtained, as illustrated in Figure 9. Subsequent to the acquisition of these vectors, the representative vector of this subset is formed by the median value calculated by coordinates and depicted in the right image of Figure 9. These “mean vectors” have the class of the subset from which they were derived.

Subsequent to the completion of the process of obtaining representative vectors of each image in the training set, the new training set for the induction process of a CDT will consist of all these vectors. The pseudocode of the median selection technique is presented in Algorithm A1 of Appendix B.

As demonstrated in the experiments performed in [11], this technique has been shown to reduce the computational time required to induce a CDT on grayscale images. This reduction is due to a decrease in the number of training instances. In addition, it has been shown to improve the quality of the segmentation results by minimizing the risk of overfitting in the learning process. For these reasons, the median selection technique is used in this work to induce CDTs using color images (with three color channels). This generalization to color images was not even considered because the process triples the number of elements in the training dataset. However, the median selection technique allows datasets with color images to perform the CDT induction process without losing critical information when converted to grayscale images.

The following sections describe images in color spaces and the proposed process to induce CDTs in three color channels.

2.3. Color Images and Color Spaces

For this work, the images used as an input are color images. Usually, to represent color in digital images, the RGB color space is used. The RGB color space is an additive model, wherein colors are created through the combination of three primary colors (red (R), green (G), and blue (B)). So, the RGB model can be conceptualized as a cube whose coordinate axes correspond to R, G, and B; see Figure 10a. All RGB values are constrained to the interval [0, 255]. Consequently, each possible color corresponds to a unique point in the RGB cube with coordinates

(R e d, G r e e n, B l u e)

in the space

[0, 255] \times [0, 255] \times [0, 255]

. For instance, the black and white colors are located at (0, 0, 0) and (255, 255, 255), respectively, in the RGB color space. This indicates that the white color is representative of an absence of the three primary colors, while the white color is itself a combination of the three colors.

The RGB color space can be converted to other color spaces, such as L*a*b* and HSV, to obtain different color representations; see Figure 10. These alternative representations can help to extract color features and patterns [18]. For this reason, the following sections will briefly describe the CIE L*a*b* and HSV color spaces.

2.3.1. CIE Lab* Color Space

The CIE L*a*b* color space is a mathematical transformation of the colorimetric system in which the numerical differences between colors consistently correspond to the visual perception of the human eye [19,20]. The CIE L*a*b* space is three-dimensional. The vertical axis represents the lightness L*, with values ranging from 0 (black) to 100 (white). The primary color axes extend from red to green (a* axis) and from yellow to blue (b* axis), as illustrated in Figure 10b. In the a* axis, a positive number indicates a redder color, and a negative number indicates greener color. In the b* axis, a positive number indicates yellower color, and a negative number indicates bluer color.

Consequently, each possible color corresponds to a unique point

(L^{*}, a^{*}, b^{*})

in the space

[0, 100] \times [- 128, 127] \times [- 128, 127]

. For example, the colors white and black are located in the coordinates (0, 0, 0) and (100, 0, 0), respectively, and the primary colors red, green, and blue are located (approximately) in the coordinates (53.24, 80.09, 67.2), (87.73, −86.18, 83.18), and (32.3, 79.19, −107.86), respectively.

2.3.2. HSV Color Space

In the HSV color space, color information is represented through three components: hue (H), saturation (S), and value (V). This model is commonly depicted as an inverse pyramid, where the vertical axis represents the value (V), the horizontal distance, with the V axis used as a reference point, corresponds to the saturation (S), and the angle, with the V axis acting as the rotation point, defines the hue (H) [21]. Refer to Figure 10c for a visual representation. In this color space, the parameters on H are color indicators with angular dimensions. The black color in the HSV space is located at the apex of the pyramid, while white occupies the center of the base. Within the base, the three fundamental colors (red, green, and blue) and their respective combinations are distributed.

So, each possible color corresponds to a unique point in the HSV space with coordinates

(H u e, S a t u r a t i o n, V a l u e)

in the space

[0, 360] \times [0, 100] \times [0, 100]

. For example, the colors white and black have the coordinate values (0, 0, 100) and (0, 0, 0), respectively. The fundamental colors are located at the base of the pyramid at positions (0, 100, 100) for red, (120, 100, 100) for green, and (240, 100, 100) for blue.

2.3.3. Representation of Color Images

Images in the previously described color spaces are structured in three dimensions since the color values of each pixel are described as a vector with three coordinates, as in Equation (6).

c o l o r = (x_{C h a n n e l_{1}}, y_{C h a n n e l_{2}}, z_{C h a n n e l_{3}})

(6)

Thus, a composition by channels can be used to describe the structure of a color image C considering a representation of three matrices of the same size as C, denoted by

C_{1}

,

C_{2}

, and

C_{3}

; see Equation (7).

C = {C_{!}, C_{2}, C_{3}} .

(7)

These matrices have as element in the position

(x, y)

the corresponding color channel value of the pixel at the same position in the original image, as shown in Figure 11 and in Equation (8).

C (x, y) = (C_{1} (x, y), C_{2} (x, y), C_{3} (x, y)) .

(8)

Thus, any color image can be represented by three two-dimensional matrices that display the values of the corresponding coordinate channels at the corresponding pixel locations.

The subsequent section presents a detailed description of the convolutional process on images with three channels. This provides a structured framework for the proposal outlined in this work.

2.4. Convolution Process on Images with Three Channels of Color

In the convolution process, it is essential that the dimensions of the convolutional kernel be matched to the image’s dimensions, ensuring proper coverage.

For images with three color channels, such as in RGB, CIE L*a*b, and HSV, the convolutional kernels must be represented as arrays of three dimensions (height, width, and depth). In Figure 12, a 3 × 3 × 3 kernel performs the convolution process on an image with three color channels.

In this scenario, the convolution process is performed for each depth layer, where each dimension of the image representing a color channel is subjected to the convolution process using the corresponding kernel dimension. Subsequently, three convolution processes are performed, whose results, in conjunction with the value of the bias assigned to the kernel, are aggregated to obtain the value of the output pixel. Notably, although the convolution process is executed on a three-dimensional image, the result will have only one dimension. An illustration of this process can be found in Figure 13.

2.5. Proposed Model: CDT Induction Using SHADE with Color Images ( $S H A D E$ - $C D T_{c o l o r}$ )

In this work, the

S H A D E

-

C D T_{c o l o r}

search strategy is proposed. This model employs the SHADE algorithm to find the kernels that serve as conditions in the nodes of a CDT, with the particularity that these kernels are directed to the convolution process in color spaces; that is, they have three channels.

In this model, the CDT structure is defined by two parameters: the size of the kernels, denoted by s, and the depth of the tree, denoted as d. The objective is to perform a global search to maximize the classification accuracy of the CDT, measured with the F1-score metric.

In the

S H A D E

-

C D T_{c o l o r}

model, the pixels in the images and the kernels for the CDT are encoded separately for each channel. To encode a pixel (instance) of the image, the hues of the pixels in the neighborhood of size s surrounding it are used in each channel. Similarly, the weights of a node kernel of the CDT in each channel are encoded as a vector of size

s^{2}

. These coding processes are illustrated in Figure 14.

In the SHADE process, each individual in the population is represented by a vector of real values that correspond to the weights of the kernels of the CDT to be found; see Figure 15. The length of this vector is dependent on the kernel size s and the depth d of the CDT under consideration. For instance, when a kernel with three channels is considered at each of the three nodes,

3 s^{2}

values are required, along with a single value for the bias of the convolution process (see Figure 13 for a description of the convolution process on a three-color-channel image). This sum of values must then be multiplied by the number of kernels in the CDT, which is calculated as

2^{d} - 1

. So, for the SHADE process, each individual in the population is a vector of real values of size

(3 s^{2} + 1) (2^{d} - 1)

. These values are randomly generated between −255 and 255.

A perceptron structure is employed to ascertain the appropriate branch of the tree to pursue during the classification of an instance. To accomplish this, the final pixel (instance) encoding is a vector with the encodings of the channels appended as a vector, incorporating a value of 1 for the bias; see Figure 16.

In the perceptron structure, the dot product of the vector with the values of the encoded instance (as in Figure 16) with the vector corresponding to the encoding of the kernel (as in Figure 15) is passed through the sigmoid activation function, defined by Equation (4). This function returns a value between 0 and 1, and, as in the SHADE-CDT method, if this value is smaller than or equal to 0.5, the label 0 is assigned; if the value is higher than 0.5, the label 1 is assigned. The tree node to which the instance should be directed is then determined based on this label, with label 0 assigned to the node on the left side and label 1 assigned to the node on the right side. This procedure is repeated until a leaf node is reached, at which point a label is assigned to the instance (pixel).

After the acquisition of all the labels for the pixels in the images of the training dataset, the F1-score metric is utilized to evaluate the fitness of the corresponding CDT.

The pseudocode describing the process for determining the fitness value of an individual using the

S H A D E

-

C D T_{c o l o r}

method can be found in Algorithm A2 of Appendix B.

As previously indicated, the one-dimensional process (

S H A D E

-

C D T

) necessitates considerable computational time and memory when considering only grayscale images. However, when color images are incorporated, the learning process involves tripling the amount of information due to the convolution performed in each color channel. Consequently, the median selection technique is a powerful tool employed in the

S H A D E

-

C D T_{c o l o r}

method proposed in this study. This technique facilitates reducing information during the learning process by selecting a sample of representative pixels, as Section 2.2.2 outlines, but, in the three-channel version, the associated vector for each pixel is obtained as depicted in Figure 16.

Also, to reduce the computational time required for the learning process of the

S H A D E

-

C D T_{c o l o r}

method, another modification considered was the utilization of a memory of previously evaluated individuals for the SHADE algorithm. In this memory, the individuals of the initial population and their fitness values are stored, and, for the subsequent generation, solely the recently generated individuals resulting from the mutation, crossover, and selection operators are evaluated and substituted for the defeated individuals. This process is repeated for subsequent generations until the DE algorithm is complete, as illustrated in Figure 17. This modification has been demonstrated to reduce the number of individual evaluations and the time required to complete the induction process, even when considering the three color channels.

The pseudocode of the proposed model for CDT induction using SHADE with color images (

S H A D E

-

C D T_{c o l o r}

method) is presented in Algorithm A3 of Appendix B.

It is important to acknowledge the critical role of the median selection technique in reducing information during the learning process and the proposed memory to reduce individual evaluations. These approaches enable the execution of the experiments described in the following section implemented in the Matlab R2023b software on a computer with the same characteristics as the one used in [10,11]; see Table 1. This facilitates a comparative analysis of the outcomes, showing that the proposed methodology reduces the memory and time of the induction process of a CDT compared to previous works while improving the F1-scores in the segmentation task by incorporating color images instead of grayscale images.

3. Experiments and Results

This section describes the experiments performed to analyze the performance of the CDT induction process for the segmentation of color images proposed in this work. The images from the “Weizmann Horse dataset” [22] are used to compare the proposal results considering the different color spaces described in Section 2.3. Once the best color space for induction is identified, the “Digital Retinal Images for Vessel Extraction” (DRIVE) dataset (https://drive.grand-challenge.org/DRIVE/ (accessed on 7 May 2025)) is employed to conduct a comparative analysis of the induction time and precision between the

S H A D E

-

C D T

method, employing grayscale images, and the

S H A D E

-

C D T_{c o l o r}

technique, using color images. The selection of these datasets allows the comparison of the results with those described in [6,10,11].

3.1. Comparison of Color Spaces with the Weizmann Horse Dataset

The Weizmann Horse dataset was used in experiments to compare the three color spaces under consideration (RGB, CIE L*a*b*, and HSV). This dataset comprises 327 manually segmented images of horses, which were resized to

40%

of their original size for the experimental procedures. The images were divided into training and test sets, with 33 and 295 images, respectively. Subsequently, controlled experiments were conducted to compare the segmentations obtained in each color space. These experiments consider as a primary approach the following parameters: 100 generations, population size with 100 individuals, and a memory of size 100 for the SHADE algorithm, and depths ranging from 1 to 5 and kernel sizes of 3, 5, 7, 9, and 11 were considered for the CDT structure.

The results obtained for the segmentation of the images in the test dataset employing the

S H A D E

-

C D T_{c o l o r}

method with the median technique, with

P = 0.3

, are presented in Table 2. This

P

value was selected after performing controlled experiments considering proportions of 10%, 20%, 30%, 40%, and 50%, obtaining the best results with 30%, so this value was maintained for the subsequent experiments. The training and test datasets were the same in these experiments to ensure consistency in the comparison between the color spaces and the different configurations of tree depth and kernel size in the structure of the CDTs.

To perform more in-depth experiments, the CDT structure with the highest F1-score value by kernel size was used. For this reason, a graphical representation of the information in Table 2 is presented in Figure 18, where the subfigures are line plots in two dimensions considering the depth of the CDT and their corresponding F1-scores, fixing the kernel size in each subfigure.

As demonstrated in Figure 18, the comparison lines facilitate the identification of the depth value, d, and the color space corresponding to the higher F1-score for each kernel size s. In Figure 18a, the maximum F1-score attained by a CDT with a kernel size of 3 is achieved when the depth is 3 in the RGB color space. Figure 18b illustrates that a higher F1-score is obtained by a CDT with a kernel size of 5 when the depth is 4 in the CIE L*a*b* color space. Furthermore, Figure 18c–e demonstrate that the best F1-score outcomes achieved with the CDTs employing fixed kernel sizes 7, 9, and 11, correspondingly, are attained when the trees have a depth of 2 in the CIE L*a*b* color space.

Five experiments were conducted to induce a CDT with each of the structures previously enumerated and for each space color, so, in total, 75 experiments were performed (25 for each color space). For these experiments, the same training and test sets were used to ensure a consistent comparison with the median selection technique.

In Table 3, the minimum, maximum, and mean ± standard deviation of the image segmentation precision, measured by the F1-score, of the images in the test set are presented.

The Shapiro–Wilk statistical test was employed for each CDT configuration within each color space to verify the normal distribution of the F1-score results obtained from the five executions. The corresponding p-values are also presented in Table 3.

In Figure 19, the boxplots of each color space are presented for each analyzed configuration.

The following observations can be made with reference to Figure 19:

In Figure 19a–c, the values obtained with the CIE L*a*b* color space demonstrate less variability when compared to the other two color spaces. This is because the corresponding boxes are comparatively short. Conversely, the boxes corresponding to the RGB and HSV color spaces are comparatively tall.
As illustrated in Figure 19c,d, the values obtained with the CIE L*a*b* color space demonstrate less variability when compared to the other two color spaces. However, in these instances, the boxes corresponding to the RGB and CIE L*a*b* color spaces do not exhibit significant variability.
As illustrated in Figure 19a,b, two values are positioned on the upper whiskers of the boxplots, corresponding to the CIE L*a*b* and RGB color spaces, respectively. This observation suggests the presence of outliers in their corresponding values. However, it should be noted that these values are above the median F1-score.
Conversely, in Figure 19b,d,e, the outliers are values that fall below the median F1-scores since they appear in the lower whiskers of the RGB, RGB and CIE L*a*b*, and HSV color spaces, respectively. However, in these cases, the comparative superiority of the RGB and CIE L*a*b* color spaces over the HSV color space is evident.
The boxplots demonstrate that the CIE L*a*b* color space consistently yields the highest mean F1-score values and exhibits minimal variability. However, a one-way ANOVA test was conducted to ascertain the disparities among the means of the various color spaces.

Subsequently, the one-way ANOVA test was performed to ascertain whether the differences between the means of the color spaces are statistically significant for each CDT structure. The p-values for these tests are also presented in Table 3. As these p-values were lower than the pre-determined significance level of

α = 0.05

, it can be concluded that the differences observed between the means of some of the methods are indeed statistically significant.

Tukey’s Honestly Significant Difference Procedure was employed as a post hoc multiple comparison test to ascertain which color spaces’ means are distinct on each CDT structure since the ANOVA test rejected the null hypothesis that all the method means are equal. The test results are illustrated in Figure 20. Additionally, a spider graph is provided in Figure 21 to facilitate a visual representation of the performance on each color space. This graph considers the three analyzed color spaces (RGB, CIE L*a*b*, and HSV) and the different CDT structures outlined in Table 3. The value of 1 indicates that the color space has obtained the best segmentation result in the corresponding CDT structure. Conversely, the value of 0 indicates that the mean F1-score obtained by the color space is not the best in the corresponding CDT structure.

The following observations can be made with reference to Figure 20 and Figure 21:

For the CDT structures with kernel sizes 3, 5, and 9 in Figure 20a, Figure 20b, and Figure 20d, respectively, the differences between the RGB and CIE L*a*b* color spaces are not significant. In Figure 21, the RGB and CIE L*a*b* color spaces have a value of 1 in the corresponding CDT structures, indicating that the best segmentation results are achieved in these color spaces.
For the CDT structures with kernel sizes of 7 and 11, significant differences are presented between the three color spaces in Figure 20a and Figure 20b, respectively. The spider graph in Figure 21 shows that those CDT structures’ best segmentation results are obtained in the CIE L*a*b* color space.
In Figure 20, the means for the RGB and CIE L*a*b* color spaces are always significantly different from those of the HSV color space. This assertion can be made since the lines corresponding to the HSV color space are positioned more to the left on the horizontal axis and do not overlap with the lines corresponding to the other color spaces.
These figures demonstrate the comparative superiority of the CIE L*a*b* color space over the other color spaces. As illustrated in Figure 20, in all the subfigures, the lines corresponding to the CIE L*a*b* color space are positioned more to the right on the horizontal axis. This finding indicates that optimal segmentation outcomes are attained within this color space. This is represented in Figure 21 since the CIE L*a*b* color space has a value of 1 in all the CDT structures.

These results demonstrate that a CDT induced with the information of the images in the CIE L*a*b* color space produces superior segmentation results in comparison to a CDT induced with information in other color spaces.

$S H A D E$ - $C D T$ vs. $S H A D E$ - $C D T_{c o l o r}$

A comparison of the results obtained with the

S H A D E

-

C D T_{c o l o r}

technique and those obtained with the

S H A D E

-

C D T

method is conducted in this study.

The best results reported in [10] per kernel size for the

S H A D E

-

C D T

technique are presented in Table 4. These results were documented following the implementation of a singular execution of the

S H A D E

-

C D T

technique. Given the absence of computational reduction approaches for the induction process, the volume of information for the training process was substantial. Consequently, multiple experiments with different kernel sizes and depths were not feasible, even when grayscale images were considered.

In the same Table 4, the best results with the

S H A D E

-

C D T_{c o l o r}

method are reported. These values were obtained from Table 3 considering the CIE L*a*b* color space with the corresponding CDT structure.

In both methods, the median selection technique was employed with a proportion of 0.3, and, for the SHADE algorithm, 100 individuals and a memory size of 100 were used. For the number of generations, 200 were considered for the

S H A D E

-

C D T

approach, while 100 were considered for the

S H A D E

-

C D T

method.

In Figure 22, a visual representation of the information in Table 4 is presented. The comparison between the F1-score and induction time results with each method (

S H A D E

-

C D T

and

S H A D E

-

C D T_{c o l o r}

) is presented as a plot line, indicating the kernel size considered on each point.

As indicated in Table 4 and Figure 22, the precision values obtained with the

S H A D E

-

C D T

technique are lower than those obtained with the

S H A D E

-

C D T_{c o l o r}

method. In Table 4, the F1-scores in column 3 are less than 0.53, while the scores in column 5 are greater than 0.56, with some exceeding 0.6. This assertion is corroborated by Figure 22, which illustrates that the points representing the

S H A D E

-

C D T

method results are positioned more to the left along the horizontal axis, corresponding to the F1-score, in comparison to the points for the

S H A D E

-

C D T_{c o l o r}

method.

Concerning the induction time, none of the experiments employing the

S H A D E

-

C D T_{c o l o r}

technique exceeded 24 min in execution, whereas the

S H A D E

-

C D T

method reported execution times over 48 min for identical structures. This hypothesis is corroborated by Figure 22 since the points representing the

S H A D E

-

C D T

method are located higher along the vertical axis, corresponding to the induction time, in comparison to the points for the

S H A D E

-

C D T_{c o l o r}

method.

It is important to note that the median selection technique and the modification in the SHADE process for the

S H A D E

-

C D T_{c o l o r}

technique have significantly reduced the learning process time and memory in experiments that utilize information from three color channels for induction. As demonstrated in Figure 22, the behavior of the induction time for the

S H A D E

-

C D T

method appears to be exponential. In contrast, for the

S H A D E

-

C D T_{c o l o r}

method, the induction time is not as variable and is entirely dependent on the number of new individuals generated in the SHADE algorithm.

It is also noteworthy that the SHADE algorithm process involved 200 generations with the

S H A D E

-

C D T

technique and only 100 generations with the

S H A D E

-

C D T_{c o l o r}

method. This discrepancy precludes a precise time induction comparison between the methods. In the following section, an accurate comparison of times will be performed in the experiments conducted with the DRIVE dataset, considering only the CIE L*a*b* color space.

It should be noted that experiments with kernels of size 11 are not documented in [10]. This is likely due to the computer specifications described in Table 2 being inadequate for executing the induction of CDT, or due to the experimental time required for these structures, even when grayscale images were considered. In contrast, the

S H A D E

-

C D T_{c o l o r}

method, in conjunction with the median selection technique, has yielded favorable outcomes for CDTs with kernels of size 11 and color images, requiring a small amount of time for the induction process.

3.2. Comparison of Induction Time and Precision with the DRIVE Dataset Considering the CIE Lab* Color Space

In this section, the “Retinal Vessel Segmentation” dataset was employed to analyze several CDT structures’ induction times and performance with the

S H A D E

-

C D T_{c o l o r}

technique. This dataset comprises 20 retinal images with their corresponding segmentation masks or ground truths. The selection of these images was guided by the findings reported in [11], which detailed the segmentation outcomes obtained by applying the

S H A D E

-

C D T

method with the median selection technique to induce CDTs with this dataset.

Replicating the characteristics mentioned in [11] was considered in these experiments. Consequently, the resolution of the images, which previously was

565 \times 584

, was changed to

256 \times 256

. Furthermore, the same images utilized for the training and test sets in [11] were considered, with proportions of 70% and 30%, respectively. The parameter settings for the experiments are presented in Table 5. In this work, these values were applied to the

S H A D E

-

C D T_{c o l o r}

technique, considering 10 independent executions, enabling a comparison of the computational time for the induction by employing the median selection technique.

In Table 6, the outcomes of the image segmentation process utilizing both methodologies are presented. The results with the full training data (

S H A D E

-

C D T_{f u l l}

) and with the

S H A D E

-

C D T

method were obtained from [11].

As demonstrated in Table 6, the findings reveal that, for a given CDT configuration (kernel size and depth), the time required to induce a CDT using the median selection technique, with images in a single color channel (grayscale images for the

S H A D E

-

C D T

technique) or three color channels (color images for the the

S H A D E

-

C D T_{c o l o r}

technique), is consistently less than the induction process that incorporates all the information of the grayscale images present in the training set (

S H A D E

-

C D T_{f u l l}

).

It is important to mention that, although the

S H A D E

-

C D T_{c o l o r}

technique increases the length of individuals and kernel weights by a factor of three for the SHADE algorithm, the induction time is not tripled compared to the

S H A D E

-

C D T

method. This is attributable to the incorporation of a memory of previously evaluated individuals, which reduces the number of evaluations in the fitness function.

Furthermore, experimentation with the

S H A D E

-

C D T_{c o l o r}

technique incorporating all the information of the color images present in the training set was not feasible with the computer available. Consequently, a direct comparison of the technique in three color channels with the full training data and the median selection technique, as in [11], was not possible.

As previously indicated, these experiments were conducted to analyze the computational time required for these processes, not the precision, given that the CDT induction using the

S H A D E

-

C D T_{c o l o r}

technique did not correspond to the ideal parameter values. Initially, the structure employed in [11] was used to compare the times for both techniques. So, to evaluate the precision of the CDT induced with the

S H A D E

-

C D T_{c o l o r}

technique, several experiments were performed considering different parameters for the SHADE algorithm, the CDT structure, and the proportion

P

for the median selection technique. The best results were obtained with CDTs induced with the parameters presented in Table 7.

A total of ten independent experiments were conducted to induce CDTs using the

S H A D E

-

C D T_{c o l o r}

technique, with the parameter values given in Table 7. The minimum, maximum, and mean ± standard deviation of the F1-scores obtained are presented in Table 8. The results obtained in [11] with the

S H A D E

-

C D T

technique using the parameter values in Table 5 are also reported in the same table to facilitate the comparison. The p-values obtained when performing the Shapiro–Wilk statistical test are included in both cases.

The Shapiro–Wilk statistical test was performed to verify the normal distribution on the F1-score results obtained in the ten executions of each of the two methods (

S H A D E

-

C D T_{c o l o r}

and

S H A D E

-

C D T

). The p-values indicate that the distribution of F1-scores obtained with the

S H A D E

-

C D T_{c o l o r}

technique is non-parametric, so, in order to identify if the samples of both methods exhibit significant differences, the Mann–Whitney U test was conducted, obtaining a value of 0.0017. Since this p-value is lower than the significance level

α = 0.05

, it is concluded that the difference between the two methods is significant.

So, it is accurate to assert that the

S H A D E

-

C D T_{c o l o r}

method, employing the median selection technique, yields superior outcomes compared to the

S H A D E

-

C D T

method, utilizing the same technique, when the induction parameters are calibrated. This assertion is substantiated by the median values of 0.6520 and 0.6392 for the

S H A D E

-

C D T_{c o l o r}

and

S H A D E

-

C D T

methods, respectively, and the means in Table 8.

Regarding temporal requirements, the

S H A D E

-

C D T_{c o l o r}

method necessitated a greater investment of time than the

S H A D E

-

C D T

approach. This is because the

S H A D E

algorithm process incorporates a population of 200 individuals and 350 generations in the first method as opposed to the 100 individuals and 200 generations employed in the second method. This suggests that there is a compromise between the temporal demands of the processes and their performance. Nevertheless, the induction times reported in this work are shorter than those documented in [6] and [7]. In those papers, the process required at least 12 h of execution. This represents a significant and noteworthy enhancement regarding the CDT induction process.

4. Discussion

As reported in previous studies, the induction of CDTs for the segmentation task employs grayscale images and analytic or evolutive processes to perform local or global searches to obtain the kernels of the nodes of the CDTs, respectively [6,7,9,10,11]. The present study explores the integration of color images in the induction process of a CDT.

In this direction, three color spaces were employed in the initial stages of the investigation to establish the most effective channels for characterizing the colors of the images in the induction process. A comparison of the RGB, CIE L*a*b*, and HSV color spaces, three of the most relevant color spaces, was undertaken, and it was found that the CIE L*a*b* color space rendered the best three channels in all the experiments of induction of the CDTs considered (see Table 2 and Table 3). Consequently, this color space was utilized to compare the outcomes obtained through the conventional induction of CDTs with grayscale images in a single channel and the proposed approach with color images and CDTs with three channels.

This study considered two types of analysis: one to analyze the execution time and the other to analyze the segmentation task’s precision.

Regarding temporal considerations, the proposed methodology employs the median selection technique and a memory of previously evaluated individuals for the SHADE algorithm in the learning process. This modification reduces the number of individual evaluations and the time required to complete the induction process (see Table 6). The induction times reported in this study (with a maximum of 131.54 min for the induction process) are shorter than those previously reported in other studies (which involved 12 h of experimentation with lower F1-scores).

Regarding precision, the proposal enhances the F1-scores obtained for the Weizmann Horse and DRIVE datasets when considering the CIE L*a*b* color space and the appropriate parameter settings for the SHADE algorithm in the induction process (see Table 4 and Table 8). These enhancements signify a substantial and noteworthy advancement in the CDT induction processes.

Regarding the so-called explainability of the CDTs, the structures proposed in this work have the same performance as the classical single-channel CDT structures. In the segmentation process, the information of each pixel of the image to be segmented passes through the classification process determined by the convolution on each node of the CDT, and its final label is obtained when the pixel reaches a leaf node. Thus, classifying a pixel with label 0 or 1 can be directly observed. It is also possible to characterize the pixels that reach the same leaf node of the CDT or to analyze why the F1-score values are high or low in different images. Appendix A shows some examples of analyses performed using the inherent explanatory power of the CDTs.

5. Conclusions and Future Work

This work proposes implementing and analyzing a color image segmentation algorithm based on CDTs using differential evolution. In this proposal, the SHADE algorithm was utilized in the learning process, where multiple individuals (CDTs) were randomly proposed and evolved to ascertain the weights of the CDT kernels, which were structured with three channels. The final CDT is the individual with the best F1-score in the training dataset.

The proposal further considers an analysis of three spaces of color to identify the one with the optimal structure for the CDT induction in three color channels, demonstrating the most favorable results in the CIE L*a*b* color space.

The results demonstrated the significance of considering the three color channels in the induction of CDTs, resulting in enhanced segmentation outcomes. The implementation of the median selection technique, in conjunction with the memory of previously evaluated individuals, exhibited a significant contribution, as evidenced by a substantial reduction in the computational time required for the induction process.

This study demonstrated that color characterization is a critical factor in enhancing the efficacy of the segmentation task performed with the induced CDTs since the precision increased by at least 8% between the experiments with grayscale and color images.

Future work will examine other color spaces, including L*C*h*, which have exhibited substantial outcomes in prior studies investigating color. Moreover, restricting the analysis to two channels and two-dimensional structures could be a viable approach to reduce computational time. Consequently, a thorough investigation into the two most representative channels for color space is warranted. Another approach that merits consideration is translating the color information to an optimal color space, where the pixels of the different classes (0 and 1) are separated by techniques such as Linear Discriminant Analysis. In addition, the performance of the CDT in comparison with other segmentation techniques that employ CNNs is proposed.

Author Contributions

Conceptualization, A.-L.L.-L., H.-G.A.-M., and E.M.-M.; Data curation, A.-L.L.-L.; Formal analysis, A.-L.L.-L.; Investigation, A.-L.L.-L.; Methodology, A.-L.L.-L., H.-G.A.-M., and E.M.-M.; Resources, A.-L.L.-L., H.-G.A.-M., and E.M.-M.; Software, A.-L.L.-L.; Supervision, H.-G.A.-M. and E.M.-M.; Validation, H.-G.A.-M. and E.M.-M.; Visualization, A.-L.L.-L.; Writing—original draft, A.-L.L.-L.; Writing—review and editing, A.-L.L.-L., H.-G.A.-M., and E.M.-M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data derived from public domain resources: the Weizmann Horse dataset from Kaggle at https://www.kaggle.com/datasets/ztaihong/weizmann-horse-database/data, and the Digital Retinal Images for Vessel Extraction (DRIVE) at https://drive.grand-challenge.org/DRIVE/ (accessed on 7 May 2025).

Acknowledgments

The first author acknowledges the Secretaría de Ciencia, Humanidades, Tecnología e Innovación (SECIHTI) of Mexico for the support provided through scholarship 712182, which was awarded for postdoctoral studies at the Artificial Intelligence Research Institute at the University of Veracruz.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CNN	Convolutional Neural Network
CDT	Convolutional Decision Tree
DE	Differential Evolution
SHADE	Success-History-Based Adaptive Differential Evolution

Appendix A. Explainability in CDTs of Three Channels

The segmentation process performed with a CDT can be analyzed by following the partitions performed on each CDT node. This approach facilitates the analysis of the corresponding subsets, thereby enabling the determination of the characteristics the CDT identifies on each branch. In this section, a detailed description regarding explainability with the CDT with the best F1-score for the Weizmann Horse and DRIVE datasets is provided.

Appendix A.1. Weizmann Horse Dataset

The following description of results was conducted by considering the CDT induced with the best F1-score in Section 3.1, with a kernel size of 11 and a depth of 2 (refer to Table 2).

As illustrated in Figure A1, the considered CDT is displayed, with red, green, and blue borderlines denoting the three channels in the structure and a small square indicating the bias value. The left-hand branch is applied to pixels labeled 0 in the root node, while the right-hand branch is applied to pixels labeled 1.

Figure A1. CDT induced to segment the images in the Weizmann Horse dataset.

In Figure A2, the best segmentation achieved on an image from the Weizmann Horse dataset is shown. It attains an F1-score of 0.9141. The partitions obtained on each node are displayed, with white representing the pixels labeled as the corresponding class. For instance, kernel 1 partitions the pixels into 0 and 1, where the background is labeled as 0 and depicted with the white pixels on the left branch, and the horse is labeled as class 1 and depicted with the white pixels on the right branch. The original image, its ground truth, and the segmentation obtained are shown on the right side.

As demonstrated in Figure A2, the first kernel performs a very good segmentation as the horse shape is clearly distinguishable on the pixels labeled as class 1. However, the subsequent investigation into the partitions of the following kernels suggests that certain regions of the horse’s head, chest, and tail were erroneously classified as part of the background as their color is brighter than the typical brown hue of the horse. Kernel 2 detects these pixels and labels them as class 1. Conversely, kernel 3 does not appear to perform another partition as no pixels are labeled as class 0.

Figure A2. Best segmentation performed on an image from the Weizmann Horse dataset. The image on the left side of the figure displays the partitions that were performed on each node of the CDT. The image on the right presents the original image, its ground truth, and the segmentation obtained with the CDT.

In this instance, the objective of the partitions was to detect the brown pixels in the original image.

Figure A3 shows the worst segmentation of an image of the Weizmann Horse dataset (F1-score = 0.1076). In this case, the performance of kernel 1 is substandard as the pixels corresponding to the horse are labeled as class 0, along with the pixels corresponding to the ground, and remain in that class after passing through kernel 2. Conversely, the pixels representing the sky are classified as class 1.

Figure A3. Worst segmentation performed on an image from the Weizmann Horse dataset. The image on the left side of the figure displays the partitions that were performed on each node of the CDT. The image on the right presents the original image, its ground truth, and the segmentation obtained with the CDT.

Figure A4 provides more in-depth insight into the process performed by the CDT. In this representation, the white regions of the horse are designated as class 0, while the darkest regions are labeled as class 1. Consequently, the CDT demonstrates effective segmentation in images depicting black and brown horses but exhibits suboptimal performance in images featuring light-colored horses. This hypothesis can be validated by analyzing the training dataset, which exhibits a paucity of light-colored horses.

Figure A4. Analysis of segmentation on an image of the Weizmann Horse dataset. The image on the left side of the figure displays the partitions that were performed on each node of the CDT. The image on the right presents the original image, its ground truth, and the segmentation obtained with the CDT.

Appendix A.2. DRIVE Dataset

The following description of results was conducted by considering the CDT induced with the best F1-score in Section 3.2, with a kernel size of 5 and a depth of 2 (refer to Table 8).

In Figure A5, the CDT under consideration is displayed, with red, green, and blue borderlines denoting the three channels in the structure and a small square indicating the bias value.

Figure A5. CDT induced to segment the images in the DRIVE dataset.

Figure A6 and Figure A7 present the best and worst segmentation results on the images of the DRIVE dataset, respectively, with F1-scores of 0.7339 and 0.6085.

In these cases, it is evident that kernel 1 performs a partition that discriminates the interior and exterior of the retina, with subsequent kernels conducting the detailed classification of the veins. In particular, kernel 3 is responsible for the most detailed classification step. A subsequent analysis of the results reveals that the CDT consistently achieves lower F1-scores because the pixels designated as class 1 exceed the original number, as the CDT identifies all the veins in the retina, in contrast to the number identified in the ground truth.

Figure A6. Best segmentation performed on an image of the DRIVE dataset. The image on the left side of the figure displays the partitions that were performed on each node of the CDT. The image on the right presents the original image, its ground truth, and the segmentation obtained with the CDT.

Figure A7. Worst segmentation performed on an image of the DRIVE dataset. The image on the left side of the figure displays the partitions that were performed on each node of the CDT. The image on the right presents the original image, its ground truth, and the segmentation obtained with the CDT.

Appendix A.3. Description and Future Analysis

In the context of performing segmentation of color images with CDTs, the structures exhibited in Appendix A.1 and Appendix A.2 appear to execute a binary classification of the two primary colors in the image on the root node. In the other leaf nodes, a characterization of subcolors present in the corresponding partitions is evident. This analysis provides a foundation for understanding the factors that contribute to suboptimal performance of CDTs on various images, as illustrated in Figure A3 and Figure A7.

In addition, the analysis of sub-structures can be considered in future studies, as illustrated by the partition observed in Figure A8, where a liver is segmented with a CDT and the pixels from different leaf nodes are identified with different colors. This analysis will facilitate the observation and subsequent description of the various colorations present on the liver, with the intensity of color serving as a distinguishing feature. This approach will generate a model for identifying regions of interest that has the potential to contribute to several fields of science.

Figure A8. Segmentation of a liver image [23] performed with a CDT and considering the partition performed by branch, not for binary class.

Appendix B. Pseudocodes

The pseudocodes that describe the

S H A D E

-

C D T_{c o l o r}

method for CDT induction proposed in this paper are presented in this section to facilitate the reproducibility of the process.

Algorithm A1 delineates the median selection technique described in Section 2.2.2. Algorithm A2 details the process employed to classify the pixels of an image with a CDT encoded as a vector (

i n d i v i d u a l

) to obtain the corresponding CDT’s fitness. This process is employed in the SHADE algorithm. Finally, the pseudocode of the

S H A D E

-

C D T_{c o l o r}

method is presented in Algorithm A3.

Algorithm A1: Median selection technique

Algorithm A2: F1-score of an individual (fitness)

Algorithm A3:

S H A D E

-

C D T_{c o l o r}

method (CDT induction with color images)

Input: Dataset of color images

C I

and their corresponding ground truth images

G T

. CDT structure parameters: kernel size s and depth d. SHADE parameters: population size

N P

, number of generations

N G

, memory size H, and fitness function f. Median selection technique parameter: proportion

ρ

.

Output:

i n d i v i d u a l_{b e s t}

with the kernel values of the CDT with the best F1-score value

f i t n e s s_{b e s t}

.

// Initialization

Generate $p o p u l a t i o n \leftarrow$ $N P$ vectors of size $(3 s^{2} + 1) * (2^{d} - 1)$ with random values chosen from −255 to 255;
// Process
Get $[M, l a b e l s] \leftarrow$ training dataset M and its corresponding $l a b e l s$ vector generated with the median selection technique applied to $C I$ and $G T$ , with parameters $s, d, ρ$ ;
Get $[i n d i v i d u a l_{b e s t}, f i t n e s s_{b e s t}] \leftarrow$ by applying the SHADE algorithm to the $p o p u l a t i o n$ , with the parameters $N G$ , H, and F1-score as fitness function f, considering the memory of previously evaluated individuals described in Section 2.5 (see Figure 17);
Return: $i n d i v i d u a l_{b e s t}$ , $f i t n e s s_{b e s t}$

References

Patil, D.D.; Deore, S.G. Medical image segmentation: A review. Int. J. Comput. Sci. Mob. Comput. 2013, 2, 22–27. [Google Scholar]
Zhu, C.; Ni, J.; Li, Y.; Gu, G. General tendencies in segmentation of medical ultrasound images. In Proceedings of the 2009 Fourth International Conference on Internet Computing for Science and Engineering, Harbin, China, 21–22 December 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 113–117. [Google Scholar]
Lateef, F.; Ruichek, Y. Survey on semantic segmentation using deep learning techniques. Neurocomputing 2019, 338, 321–348. [Google Scholar] [CrossRef]
Minaee, S.; Boykov, Y.; Porikli, F.; Plaza, A.; Kehtarnavaz, N.; Terzopoulos, D. Image segmentation using deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3523–3542. [Google Scholar] [CrossRef] [PubMed]
Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, 2nd ed. Available online: https://christophm.github.io/interpretable-ml-book (accessed on 15 April 2025).
Laptev, D.; Buhmann, J.M. Convolutional decision trees for feature learning and segmentation. In Proceedings of the German Conference on Pattern Recognition, Munster, Germany, 2–5 September 2014; Springer: Cham, Switzerland, 2014; pp. 95–106. [Google Scholar]
Barradas Palmeros, J.A.; Mezura Montes, E.; Acosta Mesa, H.G.; Márquez Grajales, A.; Rivera López, R. Induction of Convolutional Decision Trees with Differential Evolution for Image Segmentation. In Proceedings of the Congreso Mexicano de Inteligencia Artificial, Guadalajara, Mexico, 30 May–3 June 2023; Volume 8. [Google Scholar]
Eiben, A.E.; Smith, J.E. Introduction to Evolutionary Computing; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
López-Lobato, A.L.; Acosta-Mesa, H.G.; Mezura-Montes, E. Blood Cell Image Segmentation Using Convolutional Decision Trees and Differential Evolution. In Advances in Computational Intelligence. MICAI 2023 International Workshops, Proceedings of the WILE 2023, HIS 2023, and CIAPP 2023, Yucatan, Mexico, 13–18 November 2023; Springer Nature: Cham, Switzerland, 2024; pp. 315–325. [Google Scholar]
López-Lobato, A.L.; Acosta-Mesa, H.G.; Mezura-Montes, E. Induction of Convolutional Decision Trees with Success-History-Based Adaptive Differential Evolution for Semantic Segmentation. Math. Comput. Appl. 2024, 29, 48. [Google Scholar] [CrossRef]
López-Lobato, A.L.; Acosta-Mesa, H.G.; Mezura-Montes, E. Computational Time Reduction in the Induction of Convolutional Decision Trees. In Advances in Computational Intelligence. MICAI 2024 International Workshops, Proceedings of the HIS 2024, WILE 2024, and CIAPP 2024, Tonantzintla, Mexico, 21–25 October 2024; Martínez-Villaseñor, L., Ochoa-Ruiz, G., Montes Rivera, M., Barrón-Estrada, M.L., Acosta-Mesa, H.G., Eds.; Springer: Cham, Switzerland, 2025; pp. 99–111. [Google Scholar]
Storn, R.; Price, K. Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 1997, 11, 341–359. [Google Scholar] [CrossRef]
Price, K.; Storn, R.M.; Lampinen, J.A. Differential Evolution: A Practical Approach to Global Optimization; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Ahmad, M.F.; Isa, N.A.M.; Lim, W.H.; Ang, K.M. Differential evolution: A recent review based on state-of-the-art works. Alex. Eng. J. 2022, 61, 3831–3872. [Google Scholar] [CrossRef]
Zhang, J.; Sanderson, A.C. JADE: Adaptive differential evolution with optional external archive. IEEE Trans. Evol. Comput. 2009, 13, 945–958. [Google Scholar] [CrossRef]
Tanabe, R.; Fukunaga, A. Success-history based parameter adaptation for differential evolution. In Proceedings of the 2013 IEEE Congress on Evolutionary Computation, Cancun, Mexico, 20–23 June 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 71–78. [Google Scholar]
Tanabe, R.; Fukunaga, A.S. Improving the search performance of SHADE using linear population size reduction. In Proceedings of the 2014 IEEE Congress on Evolutionary Computation (CEC), Beijing, China, 6–11 July 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 1658–1665. [Google Scholar]
Phuangsaijai, N.; Jakmunee, J.; Kittiwachana, S. Investigation into the predictive performance of colorimetric sensor strips using RGB, CMYK, HSV, and CIELAB coupled with various data preprocessing methods: A case study on an analysis of water quality parameters. J. Anal. Sci. Technol. 2021, 12, 1–16. [Google Scholar] [CrossRef]
Korifi, R.; Le Dréau, Y.; Antinelli, J.F.; Valls, R.; Dupuy, N. CIEL* a* b* color space predictive models for colorimetry devices–Analysisof perfume quality. Talanta 2013, 104, 58–66. [Google Scholar] [CrossRef] [PubMed]
Vodyanitskii, Y.N.; Kirillova, N. Application of the CIE-L* a* b* system to characterize soil color. Eurasian Soil Sci. 2016, 49, 1259–1268. [Google Scholar] [CrossRef]
Cuevas, E.; Zaldívar, D.; Pérez-Cisneros, M. Procesamiento Digital de Imágenes Usando Matlab & Simulink; Alfaomega: Ciudad de Mexico, Mexico, 2010; Volume 479. [Google Scholar]
Borenstein, E.; Sharon, E.; Ullman, S. Combining top-down and bottom-up segmentation. In Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop, Washington, DC, USA, 27 June–2 July 2004; IEEE: Piscataway, NJ, USA, 2004; p. 46. [Google Scholar]
Herrera-Sánchez, D.; Acosta-Mesa, H.G.; Mezura-Montes, E.; Herrera-Meza, S.; Rivadeneyra-Domínguez, E.; Zamora-Bello, I.; Almanza-Domínguez, M.F. Imaging Estimation for Liver Damage Using Automated Approach Based on Genetic Programming. Math. Comput. Appl. 2025, 30, 25. [Google Scholar] [CrossRef]

Figure 1. Semantic segmentation technique applied to retinal vessel detection *. * https://drive.grand-challenge.org/DRIVE/ (accessed on 7 May 2025).

Figure 2. DE algorithm with SHADE.

Figure 3. Grayscale image.

Figure 4. Convolution process on a 6 × 6 image with a 3 × 3 kernel.

Figure 5. Convolutional Decision Tree with kernels of size 3 and depth 2.

Figure 6. Encodings of a kernel and an individual for the SHADE-CDT technique to induce a CDT of depth 3 with kernels of size 3.

Figure 7. Coding of a pixel-associated instance.

Figure 8. SHADE-CDT process employing the codification of the pixel-associated instance (x) and the codification of the convolutional kernels (w) in the nodes of a CDT. The dot product between x and w is evaluated in the sigmoid function to obtain a value between 0 and 1. In this example, this evaluation is greater than 0.5, so the label assigned for the root node, denoted as

k_{1}

, is 1, highlighted in the CDT, directing the pixel information to the node

k_{3}

. The same process is repeated with the codification of node

k_{3}

to obtain the label 0, directing the pixel information to the leaf node

k_{6}

. With its vectorial codification, the process obtains the final label 1 for the pixel under consideration.

Figure 8. SHADE-CDT process employing the codification of the pixel-associated instance (x) and the codification of the convolutional kernels (w) in the nodes of a CDT. The dot product between x and w is evaluated in the sigmoid function to obtain a value between 0 and 1. In this example, this evaluation is greater than 0.5, so the label assigned for the root node, denoted as

k_{1}

, is 1, highlighted in the CDT, directing the pixel information to the node

k_{3}

. The same process is repeated with the codification of node

k_{3}

to obtain the label 0, directing the pixel information to the leaf node

k_{6}

. With its vectorial codification, the process obtains the final label 1 for the pixel under consideration.

Figure 9. Grayscale image and its ground truth on the left, and a representation of the obtention of a representative vector for the analyzed subset

ρ

with the median selection technique.

Figure 9. Grayscale image and its ground truth on the left, and a representation of the obtention of a representative vector for the analyzed subset

ρ

with the median selection technique.

Figure 10. Color spaces analyzed in this work: (a) RGB. (b) CIE L*a*b*. (c) HSV.

Figure 11. Composition of a color image by channels.

Figure 12.

3 \times 3 \times 3

convolutional kernel process on an image with three channels of color.

Figure 12.

3 \times 3 \times 3

convolutional kernel process on an image with three channels of color.

Figure 13. Convolution process on an image with three color channels with a kernel of size

3 \times 3 \times 3

.

Figure 13. Convolution process on an image with three color channels with a kernel of size

3 \times 3 \times 3

.

Figure 14. Encodings of a kernel and an instance by color channel.

Figure 15. Individual considered in the SHADE process for the induction of a CDT with kernel size

s = 3

and depth

d = 2

(vector of length

(3 s^{2} + 1) \cdot (2^{d} - 1) = 84

).

Figure 15. Individual considered in the SHADE process for the induction of a CDT with kernel size

s = 3

and depth

d = 2

(vector of length

(3 s^{2} + 1) \cdot (2^{d} - 1) = 84

).

Figure 16. Information of the pixel

x_{i, j}

structured for the SHADE process considering a kernel of size

s = 3

.

Figure 16. Information of the pixel

x_{i, j}

structured for the SHADE process considering a kernel of size

s = 3

.

Figure 17. The memory implemented for the

S H A D E

-

C D T_{c o l o r}

method calculates the fitness for all individuals in the initial population, and, for subsequent populations, only the fitness for the new individuals in the immediate next generation is calculated, reducing the computational time of the induction process.

Figure 17. The memory implemented for the

S H A D E

-

C D T_{c o l o r}

method calculates the fitness for all individuals in the initial population, and, for subsequent populations, only the fitness for the new individuals in the immediate next generation is calculated, reducing the computational time of the induction process.

Figure 18. Line plots in two dimensions considering the depth d of the CDTs and their corresponding F1-scores, fixing the kernel size s. (a) CDTs with

s = 3

. (b) CDTs with

s = 5

. (c) CDTs with

s = 7

. (d) CDTs with

s = 9

. (e) CDTs with

s = 11

.

Figure 18. Line plots in two dimensions considering the depth d of the CDTs and their corresponding F1-scores, fixing the kernel size s. (a) CDTs with

s = 3

. (b) CDTs with

s = 5

. (c) CDTs with

s = 7

. (d) CDTs with

s = 9

. (e) CDTs with

s = 11

.

Figure 19. Boxplots of the F1-scores obtained in five experiments in each color space and in each of the following CDT structures: (a) CDTs with

s = 3

and

d = 3

. (b) CDTs with

s = 5

and

d = 4

. (c) CDTs with

s = 7

and

d = 2

. (d) CDTs with

s = 9

and

d = 2

. (e) CDTs with

s = 11

and

d = 2

.

Figure 19. Boxplots of the F1-scores obtained in five experiments in each color space and in each of the following CDT structures: (a) CDTs with

s = 3

and

d = 3

. (b) CDTs with

s = 5

and

d = 4

. (c) CDTs with

s = 7

and

d = 2

. (d) CDTs with

s = 9

and

d = 2

. (e) CDTs with

s = 11

and

d = 2

.

Figure 20. Multiple comparison test between the color spaces analyzed in each of the following CDT structures: (a) CDTs with

s = 3

and

d = 3

. (b) CDTs with

s = 5

and

d = 4

. (c) CDTs with

s = 7

and

d = 2

. (d) CDTs with

s = 9

and

d = 2

. (e) CDTs with

s = 11

and

d = 2

.

Figure 20. Multiple comparison test between the color spaces analyzed in each of the following CDT structures: (a) CDTs with

s = 3

and

d = 3

. (b) CDTs with

s = 5

and

d = 4

. (c) CDTs with

s = 7

and

d = 2

. (d) CDTs with

s = 9

and

d = 2

. (e) CDTs with

s = 11

and

d = 2

.

Figure 21. Spider graph of the results obtained with the three analyzed color spaces (RGB, CIE L*a*b*, and HSV) and the different CDT structures (kernel size s and depth d). Value 1 indicates that the color space has obtained the best segmentation result in the corresponding CDT structure. Value of 0 indicates the contrary case.

Figure 22. Plot line to compare the F1-scores and the induction time results with each method (

S H A D E

-

C D T

and

S H A D E

-

C D T_{c o l o r}

), indicating the kernel size s considered on each point.

Figure 22. Plot line to compare the F1-scores and the induction time results with each method (

S H A D E

-

C D T

and

S H A D E

-

C D T_{c o l o r}

), indicating the kernel size s considered on each point.

Table 1. Computer specifications.

Operating System	Windows 11 Pro 23H2
RAM	64 GB
Processor	AMD Ryzen 5 5600G
Processor speed	3.90 GHz

Table 2. Results for the Weizmann Horse dataset segmentation with the

S H A D E

-

C D T_{c o l o r}

method employing the median technique with

P = 0.3

, and, for the SHADE algorithm, 100 individuals, 100 generations, and a memory size of 100. The best F1-score by kernel size, considering the color space, is shown in bold numbers, and the best result by kernel size is highlighted.

Table 2. Results for the Weizmann Horse dataset segmentation with the

S H A D E

-

C D T_{c o l o r}

method employing the median technique with

P = 0.3

, and, for the SHADE algorithm, 100 individuals, 100 generations, and a memory size of 100. The best F1-score by kernel size, considering the color space, is shown in bold numbers, and the best result by kernel size is highlighted.

CDT Structure		RGB			CIE Lab*			HSV
$s$	$d$	F1-Score	Accuracy	Time (min)	F1-Score	Accuracy	Time (min)	F1-Score	Accuracy	Time (min)
3	1	0.5415	0.7632	6.39	0.5466	0.7653	6.18	0.3996	0.7383	6.64
3	2	0.5346	0.7509	10.6	0.548	0.7645	10.44	0.5239	0.7675	10.82
3	3	0.5718	0.773	14.23	0.5654	0.7664	14.27	0.5143	0.7692	14.36
3	4	0.5534	0.7492	18.08	0.5689	0.7692	18.25	0.5353	0.7623	18.7
3	5	0.5664	0.7669	22.39	0.5664	0.7648	24.62	0.5455	0.7689	22.16
5	1	0.5562	0.7702	7.26	0.5567	0.7647	7.16	0.4087	0.7331	6.65
5	2	0.5544	0.7694	11.22	0.581	0.76	11.37	0.4966	0.7457	11.72
5	3	0.5769	0.7484	15.41	0.5686	0.7528	15.68	0.5152	0.7484	15.81
5	4	0.5793	0.7502	19.58	0.593	0.7778	20.13	0.5359	0.7486	19.33
5	5	0.5666	0.7557	23.6	0.592	0.7703	24.91	0.512	0.7528	23.04
7	1	0.563	0.7689	8.51	0.5641	0.7635	8.83	0.4127	0.7235	8.61
7	2	0.5669	0.7694	13.25	0.5942	0.7667	13.22	0.5115	0.7425	13.82
7	3	0.5825	0.7532	18.16	0.5923	0.7672	17.30	0.5226	0.7513	18.46
7	4	0.5918	0.7567	22.76	0.5865	0.756	23.15	0.5312	0.7363	21.79
7	5	0.5671	0.74	28.09	0.5642	0.7385	26.13	0.5275	0.7337	26.44
9	1	0.5548	0.7586	11.8	0.5707	0.7631	11.62	0.4387	0.7183	11.23
9	2	0.5946	0.7502	18.38	0.6042	0.769	17.8	0.5063	0.7258	16.75
9	3	0.5936	0.7545	24.48	0.5897	0.7567	21.91	0.5214	0.7202	22.22
9	4	0.5898	0.7503	31.26	0.5699	0.7441	26.69	0.53	0.7376	29.41
9	5	0.5848	0.758	32.14	0.5915	0.7648	31.98	0.5234	0.7012	32.31
11	1	0.5559	0.7519	12.62	0.5806	0.763	12.43	0.4407	0.7134	15.09
11	2	0.5503	0.7458	19.69	0.6052	0.7585	18.26	0.5088	0.7129	20.65
11	3	0.5891	0.7393	23.71	0.5903	0.7591	23.68	0.5225	0.7166	25.76
11	4	0.5623	0.7411	29.76	0.5954	0.7507	28.39	0.5255	0.7179	30.07
11	5	0.5749	0.7437	35.69	0.5624	0.7354	35.64	0.5315	0.7118	35.49

Table 3. Results of 5 experiments with the Weizmann Horse dataset in the three color spaces employing the median technique with

P = 0.3

, and for the SHADE algorithm 100 individuals, 100 generations, and a memory size of 100. The table presents the minimum, maximum, mean ± standard deviation, and median of the F1-scores obtained in each color space and with the specified CDT structure. Additionally, the mean time and two p-values for the statistical tests, namely the Shapiro–Wilk test and the one-way ANOVA test, are included, with a significance level of

α = 0.05

applied to all tests. The values in bold represent the results with the best segmentation, with two highlighted values when the differences between the corresponding color spaces are not statistically significant.

Table 3. Results of 5 experiments with the Weizmann Horse dataset in the three color spaces employing the median technique with

P = 0.3

, and for the SHADE algorithm 100 individuals, 100 generations, and a memory size of 100. The table presents the minimum, maximum, mean ± standard deviation, and median of the F1-scores obtained in each color space and with the specified CDT structure. Additionally, the mean time and two p-values for the statistical tests, namely the Shapiro–Wilk test and the one-way ANOVA test, are included, with a significance level of

α = 0.05

applied to all tests. The values in bold represent the results with the best segmentation, with two highlighted values when the differences between the corresponding color spaces are not statistically significant.

CDT Structure	Metrics	RGB	CIE Lab*	HSV
	Min/Max	0.5445/0.5777	0.5654/0.5843	0.5143/0.5416
$s = 3$	Mean ± St.D.	0.5612 ± 0.0143	0.5714 ± 0.0075	0.5276 ± 0.0103
$d = 3$	Median	0.5627	0.5697	0.5288
	Time (min)	15.81	15.4	14.48
Shapiro–Wilk	p-value	0.6189	0.0660	0.9871
One-way ANOVA	p-value		$1.08 \times 10^{- 4}$ *
	Min/Max	0.5462/0.5793	0.5723/0.5930	0.5028/0.5386
$s = 5$	Mean ± St.D.	0.5678 ± 0.0133	0.5823 ± 0.0075	0.5246 ± 0.0150
$d = 4$	Median	0.5706	0.5817	0.5297
	Time (min)	21.31	21.15	19.54
Shapiro–Wilk	p-value	0.3067	0.9548	0.4637
One-way ANOVA	p-value		$2.33 \times 10^{- 5}$ *
	Min/Max	0.5562/0.5845	0.591/0.6057	0.5023/0.5119
$s = 7$	Mean ± St.D.	0.5660 ± 0.011	0.598 ± 0.006	0.5077 ± 0.0041
$d = 2$	Median	0.5615	0.5966	0.5078
	Time (min)	14.5	14.33	13.7
Shapiro–Wilk	p-value	0.1777	0.7977	0.5394
One-way ANOVA	p-value		$1.12 \times 10^{- 9}$ *
	Min/Max	0.5559/0.5946	0.5699/0.6064	0.5063/0.5149
$s = 9$	Mean ± St.D.	0.5819 ± 0.0152	0.5951 ± 0.0147	0.5110 ± 0.0038
$d = 2$	Median	0.5880	0.5984	0.5101
	Time	17.68	17.21	17.42
Shapiro–Wilk	p-value	0.0952	0.0687	0.3659
One-way ANOVA	p-value		$3.18 \times 10^{- 7}$ *
	Min/Max	0.5503/0.5884	0.5978/0.6122	0.4906/0.5192
$s = 11$	Mean ± St.D.	0.5696 ± 0.0174	0.6041 ± 0.0054	0.5096 ± 0.0115
$d = 2$	Median	0.5719	0.6041	0.5116
	Time	19.94	19.6	20.34
Shapiro–Wilk	p-value	0.3384	0.857	0.2049
One-way ANOVA	p-value		$1.81 \times 10^{- 7}$ *

* The differences between some of the mean F1-scores of color spaces are significant at the

95%

confidence level.

Table 4. The best results per kernel size using the

S H A D E

-

C D T

approach in [10] for image segmentation on the Weizmann Horse dataset, and the corresponding CDT structure results employing the

S H A D E

-

C D T_{c o l o r}

technique in the CIE L*a*b* color space. In both methods, the median selection technique was employed with a proportion of 0.3, and, for the SHADE algorithm, 100 individuals and a memory size of 100 were used. For the number of generations, 200 were considered for the

S H A D E

-

C D T

approach, while 100 were considered for the

S H A D E

-

C D T

method.

Table 4. The best results per kernel size using the

S H A D E

-

C D T

approach in [10] for image segmentation on the Weizmann Horse dataset, and the corresponding CDT structure results employing the

S H A D E

-

C D T_{c o l o r}

technique in the CIE L*a*b* color space. In both methods, the median selection technique was employed with a proportion of 0.3, and, for the SHADE algorithm, 100 individuals and a memory size of 100 were used. For the number of generations, 200 were considered for the

S H A D E

-

C D T

approach, while 100 were considered for the

S H A D E

-

C D T

method.

CDT Structure		$S H A D E$ - $C D T$		$S H A D E$ - $C D T_{c o l o r}$
s	d	F1-Score	Time (min)	F1-Score	Time (min)
3	4	0.4789	48.87	0.5689	18.25
5	4	0.4981	56.95	0.5930	20.13
7	4	0.5151	66.6	0.5865	23.15
9	2	0.5220	86.4	0.6042	17.8
11	2	Not available	Not available	0.6052	18.26

Table 5. Parameter settings for the

S H A D E

-

C D T

and

S H A D E

-

C D T_{c o l o r}

methods with the median selection technique.

Table 5. Parameter settings for the

S H A D E

-

C D T

and

S H A D E

-

C D T_{c o l o r}

methods with the median selection technique.

Population Size	Number of Generations	Memory Size	Kernel Size	Tree Depth	Proportion
$N P$	$N G$	H	k	d	$P$
100	200	100	5	2	0.3 (30%)

Table 6. F1-score results and mean time obtained using the

S H A D E

-

C D T

technique with the full training set (

S H A D E

-

C D T_{f u l l}

), and with the

S H A D E

-

C D T

and

S H A D E

-

C D T_{c o l o r}

methods employing the median selection technique. The parameter values specified in Table 6 were employed for the three methods.

Table 6. F1-score results and mean time obtained using the

S H A D E

-

C D T

technique with the full training set (

S H A D E

-

C D T_{f u l l}

), and with the

S H A D E

-

C D T

and

S H A D E

-

C D T_{c o l o r}

methods employing the median selection technique. The parameter values specified in Table 6 were employed for the three methods.

Method	Mean F1-Score ± St.D.	Mean Time
$S H A D E$ - $C D T_{f u l l}$	0.6283 ± 0.0089	85.8 min
$S H A D E$ - $C D T$	0.6415 ± 0.0069	27.21 min
$S H A D E$ - $C D T_{c o l o r}$	0.5511 ± 0.0260	38.67 min

Table 7. Best parameter settings found for the

S H A D E

-

C D T_{c o l o r}

method using the median selection technique to induce a CDT with the DRIVE dataset.

Table 7. Best parameter settings found for the

S H A D E

-

C D T_{c o l o r}

method using the median selection technique to induce a CDT with the DRIVE dataset.

Population Size	Number of Generations	Memory Size	Kernel Size	Tree Depth	Proportion
$N P$	$N G$	H	k	d	$P$
200	350	100	5	2	0.3 (30%)

Table 8. Best results obtained with the

S H A D E

-

C D T

and

S H A D E

-

C D T_{c o l o r}

methods for the image segmentation of the images in the DRIVE dataset, with the parameter values presented in Table 6 and Table 7, respectively.

Table 8. Best results obtained with the

S H A D E

-

C D T

and

S H A D E

-

C D T_{c o l o r}

methods for the image segmentation of the images in the DRIVE dataset, with the parameter values presented in Table 6 and Table 7, respectively.

Method	Min	Max	Mean ± St.D.	Time	p-Value (Shapiro–Wilk)
$S H A D E$ - $C D T$	0.6318	0.6502	0.6415 ± 0.0069	27.21 min	0.1650
$S H A D E$ - $C D T_{c o l o r}$	0.6468	0.6808	0.6608 ± 0.0141	131.54 min	0.0095

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

López-Lobato, A.-L.; Acosta-Mesa, H.-G.; Mezura-Montes, E. Induction of Convolutional Decision Trees for Semantic Segmentation of Color Images Using Differential Evolution and Time and Memory Reduction Techniques. Math. Comput. Appl. 2025, 30, 53. https://doi.org/10.3390/mca30030053

AMA Style

López-Lobato A-L, Acosta-Mesa H-G, Mezura-Montes E. Induction of Convolutional Decision Trees for Semantic Segmentation of Color Images Using Differential Evolution and Time and Memory Reduction Techniques. Mathematical and Computational Applications. 2025; 30(3):53. https://doi.org/10.3390/mca30030053

Chicago/Turabian Style

López-Lobato, Adriana-Laura, Héctor-Gabriel Acosta-Mesa, and Efrén Mezura-Montes. 2025. "Induction of Convolutional Decision Trees for Semantic Segmentation of Color Images Using Differential Evolution and Time and Memory Reduction Techniques" Mathematical and Computational Applications 30, no. 3: 53. https://doi.org/10.3390/mca30030053

APA Style

López-Lobato, A.-L., Acosta-Mesa, H.-G., & Mezura-Montes, E. (2025). Induction of Convolutional Decision Trees for Semantic Segmentation of Color Images Using Differential Evolution and Time and Memory Reduction Techniques. Mathematical and Computational Applications, 30(3), 53. https://doi.org/10.3390/mca30030053

Article Menu

Induction of Convolutional Decision Trees for Semantic Segmentation of Color Images Using Differential Evolution and Time and Memory Reduction Techniques

Abstract

1. Introduction

2. Materials and Methods

2.1. Differential Evolution Algorithm and SHADE

2.2. Convolutional Decision Trees

2.2.1. SHADE-CDT Technique for CDT Induction

2.2.2. Median Selection Technique for CDT Induction

2.3. Color Images and Color Spaces

2.3.1. CIE L*a*b* Color Space

2.3.2. HSV Color Space

2.3.3. Representation of Color Images

2.4. Convolution Process on Images with Three Channels of Color

2.5. Proposed Model: CDT Induction Using SHADE with Color Images ( S H A D E - C D T c o l o r )

3. Experiments and Results

3.1. Comparison of Color Spaces with the Weizmann Horse Dataset

S H A D E - C D T vs. S H A D E - C D T c o l o r

3.2. Comparison of Induction Time and Precision with the DRIVE Dataset Considering the CIE L*a*b* Color Space

4. Discussion

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Explainability in CDTs of Three Channels

Appendix A.1. Weizmann Horse Dataset

Appendix A.2. DRIVE Dataset

Appendix A.3. Description and Future Analysis

Appendix B. Pseudocodes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.3.1. CIE Lab* Color Space

2.5. Proposed Model: CDT Induction Using SHADE with Color Images ( $S H A D E$ - $C D T_{c o l o r}$ )

$S H A D E$ - $C D T$ vs. $S H A D E$ - $C D T_{c o l o r}$

3.2. Comparison of Induction Time and Precision with the DRIVE Dataset Considering the CIE Lab* Color Space