The purpose of this work is to develop an adaptive system for choosing segmentation methods depending on external conditions (in particular, the level of illumination of the field of attention).
  2.2. K-Medoids
The K-Medoids algorithm, introduced by Leonard Kaufman and Peter Rousseau along with their PAM algorithm [
15], is a clustering technique akin to the K-Means method. Both algorithms involve partitioning the dataset into groups, aiming to minimize the distance between the data points assigned to a cluster and the designated center point of that cluster. However, there are notable distinctions between them.
Unlike the K-Means algorithm, which selects the average value of points within a cluster as its center point, K-Medoids opt for actual data points (referred to as medoids or samples) as the cluster centers. This characteristic enhances the interpretability of cluster centers, as they directly correspond to existing data points. Furthermore, K-Medoids offer the flexibility of utilizing various distance measures, while K-Means typically rely on the Euclidean distance for efficient solutions.
One advantageous aspect of the K-Medoids algorithm is its robustness to noise and outliers. By minimizing the sum of pairwise differences rather than the sum of squared Euclidean distances, K-Medoids exhibit greater resilience in aberrant data points. This sets it apart from K-Means, making it a valuable tool in scenarios where noise and outliers are prevalent.
An implementation from the scikit-learn-extra library was used for the experiment [
16].
  2.3. Fuzzy C-Means (FCM)
The FCM algorithm belongs to fuzzy (soft) clustering methods, which is a form of clustering in which each data point can belong to more than one cluster.
This algorithm operates by assigning a membership value to each data point based on its proximity to each cluster center. The closer a data point is to a cluster center, the higher its membership value for that cluster. The total membership values for each data point across all cluster centers must sum to one.
Fuzzy C-Means clustering was developed by James Dunn in 1973 [
17] and improved by James Bezdek in 1981 [
18].
Suppose that it is necessary to cluster n × m-dimensional data points represented by  (i = 1, 2, …, n).
The algorithm returns a list of 
c clusters, enters 
 and a partition matrix 
, 
 k, where 
 indicates the degree of belonging to the element 
 to cluster 
. Here, 
 and 
. The FCM algorithm is aimed at minimizing the current objective function:
        where 
m is the parameter of the fuzzy partitioning of the matrix.
An implementation from the Fuzzy C-Means library was used for the experiment [
19].
  2.4. Possibilistic C-Means (PCM)
To prevent outliers, another clustering technique was proposed by Krishnapuram and Keller (1993), called PCM [
20]. In contrast to the FCM algorithm, the membership value generated by the PCM algorithm can be interpreted as “the degree of membership or compatibility or typicality”. Degrees of typicality are determined to construct prototypes that characterize subcategories of data, taking into account both the common features of category members and their distinctive features compared to other categories. Typical values about one cluster do not depend on other clusters’ prototypes. The degree of typicality helps distinguish between very atypical and partially atypical members of a cluster [
21].
The PCM algorithm relaxes the row sum constraint of the FCM algorithm. The main limitation of the PCM algorithm is that each membership value in 
U can be anything between 0 and 1 or equal to any of them, i.e., 0 ≤ 
 ≤ 1. So, these values are called the typical characteristics of the data points in each cluster. The objective function of the PCM algorithm can be formulated as follows:
        where 
n is the total number of samples in a given dataset; 
c is the number of clusters; 
m is a parameter that determines the degree of blurring of the partition; 
 is the distance; and 
U = [
] is the fuzzy matrix partitioning.
 is the scale or typicality parameter and is calculated from the data with the following formula:
        where 
n is the total number of samples in a given dataset; 
m ∈ [1, ∞) is a parameter that determines the degree of blurring of the partition; 
 and 
 are data attributes and cluster centroids; and 
U = [
] is the fuzzy partitioning of the 
 matrix, consisting of degrees of membership of the sample 
 to each cluster 
j.
The membership of 
 value, in the case of the PCM algorithm, is calculated from the following formula:
        where 
 is the distance and 
 is the scale parameter.
An implementation from the scikit-
c-means library was used for the experiment [
22].
  2.5. Possibilistic Fuzzy C-Means (PFCM)
To obtain a stronger candidate for fuzzy clustering, Pal, Pal, Keller, and Bezdek proposed the PFCM algorithm in 2005 [
23]. The PFCM algorithm can avoid overlapping clusters and, at the same time, is less sensitive to outliers. The PFCM algorithm uses a combination of the objective functions of the PCM and FCM algorithms. The objective function of the PFCM algorithm is:
The relative significance between membership values and typicality values is determined by parameters a and b (Timm et al., 2004) [
24].
Objective function  is minimized by ,  i, j, m and η > 1, and X containing at least k different data.
The degree of belonging is updated according to the following formula:
The value of typicality is according to the following formula:
Prototypes are based on the following formula:
An implementation presented in [
25] was used for the experiment.
  2.6. Fuzzy Possibilistic C-Means (FPCM)
Fuzzy Possibilistic C-Means (FPCM) is an extension of the classic Fuzzy C-Means (FCM) clustering algorithm. Similar to FCM, FPCM is a soft clustering algorithm that assigns to each data point several clusters with different degrees of membership. However, unlike FCM, FPCM allows you to take into account additional uncertainty in the clustering process by introducing a possible term to the objective function.
In FPCM, each data point is represented by a vector of membership values, where each value reflects the degree to which the point belongs to a certain cluster. The possibility term of the objective function suggests that a data point may not belong to any cluster, not with absolute certainty, but with some degree of possibility. This allows FPCM to better handle noise and outliers in the data compared to FCM.
The objective function of the FPCM algorithm includes degrees of membership and typicality as shown in the following equation:
Provided that
        
        where 
m and 
η exponents of vagueness and typicality. Taking into account the given restrictions and optimization conditions of c-means 
, we determine the following initial conditions or extrema of the objective function in terms of the theorem of Lagrange multipliers:
  2.7. Gustafson–Kessel (GK)
The Gustafson–Kessel (GK) algorithm is a clustering algorithm that extends the well-known Fuzzy C-Means (FCM) algorithm to handle data with different cluster shapes and sizes. It was proposed by Dr. David Gustafson and William Kessel in 1979 [
26].
The algorithm returns a list of 
k clusters with centers 
 The main feature of the GK algorithm is the local adaptation of the distance metric to the cluster shape by estimating the cluster covariance matrix and the corresponding adaptation of the distance matrix. The objective function for the GK algorithm is defined as
        
In this algorithm, each cluster is associated with a separate matrix 
. Matrices 
 are used as optimization variables in the c-means functional, thus allowing each cluster to adapt the distance norm to the local topological structure of the data. The distance between the data point 
 and the center of the cluster 
 is
        
This objective function cannot be directly minimized concerning 
, because it is linear concerning 
. To obtain a feasible solution, 
 must be bounded in some way. A common way to achieve this is to restrict the determinant of 
:
The coefficient 
 determines the volumes of individual clusters (if we do not know about the problem, we can assume 
). Using the method of Lagrange multipliers, the following expression for 
 was obtained
        
        where 
, the so-called fuzzy covariance matrix of 
j-th cluster, is obtained from the formula:
The initialization of the algorithm requires the definition of the same parameters as in the FCM algorithm. The GK algorithm finds clusters of any shape but requires more calculations than the FCM algorithm due to the need to calculate the determinant and the inverse matrix  at each iteration.
An implementation presented in [
27] was used for the experiment.
  2.8. Entropy-Based Fuzzy (EBF)
Yao et al. in 2000 presented an Entropy-based Fuzzy clustering algorithm [
28]. In this algorithm, the entropy values of the data points are first calculated. Then, the data point with the minimum entropy value is selected as the center of the cluster. Data points that are not chosen in any of the clusters are called outliers. Consider a set 
X of 
N data points in an 
M-dimensional hyperspace, where each data point 
 is represented by a set of 
M values (i.e., 
). Thus, the dataset can be represented by an 
N × 
M matrix. The values of each dimension are normalized in the range [0.0–l.0]. The Euclidean distance between any two data points (for example, 
i and 
j) is defined as follows:
The entropy value between two data points is in the range [0.0–1.0]. It is very small (close to 0.0) for very close or very distant pairs of data points and very high (close to 1.0) for those data points separated by a distance close to the average distance of all pairs of data points.
The total entropy value at data points 
 relative to all other data points is calculated as
        
        where 
 is the similarity between 
 and 
 and is normalized at the [0.0–l.0] interval. During clustering, the data point with the minimum entropy value is selected as the center of the cluster. The similarity between any two points (i.e., 
i and 
j) can be calculated as follows:
        where 
α is a numerical constant. Experiments with different values for 
α show that it should be robust for all types of datasets, not just for certain datasets. The 
α value is calculated based on the assumption that the similarity value 
 is set to 0.5, when the distance between two data points 
 is equal to the average distance 
, which is calculated as
        
From (21), we can calculate 
α as
        
So, α is determined by the data and can be calculated automatically.
  2.9. Ridler–Calvard (RC)
The Ridler–Calvard method [
29] is a method for determining the threshold value of an image, which is a process of converting a grayscale image into a binary image by dividing pixels into two groups: pixels that exceed a certain threshold value and those that are below it.
The method is based on the idea of maximizing the interclass variance of two groups of pixels. Interclass variance is a measure of how well two groups are separated from each other. The threshold value that maximizes this variance is chosen as the optimal threshold value.
The Ridler–Calvard method begins by assuming an initial threshold value and computing the average values of pixels above and below the threshold. After that, it iteratively adjusts the threshold value based on the average values until the difference between them is minimized.
The foreground and background cluster values are given as 
 and 
, respectively, and are defined mathematically as:
        where 
 is the gray level value 
 and 
 is the gray-level probability mass function (PMF) of g. The PMF is calculated from the image histogram by normalizing it to the total number of samples.
 is the new threshold value, which is calculated by averaging 
 and 
 as
        
These operations are repeated until the difference value is less than the given value of ε.
An implementation from the Mahotas library was used for the experiment [
30].
  2.10. Kohonen Self-Organizing Maps (SOMs)
The Self-Organizing Map (SOM) is a specific type of artificial neural network that differs from other neural networks in its training approach proposed by Teuvo Kohonen [
31]. Instead of employing error-correcting learning methods like backpropagation with gradient descent, SOM utilizes concurrent learning.
Similar to most artificial neural networks, self-organizing maps operate in two distinct modes: learning and mapping. During the learning phase, a set of input data, known as the “input space,” is utilized to construct a reduced-dimensional representation called the “map space.” This mapping process enables the classification of additional input data using the generated map.
The map space is composed of components referred to as “nodes” or “neurons,” arranged in a two-dimensional hexagonal or rectangular grid. The number and specific locations of these nodes are predetermined based on the desired objectives of the data analysis and research.
Each node in the map space is associated with a “weight” vector, representing its position in the input space. While the nodes in the map space remain fixed, the learning process entails adjusting the weight vectors towards the input data, typically by reducing a distance metric like Euclidean distance. Importantly, this adjustment must not disrupt the topology established by the map space.
Following the training phase, the map can be employed to classify additional observations from the input space. This is achieved by identifying the node with the closest weight vector (i.e., the smallest distance metric) to the input space vector.
The primary objective of self-organizing map learning is to induce similar responses to specific input patterns across different parts of the network. This phenomenon partly mirrors the processing of visual, auditory, or sensory information in specific regions of the human cerebral cortex.
The weights of the neurons are initialized either with small random values or by uniformly selecting values within the subspace spanned by the two largest eigenvectors of the principal components. The latter alternative leads to faster learning since the initial weights provide a reasonable approximation of the SOM weights.
To train the network effectively, a considerable number of example vectors, ideally representing the expected vector types during mapping, are fed into the network. These examples are often introduced multiple times through iterations.
During training, when an example is presented to the network, its Euclidean distance to all weight vectors is computed. The neuron with the weight vector most similar to the input is designated as the “best-matching unit” (BMU). The weights of the BMU and the neurons in proximity to it in the SOM grid are adjusted based on the input vector. The magnitude of this adjustment decreases over time and with increasing distance from the BMU. The update formula for neuron 
v i-th weight vector 
 is calculated accordingly.
        
        where 
s is the step index, 
t is the index in the training sample, 
u is the BMU index for the input vector 
D(
t), 
α(
s) is the monotonically decreasing learning rate, and 
θ(
u, 
v, 
s) is a neighborhood function that defines the distance between neuron 
u and neuron 
v at step 
s.
The neighborhood function, denoted as θ(u, v, s) or the lateral interaction function, plays a vital role in the self-organizing map. It depends on the distance between the best matching unit (BMU) neuron u and neuron v within the grid. The simplest form of the neighborhood function assigns a value of 1 to neurons that are close enough to the BMU and 0 to others. However, Gaussian functions and Ricker wavelets are also commonly used alternatives. Regardless of the specific form chosen, the neighborhood function gradually decreases over time.
During the initial stages when the neighborhood is broad, self-organization occurs on a global scale. As the neighborhoods shrink to pairs of neurons, the weights start to converge toward local estimates. In some implementations, both the learning coefficient α and the neighborhood function θ decrease gradually as the parameter s increases. In other cases, particularly when the training dataset is traversed by the parameter t, the decrease occurs stepwise, once every T steps. This iterative process is repeated for each input vector over a typically large number of λ cycles. Ultimately, the network associates the output nodes with groups or patterns present in the input dataset. If these patterns are identifiable, their names can be linked to the corresponding nodes in the trained network.
During the mapping phase, a single winning neuron is determined—the neuron whose weight vector is closest to the input vector. This determination can be made by simply calculating the Euclidean distance between the input vector and the weight vector.
An implementation from the sklearn-som library was used for the experiment [
32].