Multi-Objective Automatic Clustering Algorithm Based on Evolutionary Multi-Tasking Optimization

: Data mining technology is the process of extracting hidden knowledge and potentially useful information from a large number of incomplete, noisy, and random practical application data. The clustering algorithm based on multi-objective evolution has obvious advantages compared with the traditional single-objective method. In order to further improve the performance of evolutionary multi-objective clustering algorithms, this paper proposes a multi-objective automatic clustering model based on evolutionary multi-task optimization. Based on the multi-objective clustering algorithm that automatically determines the value of k , evolutionary multi-task optimization is introduced to deal with multiple clustering tasks simultaneously. A set of non-dominated solutions for clustering results is obtained by concurrently optimizing the overall deviation and connectivity index. Multi-task adjacency coding based on a locus adjacency graph was designed to encode the clustered data. Additionally, an evolutionary operator based on relevance learning was designed to facilitate the evolution of individuals within the population. It also facilitates information transfer between individuals with different tasks, effectively avoiding negative transfer. Finally, the proposed algorithm was applied to both artificial datasets and UCI datasets for testing. It was then compared with traditional clustering algorithms and other multi-objective clustering algorithms. The results verify the advantages of the proposed algorithm in clustering accuracy and algorithm convergence.


Introduction
Clustering is an indispensable technique in data mining [1].It has attracted widespread interest in image segmentation [2], social network analysis [3], text analysis [4], and so on.Clustering is an important unsupervised classification technique [5], which is suitable for revealing the underlying distribution of data.It can divide vectors in multidimensional space into different categories.Clustering techniques can be roughly divided into several categorization methods, such as partition-based, density-based, grid-based, hierarchical, and model-based clustering [6].
Among the partition-based clustering algorithms, the representative algorithms are k-means (KM) [7], mixed density clustering [8], fuzzy clustering represented by fuzzy C-means (FCM) [9], and graph clustering algorithms [10].Hierarchical clustering is usually represented as a hierarchical tree structure [11].According to the direction of constructing the tree structure, clustering can be divided into bottom-up [12] and top-down [13] construction methods.Traditional clustering algorithms are often limited to a criterion evaluation function.Only by selecting appropriate clustering evaluation criteria can an effective clustering result be obtained [14].In the single-objective clustering algorithm [15], the evaluation function is often designed according to the dataset and the actual problem to be solved, and the number of clusters needs to be specified in advance.However, neither the cluster center nor the number of clusters can achieve adaptive discrimination [16,17].Since clustering is often an unsupervised task with a general lack of prior knowledge [18], coupled with the explosion of data dimensions and sparse high-dimensional datasets, it is difficult for traditional single-objective clustering algorithms to obtain efficient clustering solutions [19].
The clustering algorithm is simple, fast, and efficient [20], but it is easy to fall into local optimum, the number of clustering classes needs to be specified in advance, and the robustness is not strong [21].With the increasing scale and dimension of data in the real world, single-objective algorithms can no longer meet the modeling of more and more complex problems.Therefore, it is necessary to build a multi-objective algorithm with stronger problem description ability and seek a fast and efficient multi-objective optimization method [22].In [23], Handl et al. proposed the multi-objective clustering algorithm, MOCK, and compared it with three single-objective clustering algorithms.Finally, it was found that the clustering algorithm can benefit from multiple objectives.Gong et al. [24] proposed an improved multi-objective clustering framework, IMCPSO, using particle swarm optimization.This framework designs a novel particle representation of clustering problems to help PSO search for clustering solutions in continuous space, and designs a leader selection strategy according to the distribution of Pareto sets, so that the algorithm can avoid falling into local optimum.The experimental part proves the advantages of multi-objective clustering algorithm compared with the single-objective clustering algorithm through a large number of experiments.In [25], Wang et al. considered the number of clusters and the square sum of the distance between the data point and its cluster centroid as the objective, and established a bi-objective clustering model, EMO-KC, to ensure that all clustering results with different k values are effectively obtained in a single run.Different from the single objective optimization problem, the multi-objective problem contains multiple conflicting optimization objectives [26], and a single solution cannot achieve the optimization of multiple objectives at the same time.Under the condition of satisfying the constraints, effectively balancing multiple conflicting objective values to solve multi-objective optimization problems (MOP) is critical [27].From the perspective of optimization, the clustering problem can be regarded as an NP-hard problem, and traditional optimization methods cannot be directly applied to solve multi-objective problems [28].As a meta-heuristic algorithm, the evolutionary algorithm is often used to solve NP-hard problems, so multi-objective optimization methods for clustering problems become very meaningful [29].Handl et al. [23] first proposed a multi-objective adaptive clustering algorithm, MOCK, using PESA-II as an optimization tool.The algorithm takes the intra-class homogeneity and inter-class connectivity of clustering as optimization objectives, and initializes the population by minimum spanning tree and k-means.In the clustering phase, MOCK can adaptively adjust the number of clusters without specifying the number of clusters in advance.Finally, the method of gap statistics was used to select the best result from the non-dominated solutions generated by the multi-objective optimization algorithm.MOCK does not require a priori knowledge of the samples, and can adaptively determine the number of clusters, which is better than the single-objective clustering method.Hruschka et al. [30] introduced the general view of evolutionary multi-objective clustering (EMOC) in their review of evolutionary clustering algorithms.In 2012, Bong et al. [31] proposed a multi-objective clustering method applied to image segmentation.In 2013, Mukhopadhyay et al. [32] conducted a survey on multi-objective evolutionary methods for data mining, in which the authors introduced the general characteristics of EMOC and proposed some algorithms.
At present, most of the multi-objective clustering algorithms are often only for datasets with simple data distribution, and can only optimize one clustering task [33].For multidomain datasets with multiple related clustering tasks, because of the limitations in the expression of genotypes, the design of crossover operators, and population search strategies, multi-objective clustering algorithms cannot have a good performance in multi-domain datasets [34].Simply applying the multi-objective clustering algorithm to complex clustering tasks will only increase the degree of negative transfer of genetic material between individuals and reduce the ability of the population to find the global optimal solution [35].Therefore, this paper focuses on the positive transfer effect of evolutionary multi-task algorithms [36][37][38][39] between multiple clustering tasks.Gupta et al. [40] proposed a multifactorial evolutionary algorithm, MFEA, or an evolutionary multi-task algorithm, which uses a single evolutionary population to solve multiple optimization problems simultaneously by exploiting the implicit parallelism of population search.
In order to solve the above difficulties, this paper proposes a multi-task multi-objective automatic clustering model based on the MFEA framework and multi-objective clustering algorithm.The algorithm has multi-task solving ability to solve the problem of multidomain datasets.The overall deviation and connectivity index are used as the two objective functions of clustering to obtain a set of non-dominated solutions with respect to the clustering parameters.Moreover, a location-based adjacency graph (LAG) encoding method was designed, and a mutation and crossover operator based on relevance learning between tasks was designed for information transmission between populations to reduce the impact of negative transfer on population convergence [41].The specific contributions are as follows: 1. Firstly, the idea of evolutionary multi-task learning is integrated into the framework of multi-objective clustering, and the model of a multi-task multi-objective automatic clustering algorithm (MFMOCK) is established.Through the exchange of genetic material between individuals in the population, the relevant knowledge between different tasks is implicitly transferred, so as to optimize multiple tasks at the same time; 2. Based on the locus-based adjacency graph (LAG) encoding method, an individual representation suitable for evolutionary multi-tasks was designed, which provides a basis for the exchange of genetic information between tasks; 3. The crossover operator between individuals with different skill factors (solving different clustering tasks) was designed to limit the search direction of the algorithm, reduce the adverse evolution caused by random crossover, and accelerate the convergence speed of each task at the same time.
The remainder of this paper is organized as follows.Section 2 introduces the technical basis of the algorithm framework, including multi-objective optimization, MOCK, the multitasking evolutionary algorithm, and model correlation learning.Section 3 introduces the specific steps and implementation details of the multi-tasking and multi-objective clustering algorithm, including the innovation and improvement of the coding method, local search strategy, and individual crossover and mutation strategy.Section 4 records the setup and results of the comparison test, and analyzes the experimental results.Section 5 is the conclusion of this paper, which summarizes and describes the algorithms and experimental results proposed in this paper.

Background
In this section, the MOCK clustering model is firstly introduced in detail, and then the task correlation learning model is derived and its application in multi-task clustering is introduced.

Multi-Objective Optimization
Without loss of generality, the multi-objective optimization problems (MOPs) [42] discussed in this paper can be set to the minimum value.A MOP can be expressed as follows: min where Ω is the feasible space, x is the solution of the MOP, R m is the target space, F : Ω → R m consists of m real-valued objective functions.In most cases, the objectives in the MOP are contradictory, which means that it is impossible for any point in the feasible space to minimize all objectives at the same time.Therefore, the design of multi-objective optimization is to simultaneously find the best trade-off between them.For minimization, if and only if one solution x u is better than another solution If there is no solution in Ω such that F(x) dominates F(x * ), we call x * the Paretooptimal solution.We make a series of F(x * ) composed of x * become the Pareto-optimal vector.The objectives in the Pareto-optimal vector has such a relationship that the reduction of one objective will lead to the increase of other objective.All Pareto-optimal points constitute a set, which is called the Pareto-optimal set, and the corresponding Paretooptimal objective vector is called the Pareto-optimal frontier (PF) [43].
Evolutionary algorithms are usually used as an important tool for solving multiobjective problems.Because of the inherent parallelism of population iteration in multiobjective evolutionary algorithms, a near-optimal solution set covering the entire Pareto front can be generated in one iteration [44].Deb et al. proposed the NSGA-II algorithm using a fast non-dominated sorting strategy [45].The MOPSO algorithm proposed by Coello and Pulido et al. adopts the adaptive selection strategy based on a grid, and applies the new mutation strategy to the population evolution to ensure the diversity of solutions [46].Zhang et al. proposed the MOEA/D algorithm, which is a multi-objective evolutionary algorithm based on decomposition [47].The algorithm takes the Chebyshev method, weight method, and boundary intersection method as examples, decomposes the multi-objective problem into multiple single-objective problems, and, at the same time, through the iterative optimization of sub-problems, it alleviates the issue of large search space and slow iterative evolution in multi-objective optimization, and reduces the time complexity of the algorithm.

MOCK Clustering Model
MOCK is a multi-objective clustering algorithm that can automatically determine the number of clusters.The algorithm consists of two main stages: the clustering stage and model selection stage.
Clustering stage: The clustering phase of MOCK is based on the multi-objective evolutionary algorithm: the Pareto envelope-based selection algorithm version 2 (PESA-II), which optimizes the two clustering objectives of overall deviation and connectivity.In this stage, the minimum spanning tree and k-means algorithm are used to generate the initial solution set to obtain different numbers and shapes of clusters.The overall deviation is calculated as follows: where C is the set of all clusters, µ k is the centroid of the cluster , and D(•, •) is the chosen distance function.The connectivity calculation is as follows: where nn ij is the j-th nearest neighbor of datum i, N is the size of the clustered dataset, and L is a parameter determining the number of neighbors that contribute to the connectivity measure.
In MOCK, each individual is encoded using locus-based adjacency representation, where each data point is represented by a gene, and the allele value of the gene defines the connection between the data points.The offspring are generated by combining the genetic information of the two parents using the uniform crossover method.In addition, based on the concept of nearest neighbor, the neighborhood-based mutation is designed to change the connection relationship of data points.
Model Selection Stage: The model selection phase uses gap statistics to determine the number of clusters in the dataset.In addition, the Poisson model is used to generate control data in the feature space to obtain the expected connectivity and the overall deviation value of the unstructured data.Then, the solution and the control front are aligned and normalized, and the score of each point on the solution front is calculated.By analyzing the relationship between the score and the number of clusters, the local optimal solutions are identified, which are considered to be promising solutions.The global maximum may be considered as the best solution for estimation.
The specific implementation of MOCK can be found in [23].

Task Correlation Learning Models
Multi-task clustering improves the clustering performance of each task by transferring useful knowledge between related tasks [48].In many practical applications, only part of the latent classes can be shared between tasks.If the correlation between tasks is not considered, it may lead to the problem of negative transfer.Zhang et al. proposed multi-task clustering with model relation correlation learning [49].Intra-task clustering introduces symmetric non-negative matrix factorization through a linear regression model, and then clusters each task.Among tasks, the parameters of the linear regression model in each task are updated by learning the correlation of model parameters [50].The block diagram of the algorithm for task relevance learning is as follows: Given m tasks, each of which has a set of datasets, i.e., X t = x t 1 , x t 2 , . . ., x t n t ∈ i d×n t , t = 1, . . ., m; where n t is the amount of data for the tth task and d is the dimension of the feature vector.The similarity matrix of the tth task is given by M t ∈ i n t ×n t .The data X t for each task are divided into h t clusters, i.e., C t = C t 1 , C t 2 , . . ., C t h t .The cluster label matrix of the tth task is given by Y t ∈ i n t ×h t .
The objective function of multi-tasking model relational relevance learning is formulated as follows: This formulation is mainly composed of two parts: intra-task clustering and inter-task correlation learning.The first part of intra-task clustering is mainly used to cluster within each task and update the clustering clusters through the knowledge transferred from other tasks.The formula for this part is as follows: min The first term of the formula is symmetric non-negative matrix factorization (SNMF), and M t is the similarity matrix of task t.The second and third terms of the formula are linear regression models, which are used to fit the clusters in each task, and the features of the input vectors are used to fit and predict the output vectors.However, W t is the regularization term, which can prevent overfitting.
The second part is inter-task correlation learning, which learns cross-task correlation by computing the correlation of model parameters between multiple clusters in each pair of tasks.The relationship of the same cluster across different tasks is explored by learning the correlation of the model parameters corresponding to each cluster, and then the model parameters of each task can be updated by the model parameters of other tasks.min The first term of this formula can ensure that the models with a closer similarity distance have higher corresponding similarity, while the second term can prevent the situation that the similarity between similar models is 1, while the similarity between other models is 0. W s j is the linear regression parameter for the jth cluster of the task s, and G ts is the cluster correlation coefficient matrix between task t and task s, where G ts ij is the correlation coefficient between W t j and W s j .

Multi-Task Multi-Objective Automatic Clustering Algorithm
In order to make full use of the parallelism implied by a population search, and extend multi-objective optimization to multi-tasking multi-objective optimization to optimize multiple similar clustering tasks at the same time, this section proposes a multi-objective clustering learning framework based on evolutionary multi-tasking optimization, which is also a multi-tasking multi-objective clustering framework.It tries to improve the clustering performance of each task while accelerating the iteration speed of each task.This section introduces the specific MFMOCK framework according to the process and technical points, and presents the complexity analysis of the algorithm.Tables 1 and 2 are the abbreviation declaration table and the nomenclature declaration table, respectively.

Framework of MFMOCK
This part gives the basic framework and specific algorithm steps of the multi-task multi-objective automatic clustering algorithm (MFMOCK).NSGA-II [45] is used as the basic framework of MOEA, and the algorithm extends the single-task multi-objective optimization algorithm into a multi-tasking evolutionary algorithm.The individual coding method, local search strategy, genetic information transmission method between tasks, and crossover and mutation operators are mainly modified for the evolutionary multi-tasking framework.The framework of the MFMOCK is shown in Algorithm 1.
The m clustering tasks are optimized simultaneously, and the jth task is denoted as T j , whose objective function is defined as f j : X j → i, where X j is the search space.Then, the evolutionary multi-tasking clustering optimization proposed in this paper can be expressed as {x 1 , . . . ,x m } = argmin{ f 1 (x), .., f m (x)}; in order to solve this problem by using evolutionary algorithm, in the evolutionary multi-tasking framework, each individual p i ,i ∈ {1, 2, . . ., |P|} in population P has a set of attribute characteristics, and p i in the m clustering optimization problems can be decoded as x i 1 , . . ., x i m , where x i 1 ∈ X 1 , . . ., x i m ∈ X m .Taking individual p i as an example, its attribute characteristics are as follows: 1. Individual evaluation value: The individual evaluation value of p i in task T j is , and the corresponding ranking position is r i j ; 2. Individual fitness: The individual fitness of P i is defined as φ i = 1/ min j∈{1,...,m} r i j ; 3. Skill factor: The skill factor of P i is defined as τ i = arg min i r j i , j ∈ {1, . . ., m}, that is, the number of tasks for which individual performance is optimal.
Algorithm 1 Pseudocode for the MFMOCK algorithm.Input: evolution iterations gens max , Population size pop, Crossover and mutation probabilities p c and p m , the dataset containing m tasks.Individuals are selected to enter LearningPool based on the evaluation results

7:
The parents are selected through a binary tournament 8: Generation of offspring population C, refer to Algorithm 2.

9:
Individuals are evaluated according to the skill factor.Update the fitness φ and skill factor τ. 12: The next generation of individuals is selected according to the φ and τ. 13: end while 14: A solution evaluation strategy based on controlled frontier distance is used to select the final solution.
Firstly, based on MOCK, MFMOCK uses the idea of evolutionary multi-tasking, in which each task contributes an additional factor that affects the evolution of the population.Moreover, the skill factor of an individual represents the task number that the individual is best at, and individuals with different skill factors contribute evolutionary factors to the population through chromosome crossover.Due to the unified individual coding method, the data and corresponding features in different tasks are mapped in the same search space, so that in the process of population evolution, each task also provides information for the optimization of other tasks.The unified search space means that the solutions of different tasks are contained in a unified genetic information base, which enables MFMOCK to exploit the potential genetic complementarity between multiple clustering tasks, so as to effectively discover useful genetic information and implicitly transfer it from one task to another.Secondly, because the amount of data in different clustering tasks may be inconsistent, the genotype length of individuals in different tasks obtained by the classical LAG encoding method is inconsistent, which makes it impossible for individuals in the population to cross.The multi-task adjacency representation method is used to solve this problem.
The algorithm also adopted the elite retention strategy of LearningPool, which set a fixed size of LearningPool to maintain a large number of different non-dominated solutions with different skill factors found in the search process.Using the histogram method based on density estimation, the solutions cover the whole objective space instead of gathering in the same region.At the end of each generation, the non-dominated solutions in the LearningPool are updated, new non-dominated solutions are added, and the dominated solutions are eliminated.Multi-objective clustering does not generate a single solution, but a set of clustered solution sets, with different solutions corresponding to different trade-offs between the two objectives.The model selection strategy of the algorithm adopts the solution selection strategy based on the control front (CF).According to the Poisson model in the feature space of the clustered data, the control data are obtained.Specifically, the principal component analysis method is applied to the covariance matrix of the original data to obtain the eigenvectors and eigenvalues, and then the control data are generated.Thus, an estimate of the connectivity and overall bias of unstructured data, which do not contain the original data but are located in the same data space, is obtained.For the non-dominated solution on the actual Pareto front (PF), the distance between the solutions and the point on the control front is calculated, and the non-dominated solution with the largest shortest distance is considered to be the optimal solution.The calculation formula is as follows: In addition, the crossover operator of multi-tasking correlation learning is combined, which not only enables the individuals to exchange genetic information between tasks, but also uses the correlation learning model between tasks to quadratic program the linear regression model parameters between different tasks in the process of individual crossover to extract the correlation between task models.The correlation coefficient is used to minimize the difference between the parameters of different regression models, so that the related tasks have similar clustering results, and the knowledge related to the transfer task is realized, which is ultimately reflected in the change of individual genotypes.Moreover, as a population search strategy, the crossover operator avoids meaningless crossover between individuals with different skill factors, and accelerates the search speed of the population individuals for the global optimal solution in the evolution process.

Multi-Task Adjacency Coding Based on Locus Adjacency Graph
In order to apply the evolutionary multi-tasking framework to solve multi-objective clustering problems, it is necessary to modify and design the population coding method, the crossover operator between individuals with different skill factors in the population, and the evolution strategy.The above contents are introduced in details in this subsection.
In the coding method of the population, the coding method adopted in this algorithm is the multi-tasking adjacency coding method based on the trajectory adjacency graph (TAG).In the TAG, the encoding of each individual can be represented as a data connection graph, and can be decoded as a clustering result.Among them, each data point in the task is represented as a node in the graph, the connected data are divided into the same cluster, and each individual corresponds to k clusters after the algorithm runs.For example, individual p i contains n alleles p j i , j ∈ {1, . . ., n}, if the gene at position f is denoted as l, then the TAG of this gene is denoted as p f i → p l i , and the data points f and l are connected and belong to the same cluster.The main advantage of the trace-based adjacency coding scheme for solving the clustering problem is that the number of clusters does not need to be fixed in advance, since it is determined automatically in the decoding step.
In Figure 1, taking the clustering task as an example, the task has nine pieces of data, each data point represents a node i ∈ {1, 2, . . ., 9}, the genotype of the individual represented in the figure is 2, 4, 2, 1, 5, 5, 9, 7, 8, the solid line represents the connection track between the samples, the dotted line represents the corresponding cluster, and all the data are divided into three categories.Each bit of the gene in the individual genotype represents the sequence number of the node that is pointed to from the current node.In this representation, starting from a node, only a connecting line can lead to another node.
In a single multi-objective clustering task, since the number of samples is fixed, the genotype length of each individual in the population is consistent, and the gene length problem does not need to be considered for interindividual crossover.However, in the multi-tasking environment, the sample number of each task may not be the same, so when the same individual is decoded into different tasks, the genotype length required is different.Therefore, a unified encoding method, namely, the multi-tasking adjacency representation method, needs to be designed.As shown in Figure 2, the genotypes of individuals are all non-negative real numbers, so the non-negative real numbers need to be mapped into [0,1].When optimizing m tasks simultaneously, assume that the dimension of the jth task is D j ; thus, in the unified search space, its dimension D multitask = max(D j ).In the population initialization step, each individual is a vector of random variables of dimension, and each gene is within a fixed range [0, 1].If the skill factor τ i = j of individual i, the first D j genes of individual i need to be taken for individual crossover, mutation, coding, and decoding operations.
In the initialization process of the algorithm, the initialization strategy based on MST and k-means is mainly used, because different single-objective clustering algorithms have good clustering effects in different regions of the Pareto front, and the algorithm based on connectivity often generates the optimal solution in those regions of the Pareto front with low connectivity.However, compactness-based algorithms perform well in regions with low overall bias.However, MST and k-means are clustering algorithms based on connectivity and compactness, respectively, so the initial solution generation strategies of MST and k-means are used.Since different clustering objectives can reflect completely different optimization criteria for clustering solutions, two complementary objective functions are selected in this algorithm: one is based on compactness and the other is based on cluster connectivity.Cluster compactness is expressed by calculating the overall deviation (Dev) of different cluster partitions, which can be calculated as the total distance between a data point and its corresponding cluster center: where C is the set of clusters, u k is the cluster center, and δ(, ) is the distance function (we will choose Euclidean distance here).As the objective function, the overall bias should be minimized.Another objective function that reflects the clustering connectivity criterion is the connectivity index (Con), which evaluates the degree to which neighboring data points are classified into the same cluster, and is formulated as follows: where n ij is the jth nearest neighbor of data i, N is the size of the clustered dataset, and L is a parameter that measures the number of neighbors that contribute to the connectivity.As an objective function, the connectivity index should be minimized.

Evolutionary Operators Based on Correlation Learning
According to the design principle of MFEA, two randomly selected parents must meet certain conditions before they can cross.During crossover, parents with the same cultural background (skill factor) prefer the exchange of genetic material between them.Therefore, if two randomly selected parents have the same skill factor τ, they can freely cross.Crossover is performed according to random mating probability (rmp), otherwise mutation occurs.The pseudocode for the algorithm is as follows (Algorithm 2): In the proposed algorithm, individuals with the same skill factor correspond to the same clustering task.If the skill factor of two parents is the same, the two individuals correspond to the same clustering task, and the number of samples, the individual coding length, and the data distribution corresponding to the same clustering task are the same, so the crossover between individuals can be carried out according to the simulated binary crossover (SBX).In addition, the transfer of learning between tasks is also crucial because there is related knowledge between tasks.Although the evolutionary multi-task framework has great advantages in solving multi-tasking problems by using evolutionary algorithms, the exchange of genetic material between tasks of the original MFEA algorithm cannot accelerate the convergence of the clustering task, and the crossover between individuals with different skill factors may introduce the learning effect of the face.So, it is necessary to design new crossover operators between individuals with different skill factors.The new crossover operator based on multi-tasking introduces the idea of model parameter learning between tasks, so that when individuals with different skill factors exchange genetic material, the direction of individual change can be restricted to the direction of clustering learning convergence.The pseudocode for the algorithm is as follows (Algorithm 3): Algorithm 3 Evolutionary operators based on correlation learning 1: Step1: Decoding. 2: The genotypes of parent individuals P a and P b are decoded according to the number of samples in the task, whose skill factors are τ a and τ b , respectively, and the corresponding clustering results r a and r b are obtained by decoding.3: Step2: Initialization. 4: The cluster labels h t of the inter-task correlation learning model are initialized according to the clustering results r a and r b . 5: Initialize the similarity matrix M t based on the similarity measure of the dataset (in this case, Euclidean distance).6: The category indicator label Y t of the task is initialized according to the k-means algorithm.7: Two parent individuals P a and P b are randomly picked from the contemporary population P. 8: W t will be initialized to an all-unity matrix of n t × h t size.9: Step3: Learning inter-task parameter relevance.The task relevance parameters G (t,s) are calculated.When the individuals with different skill factors are crossed, the sample numbers n a ,n b of the corresponding task are obtained according to the skill factors of the parents P a and P b , the first n genes of the individual are decoded, the genes in [0,1] are linearly mapped to [1,n], and the corresponding clustering results r a and r b of the parents are obtained.The cluster label of each sample corresponding to the task is initialized according to the clustering result, and other related matrices are initialized according to the initialization strategy in the algorithm block diagram.In step 3, inter-task correlation learning, the clustering results will be updated through intra-task linear model learning and inter-task model parameter correlation learning.In this step, not only the genetic information between individuals with different task preferences is exchanged, but also the convergence of different tasks corresponding to individuals is accelerated.In addition, in the design of the crossover operator, it is not necessary to optimize the task correlation learning model to complete convergence, but to iterate a certain number of times.Therefore, the evolution strategy of local search is implied in the process of individual crossover, which not only limits the random evolution of the population individuals, but also reduces the time complexity of the algorithm.After the local search strategy, new clustering results C i and C j will be obtained, where, C i is a clustering result of the ith task, and C j is a clustering result of the jth task.It is necessary to encode C i and C j as new offspring individuals O a and O b .In the encoding process, for example, the connection in P a that is not the same as the clustering result indicated by C i is disconnected, the nearest sample belonging to the same cluster is found in the nearest neighbor matrix of the disconnected sample, and the trajectory indicates the new sample to form a new offspring individual.The specific operation is shown in Figure 3. Figure 3a is the genotype representation of the parent individual P a .In the figure, sample point 3 belongs to the same cluster as sample point 2, and there is a trajectory pointing to sample point 2. Figure 3b shows the clustering results obtained after learning the inter-task model relationship in the crossover operator.In Figure 3b, sample point 3 is divided into another cluster, so the adjacent trajectory between sample point 3 and 2 needs to be disconnected, and the adjacent trajectory of sample point 3 needs to be found in the new cluster.Since sample point 9 is the closest point to sample point 3 in the new cluster, the gene O a at the location of offspring genotype 3 is changed to 9. Finally, the genes in [1,n] are linearly mapped to [0,1] to complete the process of crossover between individuals and the generation of offspring individuals.When the parent individuals with the same skill factor are crossed, the uniform crossover mode is selected.Because there is no bias in gene ranking for individuals with the same task, the uniform crossover is used to generate any combination of alleles from the two parent individuals.When an individual carries out genotype mutation, the mutation operator based on finite neighbors is used.For the dataset of size N, the search space during mutation is N N .Using the mutation operator based on finite neighbors, each gene can only connect to one of the L nearest neighbors when it mutates, where L ≪ N; then, the search space will become L N .This not only effectively reduces the amount of calculation during the mutation, but also reduces the generation of a lot of meaningless connections.Since it is impossible for each individual generated in MFEA to perform well in all tasks, the individual can only be evaluated according to the task with the best performance.Therefore, the changes of individual genetic factors during the whole evolution process refer to the concept of vertical cultural transmission, based on the vertical cultural transmission algorithm through selective imitation.The skill factor of an individual can mimic the skill factor of any of its parents.

Analysis of Algorithm Complexity
Given a dataset with m tasks, assuming that the data dimension of each task is nd, the algorithm complexity of the task correlation learning step is O(n 2 d), and the time complexity of fast non-dominated sorting is O(pop 2 ), where pop is the size of the subpopulation.In the initialization process, the time complexity of generating the initial solution using MST is O(mn 2 ), and the time complexity of generating the initial solution using k-means is O(n(k + 2)m/d).In addition, the time complexity of each individual function evaluation is O(max{n 2 , knd}), where k ∈ {1, . . ., K max } is the number of clusters.The time complexity of the MFMOCK algorithm is given by O(genmax{popn, popknd, popn 2 d, pop 2 }).

Experiment Setup and Results
In this section, firstly, three clustering datasets are introduced, secondly, the comparison algorithms and evaluation indicators are introduced, and then the parameters are analyzed and set.Finally, the clustering results and algorithm convergence are compared and analyzed, respectively.All programs ran on the CPU (13th, Gen Intel(R) Core(TM) i9-13900-k, 3.00 GHz), RAM was 32 GB, and the program was realized by the MATLAB R2023a.

Datasets
In order to evaluate the proposed algorithm model, this experiment used three different sets of experimental data for testing.The characteristics of all datasets are described in Tables 3-5.The first set of datasets was a manually generated two-dimensional dataset, which was used to verify the robustness of the algorithm to the overlap between clusters, and the size and the shape of clusters.The data in these datasets are two-dimensional normally distributed data with a fixed cluster size, mean vector, and standard deviation vector.The instance distribution of the manual dataset is shown in Figure 4.
The second set of datasets was a standard cluster model generated by a random generator and conforming to a multivariate normal distribution.In the first random generator, randomly generated low-dimensional datasets have random directions, whereas in higher dimensions, the clusters are more spherical in shape.In the second generator, the data distribution is more biased towards arbitrarily distributed ellipsoids.The generator and the dataset used in this article are available at https://github.com/garzafabre/Delta-MOCK(accessed on accessed on 14 May 2024).The third set of datasets was UCI datasets, all available on the UCI website https://archive.ics.uci.edu/(accessed on 14 May 2024).

Comparison Algorithm and Evaluation Index
The comparison algorithms in the experiment are as follows: K-means (KM) is a traditional clustering method based on Euclidean distance; single linkage (SL) is a hierarchical clustering algorithm that defines cluster proximity as the distance between the two closest points of two different clusters; average linkage (AL) is a hierarchical clustering algorithm that defines cluster proximity as the average of the proximity of pairs of points taken from two different clusters; spectral clustering (SC) is a clustering algorithm based on spectral graph theory that transforms the clustering problem into an optimal graph partition problem [51]; fuzzy C-means (FCM) is a clustering algorithm based on fuzzy C-means; MOCK is a multi-objective clustering algorithm that automatically determines the number of clusters and Multiple Information Exchange Multi-objective Clustering Algorithm Based on MOCK (MIE-MOCK) is a MOCK variant clustering algorithm with a random crossover operator pool and random mutation operator pool [52].
To accurately evaluate the clustering results, this paper uses performance metrics such as accuracy, mutual information, and adjusted Rand index, which are standard metrics widely used for clustering.In addition, the introduction of the adjusted Rand index performance index is as follows: Clustering accuracy (ACC): Clustering accuracy discovers the one-to-one relationship between clustering labels and actual class labels, and measures the extent to which each cluster contains data points from the actual classes.The clustering accuracy is defined as follows: where r i represents the cluster label of data x i , l i represents the actual class label of data x i , n represents the total amount of data, δ(x, y) is the Delta function, equal to 1 if x = y, equal to 0 otherwise, and map(r i ) is the mapping function, which maps each cluster label r i to an equivalent label in the dataset.Normalized mutual information (NMI): The second metric is normalized mutual information (NMI), which is used to determine the quality of the clusters.Given the clustering results, NMI is defined as follows: where n i denotes the number of data contained in cluster C i (1 ≤ i ≤ c), nj denotes the number of data contained in cluster L j (1 ≤ j ≤ c), and n i,j denotes the intersection of the true class L j and cluster C i .A larger NMI indicates a better clustering result.Adjusted Rand index (ARI): This computes the probability that two instances of two clusters belong to the same cluster or different clusters, defined as follows: where n ij is the number of identical instances in the c i cluster in solution x a and the c j cluster in solution x b , n i is the number of instances in the c i cluster in solution x b , n j is the number of instances in the c j cluster in solution x b , and k a and k b are the number of clusters in solution x a and solution x b , respectively.Hypervolume [53] is an index conforming to the Pareto dominance concept and the value of hypervolume represents the volume of hypercube surrounded by individuals in the solution set and reference points in the target space.Therefore, the hypervolume can be used as the evaluation index of the convergence and distribution of the solution set.A higher hypervolume value represents a better convergence and distribution.

Parameter Analysis and Setup
The experiments in this chapter were carried out on the basis of the datasets proposed in Section 4.1.The characteristics of each dataset are shown in Table 3-5.The number of clusters in all experiments was set to the number of real classes of the data.In the experiments of this chapter, each algorithm was run independently 10 times, and the average of the final results was calculated.The proportion of correctly classified samples was taken as the clustering accuracy, and the accuracy of the algorithm was reflected by the accuracy and mutual information.Take the Sizes dataset and 50d-20c as an example, Figure 5 is the parameter change diagram of MFMOCK, showing the change relationship between the random mutation probability rmp and population size pop, where the population evolution generation gen was set to 100.Under different random mutation probabilities, the clustering accuracy was low when the population size pop was small, and when pop was not less than 50, the clustering accuracy of MFMOCK gradually became stable.Because the data distribution of the Sizes dataset was relatively simple, the number of classes was small, the dimension was low, and it was difficult to observe the effect of rmp on the clustering results.However, can be clearly seen for the clustering results of MFMOCK on the 50d-20c dataset, under the same population size, when rmp was 0.9, the clustering accuracy of the algorithm was higher.Combined with parameter experiments, considering the stability and time complexity of the MFMOCK algorithm, pop, gen and rmp were set to 100, 100, and 0.9, respectively.

Comparison Experiment of Clustering Results
In this clustering performance test experiment, the comparison algorithms were singletask clustering algorithms, that is, there was only one clustering task in each clustering, and the dataset under this task conformed to the assumption of independent and identical distribution.The MFMOCK algorithm is a clustering algorithm based on evolutionary multi-tasking.In order to verify that MFMOCK algorithm effectively shared data features between multiple tasks and learns clustering tasks, it was necessary to design multitask clustering data for each test dataset for MFMOCK.According to the distribution characteristics of the dataset, a multi-tasking auxiliary test dataset containing relevant knowledge (the data distribution of some categories was consistent) was generated.In addition, the number of categories and samples in the multi-tasking auxiliary test dataset may not be consistent with the original dataset.Moreover, the random sample points were added to the auxiliary test set to test the multi-tasking clustering learning performance of the algorithm.
In this experiment, each algorithm was run 10 times, and the average accuracy of the clustering results under the optimal parameters was selected.This section compares common single-objective and multi-objective clustering algorithms such as: KM, SL, AL, SC, FCM, MOCK, MIE-MOCK, etc.The algorithm results of this experiment are shown in the following table, where the performance indicators Acc and NMI given in the table are in percentage units, and the numerical range of ARI index is [−1, 1].
Table 6 shows the clustering results of all algorithms on manually generated datasets.The test was mainly to detect the clustering performance of each algorithm in datasets with different shapes and distributions.As shown in the table, different classical algorithms had different preferences for clustering datasets with different shapes.The KM algorithm showed excellent performance in clustering the Square and Sizes datasets, and the single join algorithm performed well in the Smile and Long datasets.Based on the objective function of overall deviation and connection index, the MFMOCK algorithm did not make any assumptions about the shape of different clusters, so it could detect clusters of arbitrary shapes.However, it had poor performance for datasets with overlapping data of different categories.In addition, the MFMOCK algorithm had a higher quality of clustering results for the Sizes and Triangle datasets, which was better than the other algorithms.For the Square dataset, because the dataset is relatively simple, the amount of data and data categories are small; thus, the accuracy of the classical clustering algorithm is already high, and the accuracy improvement effect of the MOCK-like algorithm based on the evolutionary algorithm was not obvious.The clustering results of the whole dataset showed that the MFMOCK algorithm had a partial improvement in clustering performance compared with the traditional single-objective clustering algorithm.Table 7 shows the random datasets with specific dimensions and the number of cluster categories generated by the random generator to test the clustering performance of various clustering algorithms under different combinations of dimensions and number of categories.In the manually generated dataset, the data distribution presented a lot of geometric shapes that rarely appear in practical applications, and the individual variables were uncorrelated with each other, so the second randomly generated dataset was used.Compared with the manual dataset, the random dataset contained test data with higher dimensions and more categories, its dimensions varied from 2 to 100 dimensions, and the number of cluster categories varied from 4 to 40 categories.The randomly generated dataset was used for testing.This was also to test whether the clustering advantages of MFMOCK in manual datasets could be extended to more realistic, general, high-dimensional large-scale datasets.The experimental results showed that the classical single-objective clustering algorithms (e.g., KM and SC) had a better clustering effect when the data sample had a low dimension number and few categories.When the clustering dimension was increased to 10 dimensions and the number of classes was 10 or 20, the performance of classical single-objective clustering was significantly decreased compared with the evolutionary multi-objective clustering algorithm.As the sample dimension and the number of categories continued to increase, MFMOCK showed excellent clustering performance compared with singletask evolutionary multi-objective clustering algorithms such as MOCK and MIEMOCK.Compared with low-dimensional and low-class datasets, MFMOCK had a more obvious performance improvement when solving high-dimensional clustering problems with a large number of classes.The introduction of the idea of evolutionary multi-tasking made the evolutionary clustering algorithm exhibit better performance in the face of complex clustering datasets.Table 8 shows the comparison of the experimental results of the comparison algorithms on seven UCI datasets in terms of accuracy, mutual information, and adjusted Rand index, where Append is short for the Appendicitis dataset and Aggregate is short for the Aggregation dataset.From the perspective of various indicators, MFMOCK had the best results in the four datasets among the comparison algorithms, which were Thyroid, Aggregate, Cancer, and Wine.For the dataset Zoo, MFMOCK was about 0.02 worse than the best AL in terms of clustering results, and it was optimal for the RAI index.For the dataset Jain, the clustering results of MFMOCK were about 0.05 worse than the best MOCK, and the other indicators were lower than the MOCK algorithm.For NMI and RAI, MFMOCK was higher than other algorithms in most datasets, and in the datasets with suboptimal clustering performance, the interpolation between MFMOCK and the optimal algorithm was small, i.e., within 0.07.The two tasks in dataset 1, Square and Sizes, had the same number of samples, feature dimensions, and classes.They both had spherical Gaussian clusters in their sample distribution, but they were manually generated, so they had different distributions.Figure 6a shows the convergence change trend of the hypervolume function values of MFMOCK and MOCK on dataset 1.In the initial evolution stage, the function curve of MFMOCK was better than the performance of MOCK, showing better overall convergence performance.Since MOCK and MFMOCK are completely consistent in the encoding way of population individuals and the design of crossover and mutation operators, the performance improvement can be attributed to the ability to transfer the relevant clustering knowledge between tasks by using implicit genetic, and the ability to mine similar data features by multi-tasking correlation learning.The hypervolume value of MFMOCK was better than that of MOCK in most cases, so the Pareto solution set of MFMOCK was more evenly distributed.In clustering problems, this area usually reflects the distribution and separation degree of different clusters, which may mean higher separation between clusters.In addition, hypervolume also considers the convergence and diversity of the solution set.In clustering analysis, this means that the algorithm can find clusters of different shapes and sizes while keeping the center points of these clusters as close as possible to the real situation.These advantages led MFMOCK to achieve better performance on clustering problems.In terms of the quality of the solutions generated during multi-tasking evolution, the results in Table 10 show that the MFMOCK algorithm performed well when dealing with the multi-tasking clustering of the manual dataset, the coefficient of variation of the overall bias and connection index in the two tasks were within 0.02, and the final clustering solution was within 0.03 of the objective function value of the real classification cluster.This illustrates the excellent performance of MFMOCK in clustering handcrafted datasets.The two tasks in dataset 2 were the 100d-10c and 100d-40c datasets, which only had the same feature dimension and were inconsistent in terms of the number of samples and classes.As shown in Figure 6b, the test environment of multi-domain data was simulated for the case of different number of categories and inconsistent data distribution.Compared with MOCK, the function value convergence trend in the figure shows the superior convergence of MFMOCK.Table 11 shows the performance of the MFMOCK algorithm in dataset 2, where the algorithm had a good clustering effect for task 2, the coefficient of variation of the target value was within 0.025, and the deviation between the true value and the final value was about 0.02.Due to the complexity of data and categories, the clustering performance of the algorithm was worse in task 1 than in task 2. As shown in Table 12, the two tasks in dataset 3 were the 50d-20c-1 and 50d-20c-2 datasets, which differed only in the number of samples and had the same feature dimensions and number of classes, as well as a similar data distribution because they were generated by the same random generator.As shown in Figure 6c, the test environment of multi-domain data was simulated for the case of the same number of categories but with an inconsistent data distribution.Compared with MOCK, MFMOCK showed good convergence speed and clustering accuracy.In MFEA, the adopted multi-tasking was completely oblivious to multi-tasking, and it is considered that evolutionary multi-tasking does not necessarily guarantee that the performance of each task is improved, because not all genetic transfers are always useful.While certain tasks are positively affected by the implicit genetic transfer available during multi-tasking, certain other tasks may be negatively affected.In the MFMOCK algorithm proposed in this chapter, for multiple related multi-objective clustering problems, the strategy of random crossover between different individuals was abandoned, and multi-tasking correlation learning was introduced into the crossover operator.Only knowledge was transferred between tasks with a strong correlation, which reduced the possibility of negative transfer.The experimental results show that the algorithm is effective in transferring useful genetic information and accelerating the convergence of the algorithm.

Conclusions
In this paper, a multi-objective clustering framework based on evolutionary multitasking is proposed.By introducing MFEA into the evolutionary multi-objective optimization algorithm, evolutionary multi-objective clustering is extended from single-task learning to multi-tasking learning, and similar knowledge in different clustering tasks is shared by using the implicit search parallelism in the population.Firstly, by building an evolutionary multi-tasking framework, the algorithm improves the coding method of population individuals on the basis of MOCK, which provides a basis for individuals with different skill factors in the population to exchange genetic material.In addition, a new crossover operator between individuals was designed, and the idea of correlation learning between tasks was introduced into the crossover operator.By controlling the number of iterations of the learning algorithm, the crossover operator plays the role of a local search, so that individuals representing different clustering tasks can transfer the relevant knowledge between tasks when exchanging genetic material, and perform evolution towards the convergence direction of the clustering algorithm.By limiting the search direction of individual evolution, the evolution speed of the population is accelerated.In this paper, comparative experiments were carried out on manual, randomly generated and UCI datasets, and the multi-tasking learning ability and the final clustering effect of the model were verified.Through comparative experiments, the feasibility and effectiveness of evolutionary multi-tasking in accelerating multi-objective clustering learning were verified.In the future, we will study more objective functions and verify the proposed framework on more complex datasets.Moreover, we plan to improve the performance of MFMOCK to solve more diverse clustering problems in the real world.In addition, we will focus on studying more efficient individual coding methods to improve the optimization efficiency of MFMOCK.

Algorithm 2 4 : 6 :: else 8 : 9 :
Mating selection among individuals 1: Two parent individuals P a and P b are randomly picked from the contemporary population P. 2: Generate a random number Rand between 0 and 1. 3: if τ a = τ b then Parent individuals P a and P b perform intra-task crossover to produce offspring individuals C a and C b .5: else if rand < rmp then Parent individuals P a and P b perform inter-task crossover according to Algorithm 3 to produce.offspring individuals C a and C b .7Mutation of parent individual P a produces offspring individual C a .Mutation of parent individual P b produces offspring individual C b .10: end if label matrix Y t .18: end for 19: Get the task τ a and task τ b clustering results C i and C j .20: The genotypes of P a and P b are updated according to the clustering results C i and C j to obtain the offspring individuals O a and O b .

Figure 3 .
Figure 3. Schematic diagram of the genotype update.

Figure 5 .
Figure 5. Plot of the variation of parameters pop and gen in (a) Sizes and (b) 50d-20c.

Table 4 .
Characteristics of a randomly generated dataset.

Table 6 .
Clustering results for the manually generated dataset.

Table 7 .
Clustering results for randomly generated datasets.

Table 8 .
Clustering results on UCI datasets.

Table 9 .
Comparison of algorithm test datasets.

Table 10 .
Average performance of MFMOCK on dataset 1 problems.

Table 11 .
Average performance of MFMOCK on dataset 2 problems.

Table 12 .
Average performance of MFMOCK on dataset 3 problems.