1. Introduction
The analysis of complex networks in real-world applications such as social, biological, metabolic, and paper citation networks is receiving more attention from researchers and experts [
1,
2]. The structure and function of a real-world network can be studied by graph features such as small-world effect, power-law, or network transitivity [
1,
2]. An important issue in most real-world networks is to find the hidden structures. Community detection (CD) identifies these structures of a complex network, and the density of edges inside these structures is higher than their outside. The more similarity between the members of a community has been caused the community detection able to be used as a tool in the analysis of complex networks structure [
3]. CD has a significant role in social network analysis, which includes the identification of friendship groups, relationship analysis, identify influential people, detect terrorist attacks, use in link-prediction, or identify classes in COVID-19 datasets [
1,
4,
5].
Each network is mathematically represented by a graph consisting of nodes and edges, which the nodes are connected to each other by edges. To detect a community in a complex network, there are different criteria such as betweenness [
2], modularity [
6], node similarity [
7], normalized cut [
8], and partition entropy [
9]. In addition, some algorithms detect the communities by approximate methods such as label propagation [
10], which at first, a label is allocated to each node, and then the labels of some nodes, which are randomly selected, are propagated to other nodes. This method is continued until all nodes have most of their neighbor nodes’ labels. In addition, many community detection approaches determine communities by optimizing modularity that has been proposed by Newman in 2004 [
11]. In this approach, algorithms attempt to reach the optimal value of the modularity by different methods [
5,
12,
13,
14].
Generally, detecting communities with modularity maximization in a network can be considered as an optimization problem [
15]. Metaheuristic algorithms are shown to be effective approaches to solve complex problems in a reasonable time when compared with exact methods [
16]. A wide variety of metaheuristic algorithms have been introduced such as differential evolution (DE) [
17], particle swarm optimization (PSO) [
18], cuckoo search (CS) [
19], grey wolf optimizer (GWO) [
20], salp swarm algorithm (SSA) [
21], whale optimization algorithm (WOA) [
22], and multi-verse optimizer (MVO) [
23]. Due to the arising challenges and complexities in real-world problems, there is still a need to propose new or enhance the existed algorithms [
24,
25,
26,
27,
28,
29,
30,
31,
32]. Although these algorithms are well suited for solving problems with continuous search space, some algorithms such as genetic algorithm (GA) [
33] and ant colony optimization (ACO) [
34] were proposed to solve problems over discrete spaces. In addition, different methods were employed to develop the discrete version of a continuous algorithm [
35]. The metaheuristic algorithms are applied for solving complex problems in different applications such as parameter identification of solar cells [
36], feature selection [
37,
38,
39,
40,
41], scheduling and planning [
42,
43,
44], disease diagnosis [
45,
46], clustering [
47], medical applications [
48,
49,
50], industrial applications [
51,
52,
53,
54,
55], and engineering optimization [
56,
57].
To use metaheuristic algorithms in community detection, each solution must be modeled according to the requirements of CD’s problem, such that each solution can be represented to an N-dimensional vector with discrete values. The dimension of such a vector is equal to the number of network nodes and the value of each dimension depends on the type of solution representation that is used. Two representations that are utilized to represent the solution vectors are label-based representation and locus-based adjacency representation (LAR) [
14]. In addition, the objective function has a critical role in CD which is a measurement for metaheuristic algorithms to determine the optimality of the detected communities. Modularity [
6], community score [
58], modularity density [
59], and partition entropy [
9] are some objective functions that are introduced into the CD problem. The community detection problem is solved by using metaheuristic algorithms that imitate the natural phenomenon such as [
60,
61,
62].
The moth–flame optimization (MFO) [
63] algorithm is designed to solve continuous optimization problems inspired by the moths’ navigation mechanism to fly at night. The moths fly toward the moon in a straight line by maintaining a fixed angle, which is an effective mechanism for navigating long distances. Therefore, when the moth flies toward the nearby artificial lights, this mechanism leads to flight in a spiral line. This behavior is mathematically modeled in the MFO algorithm to solve global optimization problems. The MFO algorithm is used in different applications such as feature selection [
64], software defect prediction (SDP) [
65], economic dispatch problem [
66], optimal power flow [
67], gene selection [
68], classification [
69], image segmentation [
70], and photovoltaic energy generation system [
71]. Although MFO is used to solve many problems, it still has insufficiencies, such as lack of population diversity [
72], imbalance between exploitation and exploration [
73], and premature convergence [
74]. To improve the performance of the canonical MFO, enhanced or hybrid variants are proposed, such as in [
74,
75].
The main purpose of this paper is to adapt a continuous metaheuristic algorithm that can be used to solve community detection and provide competitive results. Therefore, in this paper, a discrete moth–flame optimization algorithm for community detection (DMFO-CD) is proposed. To implement the proposed DMFO-CD algorithm, first, the representation of the canonical MFO is altered such that can be applied on a discrete problem community detection by using locus-based adjacency representation (LAR). The initial population of solution vectors is created using LAR, which detects the communities without any prior knowledge. Then, the modularity function is used as the evaluation criterion to calculate the fitness of the solution vectors and evaluate them. Next, the DMFO-CD’s movement strategy is proposed by altering the canonical MFO’s movement strategy such that the main concept of MFO is maintained and suitable for solving community detection. Finally, after iterating the movement, evaluating, and updating the solution vectors, the detected communities are obtained. To adapt MFO using community detection, the position of moths and flames are modeled as solution vectors, which is represented using locus-based adjacency representation (LAR). The initialization process is performed by considering the network structure and the relation between the nodes. Then, the movement strategy is performed to move the moth’s solution vectors around flames and update their position. This movement is accomplished by an adapted strategy consisting of a single-point crossover between the moth’s solution vector and corresponding flame to imitate the distance calculated, the two-point crossover between the calculated distance and corresponding flame for movement strategy, and the single-point neighbor-based mutation to increase the exploration ability. To validate the proposed DMFO-CD algorithm, a set of experiments were conducted on eleven real-world networks. The results were compared with five well-known algorithms in community detection, including a discrete particle swarm optimization with particle diversity and mutation that (DPSO-PDM) [
13], a genetic algorithm for community detection (GA-Net) [
58], a genetic algorithm for detecting communities in large-scale complex networks (GACD) [
14], an enhanced genetic algorithm for community detection (EGACD) [
60], and a multi-objective evolutionary clustering algorithm (DECS) [
76]. The performance of the proposed DMFO-CD algorithm was evaluated in terms of important evaluation criteria: modularity, normalized mutual information (NMI), and the number of detected communities. Moreover, the proposed algorithm was also statistically analyzed by Friedman [
77] and Wilcoxon signed-rank tests [
78]. The comparison of results proves that the DMFO-CD is able to detect the correct number of communities with better modularity than other comparative algorithms.
The rest of this paper is organized as follows:
Section 2 presents a summary of the relevant works on community detection.
Section 3 presents the mathematical model of the MFO algorithm.
Section 4 contains the proposed DMFO-CD algorithm. The experimental evaluation of DMFO-CD is presented in
Section 5. Finally, the conclusion and future works are given presented
Section 6.
2. Related Work
Metaheuristics due to their acceptable performance in solving complicated real-world problems have been broadly used to find communities in complex networks. As shown in
Figure 1, metaheuristic algorithms based on their inspiration can be classified into three categories [
26]: evolutionary, swarm intelligence, and physics-based algorithms. In the related literature, almost all algorithms from the evolutionary category were used to solve the community detection problem. Despite the simplicity of swarm intelligence algorithms, they are less applied in this problem. In the following, some representative metaheuristic algorithms that are used to find communities in complex networks are described.
Evolutionary algorithms are inspired by the biological evolution process of the species in nature [
24]. A population of individuals is iteratively processed by applying mutation, crossover, and selection operators to improve the individuals. The genetic algorithm (GA) is one of the well-known algorithms in this category, which was inspired by Darwin’s biological evolution theory. In [
79], GA was employed to find the communities by optimizing the Newman modularity. In the proposed algorithm, a one-way crossing over operation is introduced in which two chromosomes are selected as parents, a node is randomly selected from one of the parents. Then, the community label is determined, all the nodes with the same label are found, and their labels are dedicated to another parent. In [
58], GA-Net was proposed by Pizzuti, which is one of the state-of-the-art algorithms in community detection. GA-Net detects the communities by the use of GA and the community score (CS) as an objective function. CS measures the density of edges in each community and better partitioning leads to get a better CS.
Shi et al. [
14] proposed GACD, in which a kind of genetic representation is introduced for use in community detection, which is called locus-based adjacency representation (LAR). In addition, the authors used a simple way of crossover and mutation based on their representation. In another work, Moradi et al. [
60] proposed an enhanced genetic algorithm for community detection named EGACD by proposing a local search strategy. The proposed strategy is to improve the accuracy and increase the convergence speed up of the GACD algorithm. In EGACD, LAR is used to represent the individuals, and the modularity index is applied to calculate the fitness. In [
62], the GAOMA-net algorithm was proposed by a special representation in which a memory and specific depth are dedicated to the network nodes. Then, the values of memory move by object migrating automata in depth, and the gene’s evolution is performed by the use of GA. GAOMA-net can overcome the GA’s premature convergence and accelerate the convergence. Liu et al. [
76] proposed DECS to detect communities in evolving networks by adaptation of a genetic algorithm on community detection.
The second category is swarm intelligence algorithms, which imitate the animals’ behaviors such as the movement of birds’ flocks, the echolocation behavior of bats, or the navigation mechanism of moths at night. One of the most popular algorithms of this category is particle swarm optimization (PSO) [
18], which imitates the behavior of bird flocks. In this algorithm, each bird is considered as a particle that is moved by its current, local-best, and global-best positions. Rahimi et al. [
61] proposed a multi-objective particle swarm optimization algorithm for community detection in complex networks named MOPSO-Net in which the PSO algorithm is adapted as a discrete algorithm by a two-point crossover. At first, a crossover is performed between the current position and local-best position; then, a two-point crossover is performed between the resulted position and the global best position. Li et al. [
13] developed an algorithm called DPSO-PDM with improvements in PSO that controls the motion of each particle relative to its difference from global best. With this strategy, when the particle diversity decreases, the algorithm tries to increase it and vice versa. Li et al. [
80] proposed DESSO/CD, which is a hybridized version of an improved DE and social spider optimization (SSO) [
81] algorithm. In the proposed algorithm, the population is initialized and moved by the SSO algorithm, the similarity of nodes is considered as local fitness function, and further improvement on population is performed by the improved DE.
Liu et al. [
82] proposed DMFO algorithm, which is a new algorithm for clustering, with the equivalent aim of community detection. They adapted MFO by redesigning its movement strategies for a discrete algorithm and kernel k-means and ratio cut use as multi-objective functions. Zhao et al. [
83] proposed ICSC, which is an improved CS algorithm to detect communities in protein-protein interaction networks. Zhang et al. [
84] proposed a new algorithm named WOCDA to use on community detection with changes to the motion equation of WOA. The movement strategy of WOA is adapted by updating the node label with the label of most neighbor, one-way crossover, and updating the node label with a random neighbor’s label. The third category regards physics-based algorithms, which are inspired by physical rules in nature. Guendouz et al. [
85] proposed a new algorithm by use of black hole optimization algorithm in CD problem. In this algorithm initialization, two new strategies, and evolution enhance the performance of the algorithm. Liu et al. [
86] proposed the EMACD algorithm, which is an evolutionary algorithm based on membrane system for solving community detection problem. Kumar et al. [
87] used graph embedding for low-level vector representation, which can keep the topological features of the network, and the communities are detected by a gravitational search algorithm and k-means.
3. The MFO Algorithm
The moth–flame optimization (MFO) algorithm is proposed by Mirjalili in 2015 [
63], which is inspired by the moth’s navigation mechanism in nature. The moths by maintaining a fixed angle with the moon can fly long paths in a straight line during the night that is called transverse orientation. This mechanism is effective only when the light source is located in far distances, while flying toward nearby lights causes moths to move in a spiral path, as shown in
Figure 2. In the MFO algorithm, moths update their position to reach the optimum solution by moving toward the flames in a spiral path. MFO is a population-based algorithm in which the positions of moths and flames are stored in
MN × D and
FN × D matrices as shown in Equations (1) and (2).
where
N is the population number and
D is the dimension of the problem. In the first iteration,
F is the sorted moths’ population based on their calculated fitness. For other iterations, the
M and
F are merged and sorted based on their fitness such that their first
N solutions are considered as new
F. When the flames are identified, each moth is assigned to a flame and its position is updated with a logarithmic spiral equation as shown in Equation (3).
where
S is a spiral function,
Di is the distance of the
i-th moth and
j-th flame, which is described in Equation (4), t is a random number between in a range of [−1,1], and b is a constant number that identifies the shape of the spiral.
To increase the exploitation ability of MFO, the number of flames is decreased in the course of iterations as calculated by Equation (5).
where
T is the maximum number of iterations and l is the current iteration number.