Phase Transition in Ant Colony Optimization

: Ant colony optimization (ACO) is a stochastic optimization algorithm inspired by the foraging behavior of ants. We investigate a simplified computational model of ACO, wherein ants sequentially engage in binary decision-making tasks, leaving pheromone trails contingent upon their choices. The quantity of pheromone left is the number of correct answers. We scrutinize the impact of a salient parameter in the ACO algorithm, specifically, the exponent α , which governs the pheromone levels in the stochastic choice function. In the absence of pheromone evaporation, the system is accurately modeled as a multivariate nonlinear Pólya urn, undergoing phase transition as α varies. The probability of selecting the correct answer for each question asymptotically approaches the stable fixed point of the nonlinear Pólya urn. The system exhibits dual stable fixed points for α ≥ α c and a singular stable fixed point for α < α c where α c is the critical value. When pheromone evaporates over a time scale τ , the phase transition does not occur and leads to a bimodal stationary distribution of probabilities for α ≥ α c and a monomodal distribution for α < α c .


Introduction
Sociophysics emerged in the 1970s and has evolved into a captivating research field within statistical physics [1,2].In particular, herding behavior, or the inclination to follow the majority, has captured the attention of many researchers due to its pivotal role in understanding social phenomena [3][4][5][6][7].Various probabilistic models have been proposed to describe herding behavior, with one notable example being the ant recruitment model [8,9].This model explains the intermittent oscillation observed in ants when they are presented with two identical food sources [10,11].When ants choose a food source from among two food sources, the ant recruitment model incorporates a straightforward herding mechanism in which a randomly selected ant chooses one of the two based on the number of ants that have already made the same choice.Scouts play a crucial role by exploring the terrain to locate food sources [12,13].When a scout discovers food, it returns to the nest, leaving a pheromone trail in its wake.Other ants are drawn to these pheromone marks and consequently become recruited to forage at the food source.
Ant colony optimization (ACO) is a model-based meta-heuristic inspired by the foraging behavior of ants in their search for the shortest path to food sources [14][15][16][17][18].While ants may not be highly intelligent individually, they collectively find the shortest path by following pheromone trails left by their fellow ants.The optimal path is determined by the route on which the maximum number of ants travel.Consider a classic problem known as the traveling salesman problem (TSP), which involves finding the shortest possible route that visits each city represented as vertices in a given graph exactly once and returning to the origin city [19].In ACO, ants make decisions about their next city to visit based on a concept called "pheromone".Pheromone represents the preference for a particular choice and is collaboratively learned by the ants during their search process [15].In the context of the TSP defined on a graph, pheromone values are typically associated with edges between cities, reflecting the preference for traveling from one city to another along the corresponding edges.These pheromone values are learned through a reinforcement strategy, where each ant reinforces its chosen paths based on the quality of the solution constructed.This quality is often determined using the inverse of the total length of the route.ACO has found successful applications in various industrial and academic constraint optimization problems and has become one of the most popular methods [20][21][22][23].
ACO has seen significant improvements, and modern ACO algorithms deviate substantially from the original ACO [24].The fundamental modification lies in controlling the diversity of solutions and achieving convergence [25,26].In this context, "convergence" refers to the tendency of ants to cluster around similar solutions in the neighborhood and ultimately converge toward the same solution.Early convergence to a small region of the search space leaves large enough sections of the search space unexplored and fails to find suitable solutions.On the other hand, quite slow convergence means that the computational cost required to reach acceptable solutions is high, rendering the search inefficient.Diversity control aims to prevent complete convergence by slowing down the search process.
Many algorithms have been proposed for controlling the diversity of the ACO algorithm.One of the diversity control mechanisms involves modifying the probabilistic decision function [24,27].Bernd Meyer studied the influence of α, the exponent on the pheromone level in the selection function, and suggested that α qualitatively determines diversity and convergence behavior.Additionally, Meyer introduced a dynamic α that changes throughout the search process to enhance search efficiency, a technique known as α-"annealing".Meyer also emphasized the significance of noise in ACO using stochastic differential equations in both static and dynamic environments [24,[28][29][30].Ants respond to a two-choice question, and the noisy communication among ants prevents them from selecting suboptimal choice.In this paper, we study a simple model of ACO in which ants sequentially answer a series of two-choice quizzes.We investigate the phase transition and the qualitative change of the convergence behavior by varying α.We start from the discrete time formulation of the problem.Our approach can be straightforwardly generalized to ACO for many-body problems, allowing us to derive the model parameters of the continuous time approach concretely.Additionally, there are various possibilities for the continuous time limit, apart from Wiener noise [31].In Section 2, we introduce a model and derive stochastic differential equations (SDEs) using diffusion approximation.In Section 3, we investigate the time evolution and examine the effect of α on the convergence properties of the solutions.Section 4 provides a summary of the results.Appendix A explains the estimation of the initial conditions for the SDEs.

Method and Definitions
There are N two-choice questions, each of which is answered by quite a large number of ants sequentially [32].These questions are labeled by n = 1, . . ., N. The answer provided by the tth ant is denoted as X(n, t) ∈ {0, 1}, where X(n, t) = 1 indicates a correct answer, and X(n, t) = 0 indicates an incorrect answer.In ACO, multiple ants typically search for the optimal solution simultaneously in each iteration.However, in our model, only one ant conducts the search.Each ant receives 1 point for a correct answer.After ant t has answered all N questions, the total points (TPs) earned by the ant can be calculated using the following equation: Ant t deposits pheromones on their answer X(n, t) ∈ {0, 1}.The amount of the pheromones is TP(t).We assume that the pheromones evaporate and decrease by e −1/τ per unit time, where τ represents the time scale of the pheromone evaporation.
Ant t + 1 observes the total values of the pheromones associated with question m for each choice X(m, t + 1) = {0, 1}.We denote the total value of pheromones that remains on all the questions after ant t has answered, Then, the remaining pheromone on X(m, s The remaining pheromone on X(m, s) = 0, 1 ≤ s ≤ t is given by S(t) − S(m, t).
Ants are not highly intelligent, and the probability of them making the correct choice in the two-choice questions by themselves is 1/2.The information provided by TPs gives them an indirect clue about the correct choice.If TP(s) > N/2, the posterior probability for X(m, s) = 1 is larger than 1/2.Similarly, if S(m, t) is greater than S(t)/2, the posterior probability for X(m, t + 1) = 1 is greater than 1/2.In ACO, a decision function with a positive parameters α and K is introduced that uses the values of the pheromones as follows: Here, the exponent α determines the response of the choice to the values of the pheromones.As K is positive, one can avoid S(m, t) = 0 and S(m, t) = S(t) becoming the absorbing states of the process.However, instead of K, we adopt the following form for the choice function: Here, ϵ > 0 is a small positive constant to avoid the absorbing states S(m, t) = 0 and S(m, t) = S(t) of the stochastic process [8].
We denote the ratio of the remaining pheromones on the correct choices as Z(m, t), Z(m, t) ≡ S(m, t) S(t) .
We divide both the denominator and numerator of Equation ( 3) by S α (t), and the probability of the correct choice X(m, t + 1) = 1 is expressed as Here, f (z) is defined as We note that f (1/2) = 1/2 and f (1 We also introduce the discount factor D(t) and the ratio of correct answers Z(t) as

Stochastic Differential Equations
First, let us derive the recursive relation for S(t) and D(t).According to the definition, D(t + 1) and S(t + 1) obey the next recursive relations If τ is finite, for t ≫ τ ≫ 1, we have D(t) ≃ Nτ.In the limit τ → ∞, the pheromone does not evaporate, and one has D(t) = Nt.
For N ≫ 1, we can replace the sum of X(i, s), i = 1, . . ., N, by the sum of the expected values using the law of large numbers; then: Then, The recursive relation for Z(t) is ∆Z(t) ≡ Z(t + 1) − Z(t) is given as In the continuous time limit dt = 1 → 0, one obtains: One sees that Z(t) converges to f (Z(i, t)).When the pheromones evaporate and τ < ∞, D(t) ≃ Nτ, and the prefactor of the differential equation is 1/τ.If one assumes that the dynamics of Z(i, t) are faster than those of Z(t) (adiabatic approximation), the time scale of the convergence is given by τ as f (Z(i, t)) − Z(t) ∝ e −t/τ .When the pheromone does not evaporate and τ → ∞, D(t) ≃ Nt.The prefactor of the differential equation is 1/t, and f (Z(i, t)) − Z(t) ∝ t −1 .The convergence becomes quite slow.
Next, let us study the dynamics of Z(m, t).The recursive relation for S(m, t) is Z(m, t + 1) is then estimated as Using S(t + 1) = D(t + 1)Z(t + 1), one obtains: We denote the history of Z(s), {Z(i, s)}, s = 1, . . ., t, as H t , and the conditional expected value of ∆Z(m, t) is estimated as Likewise, the conditional variance of ∆Z(m, t) can be approximated as Here, we neglect the subleading terms in 1

2
. We read the drift and diffusion term from the results and the SDEs are Here, W(t) is the Wiener process.Equations ( 5) and ( 7) describe the dynamics of the system.The system can be described as a multivariate Pólya urn process.We note that in Equation ( 6) breaks the Z 2 symmetry of the system.If one holds.Z(m, t) and 1 − Z(m, t) obey the same dynamics, and we call the symmetry Z 2 symmetry.The term is always positive and drives Z(m, t) in the positive direction.As the term is proportional to 1/N, the strength of the Z 2 -symmetry-breaking field becomes smaller as N becomes larger.

Results
We analyze the SDEs given in Equation ( 7) and investigate the convergence properties of Z(m, t).As the convergence behavior relies on the initial value of Z(m, t = t 0 ) in the context of the nonlinear Pólya urn model, we commence by examining the initial distribution of Z(m, t).

Initial Distribution of {Z(m, t)}
We assume that ants adopt α = 0 and do not respond to the values of the pheromones for t = 1, • • • , t 0 .The ants answer the questions independently, and P(X(i, t) = 1) = 1/2.We estimate Z(t) and Z(m, t) for τ < t ≤ t 0 as follows: where N(a, b) denotes the normal distribution with the mean a and the variance b.The details of the calculations are given in Appendix A. If τ is finite, one has D h (t) ≃ Nτ/2 for t ≫ τ ≫ 1.In the limit τ → ∞, the pheromone does not evaporate and then D h (t) = Nt.The essential differences between Z(t) and Z(m, t) include a shift in the expected value by 1/2N and the presence of a factor of N in the numerator of the variance of Z(m, t).The shift of 1/2N arises from the finding that X(m, s) in S(m, t) is 1, which is larger than E[X(i, s)] = 1/2 for i ̸ = m.The value of the pheromone contains information about the correct choice, leading to E[S(m, t)] > 1  2 E[S(t)].However, in the "cheating" process, the variables X(i, s), are combined by X(m, s), as in Equation ( 2), resulting in a larger variance for S(m, t).The factor of N in the numerator of the variance of Z(m, t) is a consequence of this combination process.
From the distribution of Z(m, t 0 ), one can determine the values of τ or t 0 that guarantee that Z(m, t 0 ) is greater than 0.5 for the limits t ≫ τ ≫ 1 and τ → ∞, respectively.With a confidence level of 1%, τ and t 0 should satisfy the following conditions:

Case with τ → ∞
We take the limit τ → ∞ in Equations ( 5) and (7).We replace D(t + 1) and D h (t + 1) with N(t + 1) and N(t + 1), respectively.This results in the following equations: The initial conditions of Z(t) and Z(m, t) at t = t 0 are as follows: In the adiabatic approximation, where the time development of Z(m, t) is much faster than that of Z(t), the structure of the SDE for each Z(m, t), where m = 1, • • • , N, is the same as that of a nonlinear P'olya urn [33,34].The probability for Z(m, t) to converge to a stable solution of the following equation is positive [35], Here, a stable (unstable) solution of Equation ( 9) means that the curve of the left-hand-side of the equation crosses the diagonal curve y = z in in the downward (upward) direction [35].
We denote the stable and unstable solutions as z s and z u , respectively.In the limit N → ∞, the positive driving force 1− f (z) in Equation ( 9) disappears.
The dotted lines in Figure 2 show the solutions versus α for the Z 2 -symmetric case.For α < α c , z = 1/2 (black dotted line) is the stable solution.For α > α c , z = 1/2 (gray dotted line) becomes unstable (z u = 1/2) and two stable solutions z s1 , z s2 depart from z = 1/2 continuously with α > α c .The stable solution to which Z(m, t) converges depends on the initial value of Z(m, t 0 ).In general, if Z(m, t 0 ) is greater (smaller) than 1/2, the probability of the convergence to 1 is greater (smaller) than 1/2.z u determines the "attractive domains" for the stable solutions z s1 , z s2 .The susceptibility of the expected value of Z(m, t) to the initial value Z(m, t 0 ) is the order parameter of the nonlinear Pólya urn [33,34].As the order parameter is proportional to the difference in the two stable states z s1 , z s2 , the order parameter is a continuous function of α, and the phase transition is continuous.In the Z 2 -asymmetric case (b ̸ = 0), for α < α c , there is a stable solution, z s , in the z-range (0.5, 1).z s increases with α; at some critical value, α c , of α, the curve f (z)(1 + b(1 − f (z)) becomes tangential to the diagonal at some z t .z t is known as a touchpoint and to be stable [36].As α continues to increase beyond α c , two significant changes occur: a new stable solution, z s1 , emerges from the touchpoint z t , while an unstable solution, z u , also becomes apparent.
When α < α c , only one stable solution, z s , exists within the range 1/2 < z s < 1. Conversely, for α > α c , the specific stable solution to which Z(m, t) converges depends on the initial values of Z(m, t 0 ).At α = α c , both the stable fixed point z s and the touchpoint z t remain stable.The solution to which Z(m, t) converges is determined by the initial values of Z(m, t 0 ) in this case as well.For α > α c , the situation mirrors that of α = α c .Once α ≥ α c , the order parameter becomes positive, and the phase transition becomes discontinuous.
Figure 3 shows the results of the numerical studies in the limit τ → ∞.We sampled a trajectory of Z(m, t) and Z(t) for 1 ≤ t ≤ 10 9 with N = 10 2 and ϵ = 0.01. Figure 3 presents the distribution of Z(m, t 0 ) for two different values of t 0 , namely, t 0 ∈ 10 3 , 10 6 .The mean value of Z(m, t 0 ) is approximately 1/2 + 1/2N, which aligns with the theoretical predictions.The variance of Z(m, t 0 ) is given by 1/4t 0 , so the variance for t 0 = 10 3 is about 10 3 times larger than that for t 0 = 10 6 .Consequently, if t 0 = 10 3 is chosen, a significant proportion of m ∈ 1, • • • , N has Z(m, t 0 ) < z u ≃ 0.5, resulting in a high probability that Z(m, t) converges to z ′ s < 1/2 for α = 2.0.In such cases, Z(t) cannot reach 1 due to the convergence of Z(m, t) to z ′ s .On the other hand, if t 0 = 10 5 is set, the ratio of m ∈ 1, • • • , N with Z(m, t 0 ) < z u ≃ 0.5 is zero, ensuring that Z(m, t) always converges to z s > 1/2.As a result, Z(t) monotonically increases toward 1 for α = 2.0.For α = 1.0 < α c , where only one stable state, z s ≃ 1, exists, Z(m, t) consistently converges to z s .
One can see that Z(t) monotonically approaches z s with time for both t 0 = 10 3 and t 0 = 10 5 cases within the range t ≤ 10 9 .In the case of α = 0.5, where z s ≃ 0.5, Z(t) experiences relatively little change.

Case with τ < ∞
In the case where τ is finite, let us make the assumption that t ≫ τ ≫ 1 and replace D(t + 1) with Nτ in Equations ( 5) and ( 7).This leads to the following equations: The initial conditions for Z(t 0 ) and Z(m, t 0 ) at t 0 ≫ τ are as follows: The dynamics of Z(m, t), m = 1, . . ., N, are coupled through Z(t) and f (Z(i, t)).To simplify and analyze this coupled system, we focus on the stationary state of Z(m, t) in the limit t → ∞.
Z(m, t) are expected to fluctuate around the stable fixed points of f (z) in the stationary state.As was observed in Section 3.2 for the case of τ → ∞, for α < α c , there is only one stable fixed point, and for α ≥ α c , two stable fixed points exist, one of which is near one.The stationary distribution is unimodal for α < α c and bimodal for α ≥ α c .Let us denote the stationary distribution and the mean value of Z(m, t) as P st (z) and µ st , respectively.As f (Z(m, t)) is the probability for X(m, t + 1) = 1, one can assume f (Z(i, t) = Z(t) = µ st in the stationary state.The SDEs in Equation ( 10) can be simplified as follows when replacing f (Z(i, t)) and Z(t) with µ st : The stationary state with reflecting boundary conditions is determined by a potential solution [37], which can be expressed as: The second term, 2τ/Nµ st , arises from the Z 2 symmetry-breaking field and causes a shift in the stationary distribution in the positive direction.
Figure 4 shows P st (z) (13) for ϵ = 0.01 and N = 10 2 ; µ st is chosen so that the mean value of P st (z) coincides with The pair of the parameters (α, µ st ) are (0.0, 0.50), (0.5, 0.51), (0.9, 0.54), (0.99, 0.67), (1/0.99,0.74), and (2.0, 0.54).As α increases, the peak position shifts in the positive zdirection, which can be expected by the dependence of the stable solution z s on α in Figure 2. If α = 1/(1 − ϵ), the peak appears at z = 1, since there is only one stable fixed point near z = 1 in Figure 1.When α = 2, there are two stable fixed point, and the stationary distribution is bimodal.In order to derive the dependence of µ st and the variance of P st (z) on α, let us assume that Z(m, t) fluctuates around µ st ≃ 1 2 for α < α c .We linearize f (z) in the vicinity of z = 1/2 as Let us also approximate B 2 (z) as µ st (1 − µ st ) = 1/4, then P st (z) becomes In the case with α = 0, the ants do not observe the information of the pheromones and decide by themselves.The expected value and the variance are consistent with the results (11) for the initial state.The expected value and the variance increase with α for 0 ≤ α < 1/(1 − ϵ).
The shape of the stationary distribution changes from the monomodal shape for α < α c to the bimodal shape for α ≥ α c . Figure 5 shows the stationary distribution P st (z) of Z(m, t) = z for α ∈ {0, 0.5, 0.9, 0.99, 1/0.99, 2.0}, ϵ = 0.01, τ = 100 and N = 10 2 .We also plot P st (z) in Equation ( 13) with solid-line curves.Except the α = 2.0 case, the numerical results agree with the theoretical ones.As α increases from 0 to 1/0.99, the mean value and the variance of Z(m, t) increases.For α = 1/0.99, the distribution of Z(m, t) has a peak at z = 1.The distribution becomes bimodal and has two peaks near z = 0 and z = 1 for α = 2.For α = 2.0, P st (z) becomes bimodal and the equilibration time to reach the stationary state becomes quite long.We beleive the latter is the reason for the discrepancy between the numerical and theoretical results.

Discussion
In the presented paper, we studied a simple model for ACO and the convergence properties of the solutions.Ants answer many two-choice questions in sequence and deposit pheromone as they choose.As the amount of the pheromones is the number of correct answers, the following ants can receive information or hints about the correct choices.We showed that the model reduces to a multivariate nonlinear Pólya urn process and the pheromones break the Z 2 symmetry of the process.By varying the exponent α of the decision function of the ants, there occurs a phase transition about the convergence of the probability of choosing the correct answer for each question in the limit τ → ∞.For τ < ∞, the change in the stationary distribution between the monomodal and the bimodal shape occurs as we vary α.
Previous studies adopted values of α = 1 or smaller in solving actual optimization problems, like the TSP [24].In α-annealing, α increases gradually, as shown in Ref. [24].In our study, we showed that the duration of the period α = 0 should be long enough to ensure that the initial value of Z(m, t) is in the attractive domain of the suitable stable state (Z(m, t) > z u ) in the case when τ = ∞.Subsequently, with α ≥ 1 in effect, Z(t) converges to a value close to 1.In the case of τ < ∞, the timescale for pheromone evaporation, represented by τ, shown to be sufficiently long to maintain the same initial conditions.However, Z(m, t) does not converge to a specific value; instead, it follows a stationary distribution that exhibits both bimodal and monomodal shapes depending on the value of α.To achieve a distribution of Z(t) with a prominent peak near z = 1, the α-annealing process is an effective strategy.Both the stable solution z s and the distribution of Z(m, t) suggest that after a lengthy enough period with α = 0, it is advantageous to gradually increase α from 1.However, the efficiency of the annealing process depends on the specific problem being addressed.Future studies are considered to clarify on an efficient α-annealing schedule.

Figure 2 .
Figure 2. The stable, z s , and unstable, z u , solutions for b and ϵ parameters, as indicated.See text for details.

Figure 3 .
Figure 3. Left: the initial distribution of Z(m, t 0 ) for the limit τ → ∞. α = 0, ϵ = 0.01 and N = 10 2 are used.Right: Z(t) versus t for different α and t 0 as indicated.Gray areas show the standard deviation of Z(t) which is quite large for α = 2.0, t 0 = 10 3 and smaller or compatible with the line widths for other cases.See text for details.

Figure 4 .
Figure 4.The stationary distribution, P st (13), for N = 10 2 and ϵ and α parameters as indicated.See text for details.