Model-Based Prediction of an Effective Adhesion Parameter Guiding Multi-Type Cell Segregation

The process of cell-sorting is essential for development and maintenance of tissues. Mathematical modeling can provide the means to analyze the consequences of different hypotheses about the underlying mechanisms. With the Differential Adhesion Hypothesis, Steinberg proposed that cell-sorting is determined by quantitative differences in cell-type-specific intercellular adhesion strengths. An implementation of the Differential Adhesion Hypothesis is the Differential Migration Model by Voss-Böhme and Deutsch. There, an effective adhesion parameter was derived analytically for systems with two cell types, which predicts the asymptotic sorting pattern. However, the existence and form of such a parameter for more than two cell types is unclear. Here, we generalize analytically the concept of an effective adhesion parameter to three and more cell types and demonstrate its existence numerically for three cell types based on in silico time-series data that is produced by a cellular-automaton implementation of the Differential Migration Model. Additionally, we classify the segregation behavior using statistical learning methods and show that the estimated effective adhesion parameter for three cell types matches our analytical prediction. Finally, we demonstrate that the effective adhesion parameter can resolve a recent dispute about the impact of interfacial adhesion, cortical tension and heterotypic repulsion on cell segregation.


Introduction
During the development of multicellular organisms, mechanical forces affect the internal states of cells as well as the interaction between cells, and they are, therefore, an integral part of all morphogenetic processes [1]. These forces are typically driven by molecular motors and transmitted via cytoskeleton elements and adhesion molecules within the cells and between them. Together, molecular motor driven movement and force transmission via adhesion complexes constitute two major self-organizing phenomena that drive tissue morphogenesis.
Since it constitutes a paradigmatic process for tissue morphogenesis, cell-sorting has found much attention both from experimental and theoretical research. It describes the tissue-scale segregation phenomenon that is observed when mixed heterotypic cell populations unmix into spatially confined homotypic cell clusters [2,3]. Over many decades this process has been studied in a number of in-vitro cell-sorting experiments for many different cell types [2][3][4][5][6][7]. To resolve which types of intercellular interaction guide individual mobile cells to find their homotypic neighbors, several theoretical hypotheses have been proposed and studied. Steinberg put forward the Differential Adhesion Hypothesis (DAH) which exploits the similarity of cell segregation with the unmixing of immiscible fluids such as water and oil. It states that cell-sorting is driven by the minimization of tissue surface tensions which result from quantitative differences in the strengths of cell-type specific intercellular adhesion [8]. However, Harris (1976) [9] remarked that tissue surface tension can result from several cellular mechanisms besides differential adhesion and suggested the differential contractility of cells as a major driver of cell-sorting. This is also acknowledged by the Differential Interfacial Tension Hypothesis (DITH) of Brodland and Chen (2000) [10]. There, the effects of intercellular adhesion and cellular contractility at cell-cell contacts are subsumed to the concept of differential interfacial tension at homoand heterotypic cell contacts which determines the degree of mutual attachment between cells and thus drives cell-sorting. Canty et al. (2017) [11] demonstrated experimentally that repulsion at heterotypic cell-cell contacts with little to no contribution from adhesive or tensile differences between cell types can lead to cell-sorting and tissue separation. They propose the High Heterotypic Interfacial Tension Hypothesis (HIT) which asserts that heterotypic repulsion creates a situation where tension is strongly increased at heterotypic contacts compared to homotypic contacts and constitutes the major driver of cell-sorting and tissue separation.
The competing explanatory models DAH, DITH and HIT are unified within the Differential Migration Model (DMM) of Voss-Böhme and Deutsch (2010) [12]. They established that all the above hypotheses have in common that cell-cell contacts affect type-specifically the migration properties of the involved cells and parametrized this effect, independent of its specific nature -adhesion or repulsion, cortical tension or even signaling cascades. For the segregation of two cell types, the existence and form of an effective adhesion parameter (EAP) has been derived mathematically, which predicts the asymptotic sorting behavior [12].
However, while most previous studies focus on the simplest case of two cell types [7,13,14], the case of three and more cell types is more relevant in real biological systems. Studies of cell segregation and pattern formation with three or more cell types are still rare [15]. Although the DMM can be directly generalized to more than two cell types, it has only been studied for the minimal case of two cell types. In particular, the existence and form of the EAP for systems with three or more cell types remains an open question.
We study numerically the impact of the EAP and related parameters on the degree of tissue separation in terms of clear tissue boundaries as well as the separation time scale for mixtures of two cell types. We present new arguments to analytically predict the form of the EAP, thus generalizing the concept of the EAP to mixtures of arbitrary many cell types. We show numerically for three cell types that the analytically predicted EAP guides the asymptotic sorting behavior in the DMM, i.e., the EAP determines the asymptotic value of the normalized sum of heterotypic contacts. The form of the EAP on the asymptotic segregation behavior is independently quantified via two statistical learning methods, Support Vector Machines (SVM) and Logistic Regression model (Logit) [16]. The form estimated by the statistical learning methods matches our analytical prediction both for two and three cell types. Finally, we demonstrate that the EAP can explain different segregation behaviors in two cell-type Cellular Potts models which was previously attributed to the competing hypotheses DAH/DITH and HIT [11].

Differential Migration Model
The DMM is a special probabilistic cellular automaton (PCA) based on Voss-Böhme and Deutsch [12], which has been analyzed for two cell types. Here, we present the key features of this model for the case of arbitrary many cell types: Lattice: The PCA works on a squared lattice S := {0, 1, . . . , L} 2 with periodic boundaries and we set L = 25 throughout our numerical analysis. Cells: To emulate biological cells, every lattice site x ∈ S is assigned a cell type w ∈ W of all possible cell types W := {0, 1, . . . } in the system. In this way each cell regardless of its type, has approximately the same size and occupies exactly one lattice site. The number of cells of each type remains constant during the sorting process.
Configurations: A configuration η t of the lattice S represents the state of the model at time t. It belongs to the set of all possible configurations X := W S = {η : S → W}. Migration: A configuration η is changed by a cell position switch η → η xy involving two adjacent lattice sites x and y with x, y ∈ S: Position switches between two cells of the same type do not change the configuration and are therefore neglected. Thus, neighboring cells situated at lattice sites x and y only switch their positions if they are heterotypic, i.e., η(x) = η(y). Differential adhesion: We assume that stronger bonds to neighboring cells hinder cell motility. Accordingly, a cell switch η → η xy occurs with rate c(x, y, η). The rate c(x, y, η) depends on the parameters β η(x)η(z) and β η(y)η(z) , which represent the binding strengths between cells at lattice sites x and y to positions from the von-Neumann-1 neighborhoods N(x) and N(y), see Figure 1. The von-Neumann-1 neighborhood of a lattice site corresponds to all neighboring lattice sites with Manhattan distance one. The cell switch rate of the two adjacent lattice sites x, y ∈ S with x ∈ N(y) and y ∈ N(x) is as follows: otherwise .
(2) Figure 1. Example of a cell switch. (a) A cell switch at a straight interface between two clusters of type w = 0 (purple) and w = 1 (turquoise) has rate c = exp[−3(β 00 + β 11 ) − 2β 01 ] according to Equation (2). (b) A single cell of type w = 0 inside a cluster of cells of type w = 1 switches with rate c = exp[−3β 11 − 5β 01 ]. The cell switch between two adjacent cells is highlighted (gray double arrows).
Notice that the particular mechanism which affects the cellular motility on intercellular contact is not specified. The parameters β ij can be interpreted as repulsion, if β ij < 0, enhancing cellular motility, and it can be interpreted as binding strength resulting from the interplay of adhesion and relaxed cortical tension at cell-cell contact, if β ij > 0, inhibiting cellular motility. The details on the numerical implementation of this PCA are elaborated in the Appendix A. Depending on how many cell types are considered the number of intercellular adhesion parameters varies. For this, the vector β (N) of all intercellular adhesion parameters occurring in a system with N cell types is introduced, for instance β (N=2) := (β 00 , β 11 , β 01 ) T for 2-cell-type systems and β (N=3) := (β 00 , β 11 , β 22 , β 01 , β 02 , β 12 ) T for 3-cell-type systems.
For a 2-cell-type model, an effective adhesion parameter (EAP) β * (N=2) := β 00 + β 11 − 2β 01 was analytically predicted in Voss-Böhme and Deutsch [12]. The EAP determines the asymptotic cell segregation behavior of systems with N = 2 cell types regardless of the value combinations of β (2) . The higher the EAP the more segregated the cell types become. For simulation of the DMM model a Gillespie-related algorithm is used where independent cell switches, see Equation (1), are chosen and the waiting times between these events are calculated based on the cell switch rates, see Equation (2), within lattice configuration η. For details about this algorithm, see Appendix A, Algorithm A1. Each numerical simulation yields a time-series of lattice configurations η t , since every cell switch, i.e., whenever two neighboring cells switch their position, see Equation (1), results in a new configuration. The variable t refers to the number of cell switches performed.

Order Indicator
To quantify the level of cell segregation an order indicator is introduced that quantifies the level of cell segregation of a lattice configuration η. The order indicator ω(η) is the normalized sum of homotypic cell-cell contacts within a configuration η. This indicator increases the more cells of the same type are clustered together. In detail, the set of all homotypic von-Neumann-1 neighbors of a cell of type η(x) at position x ∈ S is given by With n = (x, η) := |H(x, η)|, the total sum of homotypic connections is defined as The highest amount d max and the lowest amount d min of homotypic lattice site connections are reached in the case of a fully sorted and chessboard-patterned configuration, respectively. The exact values for 2-and 3-cell-type systems are listed in Appendix B. With d max and d min , the order indicator value ω(η) for configuration η is defined as the normalization of d(η): Thus, ω(η) takes values in [0, 1], where configurations with almost perfect sorted patterns, such as in Figure 2I, yield ω(η) values above 0.95. In the case of chessboard configurations as in Figure 2III, the order indicator values are below 0.05. For a random lattice configuration, the order indicator value is ω(η) ≈ 0.5, in the case of a 2-cell-type system, and it is ω(η) ≈ 0.33 for a 3-cell-type system.
With the order indicator ω(η), a time-series of configurations η t can be converted into a time-series of order indicator values ω(η t ), in short ω(t). Furthermore, the asymptotic cell segregation level for a simulation is estimated as the average over the last 10% of the time-series ω(t) and is denoted ω, see Figure 2 for an illustration. for three different cell-type systems, each corresponding to a distinct asymptotic segregation: a cell population with sorted patterns as in configuration I (blue), a not fully cell-type segregated population as in II (orange) and a cell type mixture population with chessboard pattern as in III (green). Configuration 0 illustrates the random start configuration η t=0 with an initial order indicator value ω(t = 0) ≈ 0.5 (marked as red square at t = 0). The simulation termination condition is t = 312499 cell switches (red vertical line). The asymptotic value ω (indicated for each case as black horizontal line at the end) is computed as the average over the last 10% of cell switches (gray area). (b) Analogous for examples of 3-cell-type systems, starting with random configuration, i.e., initial order indicator value ω(t = 0) ≈ 0.33. The three different sets of intercellular adhesion parameters β (3)

Cell System Parameters for Two Cell Types
We present a new simple argument for the form of the effective adhesion parameter β * (2) guiding the asymptotic behavior in the DMM for two cell types [12] to generalize it to systems with arbitrary many cell types. For this, we introduce two additional system parameters β s (2) and β ∆ (2) besides β * (2) , which determine how changes to the intercellular adhesion parameters β (2) affect the asymptotic cell segregation behavior as well as the dynamics. Although the parameters β s (2) and β * (2) set the temporal scale of the waiting times and the asymptotic level of segregation, respectively, we argue that the parame-ter β ∆ (2) determines the number of cell switches required to reach the asymptotic level of segregation.
Indeed, the addition of a constant θ to all intercellular adhesion parameters simultaneously like rescales the cell switch rates c β (x, y, η) in the DMM by a factor exp(−8θ), since independent of the actual configuration η ∈ X and the considered lattice sites x, y ∈ S. This factor does not alter the relation between the cell switch rates, but only rescales all waiting times between cell switches by exp(−8θ). Thus, the spatio-temporal order of the cell sorting dynamics and the asymptotic behavior are not affected. In order to parametrize this rescaling, a scaling parameter β s is introduced, where the functional, which maps β (2) → β s (2) , is given as a scalar product with the vector a s = (1, 1, 1) T .
Secondly, we notice that the choice which cell type is denoted by type w = 0 and which one by w = 1 is arbitrary and the dynamics in the model is invariant against relabeling of the cell types. This invariance is reflected in the cell switch rates c(x, y, η) as well as in the order indicator ω, which quantifies the level of segregation of a configuration η as sum of homotypic lattice site connections. Accordingly, the functional which maps the intercellular adhesion parameters β (2) to the effective adhesion parameter β * (2) must reflect this invariance as well. This implies that an effective adhesion parameter β * (2) , which controls the asymptotic sorting behavior, assuming it exists and is determined by a linear functional on the intercellular adhesion parameters, has the form β * (2) = a β 00 + β 11 + bβ 01 = β (2) , a * with two real-valued constants a, b and a vector a * = (a, a, b) T introduced analogously to Equation (6) to express the functional. The constants a, b are set by the condition that the asymptotic level of segregation and thus the differential adhesion parameter β * (2) must not be affected by the temporal scaling parameter β s (2) , which implies Without loss of generality, we can neglect an additional offset and factor in Equation (7) and choose a = 1 for which follows We present an additional heuristic argument to make the assumption of a linear dependency in Equation (7) plausible and to estimate that the critical value of the effective adhesion parameter, at which the system remains randomly mixed, should be zero: Consider two limit scenarios of a cell switch, first two cells at a straight interface between clusters and secondly a single cell of one type moving inside a cluster of opposite type. As illustrated in Figure 1a, two cells of different type at the interface switch their position and thus cause a less segregated configuration with a rate c mix = exp[−3(β 00 + β 11 ) − 2β 01 ]. In contrast, a cell of type w = 0 inside a cluster of type w = 1 switches position with a given neighbor with rate c unmix, 0 = exp[−3β 11 − 5β 01 ], see Figure 1b. Analogously, for a single cell of type w = 1 in a cluster of type w = 1 the rate is c unmix, 1 = exp[−3β 00 − 5β 01 ]. Both types of switches are necessary to move a single cell of opposite type out of a cluster. Since the former switch with rate c mix reduces the level of segregation while the latter two switches with rates c unmix, 0 , c unmix, 1 are required to increase segregation, the ratio between these types of switches c unmix, 0 /c mix · c unmix, 1 /c mix = exp[3(β 00 + β 11 − 2β 01 )] should determine the level of the asymptotic segregation. In particular, the point at which the system remains mixed is expected where mixing and unmixing rates are equal, i.e., where the ratio of the rates becomes 1. Since the exponential function is monotonous, it is sufficient to focus on the exponent, which is a multiple of the effective adhesion parameter β * (2) = β 00 + β 11 − 2β 01 . Thus, the critical value 1 of the ratio of rates translates into a critical parameter β * (2) = 0 for which a two-cell-type system is expected to remain mixed. After establishing the system parameters β s (2) and β * (2) , which are defined via scalar products with the corresponding vectors a s , a * , we point out that there is a third system parameter β ∆ (2) defined by the vector perpendicular to a s and a * β ∆ The parameter β ∆ (2) quantifies the difference between the homotypic adhesion strengths of the two cell types. It becomes zero if both types have the same strength of homotypic adhesion. We use the absolute value in the definition of the parameter β ∆ (2) as we are not interested in which cell type has the stronger homotypic adhesion. We find numerically that the parameter β ∆ (2) affects how many cell switches are required to reach the asymptotic level of segregation, see Figure 3. For a fixed effective adhesion parameter β * (2) = 3, which means for the same asymptotic level of cell segregation, Figure 3 shows the number of cell switches required to reach certain thresholds of cell segregation, here ω ≥ 0.7, ≥0.8 and ≥0.9. This number increases with increasing β ∆ (2) independently of the chosen thresholds. The convergence parameter slows down the sorting process. For each β ∆ (2) value a different intercellular adhesion parameter β (2) with β * (2) = 3.0 is used and every simulation is repeated three times, see black lines. When a simulation reached an order indicator value ω ≥ 0.7, ≥0.8 and ≥0.9 for the first time the corresponding cell switch number is marked with a blue square, green dot and orange triangle. Each simulation has a random start configuration η t=0 with an initial order indicator value ω(η t=0 ) ≈ 0.5 and the intercellular adhesion parameters for each convergence speed parameter β ∆ This numerical observation is supported by a heuristic argument: Consider again the limit scenarios of a cell switch, i.e., two cells at a straight interface between clusters which switches with a particular neighbor with rate c mix = exp[−3(β 00 + β 11 ) − 2β 01 ] and a single cell of one type within a cluster of opposite type which switches with rate c unmix, 0 = exp[−3β 11 − 5β 01 ], or c unmix, 1 = exp[−3β 00 − 5β 01 ] respectively. In the symmetric case of equal homotypic adhesion parameters β 00 = β 11 , which implies β ∆ (2) = 0 and c unmix, 0 = c unmix, 1 , cell switches of the two single cells in either cluster are equally frequent and these switches occur in a certain ratio with cell switches at the interface. If the symmetry is broken by increasing β ∆ (2) at constant β * (2) and β s (2) , for instance β 00 > β 11 , implying that c unmix, 0 > c unmix, 1 , the single cell of type w = 0 is favored to leave the cluster of opposite type in comparison to the symmetric case and in particular compared to the other single cell of type w = 1. Thus, segregation is now limited by the slower process of the single cell of type w = 1 moving out of clusters of type w = 0. Moreover, the rate c mix of a cell switch at the interface is constant if only the parameter β ∆ (2) is altered. Thus, in between two switches of a single cell of type w = 1 (rate c unmix, 1 ) out of a type w = 0 cluster there are more cell switches at the interface (rate c mix ) than in the symmetric case. Please note that these back-and-forth switches at the interface do not progress the segregation but increase the number of total cell switches. This scenario illustrates how an increase of the parameter β ∆ (2) increases the number of cell switches required to reach the asymptotic level of segregation. Thus, we denote β ∆ (2) as convergence speed parameter. Please note that the convergence speed parameter β ∆ (2) in Equation (10) is proportional to the Euclidean distance between β (2) and the related symmetric adhesion parameters β sym (2) with the same values of β * (2) and β s (2) . Indeed, for a given vector of adhesion parameters β (2) = (β 00 , β 11 , β 01 ) T with system parameters β s (2) , β * (2) , and β ∆ (2) , the corresponding symmetric adhesion parameters β sym (2) = 1 2 [β 00 + β 11 ], 1 2 [β 00 + β 11 ], β 01 have the same system parameters β s (2) and β * (2) but a convergence speed parameter of zero. The Euclid distance of the adhesion parameters β (2) to this symmetric case β sym (2) is Since the convergence speed parameter is zero, the symmetric case β sym (2) leads to the fastest convergence to the asymptotic level of segregation for the given system parameters β s (2) , β * 2) . In contrast to the definition of the convergence speed parameter β ∆ (2) in Equation (10), the definition of β ∆ (2) via Equation (11) can be generalized to N cell types, see below.

The Effective Adhesion Parameter for Arbitrary Number of Cell Types
For a mix of more than two cell types neither the form nor the existence of an effective adhesion parameter are known. Assuming that for an arbitrary number N of cell types an effective adhesion parameter β * (N) exists, we postulate its form based on the generalization of our arguments for the case of two cell types. We also postulate the form of the convergence speed parameter β ∆ (N) as introduced for two cell types. The scaling parameter is directly generalized from Equation (6) to N cell types and only scales all waiting times between cell switches as in the case of two cell types.
The invariance against relabeling of the cell types applies as well to N cell types which implies, analogously to Equation (7), where the constants a, b are set again by the condition Choosing again a = 1, the effective adhesion parameter has the form with corresponding vector a * . For N = 2 this equals the result for two cell types from Equation (9) and for N = 3 this takes the form In contrast to the case of two cell types, the space perpendicular to the vectors a s and a * for N > 2 is not one-but [N + N(N − 1)/2 − 2]-dimensional, i.e., two-dimensional for N = 3. Thus, the definition of a convergence speed parameter β ∆ (N) , analogous to Equation (10), is ambiguous. One option is to consider all pairs of cell types and their corresponding convergence speed parameters, e.g., |β 00 − β 11 |, |β 00 − β 22 |, and |β 11 − β 22 | for N = 3. Since the segregation of all cell types requires the segregation of each pair of cell types, these pair-wise parameters should predict the convergence speed on subsets of intercellular adhesion parameters analogously to Figure 3. For instance, when two vectors of intercellular adhesion parameters are equal except they differ in one of these pair-wise convergence speed parameters, the one with the smaller parameter leads to a faster convergence. However, these pair-wise parameters do not allow comparison if several of them differ between two vectors of adhesion parameters and the influence of the heterotypic parameters is not even considered. Thus, the convergence speed parameter β ∆ (N) is instead defined by the generalization of Equation (11) . . ,β hom ,β het , . . . ,β het We postulate, in analogy to the case of two cell types, that the parameter β ∆ (N) determines the number of cell switches required to reach the asymptotic level of segregation for a given effective adhesion parameter β * (N) .

Numerical Evidence for the Effective Adhesion Parameter
We test our analytical prediction for the effective adhesion parameter for more than two cell types, by comparison with numerical simulations of the DMM model for two and three cell types. We find in both cases that the proposed effective adhesion parameter determines the asymptotic level of cell segregation. However, there are deviations due to the limited simulation time, from which the asymptotic level of the order indicator ω is extrapolated. This is revealed by additionally linking the simulation results to the convergence speed parameter β ∆ , introduced above. For two cell types, we observe that for small values of β ∆ (2) , which refers to simulations which converge quickly to their asymptotic value, the dependency between β * (2) and ω agrees almost perfectly with the analytically proven prediction. Deviations from the analytical prediction are the most pronounced the larger the convergence speed parameter is. For three cell types, we observe an analogous relation between deviations from the postulated dependency on the effective adhesion parameter β * (3) and a convergence speed parameter β ∆ (3) as introduced above.

The Asymptotic Level of Cell Segregation Depends on the Effective Adhesion Parameter
At first the dependence of the level of asymptotic cell segregation ω on the effective adhesion parameter β * is tested. For the 2-cell-type, the effective adhesion parameter clearly determines the asymptotic sorting state, see dark blue points in Figure 4a, i.e., ω increases with increasing β * (2) , as predicted by Voss-Böhme and Deutsch [12], and all configurations stay randomly mixed at β * (2) = 0, i.e., ω(β * = 0) = 0.5 (see black dashed lines), as predicted by the heuristic argument above. The monotonic dependence of the asymptotic order parameter ω on the proposed effective parameter β * (2) is supported by a value of 0.81 of the spearman rank correlation coefficient (Kendall rank correlation coefficient of 0.58). Additionally, Figure 4a shows the asserted influence of the convergence speed parameter β ∆ (2) on level of cell segregation dynamics. More precisely, the higher the β ∆ (2) value is the slower is the sorting process and the closer the respective ω value remains to ω(t = 0) ≈ 0.5 until the end of the simulation. For these intercellular adhesion parameters β (2) , the asymptotic value of the order indicator has not yet been reached, which is why they are scattered above and below the increasing line, indicated by the dark blue points in Figure 4a. Note, that also the effective adhesion parameter β * shows impact on the amount of cell switches needed for a system to reach its asymptotic cell segregation ω. This is further investigated in Appendix D.
Analogous results for the case of three cell types with the postulated effective adhesion parameter are shown in Figure 4b. Qualitatively the same functional relation between the hypothesized effective adhesion parameter β * (3) and ω is visible as in the case of two cell types, see dark blue points in Figure 4b. This includes the prediction that all configurations stay randomly mixed at β * (3) = 0, i.e., ω(β * = 0) = 0.33 (see black dashed lines). The monotonic dependence of the asymptotic order parameter ω on the proposed effective parameter β * (3) is supported by a value of 0.79 of the Spearman rank correlation coefficient (Kendall rank correlation coefficient of 0.53), which is of similar quality as in the case of two cell types. Based on the analogy of the numerical results for two and three cell types, we conclude that β * (3) predicts the level of sorting reached asymptotically such as β * (2) does in the 2-cell-type case. Correspondingly, the asymptotic cell segregation exhibits the postulated influence of the convergence speed parameter β ∆ (3) . Note that the restriction in Figure 4b to simulations with an β ∆ (3) < 3.0 ensures a sufficient sampling of data points with smaller β ∆ (3) parameter values. Smaller β ∆ values result in more simulations reaching their asymptotic cell segregation.

Figure 5.
A plane with the analytically derived normal vector separates data points respective to their estimated asymptotic level of cell segregation. The points refer to the same data as in Figure 4a. The coloration represents the level of asymptotic cell segregation ω. The analytically predicted plane (gray rhomb in the middle) with the normal vector a * is E : β 00 + β 11 − 2β 01 = β (2) , a * = 0.
We present analogous analysis for 4-and 5-cell-type systems, see Appendix C, Figure A1 and find a strong support of the analytically predicted effective adhesion parameter as well.

Estimating the Effective Adhesion Parameter Using Statistical Learning Methods
We confirmed qualitatively the compliance between our prediction and the numerical data in the previous section. Now we confirm additionally our prediction by comparing the analytical effective adhesion parameters for two and three cell types to the corresponding estimates obtained via two statistical learning methods, Support Vector Machines (SVM) and Logistic Regression (Logit-Model) [16]. More precisely, for the general case of N cell types, the learning methods provide a hyperplane equation of the effective adhesion parameter β * (N) . Therefore, we quantify statistically the deviation from our analytical prediction. This is done for β * (2) and β * (3) based on the numerical data ω(β) referred to in the previous section and displayed in Figure 4a,b, respectively. Both methods are suited for this task because they fit the linear dependency of the parameter β * on the intercellular adhesion parameters β, are well documented and directly available from libraries such as scikit-learn for Python [17], and, in contrast to some other approaches like deep learning, both quantify the factors that lead to classification.
The learning methods provide a hyperplane equation as a linear predictor to separate classified data points. Thus, the numerical data points ω(β) are first classified based on their asymptotic order indicator ω and a classification threshold. As thresholds, we chose the random configurations, i.e. ω ≥ 0.5 for two cell types and ω ≥ 0.33 for three cell types. Due to the symmetry of the effective adhesion parameter β * , see Equations (7) and (13), the number of coefficients, which must be estimated, can be reduced to the constant a for the homotypic and the constant b for the heterotypic adhesion parameters. The resulting hyperplane equations for both 2-cell-type and 3-cell-type cases are presented in the headlines of Table 1a,b.  For the 2-cell-type case in Table 1a, both relative coefficients, for homotypic a/a ≈ 1.00 and heterotypic b/a ≈ −1.99 intercellular adhesion parameters, are close to their predicted values 1 and −2, respectively. The relative intercept i/a, i.e. the value of β * (2) which corresponds to the chosen threshold ω = 0.5, is for both learning methods near zero, as predicted by our heuristic argument. These predictions of the SVM and the Logit-Model are of high quality as indicated by model accuracy values close to one. This quantitative result of the statistical learning methods in Table 1a support the analytical prediction for β * (2) = β 00 + β 11 − 2β 01 . Analogous for three cell types, the results of Table 1b support the analytical results by estimating β * (3) = β 00 + β 11 + β 22 − β 01 − β 02 − β 12 with convincing model accuracy of about 0.99 for both learning methods. The relative intercept i/a, i.e., the value of β * (3) which corresponds to the chosen threshold ω = 0.33, is for both learning methods close to zero, as predicted by our heuristic argument. We attribute the deviation of the relative intercept i/a from zero as well as the lower model accuracy compared to two cell types to insufficient convergence of the order indicators as predicted by the convergence speed β ∆ (3) . We present analogous analysis for 4-and 5-cell-type systems, see Appendix C, Table A2 and again find a strong support of the analytically predicted parameters defining the effective adhesion parameter.

The Impact of Interfacial Tension, Adhesion or Repulsion on Cell Segregation
For two cell types, the EAP resolves previous discussions about the impact of interfacial tension on cell segregation compared to interfacial adhesion or repulsion [11]: We predict that a higher level of segregation is reached when the EAP is large, which can be achieved by both heterotypic repulsion or differential adhesion. In fact, transforming the relative contact tensions used in ref. [11] to the corresponding EAP by identifying high tensions T ij in the experiments and the Cellular Potts model (CPM) with low adhesion parameters β ij = −T ij in the DMM, we can predict the experimental and numerical observations there, see Figure 6. In particular, we find that low lengths of heterotypic interfaces (LHI) are observed for high EAP values, analogous to the DMM. Figure 6 shows that the EAP is a better predictor for the asymptotic segregation behavior than the classification according to the DAH/DITH and HIT hypotheses.  [11] from an upper bound of 8000 of all lattice site connections, the EAP is computed from the relative contact tensions T ij used there: β * (2) = −T 00 − T 11 + 2T 01 = β 00 + β 11 − 2β 01 . In contrast, Canty et al. [11] propose that the asymptotic level of segregation is determined by a discrete number of scenarios into which all relative contact energies can be sorted. These scenarios are: Differential Adhesion Hypothesis / Differential Interfacial Tension Hypothesis (DAH/DITH), control (Ctrl), two cases of High Heterotypic Interfacial Tension Hypothesis (HIT), ectoderm-mesoderm energies (E-M), negative control (N). The prediction in [11] was that HIT leads to the highest level of segregation, while E-M, N, Ctrl and DAH/DITH lead to lower segregation.
Thus, we propose that rather than differentiating between intercellular adhesion and contact tension, the combined effect of intercellular contact on cellular motility, which is quantified by the EAP, should be focused on. Furthermore, the correct prediction of the asymptotic level of segregation in the CPM simulations of Ref. [11] using the effective adhesion parameter β * derived for the DMM suggests that our results may be directly applicable to CPM models of cell segregation. This is plausible due to the analogous structure of the exponent of the cell switch rates in the DMM and the energy functional in the CPM.

Discussion
We study analytically and numerically the differential migration model of Voss-Böhme and Deutsch (2010) [12], which is a cell-based model incorporating differential hypothesis [8], differential interfacial hypothesis [10], and High Heterotypic Interfacial Tension Hypothesis [11] within a unified framework. We generalize the existence and form of an effective adhesion parameter (EAP) guiding the asymptotic level of segregation, which is already known for systems with two cell types [12], to systems with an arbitrary number of cell types. We additionally predict the critical value of this effective adhesion parameter at which the system remains randomly mixed. For the case of two and three cell types, we confirm these theoretical predictions numerically and quantify the form of the effective adhesion parameter independently using statistical learning methods. The analogous results for 4-and 5-cell types suggest that our findings are valid in systems with even higher number of cell types. For two cell types, we show that the EAP resolves previous discussions about the impact of interfacial tension on cell segregation compared to interfacial adhesion or repulsion [11]. Thus, we propose that rather than differentiating between intercellular adhesion and contact tension, the combined effect of intercellular contact on cellular motility, which is quantified by the EAP, should be focused on.
Most previous studies focus on the simplest case of two cell types [7,13,14]. Although three and more cell types are more relevant in real biological systems, studies of cell segregation and pattern formation with three or more cell types are still rare [15]. By analytically and numerically demonstrating the existence of cell segregation for three cell types and characterizing its dynamics, we extent results from two cell types to the more general case. The differential migration model has recently been applied successfully to reproduce experimental cell segregation data for two cell types [18]. In particular, the match between model and experimental observation demonstrated that contact tension and adhesion are sufficient to explain segregation data without additional mechanisms of collective motion. Analogously, our analytical and numerical results can be applied to cell segregation experiments with three or more cell types.
If we consider a lattice S = {0, 1, . . . , L} 2 then the ideal case of a fully sorted configuration consists of stripes for each of the N cell types, each with a height of L N and a length of L. In this scenario the highest amount d max of homotypic lattice site connections is d max = 2 · L 2 − n · N, since every lattice site has two unique homotypic connections except for those at cell-type interfaces. These only have one homotypic connection. For the same ideal scenario, a chessboard configuration where no homotypic connections exist can be achieved for the right lattice side length L. Therefore, d min = 0.
In practice, a discrete lattice with 25 2 lattice sites cannot be separated into even stripes. Due to this imperfection, small deviations from these values occur in data. In the case of 2-cell-type systems, a lattice side of L = 25 causes d min = 2 · L = 50 because every cortical lattice site has one homotypic connection due to periodicity.
In other publications such as Canty et al. (2017) [11], the level of cell segregation and clustering is quantified by the length of the heterotypic interface, here we use the amount of homotypic lattice site connections, see Equation (3).

Appendix C. Numerical Evidence for the Effective Adhesion Parameter with Extended
Cell-Type Number Figure A1. The level of the asymptotic cell segregation is predicted by the effective adhesion parameter for 4-and 5-cell-type systems. (a) Each point corresponds to a simulation whose intercellular adhesion parameter is drawn uniform, i.e., β (4) ∼ U[−4, 4] 10 with an additional condition such as β ∆ (4) < 3 , see Equation (17), to reduce scattering for visibility, see text of Figure 4 for details. The start configuration is random, i.e., initial order indicator ω(t = 0) ≈ 0.25. The curve ω(β * (4) ) intersects with the predicted point ω(β * (4) = 0) = 0.25 (highlighted by black dashed lines). A data point represents the asymptotic estimate ω of this simulation after ≈ 300000 cell switches, along with the corresponding effective adhesion parameter β * (4) and convergence parameter β ∆ (4) (color bar). (b) Analogous representation for the data points of 5-cell-type systems, starting with random configurations, i.e., initial order indicator values ω(t = 0) ≈ 0.20. The curve ω(β * (5) ) intersects with the predicted point ω(β * (5) = 0) = 0.20 (highlighted by black dashed lines). The intercellular adhesion parameters are also drawn uniform, i.e., β (5) ∼ U[−2.5, 2.5] 15 with the additional condition β ∆ (5) < 3. Table A2. The effective adhesion parameter is confirmed by statistical learning methods for 4 and 5 cell types. (a) For the estimation of the effective adhesion parameter β * (4) , 2000 simulations are conducted under the same conditions as described in Figure A1a. After simulation, each data point is classified: ω > 0.25 as class 1, else class −1. The classified data points are used by the SVM and Logit-Model to estimate the homotypic a and the heterotypic b coefficient as well as the intercept i according to the table head. The model accuracy is tested via 5-fold cross-validation. (b) Analogous for the 5-cell-type systems the simulation conditions are described in Figure A1b. For data point classification applies: ω > 0.20 as class 1, else class −1. For details on the estimation process as well as the model accuracy test, see Appendix E.  Figure A2 shows that the larger the EAP β * is the more cell switches are needed before a simulation reaches its asymptotic segregation state which is a fully sorted pattern in the case of β * (2) = 12 and β * (2) = 24 according to the DMM, see Section 2.1. This effect is visible from about β * > 4 and the cause is still unknown. Figure A2. The effective adhesion parameter also influences the convergence speed exemplified with 2-cell-type systems. In the case of β * (2) = 12 (violet) the intercellular adhesion parameters are (2.0, 2.0, −4.0) and for β * (2) = 24 (green) they are (4.0, 4.0, −8.0). Both simulations have a random start configuration, i.e., initial order indicator ω(t = 0) ≈ 0.5 and run for about 7.2 million cell switches. The asymptotic estimate ω for the last 10% of ≈ 300000 cell switches (gray area) is written on the corresponding time-series.