1. Introduction
Finding and explaining patterns in the ebb and flow of people in a public gathering have been challenging tasks for the mathematically inclined social scientist [
1,
2]. People constantly join and leave groups, so at any moment, the gathering appears as a collection of social clusters. The stochastic models proposed in the 1960s to explain the size distribution of these free-forming or casual groups ignore any prior knowledge of the individuals present in the gathering and use only a couple of parameters to represent people’s average tendencies to join and leave groups [
3,
4]. Here, we evaluate the suitability of these models to reproduce the empirical size distribution of casual groups.
In fact, the equilibrium group-size distributions resulting from these null models, viz., the zero-truncated Poisson distribution and the logarithmic distribution, well describe the observed size distributions of collections of small groups, such as pedestrians on a sidewalk, playgroups in a playground and shopping groups [
3]. However, these distributions were derived using a mean-field approximation to solve equations for the expected number of groups of a given size, which prompted the criticism that they were not actual outcomes of the models but artifacts of the approximation scheme [
4,
5,
6]. Here, we attempt to settle this long-standing (and likely forgotten) issue using Gillespie’s stochastic algorithm to exactly simulate the group dynamics [
7,
8].
In addition, we extend the null models proposed in the 1960s to consider the situation where the appeal of a group of size
i to isolates (i.e., individuals who are not members of any group) is proportional to
, where
is the attractiveness exponent. Large negative values of
describe situations where a predominance of couples and isolates is expected, whereas large positive values of
foster the formation of a single large group that coexists with isolates. In time, the truncated Poisson distribution is obtained for
[
3], and the logarithmic distribution is obtained for
[
4], which have been the only cases studied so far. The stochastic simulations of the group dynamics indicate that the mean-field approximation yields the exact results for
and a large population size
N. However, the approximation fails for
since it violates the fixed-population constraint.
The variation in the attractiveness exponent
allows the modeling of collections of small, temporary groups as well as of large, stable groups. We find that these two scenarios are separated by a discontinuous phase transition at
. The probability of observing large groups vanishes exponentially with increasing group size
i for
. For
, the probability mass function concentrates around
and
, where
is the number of isolates. Hence, the acquisition-and-loss process of group dynamics does not produce a power-law decay for the probability of finding large groups observed in face-to-face interaction networks [
9]. Long-tailed group-size distributions are outputs of agent-based models where the individuals are ascribed distinct degrees of attractiveness [
10,
11], so it seems that some knowledge of the individuals present in the gathering is necessary to produce these power-law distributions. In fact, the natural tendency of people to gravitate toward others who share similar interests or backgrounds is an important factor in explaining the formation of social groups [
12,
13,
14]. Nevertheless, it is noteworthy that the stochastic null models can explain the empirical size distribution of collections of small groups.
A more recent and fruitful approach to the characterization of social groups or, more precisely, social networks—networks of friends or other acquaintances—is based on the complex networks framework [
15]. In particular, a group of individuals (nodes) with a high density of internal links but with a comparatively lower density of external links is called a community [
16,
17]. Communities are ubiquitous in social and biological systems and are believed to represent real social groupings assembled by interest or background. We note that, although there are a variety of public repositories of animal social networks (see, e.g., [
18,
19]), the detection of communities in the real world as well as in artificial networks is a challenging computational task [
20,
21]. From an evolutionary perspective, the community organization of social networks is considered optimal if it boosts communication and decision making at the group level while keeping a minimum number of connections between individuals [
22,
23]. However, social network communities are relatively stable groups and thus are not good models for fleeting casual groups, which can be described by the less popular face-to-face interaction networks [
24].
The rest of this paper is organized as follows: In
Section 2, we describe the group dynamics and derive the exact equations for the expected number of groups of a given size. In
Section 3, we present a brief overview of Gillespie’s algorithm and study the group dynamics using this stochastic simulation algorithm. In
Section 4, we solve the equations derived in
Section 2 for the equilibrium regime using the mean-field approximation and present explicit analytical expressions for the cases
,
and
, which are then compared with the simulation results. In
Section 5, we study the equilibrium regime for
using stochastic simulations and show that in the limit
, there is a discontinuous phase transition separating the scenarios where the variance of the group size is finite and where it diverges linearly with increasing
N. Finally, in
Section 6, we review our main results and present some concluding remarks.
2. The Model
We consider a fixed number of individuals N that organize themselves into a variable number of groups of size in a closed system. We denote by the number of groups of size i at time t. These random variables satisfy the constraint and determine the total number of groups in the system, viz., , which is also a random variable. The processes of joining and leaving the groups are as follows.
Each individual in a group of size has a probability of leaving the group during the time interval . When an individual leaves a group, it becomes an isolate, i.e., a group of size . An isolate has a probability of joining a group (including other isolates) in the time interval .
We assume that the attractiveness of a group of size
i to isolates is proportional to
. The case
describes the situation where isolates join any group in the system at the same rate, whatever its size [
3]. For
, we have a contagious scenario that favors the formation of large groups [
4], whereas, for
, we have an aversion scenario that disfavors the formation of groups. In the 1960s, approximate analytical expressions for the expected values of
were derived for the cases
and
only [
3,
4].
The understanding of the processes by which groups acquire and lose individuals is facilitated if we write down the conditional expected values of the random variables
given that the system is in the state
at time
t. Let us begin with the conditional expectation of the number of isolates,
where we have omitted the dependence of the variables
on
t that appear on the right-hand side (RHS) of the equation. In addition, we have introduced the notation
so that
and
. The third term on the RHS of Equation (
1) takes into account the fact that any individual who is not isolated may become an isolate with a probability of
. The second term corrects the third term by reckoning with the fact that whenever an individual leaves a group of two individuals, two isolates are created. The fourth term on the RHS of Equation (
1) accounts for the event that an isolate joins any group with
individuals, whereas the fifth term accounts for the aggregation of two isolates.
Next, we consider the conditional expectation of the number of couples, viz.,
The second and third terms on the RHS of Equation (
3) account for the facts that a group of two individuals is created when an individual leaves a group of three individuals, and it is destroyed when an individual leaves a group of two individuals. The fourth term accounts for the event that isolates attracted by groups of two individuals produce groups of three individuals, and the fifth term accounts for the joining of two isolates to produce a group of two individuals.
For groups with
individuals, we can write the general expression
with
. The interpretation of the terms in this equation follows straightforwardly from the interpretations of the terms in Equations (
1) and (
3).
We stress that Equations (
1), (
3) and (
4) for the conditional expectations of
are exact. Adding these equations yields the conditional expectation for the number of groups,
from which we can see that the isolates play a key role in driving the casual group dynamics.
Averaging Equations (
1), (
3)–(
5) over the states
and taking the limit
yield
for
, and
where
, and we have introduced the notation
. We note that by rescaling the time
, these equations depend on the aggregation and disaggregation rates only through their ratio,
Of course, Equations (
6)–(
8) do not form a closed set of equations since there are quantities (e.g.,
) that are left undefined. Somewhat surprisingly, however, in the equilibrium regime, i.e.,
for
, Equation (
9) yields the exact mean number of isolates,
which does not depend on the attractiveness exponent
.
3. The Gillespie Algorithm
Here, we offer a brief overview of Gillespie’s algorithm for simulating continuous-time stochastic models [
7,
8]. In the time interval
, the probability that aggregation occurs is
, and the probability that an individual leaves a group is
. Since these two events decrease and increase the number of groups by one unity, respectively, their probabilities appear on the RHS of Equation (
5). Given the state
at time
t, the probability that the next event will occur in the infinitesimal time interval
is
, where
is the exponential distribution,
and
is the total rate of events. The event that occurs in the time interval
is an aggregation with a probability
and a disaggregation with a probability
. In the case that aggregation occurs, there are two possibilities: an isolate can join a group of size
, which is an event that happens with a probability
or two isolates can join together to form a couple, which has a probability
In the case that disaggregation occurs in the time interval
a single individual leaves a group of size
, which is an event that happens with a probability
In sum, given the state of the system
at time
t, the stochastic simulation of the casual-group model begins with the choice of the time
when the next event will occur using the distribution in (
12), followed by the choice of the type of event—aggregation with a probability
and disaggregation with a probability
. Finally, the specific aggregation and disaggregation events are chosen with probabilities given by Equations (
14)–(
16). We note that, with the exception of the determination of the time of the next event, the aggregation and disaggregation rates always appear in the form of the ratio
. This numerical algorithm produces the exact trajectories of the states
, which, when properly averaged over many independent runs, offers the only way to verify the validity of the approximation schemes used to solve high-dimensional master equations [
7,
8].
Here, we use the bracket notation
to indicate the average over independent runs. In fact, because the number of runs is very large (typically
), we can safely equate the average over independent runs to the expected values of the random variables
, hence the choice of the same bracket notation used in
Section 2.
Figure 1 shows the time evolution of the mean density of isolates,
, and the mean number of individuals per group
for different values of the attractiveness exponent
. At time
, all
N individuals are isolates (i.e.,
and
). Interestingly, although the mean density of isolates does not depend on
in the equilibrium regime, as shown in Equation (
11), the transient regime is affected by the attractiveness exponent: the increase in
favors the production of isolates. This is so because
hinders the formation of couples, which requires the annihilation of two isolates, whereas the formation of groups of size
requires the annihilation of only one isolate. In addition, an increase in
increases the transient period, as well as the mean size of the groups.
Figure 2 shows the effect of the population size
N on the density of isolates and on the mean group size for
. The results are qualitatively similar to other values of the attractiveness exponent. We note the remarkable unresponsiveness of the density of isolates to changes in
N. In fact, although the decrease in
N results in a reduction in the rate of events
, given by Equation (
13), the effect of a single event on the density of isolates is enhanced. These two effects compensate for each other, resulting in a system size invariance of
. However, the mean group size decreases with increasing
N and converges very rapidly to the infinite system size limit. For
, this limit is described very well by the analytical approximation for the equilibrium solutions of Equations (
6)–(
8) that we will derive in the next section.
4. Analytical Approximation for the Equilibrium Regime
Here, we derive the controversial approximate analytical results for the expectations
that motivated the present contribution [
4,
5,
6]. In the following, we will focus only on the equilibrium regime, i.e.,
for
. To solve the equilibrium equations for
with
, we make the assumption
where
f is an arbitrary rational function. Such a strong assumption is valid if the random variables
are self-averaging, i.e.,
for all
i in the limit
. This neglect of fluctuations is the basis of the popular mean-field approximation of statistical physics [
25]. With this assumption, we can easily write
for
in terms of
and
,
where
, and
is given by Equation (
11). These equations must be solved self-consistently: for an arbitrary value of
, we calculate
for
, which we then use to update the estimate of
. The process is repeated until convergence. At this point, we can already see that the assumption in (
17) leads to nonphysical results for
, where
(see Equation (
11)), since it implies
for
. This breakdown of the mean-field approximation is expected because a necessary condition for the self-averaging property to hold is that
for large
N.
In addition, and more importantly, Equation (
18) holds for
only. Although for finite
N, the self-consistent strategy yields a solution for any value of
, in which
scales with
, the solution does not satisfy the constraint
for
. The reason is that Equation (
18) yields a non-negligible value for
, resulting in a net flow of individuals to nonphysical group sizes and the consequent violation of the constant-population-size constraint. This effect is negligible for
because
is vanishingly small (provided that
N is not too small), so the flow of individuals to nonphysical regions is inconsequential. We stress that the exact simulations of the group dynamics result in negligible values of
for any
, as we will show in
Section 5.
A relevant quantity that is usually observed in empirical investigations [
1,
2,
3] is the mean fraction of groups of size
at equilibrium, defined as
where the approximation is justified by the assumption in (
17). Of course,
can be interpreted as the probability of observing a group of size
i. It is interesting that empirical studies typically clump together groups of the same size that are observed on many different occasions (e.g., pedestrians on a sidewalk during Spring mornings in Eugene, Oregon [
3]), so they report the total number of observations of groups of a given size. Summing over the different sizes yields the total number of groups observed. Hence, the ratio between the two averages
is actually the correct measure to describe the empirical results. However, in stochastic simulations, we calculate the ratios
for each run and then average the results over the many independent runs, so we measure
.
In the following, we present explicit analytical expressions of
for
,
and the limit
. In addition, we present the numerical solution of Equation (
18) obtained with the self-consistent method for general
.
4.1. Case
In this case,
, and Equation (
18) reduces to
for
, with
from which we obtain an explicit expression for the mean number of groups in the equilibrium regime,
In deriving Equation (
21), we have assumed that
in order to carry out the sum over the group sizes, whereas, in deriving Equation (
22), we have assumed that
is on the order of
N, which means that
. As already pointed out, these are the necessary conditions for the validity of the self-averaging property that underlies the mean-field approximation.
Hence,
where
, which we identify as the zero-truncated Poisson distribution. Interestingly, this distribution fits a wide variety of data of small groups [
3]. The mean and the variance of the group size are
and
Figure 3 exhibits the comparison between the stochastic simulations and the truncated Poisson distribution (
23) for
and
. As expected, the mean-field approximation fails to describe the size distribution for
, but for
, its predictions are indistinguishable from the simulation results. This finding validates the use of that approximation, provided that the number of individuals is not too small.
4.2. Case
In this case,
, and Equation (
18) reduces to
with
given in Equation (
11). As before, assuming that
and
, we obtain an explicit expression for the mean number of groups:
Thus, the fraction
of groups of size
is
which we identify as the logarithmic distribution used to model relative species abundance [
26]. Hence, the mean group size is
and the variance of the group size is
Figure 4 exhibits the comparison between the stochastic simulations and the logarithmic distribution (
28) for
and
. As before, the results validate the use of the mean-field approximation if the number of individuals is not too small.
4.3. The Limit
In the limit
, we have
,
and
for
. Hence, the fraction of groups of size
i is
and
for
.
Figure 5 shows the simulation results for
and
. We have verified that
for
in the simulations. As before, although the mean-field approximation fails for
, it yields the exact result for
.
The mean group size is
and the variance of the group size is
The limit
is instructive because we can solve Equations (
6)–(
8) exactly for any value of
using the fact that
for
. We find
where
. From this equation, we can easily obtain
and
. The reason that Equations (
31) and (
32) are only approximate and thus fail to fit the data for
in
Figure 5 is that, although we can calculate
and
exactly for all
, we do not know how to calculate
, which is the quantity measured in the simulations.
4.4. General
Except for the three cases discussed before, it is not possible to obtain explicit analytical expressions for
because we cannot carry out the summation necessary to compute
in a closed form. However, the use of the self-consistent method allows us to easily obtain these quantities numerically. Since we have already established that the mean-field approximation is very accurate, even for
, in
Figure 6, we present only the approximate theoretical results for
and
.
As expected, decreasing the value of the attractiveness exponent decreases the mean and the variance of the group sizes. We note that these quantities have explicit analytical expressions in the cases
(Equations (
24) and (
25)),
(Equations (
29) and (
30)) and
(Equations (
33) and (
34)).
5. Equilibrium Regime for
This is by far the most interesting situation because of the complete failure of the mean-field approximation. As already pointed out, the reason is that the solution of Equation (
18) violates the fixed-population constraint for
. However, we can still obtain some useful analytical information by considering the limit of very large
. In this limit, the system is composed of
isolates and a single group of
individuals on average. Hence, the mean number of groups is
, and we have
and
so that the mean group size is finite, but the variance diverges in the limit
. The mean size of the large group is
Interestingly,
Figure 7 shows that this scenario—a single large group coexisting with isolates—describes the case
very well for large
N.
Figure 8 corroborates this finding by showing that
tends to a bimodal distribution characterized by sharp peaks at
and
in the limit of large
N. In addition, this figure shows that
(or
) is negligibly small for
, in disagreement with the prediction (
17) of the mean-field approximation.
In order to better understand the transition between the equilibrium regime characterized by a finite variance
and the regime where
diverges linearly with
N as
, in
Figure 9, we show the influence of the attractiveness exponent
on
and
. The results indicate the existence of a discontinuous transition between these two regimes that takes place at a critical value
. In addition,
, so the regime of infinite variance but finite
is not perfectly described by the
scenario. In particular, for
, we find
. This estimate was obtained by considering population sizes up to
and noticing that, for
, the variance
tends to a fixed value, whereas, for
, it increases with
N. As
increases, we find that
. We note that from the statistical physics perspective, the scaled variance
is the order parameter of the casual-group model since it is zero for
and nonzero otherwise.
Somewhat disappointingly, the acquisition-and-loss process underlying our model does not produce a power-law decay for the probability of finding large groups. In fact,
Figure 10 shows that in the vicinity of the transition point
, where a scale-free behavior is more likely to be observed [
27], the probability of observing large groups vanishes exponentially with increasing group size for
or exhibits two peaks for
. It remains a challenge to find a simple acquisition-and-loss process that leads to a power-law distribution of group sizes as observed in face-to-face interaction networks [
10,
11].
6. Discussion
The distribution of sizes is likely the simplest quantitative information we can derive from the observation of freely forming groups. Of course, if the interrelations of the people present were known a priori, we could most certainly predict the formation and composition of some groups. However, here, we follow an alternative and more fruitful approach that ignores any prior knowledge of the individuals present and attempts to explain the observations using stochastic models characterized by a few parameters that represent people’s average tendencies to join and leave a group of a certain size [
3,
4]. Because the models considered here do not take into account individual idiosyncrasies, we refer to them as null models.
The null models assume that the total number of individuals
N is fixed, i.e., that the system is closed, but this assumption is rarely satisfied in field studies of casual groups. For instance, the number of pedestrians on a sidewalk observed on distinct days varies greatly, and it is likely bounded by the city population. However, by taking the limit
, the fixed-population constraint becomes inconsequential. Of course, when considering this limit, we must focus only on the ratios of the number of groups, as applied in Equation (
19). In fact, this was the approach used in the pioneer paper that introduced the mathematical modeling of casual groups [
5]. In any event, our results indicate that even for
, the group-size distribution is practically indistinguishable from the distribution derived in the infinite population limit. Regarding the connection between the field studies and the mathematical models, it is assumed that the acquisition-and-loss process takes place at the time of the formation of the groups and that the system is at equilibrium at the moment of the observation. In addition, it is assumed that the observation happens on a time scale that is much faster than the acquisition-and-loss process, so the groups maintain their sizes during the period of observation [
3].
Here, we extend previous models of casual groups by assuming that the appeal of a group of size
i to isolates is proportional to
, where
is the attractiveness exponent. The control of the appeal of groups of different sizes to isolates, which is obtained by tuning the exponent
, allows us to consider some interesting scenarios. For instance, large negative values of
could describe people on the sidewalk walking to a dance party, where a predominance of couples is expected, which cannot be explained by the truncated Poisson distribution (see [
28] for an alternative, more complex model of couples and isolates). Large positive values of
result in a bimodal distribution of group sizes, corresponding to a scenario where a single large group coexists with a number of isolates. Interestingly, in both cases, the proportion between the number of individuals in the large group or the number of individuals forming couples and the number of isolates is given by the ratio
between the rates of aggregation and disaggregation.
Our main result is that the mean-field approximation used to derive the distribution of group sizes in the case that the attractiveness of a group does not depend on its size (i.e.,
) and in the case that it increases linearly with the group size (i.e.,
) actually yields the exact result for
. This conclusion, which is drawn from the agreement between the exact stochastic simulations of the group dynamics and the mean-field results, dismisses the suspicion of the inadequacy of the mean-field approximation to describe the equilibrium size distribution of casual groups [
6]. (Of course, neither Gillespie’s algorithm [
7,
8] nor the computational resources to implement it were available in the 1960s to settle this issue.) In fact, for
, the mean-field approximation yields very good predictions, even for a relatively small population size (e.g.,
). However, the approximation fails spectacularly for
, since it violates the fixed-population constraint. In this case, Gillespie’s stochastic simulation algorithm emerges as the only resource to study the dynamics of casual groups.
In addition, we find that the variation in the attractiveness exponent
produces scenarios where the group sizes are typically small, which is the situation addressed in the literature on casual groups [
2], and scenarios where most of the population is confined to a single group. The latter scenario corresponds to the large and stable groups formed by gregarious animals, whose sizes are determined by a variety of selective pressures, such as defense against predation and foraging success [
29]; the cognitive load that constrains the number of individuals with whom it is possible to maintain stable relationships [
30]; competence in problem solving [
31]; and individual distance preservation [
32]. Remarkably, in our model, these distinct scenarios are separated by a discontinuous phase transition that takes place at
, indicating that both types of aggregation behavior can be explained by the same underlying acquisition-and-loss process.
The biological and sociological implication of the success of the null models to produce the empirical distribution of sizes for small groups (i.e., the truncated Poisson distribution) is that prior knowledge of the individuals present in the gathering, as well as individual idiosyncrasies, is not necessary to explain the size distribution of casual groups.
There are at least two research avenues to pursue in order to further improve our understanding of the fleeting clusters of people observed in public gatherings. First, different forms of group attractiveness to isolates can be explored so as to fit the data available from the SocioPatterns collaboration [
24], which suggests that the group-size distribution decays as a power law for large group sizes [
9,
10,
11]. Second, the individual-based model that reproduces the SocioPatterns collaboration data [
10,
11] can be used to fit the small groups’ data available in the seminal works on casual groups [
2,
3]. If any of these pursuits is successful, one would be able to reproduce all available data on casual groups with a single model.