Selected Topics of Social Physics: Equilibrium Systems

The present review is based on the lectures that the author had been giving during several years at the Swiss Federal Institute of Technology in Zurich (ETH Zurich). Being bounded by lecture frames, the selection of the material, by necessity, is limited and is motivated by the author's research interests. The paper gives an introduction to the physics of social systems, providing the main definitions and notions used in the modeling of these systems. The behavior of social systems is illustrated by several simple typical models. The present part considers equilibrium systems. Nonequilibrium systems will be presented in the second part of the lectures. The style of the paper combines the features of a tutorial and a survey, which, from one side, makes it easy to read for nonspecialists aiming at grasping the basics of social physics, and from the other side, describes several rather recent original models containing new ideas that could be of interest to experienced researchers in the field.


Introduction
Nowadays physics methods and models are widely used in social science for characterizing the behavior of different social systems. It is useful to stress that physics provides methods and models for describing social systems, but it does not pretend to replace social science. Under a social system one implies a bound collective of a number of interacting agents. These can be various human societies, starting with families, professional groups, members of organizations, financial markets, country populations etc. Or these can be animal societies forming biological systems, e.g. wolf packs, fish shoals, bird flocks, horse herds, swarms of bees, ant colonies, and like that. Thus the number of agents in a society is more than one and usually it is much larger than one (N ≫ 1).
The collective of agents in a society is bound in the sense of having some features uniting the society members. For example, the members of a society can participate in joint activity, or can share similar goals or beliefs. The common features uniting the members make the collective stable or metastable, that is, bound for sufficiently long time that is much larger than the interaction time between the members. The agent interactions can be physical, economic, financial, and so on. Societies can be formed by force, as in the army and in prison, or can be self-organized.
Social systems pertain to the class of complex systems. A system is complex if some of its properties are qualitatively different from the agglomerated properties of its parts. Complex systems can be structured, consisting of parts that are complex systems themselves. For instance, the human world is composed of countries that, clearly, also are complex systems. In that sense, social systems usually are hierarchical, containing several levels of complex systems. For example, the human world is made up of countries that are composed of social groups that form organizations that include individuals whose bodies are made of biological cells and whose brains making decisions are extremely complex systems.
To represent a complex system, the society needs to have the following properties. The number of agents, composing the society, is to be large, N ≫ 1, although this is not sufficient. If the members of the society are independent, so that the overall system can be characterized by a set of independent agents, this does not make such a system complex. In a complex society, its features are not just a sum of features of separate members. Thus bees can fly, but a bee colony is not just a swarm of flying bees. A group of people walking separately from each other is not a social system. A society becomes complex due to its agents interactions and mutual relations, which cause the nonadditivity of the society features. Strictly speaking, there is no a generally accepted mathematical measure of social system complexity [1][2][3][4][5]. Complexity is rather a qualitative notion assuming that: A society is complex if it is stable, consists of many interacting agents, so that the typical society features are not just an arithmetic average of features of separate individuals.
For example, an ant colony is a complex system, since the behavior of separate ants is regulated by their interactions and distribution of jobs: thus ant queens lay eggs, worker ants form an ant-hill, feed the queen, winged males mate with queens and die. Another example of a complex system is a bee colony, where a queen produces eggs, workers clean out the cells, remove debris, feed the brood, care for the queen, build combs, guard the entrance, ventilate the hive, forge for nectar, pollen, propolis, and water, and drones fertilize the queen and die upon mating.
There are three interconnected problems in the description of complex social systems: how to model a society, how to investigate the model, and how it would be possible to regulate the society behavior. A model is to be simple, and at the same time realistic. Too simple models may not describe reality, but too complicated models can distort reality because of accumulation of errors in an excessively complicated description. Sometimes it happens that "less is more" [6]. So, it is desirable that models be not trivial, although not overcomplicated. In this review, first, the basic ideas are described allowing one to understand the general principles of constructing the society models, so that, from one side, the model would not be overcomplicated and, from another side, could catch the characteristic features of the considered society. Second, the main methods of investigating the system behavior are studied. Third, conditions are discussed making it possible to find the desired properties of the society and the related optimal parameters of models.
The material of the article is based on the lectures that the author had been giving for several years at the Swiss Federal Institute of Technology in Zürich (ETH Zürich). Overall, the content constitutes an one-year course consisting of two parts, one devoted to equilibrium (or quasi-equilibrium) systems, and the other, to nonequilibrium systems. Following this natural separation, the presentation of the material is also split into the corresponding parts. The present part deals with equilibrium social systems. The next part will consider nonequilibrium social systems [7].
Being bounded by the lecture frames, the content of the review, by necessity, is limited. The choice of the presented material is based on the researh interests of the author.
The layout of the present part is as follows. In Sec. 2, the principle of minimal information is discussed playing the pivotal role in establishing probability distributions for equilibrium and quasi-equilibrium systems. Section 3 studies some simple typical models of equilibrium social systems. Section 4 gives the basics of collective decision making in a society. Finally, Sec. 5 concludes.

Principle of Minimal Information
Practically all models, intending to describe the behavior of equilibrium or quasi-equilibrium complex systems, start with some extremization principles. These principles are widely spread in science as well as in life. Just as a joke, we may say that all life follows the extremization principle that can be formulated as "minimum of labor and maximum of pleasure".
To model social systems in complicated situations, when not all information is available, it is customary to resort to probabilistic description, defining the probability distribution from the principle of minimal information. This principle helps to develop an optimal description of a social system having limited amount of information on its properties. The information is always limited, since there are many agents and, in addition, their actions often are not absolutely rational. Experimental studies of brain activity clearly demonstrate that only a finite amount of information can be successfully processed by alive beings [8]. The rationality of social individuals is always bounded [9]. Also, there exists random influence of environment. Moreover, not all information is even necessary for correct description, but excessive information can lead to the accumulation of errors and incorrect conclusions.

Information Entropy
Let us consider a social system of N agents enumerated by the index j = 1, 2, . . . , N. Generally, the number of agents can depend on time. Each j-th agent is marked by a set of characteristics σ j = {σ jα : α = 1, 2, . . . , M j } . (2.1) The collection of the characteristics of all agents makes up the society configuration set X = {σ j : j = 1, 2, . . . , N} . (2. 2) The probability distribution ρ(σ) over the variables σ ∈ X is to be normalized, where the sum over σ implies .
If all variables are equiprobable, then the distribution is uniform, A statistical social system is the triple {N, X, ρ(σ)} . (2.4) However, the probability distribution needs to be defined. For this purpose, one introduces the information entropy S that is a measure of the system uncertainty. It is a measure of unaccessible information, being a functional of the distribution over the society characteristics, (2.5) enjoying the following properties.
(i) Continuity. It is a continuous functional of the distribution, such that for a small variation of the latter, the entropy variation is small: (iii) Additivity. If a statistical system with the set of variables σ can be separated into two mutually independent subsystems with the variable sets σ 1 and σ 2 , such that then the entropy of the total system is a sum of the subsystems entropies, (2.8) Shannon theorem [10]. The unique functional, satisfying the above conditions, up to a positive constant factor, has the form: (2.9) Here, the natural logarithm is used, but, generally, it is not important what logarithm base is employed since the entropy is defined up to a constant factor. The information entropy coincides with the Gibbs entropy in statistical mechanics [11,12]. For the uniform distribution, we have which, clearly, is a monotonic increasing function of N tot . Generally, the entropy is in the range It is zero, when there is just a single agent with a single characteristic, so that N tot = 1 and the distribution is trivial, then The information entropy is a measure of the system uncertainty. In other words, one can say that the entropy is a measure of unaccessible information. When a statistical system, with a distribution ρ(σ), is initially characterized by a trial likelihood distribution ρ 0 (σ), then the form is called the Kullback-Leibler information gain, or relative entropy, or Kullback-Leibler divergence [13,14].

Information Functional
Information functional has to include the information gain (2.13). In addition, we should not forget that the probability distribution is to be normalized, as in Eq. (2.3), hence σ ρ(σ) − 1 = 0 . (2.14) There can exist another information defining some average quantities C i by the condition Overall, the information functional takes the form The use of the Kullback-Leibler information gain for deriving probability distributions is justified by the Shore-Johnson theorem [15].
Shore-Jonson theorem [15]. There exists only one distribution satisfying consistency conditions, and this distribution is uniquely defined by the minimum of the Kullback-Leibler information gain, under given constraints.
In the information functional (2.16), the coefficients λ 0 and λ i are the Lagrange multipliers, whose variation yields the normalization condition (2.14) and the expectation conditions (2.15).
Principle of minimal information. The probability distribution of an equilibrium statistical system (2.4) is defined as the minimizer of the information functional (2.16), thus satisfying the conditions: The minimization conditions give which leads to the distribution If there is no preliminary information on the trial distribution properties, so that all states are equiprobable, this implies that the trial distribution is uniform, Then the sought distribution reads as with the normalization factor, called partition function, In statistical mechanics, this is called the Gibbs distribution [11,12]. The principle of minimal information is equivalent to conditional maximization of entropy. As is stated by Jaynes, the Gibbs distribution can be used for any complex system whose description is based on information theory [16,17].
It may happen that the probability distribution ρ(σ, x) depends on a parameter x that has not been uniquely prescribed. In such a case, this parameter can be chosen such that to follow the principle of minimal information. To this end, substituting the distribution (2.19) into the information functional (2.16) results in is relative information. Keeping in mind that the average quantities C i are fixed, we see that the minimization of the information functional with respect to the parameter x is equivalent to the minimization of the relative information (2.22), since In that way, the probability distribution can be uniquely defined [18].

Representative Ensembles
When the probability distribution is defined by the principle of minimal information, this does not mean that it is useful to have little information on the system. Vice versa, it is necessary to include into the information functional all available relevant information on the system, corresponding to the expected-value conditions. The principle of minimal information shows how to obtain an optimal description of a complex system, while possessing the minimal information on the latter. All available important information on the system must be taken into account. Thus a statistical system is described by the triple (2.4) consisting of N system members, society set X, and the probability distribution ρ(σ). The pair {X, ρ(σ)} is called statistical ensemble. Observables are represented by real functions A(σ) = A * (σ) whose averages are the observable quantities that can be measured. The collection of all available observable quantities { A(σ) } is a statistical state. A statistical ensemble is termed representative when it provides the correct description for the system statistical state, so that the theoretical expectation values of observable quantities accurately describe the corresponding measured values. For this, it is necessary to include into the information functional all relevant information in the form of additional constraints. Only then the minimization of the information functional will produce a correct probability distribution.
The idea of representative ensembles goes back to Gibbs [11,12]. Their importance is emphasized by Ter Haar [19,20]. The necessity of employing representative ensembles for obtaining reliable theoretical estimates has been analyzed and the explanation how the use of non-representative ensembles leads to incorrect results has been discussed in detail [21][22][23][24].

Arrow of Time
In general, the probability distribution ρ(σ, t) can depend on time t. An important question is: Why time is assumed to always increase? One usually connects this with the second law of thermodynamics according to which the entropy of an isolated system left to spontaneous evolution cannot decrease [25]. However, strictly speaking, by Liouville theorem, the entropy of a closed system remains constant in time (see, e.g., [26]). It is possible to infer that the arrow of time appears in quasi-isolated systems due to their stochastic instability [27][28][29][30][31]. Here we show that there is a very simple way of proving that the irreversibility of time can be connected with the non-decrease of information gain. First, we need to remind the Gibbs inequality.
Gibbs inequality. For two non-negative functions A(σ) ≥ 0 and B(σ) ≥ 0, one has: (2.25) Proof. The proof follows from the inequality Consequence. The information gain, caused by the transition from the probability ρ 0 (σ) to ρ(σ), is semi-positive: More examples of useful inequalities in information theory can be found in [32]. Now let us consider the natural change of the probability distribution with time, starting from the initial value ρ(σ, 0) to the final ρ(σ, t) at time t. Then the following statement is valid.
Non-decrease of information gain. The information gain, due to the evolution of the probability distribution from the initial distribution ρ(σ, 0) to the final ρ(σ, t), does not decrease: Thus the non-decrease of the information gain with time can be connected with the direction of time.

Equilibrium Social Systems
This section introduces the main notions required for modeling equilibrium social systems and studies some simple statistical models. In the long run, social systems are, strictly speaking, nonequilibrium. However, if the society does not experience external shocks during the period of time much longer than the typical interaction time between the society members, this society can be treated as equilibrium for that period of time. More details on the so-called social physics can be found in Refs. [33][34][35][36][37][38].

Free Energy
Equilibrium social systems can be treated by statistical theory being a particular application of the above notions of information theory and the principle of minimal information. Following the notation of the previous section, let us consider a society of N members, where each member is associated with characteristics (2.1). The considered society occupies a volume V .
Our aim is to study almost isolated societies, with self-organization due to their internal properties. Of course, no real society can be absolutely isolated from surrounding. The influence of surrounding is treated as stationary random perturbations, or stationary noise. The perturbations can be produced by other societies, by natural causes, such as earthquakes, floods, droughts, epidemics, etc. The noise is considered as being stationary, which implies that equilibrium situation is assumed. The influence of noise on the society is measured by temperature T that is the measure of noise intensity. In the limiting cases, the absence of noise implies T = 0, and extremely strong noise means T = ∞. One often uses the inverse temperature that can be interpreted as the measure of society isolation from random noise. Respectively, β = 0 means no isolation and extreme noise, while β = ∞ implies the complete isolation and no noise. An equilibrium system is conveniently characterized by a functional H(σ) termed Hamiltonian, or harm. The expected value of the Hamiltonian is the society energy which is also called society cost. In some economic and financial applications it is termed disagreement or dissatisfaction, since the higher energy assumes the more excited society. The energy of noise is T S, with S being the information entropy. The free energy is the part of the society energy, due to the society itself, without the noise energy: In the limiting case of zero temperature, when there is no noise, the free energy coincides with the society energy, The probability distribution over the society characteristics is defined by the principle of minimal information, which in the present notation gives It is easy to check that the entropy S can be represented as Society pressure is defined as This relation implies that the region occupied by a society can be changed because of pressure. Population density is Combining (2.22) and (3.8) gives the equality Hence the information functional (2.21) and the free energy (3.12) are connected: Since the values C i are fixed, we have (3.14) Therefore the minimization of the information functional over a parameter x is equivalent to the minimization of free energy over this parameter:

Society Stability
Considering society models, it is important to make it sure that the society is stable. A society is stable if small variations of parameters do not drive it far from its initial state. Between several admissible statistical states, the system chooses that which provides the minimal free energy. The system is stable with respect to the variation of parameters if its free energy is minimal, so that The variation of the intensity of noise, that is of temperature, under fixed volume, is characterized by specific heat With the free energy (3.8), this takes the form where the variation of the Hamiltonian H = H(σ) means The latter is non-negative, hence the specific heat has to be non-negative. The variation of volume, under fixed temperature, is characterized by the compressibility The condition of stability requires that the specific heat, as well as compressibility, be positive and finite, In this way, the principle of minimal information agrees with the minimal free energy, which, in turn, assumes the society stability.
As an example of society instabilities, it is possible to mention disintegration of a country as a result of a war or because of external economic pressure. Another example is bankruptcy of a firm caused by changed financial conditions.

Practical Approaches
In order to accomplish quantitative investigations of society properties, two different approaches are employed: network approach and typical-agent approach.
The network approach, also called multi-agent modeling, is based on the following assumptions: (i) The considered society consists of agents, or nodes, that are fixed at the lattice sites of a spatial (usually two-dimensional) lattice.
(ii) The agents interact with each other when they are close to each other, usually the nearest-neighbor interactions are considered.
(iii) Because of rather complicated calculations, as a rule, one has to resort to numerical modeling with computers.
There exist many examples of networks, such as electric-current networks, models of magnetic and ferroelectric materials, neuron networks in brains, computer networks, and so on.
The results of the network approach depend on lattice dimensionality modeling the society (whether one-, or two-, or three-dimensional lattices are considered), lattice geometry (cubic, triangular, or another structure), and on the interaction type and range (long-range, shortrange, mid-range). The network approach is appropriate for small or simply structured systems, with agents that can be treated as fixed at spatial points.
However complex societies, such as human and biological societies, are not composed of agents tied to a spatial lattice, usually they are not attached to any fixed spatial locations or sites. The agent interactions are not of nearest-neighbor type and can be independent of distance between the agents. Interactions in a society can be either direct, when interacting with fixed neighbors, or can involve changing neighbors. Nowadays the interactions not depending on the distance are widespread, such as through phone, Skype, WhatsApp, Telegram, and like that. There exist indirect interactions through letters or e-mails, by reading news papers and books, by listening to radio or watching television. The majority of interactions are long-range, but not solely with nearest neighbors. Other biological societies are also interacting at large distances by means of their voices and smells.
Summarizing, complex societies, like human or other biological societies, are formed by agents that are not fixed at spatial locations and can interact at long distance. This kind of societies is better described by the typical-agent approach. In this approach, one reduces the problem to the consideration of the behavior of typical agents representing a kind of an average member of the society. For example, the typical interaction of agents, having the characteristics σ i and σ j , and described by the term σ i σ j , is transformed to the expression Thus the representation (3.22) becomes and the average interaction reads as Then the description of the interaction is reduced to the consideration of typical agents subject to the average influence of other typical agents. For complex social systems, the typicalagent approach is not merely simpler but is more correct.

Society Transitions
Usually, one is interested not in the characteristics of single agents but in the general behavior of a society. For this purpose, one considers the mean arithmetic characteristic The observable quantity is the average characteristic The general behavior of a society can be associated with its average characteristics. When the property of the characteristic qualitatively changes, one speaks of a social phase transition. For example, it may happen that under such a social transition the average characteristic varies between zero and nonzero values. Conditionally, one can name the social state with |s| > 0 an ordered state, while the state, where s = 0, a disordered state. In that case, the average characteristic (3.27) is termed order parameter. There can occur the following types of transitions when a system parameter, for instance temperature, varies.
First-order transition. This type of a transition happens when the order parameter at some point T 0 changes between nonzero and zero by a discontinuous jump: Discontinuous social transitions can be associated with revolutions.
Second-order transition. Then the order parameter at a critical point T c changes between nonzero and zero continuously: Continuous transitions describe evolutions.
Gradual crossover. The order parameter does not become zero at a finite point, but at some crossover point T c it strongly diminishes and tends to zero only in the limit of large T : This transition corresponds to a smooth evolution.
There can occur more unusual situations, when a society is not completely equilibrium [39], but for the description of equilibrium societies, the above three types of social transitions are sufficient.

Yes-No Model
In the present section, we consider a simple model, that is well known in statistical physics where it is called Ising model [40][41][42][43]. This model is also often used in different applications to financial and economic problems [44]. We need this model in order to illustrate in action the notions introduced in the previous sections, to exemplify the terminology associated with social systems, and for having the ground for studying more complicated models in the following sections.
Suppose each agent of a society can have just two features, which are opinions that can be termed "yes" and "no". Generally, this can be any decision with two alternatives. For instance, this can be voting for or against a candidate in elections, supporting or rejecting a suggestion in a referendum, buy or sell stocks in a market, etc. These two alternatives can be represented by the binary variable taking two possible values, e.g.: The interaction, or mutual influence between two members of the society writes as with the value J ij = J ji being the intensity of the interaction. The case of agreement (collaboration) or disagreement (competition) of the members, respectively, corresponds to the values J ij > 0 (agreement, collaboration) , The terminology comes from the fact that, under agreement, when J ij > 0, the interaction energy is minimal for coinciding σ i and σ j , while in the case of disagreement, when J ij < 0, the interaction energy is minimal for opposite σ i and σ j . When there is mutual agreement and both agents vote in the same way, either both "yes" or both "no", the interaction energy is lower than when the agents would vote differently. The Hamiltonian of the system is In physics, this is called the Ising model (see the history of the model in [45]).
In the typical-agent approach, the Hamiltonian reads as Using the equalities we get the probability distribution .
For the order parameter (3.27), we obtain the equation When there is mutual disagreement between the agents, so that J < 0, the sole solution for the order parameter is s = 0, which means a disordered state, However, when the members of the society are in mutual agreement, so that J > 0, then there are two solutions of Eq. (3.38). One solution is s = 0, but the other solution is nonzero for T < T c = J. In order to choose the stable solution, we need to find out which of them minimizes the free energy.
It is convenient to work with the dimensionless reduced free energy and to measure temperature in units of J, keeping in mind a positive J. Then we have The condition of the free-energy extremum is which gives the order-parameter equation (3.39). The condition of the free-energy minimum holds true for nonzero s. It is also not difficult to check that where F (0) = −T ln 2. Hence, for temperatures lower than the critical temperature T c = J > 0, the stable state corresponds to the nonzero value of the order parameter s. At zero temperature (T = 0), with collaborating agents (J > 0), the society is completely ordered: although with an unspecified decision, either all deciding "yes" or all choosing the option "no". It is easy to notice that for any T below T c there are two solutions, positive and negative, both corresponding to the same free energy F (s) = F (−s) that does not depend on the sign of the order parameter. This means that with equal probability the society can vote "yes" as well as "no". In that sense, the situation is degenerate.

Enforced Ordering
The yes-no degeneracy can be lifted by imposing an ordering force acting on the members of the society. The ordering force, or regulation force, acting on the members, represents different regulations, such as governmental rules and laws, as well as the society traditions and habits. The society Hamiltonian, including the ordering force, takes the form I what follows, we assume that the regulation is uniform, that is that the same regulations are applied to all members of the society. This translates into the condition that the ordering force is the same for all agents, such that In other words, the laws are the same for everyone.
Resorting to the typical-agent approach yields the Hamiltonian and the order parameter We keep in mind the case of mutual agreement, where J > 0.
In the typical-agent approach, the reduced free energy becomes where temperature is measured in units of J. The order parameter (3.51) is given by the equation The order parameter can also be defined as the derivative How the society responds to the imposed regulations is described by the susceptibility that can be represented as Thus the susceptibility is positive, which means that the imposed regulations increase the order. Explicitly, we find From the order-parameter equation (3.53), it is seen that the sign of the order parameter is prescribed by the sign of the ordering force, so that s > 0 for h > 0 and s < 0 for h < 0. By choosing the sign of h it is possible to enforce either the ordering "yes" or the ordering "no". For concreteness, we consider h > 0. If the ordering force is extremely strong, then In that way, the imposed ordering force increases the order in the society and makes it more stable. For illustration, let us consider the case of no noise, hence T = 0. Then the reduced free energy coincides with the reduced energy, As is seen, the imposed force diminishes the society energy, which assumes that the society should be more stable.

Command Economy
From the previous section, it looks that the stricter the regulations, the more the society ordered. That is, the larger the force h, the larger the order parameter s. The seeming conclusion could be that it is profitable to make the regulations as stringent as possible. Is this so? Let us consider as an example an economic society. The most strictly regulated type of economic organization is command economy or centrally planned economy. Whether such an overregulated economics is the most efficient one?
The basic points of command economy are: (i) A centralized government owns most means of production and businesses.
(ii) Government controls production levels and distribution quotas.
(iii) Government controls all prices and salaries.
The proclaimed advantages of command economy assume that regulatory decisions are made for the benefit of the whole society, there is no large economic inequality, and the economy is claimed to be more stable. However in reality the proclaimed catchwords confront a huge amount of problems: 1. It is not always well defined what are the benefits of the society. Governmental decision makers often take decisions in their own favor, but not in favor of the society, announcing their egoistic goals as the society objective needs.
2. It is practically impossible to formulate correct plans for all goods for long future. Constant shortages of necessary goods and surpluses of unnecessary goods are a rule. Economy is in a permanent crisis.
3. Since the conditions for long-term future production cannot be exactly predicted, the plans are never accomplished. The necessity of correcting the plans, with additional spending for the plan corrections.
4. Since everything is planned in advance, it is practically impossible to introduce innovations that have not been planned. As a result, technological retardation is imminent.
5. What kind of science to develop is also planned. Some sciences are mistakenly announced to be unnecessary or wrong. Examples are cybernetics and genetics in the Soviet Union. This results in an irreparable harm to economy.
6. It is impossible to absolutely farely distribute wealth. Those who distribute always take more for themselves. Consequently, people are dissatisfied. Bureaucratic corruption is flourishing leading to enormous economic losses.
7. Because wealth is rigidly distributed by the government, there is no any reason to work hard. Labor becomes inefficient, with low productivity.
8. The necessity of having huge planning institutions consumes a large amount of economic means. Ineffective planning reduces the planners to the state of parasites, merely wasting resources.
9. The necessity of having a large number of controllers for implementing economic plans and controlling their accomplishment also makes such controllers as parasites.
10. Suppression of economic freedom, ascribed to the needs of the state, but usually needed for protecting the privileges of the country rulers, kills the motivation of people to work well.
11. Because of the total frustration of suppressed citizens, there is the necessity of having excessively large police and regulating services supervising the society.
12. To support the order and punish those who are against, or hesitate, or even could be potentially dangerous, the government organizes massive suppression of people, which results in large economic losses.
13. For realizing unreasonable plans, government practices massive arrests of innocent people for creating slave labor. But slavery is not economically efficient.
14. To distract the discontent of population from economic failures, the necessity arises of inventing enemies, which requires a large army consuming substantial amount of the country's wealth. 15. In order to persuade people that everything is all right, it is necessary to organize propaganda through mass media, which results in an ineffective spending of resources.
More details on the societies with command economy can be found in literature [46]. Because of so many factors making the economy of an overregulated society ineffective, it does not look feasible that enforcing regulations unlimitedly could result in an ideally ordered society. It looks that the model of a regulated society in the previous section does not take into account some factors preventing the indefinite increase of order by overregulating the society.

Regulation Cost
The problem with regulation is that it requires the use of society resources. Regulations are costly. The more strict regulations, the more resources required. All regulations, imposed on a society, are produced by a part of the same society, that is by the society itself. Some members of the society act on other members. The regulation cost is the cost the society has to pay in order to introduce the desired regulations. The regulation cost, keeping in mind the said above, can be modeled by the expression The coefficients A ij describe the regulation efficiency. By the order of magnitude, they are proportional to the interaction between the agents J ij , since regulations need to overcome mutual interactions in order to impose restrictions. Note that J ij and A ij are different types of interactions, one is a direct interaction, not related to the process of ordering, the other is an interaction inducing the ordering in the presence of an additional force. Different types of interactions, in general, are different. Thus the total Hamiltonian becomes We again assume that the force acting on the society members is uniform in the sense of equality (3.48). Now the order parameter (3.51) differs from the expression This expression (3.62) plays the role of the total order parameter, contrary to s that in the present case is a partial order parameter.
In the typical-agent approach, Hamiltonian (3.61) reads as The reduced free energy takes the form The parameters α and h are assumed to be positive. The order parameter s = σ j can be obtained from the condition leading to the equation However, in the presence in the Hamiltonian of a quadratic with respect to the ordering force term, the quantity s does not define the total order that is given by the form (3.62), which yields the total order parameter The susceptibility is defined by the derivative resulting in the expression which differs from the derivative .
The behavior of the social system at small and large regulating force shows how the order parameters and free energy vary. The analysis of the asymptotic behavior demonstrates the relation between the system stability and its order. This relation can be rather complicated, depending on the system parameters.
At small ordering force and weak noise, when 0 < T < 1, the order parameter s behaves as where s 0 is the solution to the equation The total order parameter M at small ordering force and weak noise behaves as which shows that the order parameters can either increase or decrease with rising h, depending on the system parameters.
The free energy at small ordering force and weak noise is For T > 1 and weak h, the order parameters are The free energy reads as When the regulating force is strong, then, at any temperature, we have the order parameters and the free energy In the absence of noise, that is when T = 0, the free energy equals the energy. Considering the reduced energy at zero T , we obtain As is seen, the energy (3.83), with switching on regulations, first decreases, making the society more stable, Then it reaches the minimum after which it increases making the society less and less stable, Thus, in the absence of noise, the optimal regulation, making the society the most stable, corresponds to h = 1/α, when the energy (3.85) is minimal. This implies that some amount of regulations is useful, stabilizing the society. However, the society should not be overregulated.
Overregulation can make the society unstable.
In the presence of noise, the situation is more complicated, being dependent on the noise strength and the value of the regulating force. The peculiarity of the system behavior is illustrated in Figs. 1 and 2. Figure 1 shows the behavior of the free energy F as a function of the regulation force h for different temperatures (noise intensity), when T < 1 in Fig. 1a and when T > 1 in Fig. 1b. And Figure 2 shows the order parameter s as a function of the regulation force h for different temperatures, low (T < 1) and high (T > 1). As we see, at low temperature, the free energy, first, decreases an then increases, similarly to the behavior of energy (3.82). The order parameter, first, increases but later decreases. That is, there exists an optimal regulation strength when the society is the most stable and at the same time the most ordered. At the presence of strong noise, the free energy decreases with h, however the order parameter, first increases, but then decreases. So, the regulating force stabilizes the society, however improves the order only at a limited value of h, while at a very strong force the order diminishes.  Fig. 1a and when T > 1 in Fig. 1b.

Fluctuating Groups
It is the usual situation when the society members separate into several groups with different properties, so that the number of members in a group is not constant but varies with time.
For example, these could be the groups of opponents to the government, workers on strike, opposition organizations, and like that. Different groups of agents can be described by different order parameters. The groups can be localized in some spatial parts of the society or may move through the whole society space. The groups are not permanent in their agent numbers and do not necessarily exist forever, but they can arise and vanish. In that sense, the groups are fluctuating: They can appear, change, and then disappear. Let the number of agents in a fluctuating group be N f and the characteristic time during which they do not essentially change be t f . Fluctuating groups are mesoscopic, at least in one of the following senses.

Mesoscopic in size:
1 The number N f is much larger than one for the group order parameter to be defined. And N f is much smaller than the total number of agents N in the society to be classified as a group inside that society.

Mesoscopic in time:
where t int is the characteristic interaction time between the agents and t exp is the observation (experiment) time during which the society is studied. The time t f has to be much larger than the interaction time in order that the groups as such could be formed. And t f is much smaller than the observation time for the groups to be classified as fluctuating. The general method of describing this kind of a society containing fluctuating groups is presented in this section. Let us consider a snapshot of a society picture, where the spatial location of a j-th agent of a society is denoted by a vector a j . The overall society containing all its members is given by the collection G = {a j : j = 1, 2, . . . , N} . (3.89) The society consists of several groups, each being characterized by its specific feature. The features are enumerated by the index f = 1, 2, . . .. A group with an f -th feature is The union of all groups forms the whole society The spatial locations of groups is represented by the manifold indicator functions The collection of all indicator functions, showing the society configuration, is denoted as The indicator functions possess the properties the first of which means that each agent pertains to some of the groups, while the second shows the number of agents in an f -th group.
In a realistic society, the agents can change their locations, so that the location vector a j = a j (t) depends on time t. Hence the society configuration also is a function of time, Since t exp ≫ t int , the observable quantities describing the society correspond to the double average, over the system variables and over time, (3.96) The motion of the groups inside the society usually is so much complicated that it is neither possible nor reasonable to follow the detailed movements of all agents, but the agent locations can be treated as random. Then it is possible to interchange the averaging over time with the averaging over the random society configurations by the rule (3.97) Then we need to realize the functional integration over the manifold indicator functions. We will not describe the mathematical details of the integration that can be found in review articles [47,48], but will present the results. The probability distribution ρ(σ, ξ) of a heterophase society, depending on the configuration ξ, can be derived following Sec. 2.2. The probability is to be normalized, σ ρ(σ, ξ) Dξ = 1 . The information functional takes the form Minimizing the information functional and assuming the uniformity of the trial distribution ρ 0 (σ, ξ) = const yields the probability The effective Hamiltonian takes the form in which the probability that an agent pertains to a j-th group is In other words, this is the fraction of the society agents belonging to the f -th group. Of course, the normalization conditions are valid, The observable quantities are given by the averages with the effective probability distributions 113) and the partition functions The sum of direct summation in Eq. (3.106) is used instead of the simple sum in order to stress that the summands are defined on different configuration sets with the total society configuration set being the tensor product This is the general approach for describing the heterophase societies with fluctuating groups. The approach reduces the consideration of quasi-equilibrium systems with mesoscopic group fluctuations to the description of effective equilibrium systems with a renormalized effective Hamiltonian. The mathematics of accomplishing functional integration over manifold indicator functions is described in reviews [21,47,48]. More mathematical details are presented in Refs. [49][50][51]. In the following section, we give an example of applying this approach.

Self-Organized Disorder
Let us consider the yes-no model of Sec. 3.5 modified so that to take account of fluctuating groups. For simplicity, we shall keep in mind two groups, one called ordered and the other disordered. The ordered group consists of agents agreeing with each other and characterized by a nonzero order parameter, while the disordered group is described by a zero order parameter to be defined below.
Each j-th agent from an f -th group is characterized by the features σ f j . In the yes-no model, these features are described by the variable σ f j = ±1. The total feature set for the whole system is the configuration set (3.116).
In addition to the interaction J ij > 0, typical for self-organization in the yes-no model, we need to include other interactions that would describe some disagreements between the agents. We denote the disordering interaction by U ij .
In the snapshot picture, the Hamiltonian with ordering and disordering interactions reads as Here the manifold indicator functions show the belonging of the agents to the corresponding groups.
After averaging over group configurations, we obtain the effective Hamiltonian The order parameter for the f -th group is which translates into the equation Let the first group be ordered, so that while the second group be disordered in the sense that In the case of two groups, it is convenient to denote the fraction of agents in the ordered group and disordered group as Then the necessary condition for the free-energy minimum is This results in the equation in which the notation is used: Taking into account that the disordered group is described by the zero order parameter s 2 = 0, we have the fraction of agents in the ordered group (3.128) Due to the probability definition 0 ≤ w ≤ 1, and because 0 ≤ s 1 ≤ 1, expression (3.128) exists only for sufficiently strong disordering interactions, such that u ≥ 1.
Note that the second derivative of the free energy, to be positive, leads to the inequality which yields the condition u > 1/2. The society reduced energy For simplicity, let us study the case of no noise, so that T = 0. Then s 1 = 1 and the energy becomes while the fraction of ordered agents is Thus the society energy is Now we have to decide what society is more stable, that which contains fluctuating disordered groups or that one which does not contain them? The more stable is that society whose free energy is lower. In the case of very small noise, we need to compare the corresponding expected energies.
If no fluctuating groups would be allowed, then the ordered fraction would be exactly one (w ≡ 1). The energy of a completely ordered society is where w ≡ 1. If the whole society would contain only disordered agents (w ≡ 0), then the society energy would be The society with disorder group fluctuations is more stable, when its energy is lower. Comparing the above expressions we see that Thus, for competing interactions, satisfying the condition u > 1, the energy of the society with fluctuating disorder is lower than that for the perfectly ordered society that, in turn, is lower than the disordered society. That is, a completely ordered society is more stable than a disordered society. However the most stable is a society with the admixture of fluctuating disorder. Thus disorder fluctuations make the society more stable. In that sense the disorder is self-organized. The total order parameter is defined as the average of the expression When there is no noise, that is at zero temperature, we get Concluding, under sufficiently strong competition between the society agents, there spontaneously appear social disorder groups. When the disorder groups appear, they diminish the total society order. However, the existence of the disorder groups makes the society more stable. This is an example showing that it is not always useful to try to realize a complete order in a society, since the presence of some disorder can make the society more stable. Sometimes societies for being more stable generate self-organized disorder.

Coexistence of Populations
Very often social systems consist of several groups of rather different people, so that the groups are more or less stable and on average do not essentially change for long time, contrary to fluctuating groups considered in the previous section. Each of the groups can be characterized by different typical agents. There exist numerous examples of such societies. Thus a society, forming a country, very often can include groups having different nationality or religion.
In a society composing a financial market, there are the groups of fundamentalists and chartists. Fundamentalists base their decisions upon market fundamentals, such as interest rates, growth or decline of the economy, company's performance, etc. Fundamentalists expect the asset prices to move towards their fundamental values, hence, they either buy or sell assets that are assumed to be undervalued or, respectively, overvalued. Chartists, or technical analysts, look for patterns and trends in the past market prices and base their decisions upon extrapolation of these patterns. There exist as well the groups of contrarians who buy or sell contrary to the trend followed by the majority. Different groups of people or other alive beings, having differing features, are called populations. Let a society, inhabiting the volume V , contain several different populations enumerated by the index n = 1, 2, . . .. In an n-th population, there are N n members. In what follows, we consider a general approach describing whether the populations can or cannot coexist with each other. The populations can be of any origin. For concreteness, we can keep in mind different population groups living in a country.
The populations can either peacefully live in the whole country, which can be called a mixed society, or can possess the tendency of separating from each other, thus having no intention for joint coexistence. Our aim is to understand how we could quantify the conditions when different populations prefer to live together and when they wish to separate.
According to the general law, that society is more stable whose free energy is lower. That is, a mixed society, living in the same country, is more stable than the separated populations, existing separately from each other, if the free energy F mix of the mixed society is lower that the free energy F sep of the separated society, (3.141) The free energies can be represented as Therefore, denoting the entropy of mixing we have the condition of stability of the united country: We need to write down the energy of the mixed and separated populations. For this purpose, let us denote the density of an n-th population by ρ n (r) and the interaction between the members of an m-th and n-th populations by Φ mn (r). Then the interaction energy for the mixed populations can be represented as Assuming a uniform distribution of the people across the country implies the unform densities Thus the interaction energy of a country with mixed populations reads as where the quantity describes the average interaction strength between the members of an m-th and n-th populations. Now let us consider the case of separated populations, when each population lives in its separate location characterized by the volume V n . Then the energy of a separated country is the sum Keeping again in mind that each population inside their part of the country is uniformly distributed gives the uniform densities Then the energy of a separated country is reduces the energy of a separated country to Any two different populations, though being separated, exercise pressure on each other through their common boundary. If the pressure of one of them is larger than that of another, there is no equilibrium between the populations, but there can arise nonequilibrium movements, such as invasions and wars. The equilibrium coexistence of two separated populations implies the equality of their pressures: By its meaning, this equality can be called no-war condition. For not too strong noise, such that we find the no-war condition in the form (3. 155) This shows that two neighboring populations have no war, being in equilibrium with each other, only when the signs of the effective interactions of the members inside each population are the same and the interactions satisfy the above conditions. An m-th population and n-th population can be in equilibrium, only when their internal interactions Φ mm and Φ nn are both either positive or negative. If for one population Φ mm > 0, while for the members of another population Φ nn < 0, then there can be no equilibrium between such neighboring populations.
Taking account of the no-war condition (3.155) makes it possible to rewrite the energy of the separated country in the form Then the condition of stability (3.144) for the mixed society yields the inequality With the identity where ρ = N/V is the average density of the total population. From here, the sufficient condition of stability for the mixed country becomes The entropy of mixture can be represented as Thus we obtain the condition of stability for the country with mixed population, as compared to the country where the populations are separated,

Forced Coexistence
From the history we know that empires are usually kept united not merely by economical advantage of the countries composing them by also by force. Why this is possible and what happens when the required force becomes too strong? Is it possible to keep together different populations inside one empire by increasing force? To answer these questions, it is necessary to consider the case with imposed regulations, including regulation cost. Let the regulation force, acting on a member of an n-th population be f n (r). The energy of a society with mixed populations, including the force applied for keeping different populations together, can be written as where the last term characterizes the cost of supporting the regulation force, with A mn (r) being the strength of the interaction between the members of the society caused by the applied force at each other. It is possible to assume that the population densities are uniform and the force is constant, so that Then the society energy reads as When the society is disintegrated into several independent countries, there is no need to apply force, hence the energy of a society with separated populations is (3.156). Using the identity and following the same way as in the previous section, we obtain the sufficient condition for the society with mixed populations For short, this condition can be named the condition of empire stability. If it does not hold, the empire disintegrates into separate countries. Condition (3.166) shows that switching on the regulating force, first, increases the righthand side of the inequality, thus making the mixed society more stable. However, too strong force makes the left-hand side of the inequality larger, thus leading to the instability of the empire and its disintegration. Hence, by enforcing reasonable regulations, it is possible to keep the coexistence of populations, even when without such an enforcement the country would disintegrate. However, the regulating force should not be too strong. This explains why, for some time, an empire can exist as a united family of different populations. But when the enforcement is too costly, it turns unbearable, and disintegration becomes inevitable. Empires do not exist forever.
An empire, with forced coexistence of different populations, is practically always less stable than several independent countries, with their own populations and weaker regulation forces. However the unions of different populations also can exist, when they are kept not by force, but by their collaborative interactions. The prominent example is Switzerland, where the German, French, and Italian cantons peacefully coexist, not being forced, but due to the advantage of their mutual interactions.

Collective Decisions in a Society
The models considered above allow us to describe general structures and states of social systems, whose members exhibit specific behavior. Thus in the yes-no model, the members are assumed to take decisions, either "yes" or "no", with regard to some problem, say voting in elections or buying something. As is evident, the underlying process of any action is the process of taking decisions. It is possible to say that practically all properties of a social system are caused by the decisions taken by its members. This is why it is so important to have general understanding of how decisions are made. In the present chapter, we consider some models describing the way of how the members of a society take decisions. Collective decisions are based on decisions made by individuals interacting with each other. Therefore, first of all it is necessary to understand how decisions are taken by individuals and then to model how they interact between themselves for reaching a collective decision. Moreover, there are deep parallels between the process of making decisions by individuals and by groups, since even individual decision making is a kind of collective decision making accomplished by the neurons of a brain. Since the notions of decision making theory are less known for the physically oriented audience, we give below an overview of the main literature and describe the basic points of the theory.

General Overview
Nowadays, the dominant theory, describing individual behavior of decision makers is expected utility theory. This theory was introduced by Bernoulli [52], when investigating the so-called St. Petersburg paradox. Von Neumann and Morgenstern [53] axiomatized this theory. Savage [54] integrated within the theory the notion of subjective probability. The power of the theory was demonstrated by Arrow [55], Pratt [56], and by Rothschild and Stiglitz [57,58] in the studies of risk aversion. The flexibility of the theory for characterizing the attitudes of decision makers toward risk was illustrated by Friedman and Savage [59] and Markowitz [60]. The expected utility theory has provided mathematical basis for several fields of economics, finance, and management, including the theory of games, the theory of investment, and the theory of search [61][62][63][64][65][66][67][68][69][70].
Despite many successful applications of the expected decision theory, quite a number of researchers have discovered a large body of evidence that decision makers, both human as well as animal, often do not obey prescriptions of the theory and depart from this theory in a rather systematic way [71]. Then there appear numerous publications, beginning with Allais [72], Edwards [73,74], and Ellsberg [75], which experimentally confirmed systematic deviations from the prescriptions of expected utility theory, leading to a number of paradoxes.
In order to avoid paradoxes, there have been many attempts to change the expected utility theory, which has been named as non-expected utility theories. There is a number of such non-expected utility theories, among which we mention just a few, the most known approaches. These are prospect theory [73,76,85], weighted-utility theory [86][87][88], regret theory [89], optimism-pessimism theory [90], dual-utility theory [91], ordinal-independence theory [92], quadratic-probability theory [93]. More detailed information on this topic can be found in the recent reviews by Camerer et al. [94] and Machina [95].
Despite numerous attempts of treating expected utility theory, none of the suggested modifications can explain all those paradoxes, as has been shown by Safra and Sigal [96]. The best that could be achieved is a fitting for interpreting just one or a couple of paradoxes, with other paradoxes remaining unexplained. Moreover, spoiling the structure of expected utility theory results in the appearance of several inconsistences. Accomplishing a detailed analysis, Al-Najjar and Weinstein [97,98] concluded that any variation of the expected utility theory "ends up creating more paradoxes and inconsistences than it resolves".
An original interpretation was advanced by Bohr [99][100][101], who suggested that psychological processes could be described by resorting to quantum notions, such as interference and complimentarity. Von Neumann [102] mentioned that the theory of quantum measurements could be interpreted as decision theory. These ideas were developed in quantum decision theory [103,104], on the basis of which the classical decision-making paradoxes could be explained. It has also been shown [105,106] that quantum decision theory can be reformulated to the classical language employing no quantum formulas.
When decision makers interact with each other, the process of decision making becomes collective [107][108][109][110][111][112]. Then each individual takes decisions being based not solely on personal deliberations, but also to some extent imitating the actions of other members of the society. Sometimes, this imitation grows to the level of herding effect. In the present chapter, we describe the main steps of how the process of decision making develops, starting from individual decisions to collective decisions taken by a network of society members.

Utility Function
The primary elements in a problem of choice are some events, or outcomes, or consequences, or payoffs denoted as x. One considers a set of outcomes, or a set of payoffs, or consumer set, or a field of events X = {x i : i = 1, 2, . . . , N}. Generally, payoffs x i can be either finite or infinite, as well as the payoff set can be finite or infinite. Mathematically correct definition of infinity is through the limit of a sequence. Payoffs or outcomes have to be measured in a common system of units, accomplished through a utility function u(x), also called elementary utility function, pleasure function, satisfaction function, or profit function u(x) : X → R. The utility function has to satisfy the following properties: (i) It has to be nondecreasing, so that If (4.2) is a strict inequality, the function is termed strictly increasing.
(ii) It is often (although not always) taken to be concave, when where It is called strictly concave, if (4.3) is a strict inequality.
A twice differentiable function is nondecreasing if and it is concave when The derivative u ′ (x) defines the marginal utility function. According to the above properties, the marginal utility does not increase.
An important property of a utility function is its risk aversion. The degree of absolute risk aversion is measured [56] by the quantity Coefficient of relative risk aversion [55,56] is defined as For a concave utility function, the degree of risk aversion is non-negative, hence utility function is risk averse, r(x) ≥ 0. This means that with increasing x, the growth of the utility function u(x) does not increase.
In practice, one often employs a linear utility function For this function, u ′ (x) = k , u ′′ (x) = 0 , hence the degree of risk aversion is zero, r(x) = 0. Another used form is a power-law utility function assuming x ≥ 0. For this case, The degree of risk aversion diminishes with increasing x as One more example is the logarithmic utility function again keeping in mind x ≥ 0. Then Hence the degree of risk aversion diminishes with the increasing payoff x, Sometimes, one uses an exponential utility function for which u ′ (x) = cke −kx , u ′′ (x) = −ck 2 e −kx , and the degree of risk aversion is constant, r(x) = k.
In above examples, the utility function is twice-differentiable, such that u ′ (x) and u ′′ (x) exist, it is non-decreasing, concave, hence risk averse, it is also non-negative for non-negative x, and it is normalized to zero, so that u(0) = 0, implying that the utility of nothing is zero.
Some of the utility functions exemplified above are defined only for positive payoffs x > 0. In the case of losses, payoffs are negative. Then one defines different utility functions for gains and losses, say, u g (x) for gains and u l (x) for losses. Of course, these should coincide at zero, so that u g (0) = u l (0).
It is also useful to mention that there are two types of utility, cardinal and ordinal. Cardinal utility can be precisely measured and the magnitude of the measurement is meaningful, similarly how distance is measured in meters, or time in hours, or weight in kilograms. For ordinal utility, not its precise magnitude is important, but the magnitude of the ratios between different utilities is only meaningful.

Expected Utility
The notion of expected utility was introduced by Bernoulli [52] and axiomatic utility theory was formulated by von Neumann and Morgenstern [53]. The definitions below follow the classical exposition of von Neumann and Morgenstern [53].
One defines a probability measure over the set of payoffs {p(x i ) ∈ [0, 1] : i = 1, 2, . . . , N} , (4.10) with the probabilities p(x i ) normalized to one, The probabilities can be objective, prescribed by a rule [53], or they can be subjective probabilities, evaluated by a decision maker [54].
The probability distribution over a set of payoffs is termed a lottery, A compound lottery is a lottery whose outcomes are other lotteries. For two lotteries the compound lottery is the linear combination where The lottery mean is the lottery expected value (4.14) Lottery volatility, lottery spread, or lottery dispersion is the variance The lottery volatility or lottery dispersion is a measure of the lottery uncertainty. Expected utility of a lottery is the functional The expected utility is proportional to the lottery mean in the case of a linear utility function.
Lotteries are ordered implying the following definitions. Lotteries Expected utility satisfies the following properties.
(i) Completeness. For any two lotteries L 1 and L 2 , one of the conditions is valid: (4.20) (ii) Transitivity. For any three lotteries, such that L 1 ≤ L 2 and L 2 ≤ L 3 , it follows that L 1 ≤ L 3 .
(iii) Continuity. For any three lotteries, ordered so that L 1 ≤ L 2 ≤ L 3 , there exists α ∈ [0, 1] for which (iv) Independence. For any L 1 ≤ L 2 and arbitrary L 3 , there exists α ∈ [0, 1], such that These properties follow directly from the properties of the utility function described above.
The standard decision-making process proceeds through the following steps. Defining the utility function u(x), one calculates the expected lottery utilities U(L n ). Comparing the values U(L n ), one selects the largest among them. A lottery L * is named optimal when it is preferred to all others from the given choice of lotteries. A lottery is optimal if and only if its expected utility is the largest: (4.24) The aim of a decision maker is assumed to choose an optimal lottery maximizing the related expected utility.

Time Preference
Sometimes people need to compare present goods that are available for use at the present time with future goods that are defined as present expectations of goods becoming available at some date in the future. Time preference is the insight that people prefer present goods to future goods [84,113].
Mathematical description of the time preference effect can be done as follows. Let us consider at time t = 0 a lottery With a utility function u(x) at time t = 0, the expected utility of the lottery is The lottery, expected at time t > 0, has the form Denoting the utility function at time t > 0 as u(x(t), t), we have the expected utility of L(t) as According to the meaning of time preference, the same goods at future time are valued lower then at the present time just because any goods can be used during the interval of time [0, t].
In the case of money, its value increases with time since it can bring additional profit through an interest rate, hence the utility of a fixed amount of money decreases with time. This can be formalized as the inequality It is possible to introduce a discount function D(x, t) by the relation with the evident condition D(x i , 0) = 1 . In this way, if the payoffs at the present time t = 0 and at a future time t > 0 are the same, then the present lottery is preferable over the future one, In particular, if the discount function is uniform with respect to the payoffs, then U(L(t)) = U(L)D(t) , (4.35) where D(0) = 1.
In decision making, one uses the following discount functions.
Power-law discount function (4.36) Exponential discount function Note that the power-law discount function (4.36) is equivalent to the exponential form (4.37), since they are related by the reparametrization 1 + r = e γ , γ = ln(1 + r) .
Hyperbolic discount function with positive parameters γ and τ . A detailed review on the effect of time preference is given in [84].

Stochastic Utility
There exists a number of factors that influence decision making. These factors are random, or stochastic, because of which the related approach in decision making is named stochastic. These factors, for instance, are: (i) External conditions under which the decision is made, such as weather, situation in the country, relations with people, opinions of other people, ... Stochastic decision theory assumes that all these interrelated factors can be characterized by some variables, say ξ, called states of nature. The total collection of the states of nature forms the nature set {ξ}. The variables ξ are random, stochastic. It is supposed that there should be given a probability measure µ(ξ) on the nature set {ξ}. Utility function u(x, ξ) becomes a random variable. Probability of payoffs p(x, ξ) is also a random variable.
In that way, a lottery becomes a stochastic lottery Respectively, we come to a stochastic utility As far as the states on nature are random, one needs to average over these states, thus coming to the expected utility U(L) ≡ U(L, ξ) dµ(ξ) . One needs to choose a lottery with the largest expected utility U(L * ). This approach, however, confronts several serious difficulties.
(i) It is not clear how the random variables should be incorporated into lotteries.
(ii) Calculations become rather complicated.
(iii) The nature set is not fixed, generally, depending on time.
(iv) The nature set also can depend on the set of payoffs.
(v) The explicit form of the nature states probability is not known and has to be postulated.
More details on stochastic utility theory can be found, e.g., in the books, [114,115].

Affective Decisions
People make decisions being based not merely on rational grounds, by calculating utility, but also being affected by emotions that are irrational. Several attempts have been made to take account of emotions in decision making by modifying expected utility, which is equivalent to some variants of non-expected utility models [91,116,117]. Here we present the main points of a probabilistic approach of taking into account emotions. First, this approach was formulated by resorting to techniques of quantum theory [103,104], however later it was shown [105,106] that it can be reformulated in classical terms, without invoking any quantum expressions. The basics of the probabilistic affective decision theory are as follows.
The aim of any decision making is to choose an alternative from a set of several alternatives. Let the set of alternatives be denoted as A = {A n : n = 1, 2, . . . , N A } . (4.44) Each alternative from this set is assumed to be equipped with a probability p(A n ), with the normalization condition This probability shows how it is probable that the alternative A n can be chosen. An alternative A 1 is said to be stochastically preferable to A 2 , if and only if  An alternative A opt is stochastically optimal if its probability is maximal, The usefulness of an alternative is characterized by a utility factor f (A n ), whose form is to be prescribed by normative rules. This factor shows the probability of choosing an alternative A n being based on rational understanding of its utility. The standard probability normalization is applied: One says that an alternative A 1 is more useful than A 2 , if and only if     The probability p(A n ) of an alternative A n is a functional of the related utility factor f (A n ) and of attraction factor q(A n ), such that the rational boundary condition be valid, when, in the absence of emotions, the probability of an alternative coincides with the rational utility factor. The simplest form of the probability functional satisfying the rational boundary condition (4.55) is prescribed by the superposition axiom (4.59) The trial distribution f 0 (A n ) can be taken following the Luce rule [118][119][120], f 0 (A n ) = a n n a n , (4.60) where a n is the attribute of the alternative A n , having the form for semi-positive utility functionals and for negative utility functionals [106,121,122].

Wisdom of Crowds
Wisdom of crowds is the notion assuming that large groups of people are collectively smarter than individual members of the same groups. This concerns any kind of problem-solving and decision-making. The justification of this idea is based on the understanding that the viewpoint of an individual can inherently be biased, being influenced by the individual emotions and prejudices, whereas taking the average knowledge of a crowd results in eliminating the noise of subjective biases and emotions, thus producing a more wise aggregated result [123][124][125][126].
Talking about the wisdom of crowds, one keeps in mind the following crowd features: (i) The crowd should have a diversity of opinions. (ii) Each personal opinion should remain independent of those around them, not being influenced by anyone else. (ii) Each individual from the crowd should make their own decision based solely on their individual knowledge. These conditions exclude the situation, when the crowd members consult with each other and mimic the actions of their neighbors, which can lead to herding effects. The latter will be considered in the next section. Let us enumerate the members of a crowd, or a society, by the index j = 1, 2, . . . N. According to the previous section, each member of the considered group chooses an alternative A n with the probability under the standard normalization condition The aggregate opinion implies the arithmetic averaging over the society members, which yields the average probability composed of the superposition of the average utility factor 65) and the average attraction factor thus coming to expression (4.56).
The utility factor is prescribed by a rational evaluation of the utility of the considered alternatives, hence weakly depending on subjective emotions. This means that f j (A n ) is approximately the same for any group member, which is equivalent to the condition (4.67) Notice that the normalization conditions remain valid. The attraction factor, on the contrary, is subjective, being essentially influenced by the agent's emotions. In that sense, the attraction factor is a random quantity. It is random because of several causes. First, its randomness is due to the evident fact of the variability of emotions experienced by quite different people. Second, emotions of even the same person vary at different times. And third, emotions randomly influence the choice due to the generic variability and local instability of neural networks in the brain, as has been found in numerous psychological and neurophysiological studies [127][128][129][130][131][132][133][134][135].
Nevertheless, despite the intrinsic randomness of the attraction factor, some of its aggregate properties can be well defined. First of all, the alternation law is satisfied: This follows directly from equations (4.63) and (4.68). Expression (4.63) also tells us that the attraction factor for a j-th society member is in the range  Non-informative priors for the attraction factors can be estimated by means of the related arithmetic averages. Recall that, if a quantity y lays in an interval [a, b], its arithmetic average is y = (a + b)/2. And if the interval limits are in the ranges a 1 ≤ a ≤ a 2 and b 1 ≤ b ≤ b 2 , then they are expressed through their averages, so that Keeping in mind that 0 < q + (A n ) < 1−f (A n ) and −f (A n ) < q − (A n ) < 0, while 0 ≤ f (A n ) ≤ 1, we obtain the quarter law Employing the non-informative priors for the aggregate attraction factors, one can estimate the probability of alternatives, averaged over the crowd, as depending on whether the alternatives on average are attractive or not. The quarter law has been found to be in very good agreement with empirical data [106,136].

Herding Effect
In the previous section, the process of decision making by a crowd of independent agents is considered. However, generally, the members of a society interact with each other, which can result in drastic changes in the agents behavior, such as the occurrence of herding effect [137][138][139][140][141][142][143]. There can exist two kinds of interactions between the society members, which can lead to collective effects, such as herding. First, the members can mimic the actions of others just replicating their behavior. And, second, the members communicate through information exchange. Strictly speaking, the correct description of collective effects, arising in the process of agent's interactions, requires to study temporal processes, simply because collective effects need time to be formed and develop. The consideration of temporal collective effects is out of the scope of the present article. However, due to their importance for social systems, we delineate the principal points of how collective interactions can be incorporated into the affective decision theory. Details can be found in Refs. [144][145][146].
Multistep decision theory deals with quantities depending on time. Then the probability of choosing by an agent j an alternative A n at time t is p j (A n , t), with the normalization N A n=1 p j (A n , t) = 1 , 0 ≤ p j (A n , t) ≤ 1 . The utility factor becomes f j (A n , t), with the normalization N A n=1 f j (A n , t) = 1 , 0 ≤ f j (A n , t) ≤ 1 , (4.79) and the attraction factor reads as q j (A n , t), satisfying the conditions N A n=1 q j (A n , t) = 0 , −1 ≤ q j (A n , t) ≤ 1 . Taking into account the tendency of society members to replicate the actions of others defines the probability dynamics where ε j is a replication parameter satisfying the condition 0 ≤ ε j ≤ 1 (j = 1, 2, . . . , N) . (4.82) As is explained above, there are two types of agent interactions in a society, replication of the actions of other members and exchange of information. The latter is known to attenuate the influence of emotions, which results in the attenuation of the attraction factor. The attenuation of the emotion influence due to agent interactions is well confirmed by empirical observations [147][148][149][150][151][152][153][154][155][156][157][158]. The attenuation of the attraction factor is described [144][145][146]158] by the form q j (A n , t) = q j (A n ) exp{−M j (t)} , (4.83) where q j (A n ) is the attraction factor of an agent j in the absence of social interactions, M j (t) is the amount of information received by the moment of time t by an agent j. The quantity M j (t) that, for short, can be called memory, writes as Here J ji (t, t ′ ) is the information transfer function from an agent i to the agent j in the interval of time from t ′ to t and µ ji (t) is the Kullback-Leibler information gain, received by the agent j from the agent i at time t, µ ji (t) = N A n=1 p j (A n , t) ln p j (A n , t) p i (A n , t) . (4.85) Interaction of the society members through replication and information exchange results in a rich variety of behavior types, including herding, periodic cycles, and chaotic fluctuations, which depends on the society parameters. Detailed analysis of multistep decision making by intelligent members of a society is given in Refs. [144][145][146].

Conclusion
In the present part of the lectures, the principle of minimal information is formulated, which gives the key for constructing probability distributions for equilibrium and quasi-equilibrium social systems. Several simple examples, based on the yes-no model, are considered. Despite their seeming simplicity, the models allow one to describe rather nontrivial effects including the role of regulation cost and the existence of fluctuating groups inside a society. Quite surprisingly, it turns out that the occurrence of self-organized disorder can make a society more stable. The peculiarity of coexisting populations explains when these populations can mix and peacefully live in the same country and when this coexistence becomes unstable, so that the country separates into several pieces, with different populations in different locations. Since behind all actions of any society there are decisions of the society members, the basics of the probabilistic affective decision theory are delineated.
Hopefully, the material of the above survey gives the reader a feeling of wide possibility of applying mathematical models of physics for describing social systems, even being limited to equilibrium systems. The presented content is based on the lectures that have been given by the author during several years at the Swiss Federal Institute of Technology in Zürich (ETH Zürich). Being restricted by lecture bounds, it is, certainly, impossible to present numerous existing approaches and models of social systems, which can be found in the cited literature. This is why the topics touched upon are by necessity limited. The choice of the material is motivated by the interests of the author, which, not surprisingly, are often connected with the development of the theories and models he has been involved in.
The following part of the lectures will be devoted to nonequilibrium systems studying various evolution equations and dynamical effects.
Funding: This research received no external funding.