This section reports a mathematical formulation that can be considered a first attempt to evaluate how the interaction among individuals impacts the dynamics of awareness in the context of decision making processes. The starting point is a framework embedding the different phenomena impacting the awareness-raising process, with a focus on the single individual in isolation. Then, a reasonable formulation is introduced to describe the contribution on the dynamic of the single agent provided by the interactions among others, exploiting a network and a suitable coupling function.
2.1. Modeling Individual Awareness
To provide a mathematical model embedding the basic mechanisms of awareness raising, we start from the wide definition of a deterministic finite-state automaton as a quintuple (
,
,
,
,
) where
is the input alphabet, consisting of a finite non-empty set of symbols,
is a finite non-empty set of states,
is an initial state,
is a transition function, and
is the set of final states. Then we focus on the framework of Markov Decision Processes (MDPs) which embed a Markov-chain with the addition of input (
) and costs/rewards, according to a function
[
17,
18].
The MDP is then defined as a tuple (, , , ) where is a finite set of feasible states, called state space, is a finite section of available actions, namely the action space, is a stochastic transition function defining the probability to shift from a state to a state by choosing the input , and is a reward function incurred.
A mathematical model grounded on MDPs describing the basic elements which impact the individual’s reasoning and analyzing how they are related to the process of increasing personal awareness is shown in
Figure 1, which reports a graphical scheme of the mathematical model where the individual mechanisms introduced in [
2,
19] (green border sub-scheme), and the networked system of interconnected individuals (complete scheme) are highlighted.
In particular, the single individual model considers a discrete and finite time horizon of length
, in which each time-epoch,
, corresponds to a moment of making a decision,
, under the conventional assumption that the more analytical the choice, the higher the value of
, and, vice versa, the more intuitive the choice, the lower the value of
. The state,
, of the individual at each time
, is a representation of their level of awareness, incorporating all past experiences. Therefore, awareness is considered to be a process—mathematically a sequence of states—involving the DM’s experiences, filtered by their perspectives, beliefs, and values. Moreover, in a MDP we have at the same time the presence of a choice of the DM and uncertainty about its outcomes given by uncontrollable external factors, represented in our framework by a
stochastic variable , as always happens in our decisions. This model is good trade-off between realism and simplicity: broad enough to account for realistic sequential decision making problems while simple enough to allow it to be understood and applied by different kind of practitioners. Each individual possesses a reasoning propensity,
, which embeds the specific attitude in processing the information about the problem and represents the trade-off between two diametrically opposed reasoning modalities: intuitive (
) and analytical (
), assuming in this way, that both are always involved, with different amounts, in any decision [
20,
21]. The reasoning propensity affects the
policy,
, of the individual. A policy is a function prescribing the DM the action to take for each possible state at any time instant, and it is represented by a matrix of dimension
. Therefore, the policy turns out in the decision:
The choice, driven by the policy as exposed in Equation (1), leads to two results: the DM receives a reward, defined by a reward function
, and the system evolves to a possibly new state according to a stochastic transition function
defined as:
Equation (2) characterizes the dynamics of the process of self-knowledge indicating how the future level of awareness of the individual depends on the current state, the choices they make, and its outcome which is subject to some uncertainty represented by
. We assume, for simplicity, that by deciding
, the state
can increase (
), decrease (
), or remain the same (
), according to a certain probability expressed by the transition probability function
P. In particular, we define a stationary probability
constant for all
, a forward probability
as a function of
, and a backward probability
computed starting from the first two: since as probabilities their sum must be equal to one. The presence of uncertainty affecting the outcomes of the decision, given by uncontrollable elements in the environment, makes the state evolution and the rewards sequence stochastic. The reward function incorporates the assumption that the wellbeing increases with the state, which in turn indicates the level of awareness, so that living with a higher level of awareness produces a higher individual’s global wellbeing. On the other hand, it introduces a component related to information costs, which are higher the more the choice of the individual is analytical, i.e., in our convention, the more
is near 1 (see
Appendix A for more details on the reward function
r).
The individual is focused on the maximization of their wellbeing i.e., the sequence of rewards they incur over time, embedding a directionality typical of the human goal-directed behaviors. The optimal decisions are obtained by maximizing the sequence of expected rewards, according to the following problem:
The last reward the DM incurs at the last time-epoch , namely the term in Equation (3), is a priori fixed because it is used as the starting point for the reconstruction of the optimal policy in the process of optimization described below, embedded applying a backward induction method. We assume that the terminal benefits the DM incurs at the final time increases by increasing the values of the final state . Moreover, a factor , weighting future rewards, has been introduced.
The maximization problem expressed in Equation (3) is recursively solved by implementing a Dynamic Programming algorithm, where the original problem is separated into a recursive series of easier sub-problems considering a shorter time horizon from
τ to
T and given the step-by-step initial state
, representing the state at time
. The expected reward function to optimize at each stage is explicitly described as:
where the coefficients
γ1,
γ2 and
γ3 weight the cases of increasing, maintaining constant, or decreasing the state. These parameters allow us to take into account different attitudes of the DM, for example penalizing the eventuality of decreasing awareness.
The optimization process, intended to maximize the sequence of rewards expressed in Equation (4), embeds in itself an element of individual’s self-observation, which determined the action/decision to made. This feedback component allows the DM to build a policy by having knowledge of the form of the transition probability function, and the current state at each time epoch. In this way, the DM can overcome their usual, automatic process by shifting from their habitual to a new aware policy, which results from the reward maximization process, thus mitigating the mechanistic tendencies of the individual [
2]. Because of the dependence of the reward on the level of awareness, this maximization also models the fact that the increase of personal awareness follows a focus on this specific objective, emerging from personal effort and motivation [
22].
It is possible to include in the model the individual’s immediate emotions [
23,
24,
25], considering also that the role played by emotions could depend on the present level of awareness of the individual. For example, at a low level of awareness, emotions prevail on individual reasoning, so DM is driven in its choices by the research of instant satisfaction. In this condition, the future (intended to be long-term perspectives) will have very little weight on their choice, the DM is enslaved by immediate reward, a condition which could turn out to be highly damaging. This ‘immediate reward’ dynamics loses its importance when the individual reaches a high level of awareness. In the mathematical model, emotions affect the function
, showing how the individual differently weighs the future with respect to the present and taking into account also the dependence of function
on the state
.
Once the generic structure of the model has been defined, it is possible to assign specific forms to the different model’s functions to perform numerical simulations aimed to study the emergence of behaviors and dynamics. The different functions used are reported in
Appendix A.
Figure 2 reports a comparison of the aware policy with respect to a habitual policy referred to as the most basic and simplest mechanism governing an individual’s usual decisions. It assumes that decisions spring from habits, automatic, non-conscious mechanisms, represented in the model by individual reasoning propensity
. Furthermore, it is fixed and constant in time, with the only addition of noise
, as reported in Equation (5), normally distributed with mean zeros and variance
(where
is fixed to 0.08 in the simulations). It takes into account a source of uncertainty, mainly due to external factors, influencing the effective choice and shifting it around the reasoning propensity of the individual.
On the contrary, the aware policy is embodied by the faceted optimization process described in Equations (3) and (4).
Figure 2 shows, before anything else, the effect resulting from the application of the aware policy in terms of a high final level of the state (near 1) starting from a low state (
Figure 2a) and speed in reaching the maximum state level starting from a high initial state (
Figure 2b). The shift from a habitual to an aware policy has an improving action on the dynamics of the state (
Figure 2c,d), except when the initial level of awareness is high enough (see
Figure 2d, where
) and after a long time horizon (the similarity at the first time instant in
Figure 2c,d is due to the equal initial state assumptions). The benefit of the shifting increases as the
grows from zero to 0.5 where there is a discontinuity for which the difference between the two situations is neglectable, and then symmetrically decreases as the
grows further from 0.6 to 1. This fact is due to the symmetrical form assumed for the theoretical probability functions defined for an analytical and intuitive individual, as expressed in Equations (8) and (A1), respectively. In the end, the two panels
e and
f highlight how the improvement of the state dynamics not necessarily corresponds to an increase in the reward,
, because the model considers also a part of costs related to the level of analyticity adopted in the decision so that an individual can reach a high level of the state but can also have some losses simply indicating a high cost due to excessive use of analytic reasoning.
2.2. Interactions and Awareness: A Network Model
The framework exposed until now describes the emergence of awareness considering the decision making process of a single agent taken into isolation. However, it lacks an essential dimension: how the interaction with other agents can impact the awareness dynamic of every single agent. The novelty of the present work is specifically to establish a mathematical declination of the interactions among agents in the optic to study the awareness-raising phenomenon. This interaction can be modeled through a Network—also called a Graph in the mathematical literature—that is as a collection of vertices—
nodes—joined by links—edges [
26,
27]. Whether time series of behavioral data are available for each node, the links can derive from pairs’ partial correlation coefficients, mutual entropy, and so on. Furthermore, conditional probabilities computed over some relevant variable or distance between pairs of nodes over a metric space can be used as quantifiers of edge strength [
28]. Since the model developed in this study is based on theoretical assumptions as well as behavioral and psychological evidence of prototypical cases, the connections are undirected and weighted uniformly in the population. Estimating specific values for these edges is crucial; to this aim, suitable surveys are under development by the authors of the present paper.
According to the common notation applied in the mathematical literature, a graph is defined by giving the set of
n nodes and the set of
m edges linking them. The effective meaning of nodes and edges depends on the specific context in which the graph is applied: in our case, the nodes are the agents [
29], and the edges model a relationship among them, in the sense that each agent is influenced directly by their neighborhoods, i.e., all the other agents it is connected to, and indirectly by all agents without any specific dependence on the distance. Now, there are many ways to represent a network mathematically, but typically it is given the adjacency matrix
in which each element
indicates the presence or absence of a link between nodes
and
, and eventually the weight of that connection [
26].
The kind of network we are now considering has at maximum one edge between the same pair of vertices, in opposition to a multi-edge graph; moreover, there are no edges that connect vertices to themselves, the so-called self-edges or self-loops. A network of this type that has neither self-edges nor multi-edges is called a simple network or simple graph. The network considered is undirected and unweighted, so the adjacency matrix has all zeros on the diagonal (no self-loop) and it is symmetric. All are equal to 1 or 0 to indicate the presence and absence, respectively, of an interaction between nodes i and j; they represent simple on/off connections among different agents.
As future development, it might be interesting to consider a weighted graph by explicitly defining a certain strength/weight of each edge connecting nodes i and j, represented by . It might be equally interesting to consider a direct graph for which the adjacency matrix is not symmetrical, exploiting the fact that a connection between i and j does not have the same weight in both directions.
The aim of this work is to evaluate the mutual impact of individuals’ decisions on their respective awareness dynamics. To mathematically analyze this phenomenon, it is necessary to introduce a dependence among individuals characterizing their interactions [
30,
31]. A possible way to insert this interaction is to define the state evolution of each agent as related to the state of the other agents in the network. From a mathematical point of view, we introduce a new interaction term
, claiming that the state evolution for each agent is governed by the function:
For the interaction term, we can assume that there exists an imitative behavior driving people to behave like their neighbors. The same consideration is applied in scientific literature to describe mechanism of “gossip”, in which people tend to copy their friends: they increase the amount they talk about a topic if their friends are talking about it more than they are, and decrease if their friends are talking about it less [
26].
This imitative behavior can be represented by putting:
This is the form introduced for the first time to describe the synchronizing Kuramoto oscillators [
32,
33].
If we consider, for Equation (7), the simple linear case for which
, then Equation (6) reads as follows:
Indicating that the state evolution of the agent depends on its previous state (), on the stochastic process, , and on an interaction term that describes the dynamics of all the nodes linked to node . This term is the summation of all the differences between the state values of the considered node and the ones of the nodes to which it is connected, tuned using two constant positive parameters and These parameters reflect how the present state of the individual is weighted with respect to the ones of the neighbors (considering, in this first case, an equal coefficient for all the neighborhoods of node ). When the interaction term present in Equation (8) is null, and the state evolves as the single individual awareness model. In the end, the summation is modulated with a term , where is the degree of node (i.e., the number of its neighborhoods).
Recalling that the state value is defined as the level of awareness of agent at time , we can appreciate the meaning of the proposed equation. The form of the equation encompasses the fact that neighborhoods of with higher state values have a positive impact on the dynamic of node , while neighborhoods with lower values have a negative impact on it, i.e., agents with a higher and lower level of awareness can help or damage, respectively, the level of awareness of the node .