Modeling Bimodal Social Networks Subject to the Recommendation with the Cold Start User-Item Model †

: This paper describes the modeling of social networks subject to a recommendation. The Cold Start User-Item Model (CSUIM) of a bipartite graph is considered, which simulates bipartite graph growth based on several parameters. An algorithm is proposed to compute parameters of this model with desired properties. The primary desired property is that the generated graph has similar graph metrics. The next is a change in our graph growth process due to recommendations. The meaning of CSUI model parameters in the recommendation process is described. We make several simulations generating networks from the CSUI model to verify theoretical properties. Also, proposed methods are tested on real-life networks. We prove that the CSUIM model of bipartite graphs is very ﬂexible and can be applied to many different problems. We also show that the parameters of this model can be easily obtained from an unknown bipartite graph.


Introduction
A social network often means the social structure between actors, which are generally individuals or individual organizations. It shows relationships of various types, ranging from random acquaintances to the close relationship, or to object flows (e.g., information, goods, money, signals, intermediates in the production cycle) between members of the community [1].
Social network analysis (SNA) is focused on mapping and measuring relationships and information flows between people, their groups, organizations, or other entities in transforming information and/or knowledge. SNA attempts to make a prediction on the basis of the characteristics of the network as a whole entity, the properties of individual nodes based on network structure, and so forth. The subject of the research can be a complete social network, or parts of it can be related to a specific node.
Nowadays, graphs are used to model various interesting real-world phenomena. Much interest of researchers has been attracted by social networks in which one can distinguish between two types of objects, such as users and items, and where relationships only between a user and an item are of interest. They can be modeled via bipartite graphs, which are graphs in which edges exist only between two disjoint subsets of vertices. For example, in the case of customer data, there are two modalities: users and products. There are no edges between users in the user set, and there are no edges between products in the product set. An edge between a user and a product means that the user bought this product. Such graphs can be utilized to recommend some products to users. Another example is an Internet forum. In this case, there are two modalities: forum users and forum threads in and some linear dependencies for parameter estimation are investigated in Sections 7 and 8. In Section 9, a method for parameter computation from a graph is proposed. In Section 10, experimental results on parameter recovery and model quality are presented. Section 11 contains some concluding remarks.

Related Work
Much research efforts have been devoted to the qualitative description of the real-world phenomena of uni-modal social networks. Barabási [6] coped with the impact of the removal of a few super-connected nodes, or hubs. Albert and Barábasi [7] present a statistical approach to modeling random graphs, small-worlds, and scale-free networks, evolving networks, and the interplay between topology and the network's robustness against failures and attacks.
Lin et al. [8] used network history to predict communities in the current state, exploiting node degrees, modularity (as defined by Newman et al. [9]), and their own soft modularity.
Leskovec et al. [10] characterized the statistical and structural properties of communities as a function of network size and conductance.
Leskovec et al. [11,12] investigated the phenomenon of real graphs densifying over time, and shrinking of the average distance between nodes. They attempted to explain the phenomenon by models of "Community Guided Attachment" (CGA) and a more complex "Forest Fire Model" (FFM).
While there are many publications concerning uni-modal social network growth models, bimodal ones are far more rarely investigated, though as [13] shows, they are important for product recommendation and rating prediction, or as [14] (sec. 6.4.4.) reports, they may be used for investigation of models of scientific paper co-authorship.
Publications like [15] or [16] propose models for new edge prediction. The paper of Lavia et al. [17] is the most similar in spirit to our work. In their paper, the Netflix competition database was considered, and an explanation for the hardness of prediction was made. The authors there proposed a growth model of an item rating network based on a mixture of preferential and uniform attachment that reproduces the asymptotic degree distribution, but also agrees with the Netflix data in several time-dependent topological properties.
Our research differs from this in that we are considering a much more complex model, where both items and users can perform the edge attachment, and a bouncing mechanism for modeling the impact of local recommendation is included.

Graph Models for Unimodal Networks
In this section, the most popular graph generators are presented, also called graph models. First, let us recall an important measure of graphs, which is frequently used when evaluating the quality of various graph models of social networks.
Many empirical graphs are well-modeled by small-world networks (see [18]). For example, social networks, the connectivity of the Internet, wikis such as Wikipedia, and gene networks-all of them exhibit small-world network characteristics.
Therefore, in the literature, a couple of measures have been proposed to determine whether a graph is a small-world network. The most popular of them is the so-called local clustering coefficient (LCC), and for a vertex, i it is defined as: where E is the set of all edges, V is the set of all vertices, a, b ∈ V are vertices, and k i is the degree of vertex i. The degree of vertex i is the number of edges incident to this vertex. For the whole graph G, the clustering coefficient is just LCC(G) = ∑ i∈V LCC(i) |V| . The Erdös-Réni model (see [19]) is defined by two parameters: the number of nodes, n, and the probability that there exists an edge between nodes, p. This mechanism of node connection is called uniform attachment (UA). In this model, the node degree distribution follows the exponential distribution and local clustering coefficient LCC ∼ n −1 . The Bárabasi-Albert model (see [7]) uses a "preferential attachment" (PA) mechanism for creating a connection between nodes. A graph is initialized with a connected graph with m 0 nodes. In the following steps, each node is connected to m existing nodes in such a way that the probability of connection is proportional to the number of links that the existing nodes already have. This method of connecting nodes causes that node degree distribution to follow a power-law distribution P(k) ∼ k −3 and LCC ∼ n −3/4 . Liu's model (see [20]) was one of the first attempts at combining the uniform attachment and preferential attachment mechanisms. The authors proposed a parameter of intensification, mediating between UA and PA.
The models mentioned previously have serious drawbacks-the LCC value does not depend on graph parameters. More flexible models were proposed by Vázquez (see [21]), and independently by White (see [22]), where LCC may be modified by changing graph parameters. In the Vázquez model, the idea is based on random walks (called also surfing) and a recursive search for generating networks.
In the random walk model, the walk starts at a random node, follows links, and for each visited node, with some probability, an edge is created between the visited node and the new node. It can be shown that such a model generates graphs with a power-law degree distribution with an exponent greater than or equal to 2 (see [23]).

Graph Models for Bimodal Networks
A bimodal network is understood as a network connecting two varieties of objects (like authors and their papers, employees and their firms, tourists and museums, etc.) [24]. Other names for such a network are a bipartite, 2-partite, or 2-mode network. These networks can be modeled by bipartite graphs. A bipartite graph has the form That is, vertices of the bipartite graph can be divided into two disjointed sets, U and V, such that every edge connects a vertex in U to another one in V; that is, U and V are independent sets. These sets represent, for example, customers and products. If a customer u i buys a product v j , there is an edge between vertex u i and v j . Thus, there are no edges between customers and between items-they cannot buy each other. In the case of an Internet forum, one could also have two modalities: one for users and the other for threads the users participate in. Many other kinds of bipartite networks occur in real life [25].
Although bimodal graphs are a specific subclass of uni-modal graphs, models mentioned in Section 2.1 are not appropriate when one wants to model bipartite graphs of bimodal social networks. In the bipartite graph for all vertices a, b in the same modality set, one does not have any edges between them, so one always gets LCC = 0. This means there is a severe problem when studying bipartite graphs, because, on the one hand, one does not have any means of looking at the fundamental small-world phenomena, and on the other hand, it is an obstacle in adopting traditional graph generators to the case of bipartite ones. Therefore, in [5], another suitable metric for clustering tendency was proposed-the bipartite local clustering coefficient (BLCC): W is the set of all vertices, N s (n)-the set of neighbors of vertex n ∈ W, which are s ≥ 1 steps away. In other words, N s (n) = {a ∈ W : K(n, a) = s}, where K(i, j) is a minimal distance (number of edges) between vertices i and j. In [5], it is shown that the graph metric LCC and BLCC are similar in classical graphs. Typically, in a social network, a small number of vertices have many direct neighbors, and a large number of vertices have a small number of direct neighbors. Social networks contain clusters with a high density of connections. This network property is called transitivity, and says that if nodes a and b have a common neighbor, then this influences the probability of the existence of an edge between a and b. The critical difference between unimodal and bimodal networks led to the development of separate models for the bimodal case. Let us mention a few.
There are two main approaches to model bipartite graphs: the iterative growth method and the configuration method. The iterative growth method is better for the recommendation process, as shown in [5]. It simulates the growth of a network. In the configuration method, one gives their estimated general description of the network, and from this, one constructs the final state of the network. The description usually contains: the number of nodes in each modality, the probability density function (PDF) of the nodes' degree, and the number of edges. Then, one creates nodes in each modality without edges. One creates edges from sampling endpoints of the edge from the node degree probability density function (PDF).
In [4] Guillaume and Latapy presented a method of transforming the bipartite graph into a classical network (uni-modal) and of reversing this transformation. Unfortunately, the reverse transformation is not unique, and moreover, the retrieval of the bipartite structure is computationally hard. The authors pointed out that the computation of the largest clique containing a given link may be very expensive (it is NP-complete). Birmele [2] builds a bipartite graph model from existing uni-modal graph models using retrieval of the bipartite structure from classical graphs. In [3], Zheleva et al. analyzed the evolution of groups in an affiliation network. The affiliation network has two modalities: users, and groups to which users belong. In the co-evolution model, groups can disappear and merge. This model is not appropriate in the case of recommendation items for users-items do not merge. In [26], Lattanzi and Sivakumar proposed a different model of the affiliation network as the bipartite graph. Their model for the evolving affiliation network and the consequent social network incorporates elements of preferential attachment and edge copying. They analyze the most basic folding rule, namely, replacing each society node in the affiliation network by a complete graph on its actors in the folded graph. The drawback of their models is that given a social network (or another large graph), it is not at all clear how one can test the hypothesis that it was formed by the folding of an affiliation network. The general problem of solving, given a graph G on a set Q of vertices, whether it was obtained by folding an affiliation network on vertex sets Q and U, where |U| = O(|Q|), is NP-Complete.
All previously mentioned generators have an iterative growth mechanism. The common limitation of those generators is that they generate bipartite graphs with the power-law or uniform distribution of vertex degrees. Additionally, none of these models contains a parameter which controls the transitivity property. The previous approaches also have a significant drawback: configuration methods and methods based on retrieval of a bipartite structure decrease the bipartite local clustering coefficient (BLCC) compared to iterative methods. This means that models derived by those methods fit the real-world structures worse than those estimated by iterative methods. Moreover, in the pessimistic case, one deals with the NP-complete problem, so some approximations are needed. As these models suffered from various drawbacks, in [5], another model was proposed, which is characterized in Section 3 and which is the subject of our current investigation.

Recommender Systems for Bimodal Networks
Bimodal networks appear as a natural setting for a recommendation system, where objects of one modality are recommended for the objects of the other modality. We have already mentioned the works [5,26], and another addressing modeling for recommendations under these settings-however, there are many more. Ahmedi et al. (see [24]) derived recommendations from a network associating tourists with points of interest, based on the vertex and edge labeling combined with some ranking or centrality function. He et al. (see [27]) proposed to predict item popularity and to recommend items to users based on a specific version of a PageRank technique (eigenvectors of a special connectivity matrix). User preferences are expressed as weights. Shi et al. (see [28]) used for recommendation a combination of content-based and collaborative filtering, while a method of combining both recommendations was developed via learning the weights of both components from previous prediction accuracy. Cheng et al. (see [29]) exploited a matrix factorization model based on reviews and preferences. Ozsoy (see [30]) proposed to use the word2vec technique, originally developed for seeking words occurring in similar contexts. This approach replaces words with users and items, creating a kind of word2vec representation of the item-user graph. Recommendations are based on the similarity of objects in this representation. Vasile et al. (see [31]) proposed, based on the same word2vec technology, recommendations of items (products) based on their context (other products), as well as some textual information (content-based support). Kang and Yu (see [32]) developed a soft-constraint-based online LDA algorithm for community recommendation. It also accommodates a technique used for document processing to the collaborative filtering setting. A user is represented as a "document", being a probability distribution over latent topics, and each topic is represented as a probability distribution over communities. The number of users' posts within each community forms the foundation for estimation of latent topics, whereby an online LDA algorithm is applied for this purpose. Communities are recommended based on the conditional distribution of a community against the user "document". Liu et al. (see [33]) developed a recommendation method enriching online LDAs with probabilistic matrix factorization. Other application cases are reviewed in [34].
The current paper proposes a framework that differs from the just-mentioned approaches. They attempt to make recommendations taking into account the current state of the network. In the approach presented in this paper, the history of the network is modeled-that is, the predictions are related to the evolution of the network, and not to a suggested recommendation to a particular user at a given snapshot.

CSUIM Bipartite Graph Generator
The bimodal graph generator presented in [5] is more flexible than graph generators mentioned in Section 2.2, though it cannot generate a disconnected graph with desired properties. Its advantage is the capability to create graphs with a broader range of clustering behavior via the so-called bouncing mechanism. The bouncing mechanism is an adaptation of a surfing mechanism in classical graphs (see [21]). The bouncing mechanism is used only to the edges, which were created according to the preferential attachment.
In the CSUIM, we consider a graph with the set of vertices W = U ∪ V, U ∩ V = ∅, where the set U is called "users" and set V is called "items". We consider both the uniform attachment, where incoming nodes form links to existing nodes selected uniformly at random, and the preferential attachment, when probabilities are assigned proportional to the degrees of the existing nodes (see [35]).
The generator has seven parameters: 1. m-the initial number of edges, where the initial number of vertices is 2m 2.
δ-the probability that a new vertex v added to a graph in the iteration t is a user v ∈ U, so 1 − δ means the probability that the new vertex v is an item v ∈ V 3. d u -the number of edges added from the vertex of user type in one iteration (number of items bought by a single new user), 4.
d v -the number of edges added from the vertex of item type in one iteration (number of users that bought the same new item) 5.
α-the probability of item preferential attachment, 1 − α-the probability of item uniform attachment 6.
β-the probability of user preferential attachment, 1 − β-the probability of user uniform attachment 7.
γ-the fraction of edges attached in a preferential way, which were created using the bouncing mechanism The Cold Start User-Item Model (CSUIM) creates a node in the set of users with probability δ and 1 − δ in the set of items. The newly created node is connected with nodes of the opposite modality. If the node is of user type, it will be connected with d u items, and if it is of item type, then it will be connected with d v nodes of user type. To find the node to which the newly added node will be connected, we use two mechanisms: the "uniform attachment" (UA) and the "preferential attachment" (PA) described briefly in Section 2.1. PA is drawn with probability α for items and β for users; otherwise, nodes are selected by UA. When PA is selected, we have to choose the fraction γ of edges that will be attached by the bouncing mechanism. More details of the bouncing mechanism will be described after the description of the CSUIM algorithm.
The procedure for generating synthetic bipartite graphs is outlined in Algorithm 1.

Algorithm 1 Cold Start User-Item Model.
Step 1. Initialize the graph with m edges (we have 2m vertices).
Step 2. Add a new vertex to the graph of type user with probability δ, otherwise of type item.
Step 3. Choose a neighbor to join the new vertex according to the following rules: Step 3a. If the new node is item, then add d v edges from this node to type user vertices using the preferential attachment mechanism (with probability β) or uniform attachment (otherwise).
Step 3b. If the new node is user, then add d u edges from this node to type item vertices, using the preferential attachment mechanism (with probability α) or uniform attachment (otherwise).
Step 3c. Consider the newly added vertex v 0 and edges from this node added by preferential attachment (nodes u i and v i are from different modalities). Select γ fraction of those end nodes. For each node u 1 from this set, pick at random one of its neighbors, v 2 . From the randomly selected node v 2 , select its neighbor u 3 at random again. Connect the new node v 0 to the node u 3 selected in this way instead of the original node u 1 obtained by preferential attachment.
Step 4. Repeat Steps 2 and 3 T times.
Step 3c emulates the behavior called recommendation. One can imagine that a customer who is going to buy one of the products encounters another consumer who already purchased it, and recommends him another product instead. The first consumer changes his/her mind and follows this recommendation with a probability of γ. By varying this parameter, one can observe what happens when people are more or less amenable to the recommendation.
Selecting products by uniform attachment simulates consumers that do not bother about which product to choose. Preferential attachment simulates consumers that look for products on their own (e.g., dresses unseen frequently on the street). Note that this model of graph growth simulates a very special kind of purchase behavior-namely, the behavior of only new consumers and new products. Despite its limited applicability, the model is very important, because it concentrates on a very hard part of the recommendation process called "cold start". Cold start concerns the issue that the system cannot draw any inferences for users or items about which it has not yet gathered sufficient information. Recommender systems form a specific type of information filtering (IF) technique that attempts to present information items (e.g., movies, music, books, news, images, web pages) that are likely of interest to the user. Typically, a recommender system compares the user's profile to some reference characteristics. These characteristics may be from the information item (the content-based approach) or the user's social environment (the collaborative filtering approach). More detailed specifics of this hard problem and some solutions have been presented in [36,37].
It is easy to see that after t iterations with the bouncing mechanism disabled (γ = 0), we have |U(t)| = m + δt vertices of type user and |V(t)| = m + (1 − δ)t vertices of type item. The average number of edges attached in one iteration is After a large number of iterations, we can skip m initial edges in further calculations. Thus, we can show that the average number of vertices of type user and of type item depends only on iteration t and δ and does not depend on m, d v , or d u . The total number of edges depends only on d v , d u , and δ. This is not good news, because we cannot use them to estimate all parameters of the generator, especially β, α, and γ.
In the next section, an approach and a method of parameter extraction based not on the current state but rather on the dynamics of the network is presented.

Motivation for Proposed Approach
One can ask, how do you estimate generator parameters? Any approach to network parameter estimation should be based on the observable quantities that should be turned to the model parameters.
In the model described above, we can essentially observe the nodes and their interconnection, as well as their statistics (like degree distributions and/or clustering coefficients) as a source of information for parameter estimation.
At least three types of approaches seem to be considered: Machine-learning; and • Brute force.
An analytical approach would mean establishing a closed-form model for some of the observables, such as node degree distribution in both modalities and an attempt to solve it analytically for the parameters. As we will see in the next section, the differential equation for the node degree distribution is not simple to solve, and only approximate solutions are known in the literature for particular settings of the variables, even in the simple case of no recommendation (γ = 0).
A machine-learning approach was applied in [38], but there seems to be no simple relationship to be extracted via machine learning.
Finally, a brute force approach would be to slice the space of parameters and then to generate a sample for each of the parameter space slices, compute the observables from the sample, and to choose the parameter set for which the sample is closest to the real graph.
Eventually, we follow the last path; however, we simplify the process. In the simplified process, we exploit independences between some effects of the parameters, as well as some simplifying implications of the theoretical models.

Theoretical Node Degree Distribution Models
As indicated in the previous section, in our approach, we measure some characteristics of a network to estimate the parameters of the model. One of the most important properties of a network is the node degree distribution for each modality. We consider the probability that a node has degree k at some moment in time t and denote it as p k (t). Variable t can be interpreted as a number of iterations made while generating a graph from the model.
Let us concentrate on CSUIM (see Algorithm 1 from Section 3) when there is no bouncing mechanism. The bouncing mechanism is disabled when γ = 0. Let ζ g represent the rate at which new nodes are introduced in modality g (items or users) that is, ζ g ∆t nodes of modality g are added in time interval of duration ∆t. In the current model ζ users = δ, ζ items = 1 − δ (on average) is added in a single time interval. Let N k,g (t) denote the expected number of nodes of modality g whose degree is k at time t. Let us consider multiple attachments. Each new node of modality g that is introduced chooses θ g existing nodes ( θ users = d u , θ items = d v )of opposite modality g. With θ n,g , let us denote the number of nodes attached to a new node of modality g using the non-preferential (uniform) attachment, and with θ p,g , let us denote the number of nodes attached to a new node of modality g using preferential Following the argument from [35], one can see that the node distribution over time is governed by the equation:Ṅ with ∑ N ,g (t) = (ζ g θ g + ζ g θ g )t.
In [35], the solutions for the extreme cases of θ g = θ n,g (pure uniform attachment) and θ g = θ p,g (pure preferential attachment) were found. It turns out that for t tending to infinity, the node distributions are governed approximately by exponential distribution and power distribution resp.
The change of N k (t) for pure uniform attachment is given by: An approximate solution tends to the following when time is going to infinity: In the case of preferential attachment, each newly attached node adds one to N β p at that instant. Then, N k (t) evolves according to the equation: However, no mixed case was considered in [35]. The mixed case was treated by [5], though only for large k. It turns out that the distribution in the mixed case is approximately power distribution, though with a complex exponent. The formula in [5] is derived using the relaxation of the degree to a real positive number, defining probability density function over degrees. Using our notation, we have the following equation: where Φ{k g (t) < k} is the probability that modality g vertex g has degree k g , which is less than threshold value k, and η = d u δ + (1 − δ)d v is the average number of edges attached in one iteration. Thus, we get In our paper [38], we tried to extract the α and β coefficients from formula (9), but it did not match the experimental distribution well.
Therefore, we seeked an alternative to this. This alternative is shown in the sections below.

Our Approach to Parameter Estimation
The model from the previous section, though difficult enough for an analytical solution, still means a substantial simplification in that γ is set to zero, and we deal with time tending to infinity and assume the k is large.
So, first of all, why shall we assume that γ = 0? If there is no bouncing, then we can easily see that the node degree distributions of both modalities are independent of one another so that they can be considered separately. Also, as α and β apparently influence one or the other modality degree distribution, we can guess that both can be estimated separately. This reduces the search space drastically, but what will happen if γ > 0? In this case, "under a stable distribution", two nodes of, say, user type will pick up item nodes from approximately the same distribution. So if bouncing occurs, then it is equally likely that a node of degree k will increase its degree instead of a node of degree l, and that something will happen in the reverse direction. So, we can expect that under "modest" values of γ, the marginal distributions of degrees of both modalities will remain unchanged, and a model with γ = 0 is justifiable for them.
However, we can easily guess that γ will impact the clustering measures. So that after estimating α, β, we can estimate γ separately.

A Linear Relationship to Obtain α and β
We would like to demonstrate how probability p k of a node having degree k changes in CSUIM. With respect to the definition of probability from Equation (9), Figure 1a,b depict dependency between ln(p k ) and ln k for fixed values of α or β, depending on modality. It turns out that for small k (consuming most of the probability mass) and fixed α (β), the value of ln(p k ) decreases nearly linearly with ln k. We can see that when we add more edges to a node (ten times more), linear characteristics of the relation between ln(p k ) and ln k. This gives us an insight about setting up values of d u and d v . ln(P(k)) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q beta=0.001 beta=0.1 beta=0.3 beta=0.5 beta=0.7 beta=0.9 beta=0.99 ln(P(k)) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q beta=0.001 beta=0.1 beta=0.3 beta=0.5 beta=0.7 beta=0.9 beta=0.99 The same dependency occurs when we consider simulations with the CSUIM (see Figure 2a,b). We can see that the linear relation between ln P(k) and ln k almost does not change when we fix β and change the value of α from 0 to 0.99. Note that Equation (11) does not contain α. This experiment shows that not only in theory, but also in practice, computing β does not depend on α value. Therefore, we looked at the relationship between α (analogously for β) and the direction coefficient of the straight line approximating the relationship between ln(p k ) and ln k, and drew it for various values of α (β). We see that for a wide range of values of α (β), this relationship is linear, both for the theoretical and simulation models.
This insight led us to the algorithms for the identification of α and β, as described below. How can it be explained that α and β are linearly dependent on the degree distribution exponent? As already mentioned, when α or β (for the respective modality) is set to 1, then we have to do with the preferential attachment for that modality and the degree distribution follows a power-law, whereas when set to 0, the exponential distribution is followed. For values in-between, we have to do with a kind of mixture of both (which seems not to be a simple one).
If we take the formula ln P(k)/ ln k and draw it for various values of k as a function of β, we will see that in a large range of values there is a nearly linear relationship. This result is shown in Figure 3. Therefore, we exploited it for an estimation of β. qqq qqq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq q q q q q q q q q q q q k=3 k=4 k=5 k=6 k=7 k=8 k=9 k=10 In the CSUI model when β grows, the probability of connecting the new link with preferential attachment grows as well. Thus, we can approximate the distribution of vertices' degrees by the power-law distribution from the experimental degree distribution and compute the exponent of this distribution. We have After applying the ln function to both sites, we get: We see in Figure 4 that theoretically for different combinations of d u and d v , parameter β has a linear relationship with exponent (a coefficient in Equation (11)) of the power-law distribution of a vertex degree. This observation provides us with an algorithm for β parameter estimation. Analogously, we can estimate a α parameter from the exponent of distribution of vertices from the item modality. Moreover, when β grows up to 1 (preferential attachment), then we get the desired power-law distribution of nodes degree P(k) ∝ k −3 .  The preferential attachment has a power-law distribution with a "heavy tail" of node degrees, and the uniform attachment has an exponential distribution of node degrees, with a "light tail". As demonstrated in [39], an empirical mixture of these two distributions can be approximated with the power-law distribution. Therefore, linear regression analysis has been used sometimes to evaluate the fit of the power-law distribution to data and to estimate the value of the exponent. This way, one can also obtain the mixture parameter, α. The rationale behind this approach is that the heavy tail distribution dominates over the exponential distribution for nodes of higher degree. This technique produces biased estimates (see [39]). As we see in the experiments, it is unreliable for low values of α (β) (below 0.1)-see Figure 5a (ln P(k) vs ln(k)) and Figure 5b (ln P(k) vs k).

A Linear Relationship for γ
The bouncing parameter of the graph model may be used to model the behavior of users vulnerable to recommendations. We find out that this parameter is linearly correlated with a graph metric called "optimal modularity" (see [9]).
Modularity is a measure of the quality of the clustering of nodes in a graph (we describe it briefly below). Optimal modularity is the modularity of such a clustering of nodes for which the modularity is the highest among all the node clusterings of a given graph. It is known that finding the optimal modularity is an NP-hard task; therefore, there exist various greedy algorithms without a range guarantee. So, in fact, this term should be called "the optimal modularity for the algorithm X", and so we mean here the optimal modularity computed by the algorithm described in [9].
Modularity is the fraction of the edges that fall within the given groups (clusters) minus the expected such fraction if edges are distributed at random. The value of the modularity lies in the range [− 1 2 , 1]. It is positive if the number of edges within groups exceeds the number expected on the basis of chance. Examples of graph clusterings with positive and negative modularity values are shown in Figures 6a,b, respectively. The upper boundary (modularity=1) is approached if one has a multitude of complete graphs. For a given division of the network's vertices into some clusters (called groups, communities, or modules), modularity reflects the concentration of nodes within modules compared to a random distribution of links between all nodes regardless of modules. There are many ways to express the modularity. In our approach, we compute Newman's modularity (see [9]) as follows: where A ij represents the adjacency matrix, A ij = 1 when there is an edge between nodes i and j and 0 otherwise, k i = ∑ j A ij is the sum of the weights of the edges attached to the vertex i, c i is the community to which the vertex i is assigned, the δ-function is Kronecker delta, δ K (u, v) = 1 iff u = v and 0 otherwise, and m = 1 2 ∑ ij A ij . The above formula for modularity can also be expressed as the difference between the quotient of the number of edges inside of communities and of the total number of edges minus the sum of squares of the shares of edges that have at least one end in the community.
In our computations of the optimal modularity, communities are obtained based on the Newman's modularity concept. The algorithm runs as follows: initially, each node constitutes its own community, then nodes are moved between neighboring communities until a stopping criterion is reached. The obtained communities receive distinct identifiers called a modularity class. A node is moved to the community of one of its neighbors if this would increase the modularity of the entire network. At each step, the node giving the maximum modularity gain is selected. The process is terminated if no gain of modularity can be achieved.
Part of the Newman's algorithm efficiency (see [9]) results from the fact that the gain in modularity ∆Q obtained by moving an isolated node i into a community C can easily be computed by: where ∑ in is the sum of the weights of the links inside C, ∑ tot is the sum of the weights of the links incident to nodes in C, k i is the sum of the weights of the links incident to node i, k i,in is the sum of the weights of the links from i to nodes in C, and m is the sum of the weights of all the links in the network. A similar expression is used in order to evaluate the change of modularity when i is removed from its community. Therefore, in practice, one evaluates the change of modularity by removing i from its community and then by moving it into a neighboring community.
To sum up, the Newman's optimal modularity tells us very important thing-how much our graph differs from a random one. In a fully random graph, edges are attached to some nodes at random from some distributions. The bouncing parameter γ of the CSUI model gives us a kind of dependence of node linking to other nodes-selecting both ends of an edge. Value γ represents a fraction of edges attached in a preferential way, which were created using the bouncing mechanism. The greater the value of γ is, the stronger dependence in creating links in the graph occurs. When we have some kind of dependence while creating links, the greater the value of modularity.
Let us return to the step in the CSUI model where the new node is added, and the bouncing mechanism is active. Let us consider bouncing from the newly created user vertex u (see Figure 7). Firstly, the bouncing algorithm selects an item vertex, i. From this vertex, we can go further to user modality through edges added in previous steps either by an edge added in one of the previous iterations by adding a user node or item node. From the fact that we deal with the power-law distribution of a vertex degree, we know that most of the distribution mass have vertices with the smallest degree. Thus, it is more probable that we go through the edge added by adding a user node u 2 and from this node to an item node i k , which is the end node of the edge e. Thus, we created a new edge, (u, i k ). So the probability of creating edge (u, i k ) is: Figure 7. Example of creating a new edge (red dashed line) from new node u using a bouncing mechanism. Directed arrows indicate the following steps of a bouncing mechanism in an undirected bipartite graph.
Equation (14) can be written in this form, because if we add a new vertex u to the graph, then outgoing edges from this node are independent of each other. Because the node i has a low degree, most of the outgoing links from i k are independent, and analogously, most of the user nodes are of low degree, so outgoing links are independent. In general, after "sufficient" time during the further evolution of the network, we get P(i k |u 2 ) = P(u 2 |i k ), so we have: Thus, the bouncing mechanism does not change distribution on most degrees (small degree) and can be considered separately from α and β parameters.
On the other hand, modularity is a measure of distribution change of edge placement in graphs compared to their random placement. Edges mentioned before are placed almost randomly, and they have no influence on the modularity value. However, there are other combinations of the placement in bouncing-some edges are added when we added edges from u 2 and also u. In this case, edges are not independent because adding an edge from u 2 to i increases the probability of adding an edge from u to i. Thus, the independence of distribution is distorted, which implies a change of modularity. Therefore, we conclude that there may be a way to identify the bouncing parameter from the modularity, and we will determine this relationship empirically. In Figure 8a, we see that this relationship seems to be linear even for small values of α and β. Unfortunately, when we add more edges in one step, the linear relation gets weaker-see Figure 8b.

Parameter estimation
Here, we estimate the parameters of the model based on a couple of observable network properties. The estimations are based on theoretical relationships between model parameters and metrics from the generated network from the previous section. In this section, we propose algorithms for the computation of all CSUI models. First, we describe the retrieval of parameters δ, m, d u , and d v . Then, we propose two algorithms. The first algorithm estimates α and β parameters using the distribution of node degree in each modality and linear regression. The second one uses modularity measure and linear regression for computation of the γ parameter.

Parameter δ
Theoretical equations from the previous section after some modification are useful to estimate parameters of a bipartite graph generator. The simplest one is δ, which is the probability that a new vertex v added to the graph in iteration t is a user v ∈ U, so 1 − δ means the probability that the new vertex v is an item v ∈ V.
where |U| cardinality is the set of nodes users and V is the set of nodes of item type.

Parameters d u , d v and m
There are two approaches to obtaining d u and d v . The first and simplest one is to set d u as the minimal degree in the user set, and to analogously set d v as the minimal degree in the item set.
The second way is more complicated. The average number of edges attached in one iteration is η is easy to estimate from graph as η = |E| |U∪V| , where |E| is the total number of edges in the graph. δ is computed from the previous section. Most of the vertex degree distribution mass is on the lower degrees k, so we can make the integer minimization of d u + d v with an additional restriction d u δ + (1 − δ)d v − η = 0. Another way is the brute force approach. It is done based on vertex degree distribution in each modality. We fix some d u and compute the value d v from equation m is the number of initial edges. It must be at least max(d u , d v ). The better way is to set it for computation on d u + d v because it speeds up a few initial steps when there are few nodes in a graph.

Calculations of α and β
In Section 7, we had shown theoretical linear relationships for obtaining α and β. Therefore, we can compute α and β from linear models: where a 1 , a 0 are some constants calculated from the linear regression model and exp item is an exponent of the power-law distribution of node degree in item modality. Analogously, for the β parameter, we obtain where b 1 , b 0 are some constants calculated from the linear regression model and exp user is an exponent of the power-law distribution of node degree in user modality. Technical details of computing exponent of the power-law distribution of node degree are shown in Sections 9.4 and 9.5.

Calculations of log pk and log k
In this section, we show how to compute an empirical power-law distribution. For each modality, we have a two-dimensional array deg k [max k ] [2], where max k is maximal degree in the considered modality. Based on this array, we compute arrays: log pk which contains the probability that the vertex has degree k and log k with the logarithm of the vertex degree k. The pseudocode is shown in Algorithm 3. It is important for the Algorithm 4 from Section 9.5 that this array contains only existing node degrees.

Calculations of Power-Law Exponent
For each modality in the graph, we compute the exponent of the power-law distribution in the following manner. We have two arrays computed in Section 9.4: log pk , which contains the probability that the vertex has the degree k, and log k with the logarithm of the vertex degree k. From those arrays, we compute the power-law exponent exp of the node degree distribution in Algorithm 4.

Algorithm 2 Computation of α and β.
Input: Bipartite graph G = U ∪ V, where U ∩ V = ∅. We will call U user set and V is item set.
Step 1. Compute exponent exp user of degree distribution of user node set U, and analogously, exp item of item node set V.
Step 2. Compute δ from Equation (16) and d u and d v from subsection 9.2.
Step 4. For each pair (α i , β j ), generate a bipartite graph with these parameters, setting δ, d u and d v as computed in Sections 9.1 and 9.2 and setting γ to zero. From the generated graph, compute exponent exp user ij of the degree distribution of the user set, and analogously, exp item ij of the item set, as shown in Section 9.5.
Step 5. For the data set D α consisting of pairs (α i , exp item ij ), perform linear regression creating model α with the response vector α and one predictor variable, exp item .
Step 6. For the data set D β consisting of pairs (β j , exp user ij ), perform linear regression creating model β with response vector β and one predictor variable exp user .
Step 7. Predict α value from model α based on exp item obtained from graph G.
Step 8. Predict β value from model β based on exp user obtained from graph G.

Algorithm 3
Computation of arrays log pk and log k .
Step 1. Count vertices of degrees 1, ..., max k , which exists in the graph, and store them in array deg k [i] [2], where deg k [i] [1] has the value of k and deg k [i] [2] contains the number of vertices with degree k.
Step 2. Get the count of vertices of the considered modality as mod count .

Calculations of γ Parameter
As we had shown in Section 10.1, this relation is well-approximated by the linear model to some extent. If α, β ∈ [0.1, 0.9] and d u , d v ≤ 5, then the bouncing parameter is predicted to be quite good from the simple linear model: where pb 1 , pb 0 are some constants calculated from the linear regression model. Thus, we constructed the Algorithm 5.

Algorithm 4
Computation of power-law exponent exp.
Step 1. Fit linear model to the data: log k = l 1 · log pk +l 0 .
Step 2. The returned model has two coefficients: l 0 -intercept and l 1 -attribute coefficient.
Step 3. Get coefficient from the attribute log pk and save on variable exp.

Algorithm 5 Computation of γ.
Input: Bipartite graph G = U ∪ V, where U ∩ V = ∅, U-user set and V-item set.
Step 2. Create grid of γ i values.
Step 3. For each γ i , generate the graph model and compute modularity.
Step 4. Make dataset D γ containing γ i values and corresponding modularity values.
Step 5. Make linear regression model model γ having the response vector γ and one variable modularity.
Step 6. Predict the γ value from model γ based on modularity from graph G.

Experimental Results
Here, we present the experimental results on parameter recovery and model quality. We performed several simulations to validate theoretical relations involving parameters α and β described in Section 7 and parameter γ described in Section 8. Those simulations are presented in Section 10.1. After verifying theoretical properties, we made parameter estimation experiments. We tested how well the parameters of the CSUI model can be obtained from several real networks. The network generated from the CSUI model and real network were compared based on a number of metrics described in Section 10.2.

Validity of Parameter Recovery Models
In this section, we make several simulations generating networks from the CSUI model to verify theoretical properties. Experiments with α and β were made based on Algorithm 2. Experimental results in Figure 9a,b show that α and β parameters do not depend on each other. Model m1 contains two variables: β and exp I in Figure 9a, α and exp user in 9b. Model m2 contains only one variable-exp item in Figure 9a, and exp U in Figure 9b. On top of each plot is the given p-value of the ANOVA test of difference between models m1 and m2. Adjusted R-Squared values for models m1 and m2 in Figure 9a are 0.94, and the p-value of the ANOVA test is 0.82. Adjusted R-Squared values for models m1 and m2 in Figure 9b are around 0.86, and the p-value of the ANOVA test is 0.63. The p-value of the ANOVA test is greater than 0.05, so at this level of importance, there is no statistically significant difference. Thus, the α parameter does not depend on the β parameter in Figure 9a, and the β parameter does not depend on the α parameter in Figure 9b. Moreover, with more iterations (see Figure 10a,b), this independency gets stronger-thus, there is a greater value of the ANOVA test. The 2D plot of data obtained from the experiment is given in Figure 11a,b for 5000 and 50,000 iterations, respectively. We can see that with more iterations, the spread of points for different values of the α parameter at the same value of β is getting smaller, which gives a better prediction of the parameter β. Moreover, with more iterations, this independency gets stronger-a greater p-value of the ANOVA test. Thus, we can predict them separately. Simulations with the γ parameter were made based on Algorithm 5. Plots of data obtained from the experiment for 10,000 iterations and different values of α and β are given in Figure 12. We can see an almost ideal fit (adj. R-squared value above 0.98).

Retrieval of Parameters
In our experiments, we used several topical fora from the StackExchange data dump from December 2011. This database is available online and licensed under the Creative Commons BY-SA 3.0 license. Stack Exchange is a fast-growing network of question-and-answer sites on diverse topics, from software programming to cooking to photography and gaming. We analyzed databases from the following forums: bicycles, databaseadministrators, drupalanswers, itsecurity, physics, texlatex, theoreticalcomputerscience, unixlinux, webapplications, webmasters, and wordpress. From this data, a bipartite graph for each dataset was created. In one modality, there were users in other topics. An edge was created when a user participated in some topics by writing a post in the topic. The edge between the user and topic was created only once. We interpreted the network structure as an undirected graph with no weights per edge.
Due to the limitation of the CSUI model, we took under consideration only the giant component (GC). The giant component is the biggest connected component in a graph. In a real-world graph, GC contains 70% or more of the whole graph and influences the growth of the network. From the created bipartite graphs, we calculated several graph and model properties and compared them to an artificial graph generated from the CSUI model. Metrics used in experiments:

1.
Total Nodes-the total number of nodes in GC 2.
Total Edges-the total number of edges in GC 3.
Average Degree-the average of node degree in GC 4.
Diameter-the maximal distance between all pairs of nodes in GC.

5.
Radius-The radius of GC. The radius r of a graph is the minimum eccentricity of any vertex, r = min v∈W (v). The eccentricity (v) of a vertex v is the greatest geodesic distance between v and any other vertex. 6.
Average Path Length-the average number of steps along the shortest paths for all possible pairs of network nodes. It is a measure of the efficiency of information or mass transport on a network. 7.
Number Of Shortest Paths-the number of shortest paths in GC 8.
Communities Number-the number of communities from Neumann's modularity algorithm in GC. More details in section 8 9.
Density-measures how close the network is to a complete graph. A complete graph has all possible edges and density equal to 1. 10. Modularity-the Neumann's modularity described in section 8 11. Avg Item Clustering-the average value of BLCC for modality items based on Equation (2) 12. Avg User Clustering-the average value of BLCC for modality users based on Equation (2) 13. UsersCount-the number of nodes in modality users 14. ItemsCount-the number of nodes in modality items 15. User Average Degree-the average value of users node degree 16. Item Average Degree-the average value of items node degree 17. gen alpha-the value of parameter α from the CSUI model. Computation is based on Algorithm 2 from Section 9.3. In column "Graph", it is computed based on a real graph, and in column "Model", it is computed based on a generated network from the CSUI model. This value is in [0, 1] interval. We give the exact value from a linear model for demonstration purposes. 18. gen beta-the value of parameter β from the CSUI model. Computation is based on Algorithm 2 from Section 9.3. Interpretation as for the gen alpha metric. 19. gen p add user-the value of parameter δ from CSUI model. Computation is based on Section 9.1. 20. gen p bouncing-the value of parameter γ from the CSUI model. Computation is based on Algorithm 5 from Section 9.6. 21. ExpUserCoeff-the exponent of exponential distribution of node degree of modality users.
Computation based on Section 9.5. 22. ExpItemCoeff-the exponent of exponential distribution of node degree of modality items.
Computation based on Section 9.5.
We extracted graph parameters as shown in Section 9. It turned out (see Tables 1-5) that the most crucial parameters were d u and d v . Values of these two parameters determine how the graph generated by the model will be similar to a real one. We used two methods for finding optimal values d u and d v : discrete optimization and the brute force approach described in Section 9.2. The brute force approach gave us the best results in half of the cases.

Conclusions
The Cold Start User-Item Model (CSUIM) of bipartite graphs is very flexible and can be applied to many different problems. In this article, we showed that the parameters of the CSUI model could be obtained easily from an unknown bipartite graph. We presented several algorithms to estimate the most important parameters: We gave some advice about setting up the renaming parameters: m, d u , and d v . The experimental results showed that the CSUI model could be applied to some extent for modeling the bipartite graph of users and user posts.
Moreover, we gave a theoretical basis for estimating parameters α and β based on the degree distribution in each of the modalities. We showed that for small k (consuming most of the probability mass) and fixed α (or β), the value of ln(p k ) decreases nearly linearly with ln k. The experiment presented in this paper proved that computing α does not depend on the β value and vice versa not only in theory, but also in practice. The sampling for linear regression models can simply be parallelized for more efficient computations. We also found out that the bouncing parameter γ was linearly correlated with Newman's optimal modularity. Experiments made on real-world graphs showed that from these theoretical relationships, the CSIU model parameters α, β, and γ could be extracted quite well.
An in-depth analysis of the CSUI model provides an essential guide to future research concerning creating disconnected graphs. In general, it is a hard problem, and to simplify it, we moded the giant component of the analyzed graph. Although the CSUI model can produce disconnected graphs by first initializing step, it can only merge disconnected components and does not produce (divide) new components, as it happens in real-world networks.

Conflicts of Interest:
The author declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: CSUIM Cold Start User-Item Model PDF Probability density function PA Preferential attachment UA Uniform attachment m The initial number of edges, the initial number of vertices is 2m δ The probability that a new vertex v added to a graph in the iteration t is a user v ∈ U, so 1 − δ means probability that v is an item, v ∈ I d u The number of edges added from the vertex of user type in one iteration (number of items bought by a single new user) d v The number of edges added from the vertex of item type in one iteration (number of users who bought the same new item) α The probability of item preferential attachment, 1 − α -the probability of item uniform attachment β The probability of user preferential attachment, 1 − β -the probability of user uniform attachment γ The fraction of edges attached in a preferential way which were created using the bouncing mechanism η η = d u δ + (1 − δ)d v is the average number of edges attached in one iteration