Adaptation of Random Binomial Graphs for Testing Network Flow Problems Algorithms

: Algorithms for network flow problems, such as maximum flow, minimum cost flow, and multi ‐ commodity flow problems, are continuously developed and improved, and so, random network generators become indispensable to simulate the functionality and to test the correctness and the execution speed of these algorithms. For this purpose, in this paper, the well ‐ known Erd ő s–Rényi model is adapted to generate random flow (transportation) networks. The developed algorithm is fast and based on the natural property of the flow that can be decomposed into di ‐ rected elementary s ‐ t paths and cycles. So, the proposed algorithm can be used to quickly build a vast number of networks as well as large ‐ scale networks especially designed for s ‐ t flows.


Introduction
There are many applications of the network flow problems, e.g., electrical, water, or gas supply networks, vehicle routing and transportation, wireless networks, data mining, airline scheduling, project selection, image segmentation, network reliability, multi-camera scene reconstruction, security of statistical data, gene function prediction, open-pit mining, distributed computing, network connectivity, network intrusion detection, finance models, baseball elimination, etc. [1,2]. Algorithms for network flow problems are continuously developed and improved. Consequently, it is very important to have a tool for creating networks for testing the correctness and to compare the execution time of the new algorithms with the existing ones.
In the literature, a few methods used to build random graphs are proposed. Erdõs and Rényi introduced random binomial graphs in [3]. These random graphs are generated based on the values of two parameters: n (the number of nodes) and p  [0,1] (the probability of introducing any edge in the graph). These kinds of random networks have been applied for Zagreb indices, general sum-connectivity index, general inverse sum indeg index, and general first geometric-arithmetic index [4]. In a network generated in this manner, there is the possibility that the source will poorly communicate to the sink or even not communicate at all. An algorithm for generating simple random graphs with a given degree sequence was developed in [5]. Using this algorithm, an asymptotically uniform random graph with a given degree sequence is very quickly generated (almost linear time). In Reference [6], Barabási and Albert introduced their model (BA) consisting of an algorithm based on the preferential attachment mechanism for generating random scale-free networks. The networks generated this manner have real application on the Internet, citation networks, the World Wide Web, and some social networks. The algorithm starts with a network having m0 given nodes. Sequentially, nodes are introduced into the network. Each of these newly added nodes is connected to m ≤ m0 existing nodes using a given probability that is proportional to the number of connections that the pre-viously added nodes already had. The probability pi of connecting a new node to node i is: where ki is the so-called degree of node i. The denominator from Equation (1) is twice the existing number of edges from the network. Penrose, M. introduced a so-called random geometric graph (RGG), which is an undirected geometric graph with randomly sampled nodes. To generate such a graph, a uniform distribution of the underlying space [0, 1) d is used, where d is the dimension of the space [7]. The idea behind generating a RGG is that two nodes are linked only if the distance between them is less than a given parameter r  (0, 1). Therefore, r and n give the way a RGG is generated. Very recently, RGGs have been successfully applied in nanomaterials [8]. Waxman generalized RGGs by considering a probabilistic connection function [9].
Considering the fact that the existing results from the literature about network graphs are dealing with specific graphs that are not general enough, or not suitable for network flow problems, in this paper, a new idea for generating random networks is proposed that has the advantages of being fast and based on the natural property of the flow that can be decomposed into directed elementary paths and cycles. Consequently, the networks generated in this manner are suitable for testing the correctness and the time efficiency of algorithms for network flow problems such as minimum cost flow, maximum flow, multi-commodity flow problem, etc. The maximum flow problem is to find a flow from source to sink having the maximum possible value. Very recently, better and better algorithms were developed to solve this problem [10][11][12]. Together with maximum flow, the minimum cut can also be calculated [13]. The minimum cost flow is to find a flow having minimum cost from supply nodes to demand nodes. Recently, the best-known algorithm was developed for solving this problem [14]. The multi-commodity flow problem uses flow demands, or multiple commodities between different source nodes and sink nodes. The best currently known algorithm to solve this problem is from Karakostas [15]. There exist other flow problems of which the algorithms can still be improved, e.g., the inverse generalized maximum flow problems under sum-type distances, which are proved to be NP-hard [16].
In this paper, the Erdős-Rényi model is adapted to generate random flow networks. The paper is organized as follows. The flow decomposition into directed s-t path and directed cycles is presented in Section 2. An algorithm for generating random networks is deduced. The more general case of networks having multiple sources and sinks is also studied. The algorithm is tested both on CPU and CUDA in Section 3. In Section 4, some conclusions are discussed.

Flow Decomposition into Elementary Flows
Let G = (V, E, s, t, u, c) be an s-t directed network. V is a set containing n > 0 vertices (nodes), and E is a set of m ≥ 0 so-called arcs (directed edges); each arc a = (i, j)  E connects two nodes i and j from V, s is a special node called source, and t is a node called sink. In G, we define the capacity function : → * and, respectively, the cost function where 0 is the so-called the value of the flow f.
A feasible flow f can be decomposed into two feasible flows, f1 and f2, and is denoted by A directed cycle is a directed path for which the first node is equal to the last one, i.e., C = (u1, u2, …, uk = u1) is a directed cycle. A directed cycle is elementary if it does not pass a node twice, except for the first node. A flow f is called elementary if it is 0 on all the arcs of the network except for the arcs of a directed s-t path or of a directed cycle, where it is equal to a value v > 0.
In Reference [13] the following flow theorem is presented: Proof of Theorem 1. The proof can be found in [13]. □ So, a direct consequence of Theorem 1 is that a flow can be decomposed into at most n + m elementary flows.
To illustrate the idea behind Theorem 1, in Figure 1, we present a flow f in a network G. The flow f is feasible since it satisfies both the conditions (2) and (3). The value of the flow f is vf = 5. One possible decomposition of the flow f into elementary flows is given by the f1, f2, and f3 flows corresponding, respectively, to the paths P1 = (1, 2, 5), P2 = (1,3,4,5), and the cycle C = (2,3,4,2). The value of the flow f1 is 3 and is equal to the value on the path P1. The value of the flow f2 is 2 and is equal to the value on the path P2. The value of the flow f3 is 0, but the value on the cycle C is equal to 5.

Algorithm for Generating Random s-t Flow Networks
Correctness and time efficiency comparisons of algorithms for network flow problems are important when new methods are elaborated. To do that, a fast and reliable tool is needed to generate random networks, starting with simple ones and continuing with large-scale networks. We develop a method based on the Erdős-Rényi model using the idea of Theorem 1 to create such a tool. Since a flow can be decomposed into elementary flows, a natural approach is to generate random directed elementary s-t paths and cycles.
We present now a first algorithm (Algorithm 1) to generate a random directed elementary s-t path: In ARDEP1, without restricting the generality of the algorithm, we consider the source's index equal to 0, and n-1 as the index of the sink node t. The algorithm builds a path starting from s. At each iteration, a new node that was not previously added to the path is randomly selected and pushed at the end of the path. Each time a new node v is pushed back to the path, the arc (u, v) is added to the network, i.e., the value of the adjacency matrix ma is set to 1 on the position (u, v), where u is the node previously added to the path. The algorithm ends when the sink node is added to the path.
Next, we present Algorithm 2 to generate a random directed elementary cycle: /* choose a random node u0 */ u0 = random(0, n-1); /* only node u0 is initially part of the cycle */ for each node j other than u0 do cyclenode[j] = false; end for; cyclenode[s] = true; /* build the random cycle */ u = u0; for j = 0 to n-1 do /* choose a random index k of the next node to be added to the cycle */ k = random(0, n-j-1); l = 0; /* find node v as the k-th node out of the not before chosen nodes */ In ARDEC1, a cycle is built starting with a randomly chosen node u0. At each iteration, a new node that is not already part of the cycle is randomly selected and added to the cycle. Each time a new node v is introduced into the cycle, the arc (u, v) is also added to the network, where u is the node previously added to the cycle. The algorithm ends when the node u0 is added again to the cycle.
The algorithms ARDEP1 and ARDEC1 can naturally build directed elementary s-t paths and cycles. Their time complexity is obviously O(n 2 ). These two algorithms could be used together to build random networks. However, we shall present a faster approach below.
Richard Durstenfeld proposes an algorithm to randomly generate a permutation [17]. In Algorithm 3, we propose a similar but simpler approach to generate a shuffled vector of nodes having the indexes between istart and iend: ASVN starts with a vector having all the nodes with indexes from istart to iend. Then, this vector is shuffled by two randomly chosen nodes from the vector and by interchanging their positions. These interchanges are executed n times, where n is the length of the vector. We have the following theorem that proves the quality of the obtained shuffled vector: Theorem 2. Using ASVN, any vector of the nodes randomly generated by ASVN has equal probability to be generated.
Proof of Theorem 2. Let us suppose we have n values that have to be generated using ASVN. The initial vector of nodes contains n distinct values. There are n random swapping operations applied to the vector. We shall prove that any permutation of the initial values can be obtained this way.
Let p = (p1, p2, …., pn) be a permutation of the initial values istart, istart+1, …., iend. The following algorithm (Agorithm 4) transforms the initial vector nodes into p. Using ANP, there are n swapping operations that transform nodes into p. ASVN performs n random swapping operations to nodes. So, there always is a chance for ASVN to generate p from nodes. The probability to generate p from nodes is ! using ASVN, and, since the total number of possible permutations is n!, it results that any permutation of the vector nodes has an equal chance to be generated using ASVN. □ We now introduce two new methods to randomly generate directed elementary s-t paths and cycles using ASVN.

Algorithm 5. Algorithm Random s-t Directed Elementary Path v2 (ARDEP2)
/* efficiently generate a shuffled vector of nodes without s and t */ ASVN(1, n-2); s = 0; t = n -1; /* randomly generate the length of the path */ lpath = random(2, n); /* add the arcs given by the first lpath nodes of the shuffled vector to the network */ m_ma[s][nodes [1] In Algorithm 5, first, ARDEP2 randomly generates the length of the path. lpath-2 nodes are then taken from the shuffled vector of nodes, and together with source and sink, generate the path. In Algorithm 6, ARDEC2 takes lcycle nodes from the shuffled vector of nodes and generates a cycle.
Below, we introduce Algorithm 7 for generating a random flow network.

Algorithm 7. Algorithm Generating Random s-t Flow Network (AGRFN)
Input: p, npath, ncycle, minu, maxu, minc, maxc; /* generate "npath" random paths */ for k = 1 to npath do ARDEP2; end for; /* generate "ncycle" random cycles */ for k = 1 to ncyle do ARDEC2; end for; /* generate the adjacency lists "la" using the adjacency matrix "ma" */ for i = 0 to n do la[i] = null; end for; /* randomly attach capacities and costs to the arcs when they are added to "la" */ for i = 0 to n do for j = 0 to n do / Before starting AGRFN, the adjacency matrix ma is set to 0. The algorithm builds ma and then the adjacency lists la using ma.
After the directed s-t paths and directed cycles are built, arcs are randomly added to the network using the Erdős-Rényi model. According to this model, the probability of adding a new arc is p  [0,1]. Consequently, in AGRFN, for each pair of nodes (i, j), i ≠ j, so that ma[i, j] = 0, i.e., (i, j) is not currently an arc in the network, a random integer number is generated in the interval [0, 999] using the function random, and if this value is less then p • 1000, the arc (i, j) is added to the network.
The capacities of the arcs are randomly generated between the given values minu and maxu. The costs on the arcs are also randomly generated between minc and maxc. There are more parameters for some flow problems such as lower bounds [13,18], modification limits for capacities [19,20], arc resistance [21,22], or gain factor [16,23]. These values can also be randomly generated on arcs.

Theorem 3. The time complexity of AGRFN is O (n • max{npath, ncycle, n}).
Proof of Theorem 3. For the time complexity of generating an s-t path or a cycle using ARDEP2, respectively, ARDEC2 is O(n). Consequently, the adjacency matrix ma is generated in O(max{npath, ncycle} • n), and since generating the adjacency lists takes O(n 2 ) time, the time complexity of the algorithm is O(n • max{npath, ncycle, n}). □ Usually, it is enough to consider the number of paths and the number of cycles less than the number of nodes. So, in practice, the time complexity is likely to be O(n 2 ).
The time complexity from Theorem 3 can be improved if the generation of the paths, cycles, and the adjacency lists are parallelized. The computations from the algorithm are elementary and they only involve integer values. So, AGRFN can be naturally parallelized on GPUs. Since the speed of generating of large-scale random networks is essential, time complexity improvement by parallelization can act an important role. Considering a total of g GPUs, the generation of the paths and cycles can be divided into max{1, (npath+ncycle)/g} groups. The generation of the adjacency lists can also be divided into max{1, n/g} groups. So, the time complexity of the parallel implementation on GPUs of AGRFN is O(n • max{npath, ncycle, n}/g).
The C++ source code for generating random networks using AGRFN can be found in Appendix A (Source code S1).

The Case of Multiple Sources and Multiple Sinks
There are situations when networks having multiple source and sink nodes have to be generated. We shall show how AGRFN can be adapted for this more general case.
Let G = (V, E, S, T, u, c) be a directed network, where S = {s1, s2, …, sns} is the set of ns ≥ 1 sources and T = {t1, t2, …, tnt} is the set of nt ≥ 1 sink nodes. G is equivalent to an s-t network G' = (V', E', s, t, u', c') by introducing a super-source s, a super-sink t, and the arcs (s, si) and (tj, t), where si  S, i  {1, 2, …, ns}, and tj  T, j  {1, 2, …, nt} [13]. The capacities and costs for the newly introduced arcs are irrelevant at this point. Using AGRFN, a random network G' is built. In the end, the arcs (s, si) and (tj, t), where si  S, i  {1, 2, …, ns}, and tj  T, j  {1, 2, …, nt} are eliminated together with the nodes s and t, so that a random network G with multiple source and sink nodes is randomly generated. [t] = 1; end for; /* call AGRFN to generate a random netowrk */ AGRFN(p, npath, ncycle, minu, maxu, minc, maxc); /* come back to the initial network having multiple sources and sinks */ In Algorithm 8, the nodes s and t are eliminated from V, and the adjacency matrix ma is modified accordingly.
It is obvious that the time complexity of AGRFNMSS is the same as for AGRFN.

Results and Discussions
In Figure 3, three networks having 6, 20, and 100 nodes, respectively, were generated and displayed. For the first network, 3 paths and 2 cycles were generated. For the second network, 10 paths and 2 cycles were generated, and for the last network, 20 paths and 10 cycles were generated.
Different tests were performed to illustrate the generating time of increasing the scale of random networks having the number of nodes between 10 and 10,000. As expected, and as shown in Table 1, the number of nodes together with the number of considered paths and cycles directly influence the speed of the network generation. An Asus ROG Strix G17 G712LV, Intel Core i7-10750H up to 5.10 GHz processor, 16GB RAM, NVIDIA GeForce RTX 2060 6GB GDDR6 with 1920 CUDA cores was used. The tests showed that the usage of parallelization becomes more and more effective with the increase of the dimension of the networks. The parallelization was implemented using CUDA programming on GPU. Each path and cycle were created on a different thread. Additionally, the creation of adjacency lists from the adjacency matrix was parallelized, the list for each node being obtained on a different thread. For small networks (less than 50 nodes) it is better to use the implementation of the algorithm on CPU, but when the number of the nodes of the networks is more than 50, the CUDA implementation is preferred resulting in a clear speed-up, up to 19 times faster than the CPU implementation. The speed-up was calculated as the ratio between CUDA and CPU execution times. The best speed-up was obtained for large-scale networks having thousands of nodes (Table  1).  In Figure 4, the speed-up evolution for generating networks of different dimensions is presented.

Conclusions
We developed a fast and reliable algorithm called AGRFN to randomly generate networks. The resulted networks can be used to test the correctness and efficiency of algorithms developed for network flow problems, e.g., minimum cost flow, maximum flow, or multi-commodity flow problems. The CUDA parallelized version of AGRFN proved to be up to 19 times faster when large-scale networks need to be generated.
Considering further developments, other problems in specific networks could be identified in which AGRFN can be adapted.