# Benford Networks

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Formal Definitions

**Definition**

**1.**

**Remark**

**1.**

**Definition**

**2.**

**Remark**

**2.**

**Definition**

**3.**

**Definition**

**4.**

**Definition**

**5.**

**Definition**

**6.**

**Definition**

**7.**

**Remark**

**3.**

**Definition**

**8.**

**Definition**

**9.**

**Definition**

**10.**

**Remark**

**4.**

**Remark**

**5.**

## 3. Algorithms for Simulating BNs

#### 3.1. A Fast Algorithm for a BN with Maximal/Minimal Assortativity

`1. initialize a network with N nodes and 0 edges`

`2. assign each node its degree so as to fullfill the BL`

`3. Unil each degree is reached:`

`select the beginning and end of each edge`

`1. create an NxN matrix A with each element equal to 0`

`2. create a vector v of length N storing the degree of each node`

`3. Until each degree is reached:`

`select i, j, and set A(i,j)=A(j,i)=1`

`1. create an NxN matrix A with each element equal to 0`

`2. create a vector v of length N`

`assigning the degree in descending order`

`3. for each node i=1,\ldots,N`

`until its node degree v(i) is reached:`

`match the other end j of each edge`

`with the first available node`

**Remark**

**6.**

**Remark**

**7.**

**Remark**

**8.**

**Remark**

**9.**

#### 3.2. The BN as a Function of the Density of the Network

#### 3.2.1. Analysis of the Range of Densities of BNs

#### 3.2.2. Rewiring Algorithm

`1. start from a random seed network with the due density`

`2. while the network is not a BN`

`(or the maximal number of trial is reached)`

`2.a select a link for the rewire`

`2.b if the rewire produces a network closer to a BN:`

`then accept the rewire`

`otherwise skip`

`end`

`3. store the distances from a BN`

`4. report the data in a figure`

**Remark**

**10.**

**Remark**

**11.**

`1. select two links of a BN network`

`2. if the swap increases (decreases) the assortativity,`

`then accept the swap`

#### 3.2.3. An Intermediate Algorithm for the Immediate Construction of a BN and Random Rewiring

## 4. A New Definition of the Distance to a BN

**Definition**

**11.**

**Remark**

**12.**

**Remark**

**13.**

**Remark**

**14.**

**Remark**

**15.**

## 5. Discussion and Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Barabási, A.L. Network Science; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
- Ausloos, M.; Herteliu, C.; Ileanu, B. Breakdown of Benford’s law for birth data. Phys. A Stat. Mech. Appl.
**2015**, 419, 736–745. [Google Scholar] [CrossRef] - Belluzzo, T. Benford’s Law. GitHub. 2022. Available online: https://github.com/TommasoBelluzzo/BenfordLaw (accessed on 31 May 2022).
- Hassler, U.; Hosseinkouchack, M. Testing the Newcomb-Benford Law: Experimental evidence. Appl. Econ. Lett.
**2019**, 26, 1762–1769. [Google Scholar] [CrossRef] - Morzy, M.; Kajdanowicz, T.; Szymański, B.K. Benford’s Distribution in Complex Networks. Sci. Rep.
**2016**, 6, 34917. [Google Scholar] [CrossRef] [PubMed] - Morzy, M.; Kazienko, P.; Kajdanowicz, T. Priority rank model for social network generation. In Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), San Francisco, CA, USA, 18–21 August 2016; pp. 315–318. [Google Scholar]
- Cerqueti, R.; Lupi, C. Some New Tests of Conformity with Benford’s Law. Stats
**2021**, 4, 745–761. [Google Scholar] [CrossRef] - Cerqueti, R.; Maggi, M. Data validity and statistical conformity with Benford’s Law. Chaos Solitons Fractals
**2021**, 144, 110740. [Google Scholar] [CrossRef] - Nigrini, M.J. Benford’s Law: Assessing Conformity. In Forensic Analytics: Methods and Techniques for Forensic Accounting Investigations; 2012; Available online: https://onlinelibrary.wiley.com/doi/10.1002/9781118386798.ch6 (accessed on 1 June 2022).
- Angeles, M.; Espino-Gamez, A. Comparison of methods Hamming Distance, Jaro, and Monge-Elkan. In Proceedings of the International Conference on Advances in Databases, Knowledge, and Data Applications, DBKDA 2015, Rome, Italy, 24–29 May 2015. [Google Scholar]
- Chaabi, Y.; Allah, F. Amazigh spell checker using Damerau-Levenshtein algorithm and N-gram. J. King Saud Univ. Comput. Inf. Sci.
**2021**, 34, 6116–6124. [Google Scholar] [CrossRef] - Jimenez, S.; Becerra, C.; Gelbukh, A.; Gonzalez, F. Generalized Mongue-Elkan Method for Approximate Text String Comparison. In Computational Linguistics and Intelligent Text Processing. CICLing 2009; Gelbukh, A., Ed.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2009; Volume 5449. [Google Scholar]
- Kashefi, O.; Sharifi, M.; Minaie, B. A novel string distance metric for ranking Persian respelling suggestions. Nat. Lang. Eng.
**2013**, 19, 259–284. [Google Scholar] [CrossRef] - Nishimura, K.; Nishimori, H.; Ochoa, A.J.; Katzgraber, H.G. Retrieving the ground state of spin glasses using thermal noise: Performance of quantum annealing at finite temperatures. Phys. Rev. E
**2016**, 94, 032105. [Google Scholar] [CrossRef] - Emmert-Streib, F.; Matthias Dehmer, M.; Shi, Y. Fifty years of graph matching, network alignment and network comparison. Inf. Sci.
**2016**, 346–347, 180–197. [Google Scholar] [CrossRef] - Gao, X.; Xiao, B.; Tao, D.; Li, X. A survey of graph edit distance. Pattern Anal. Appl.
**2010**, 13, 113–129. [Google Scholar] [CrossRef] - Li, T.; Dong, H.; Shi, Y.; Dehmer, M. A comparative analysis of new graph distance measures and graph edit distance. Inf. Sci.
**2017**, 403–404, 15–21. [Google Scholar] [CrossRef] - Bougleuxa, S.; Bruna, L.; Carletti, V.; Foggia, P.; Gaüzére, B.; Vento, M. Graph Edit Distance as a Quadratic Assignment Problem. Pattern Recognit. Lett.
**2016**, 87, 38–46. [Google Scholar] [CrossRef] - Shimada, Y.; Hirata, Y.; Ikeguchi, T.; Aihara, K. Graph distance for complex networks. Sci. Rep.
**2016**, 6, 34944. [Google Scholar] [CrossRef] [PubMed] - Wegner, A.; Ospina-Forero, L.; Gaunt, R.; Deane, C.; Reinert, G. Identifying networks with common organizational principles. J. Complex Netw.
**2018**, 6, 887–913. [Google Scholar] [CrossRef] - Jaccard, P. The Distribution of the Flora in the Alpine Zone.1. New Phytol.
**1912**, 11, 37–50. [Google Scholar] [CrossRef] - Kosub, S. A note on the triangle inequality for the Jaccard distance. Pattern Recognit. Lett.
**2019**, 120, 36–38. [Google Scholar] [CrossRef] - Leskovec, J. Stanford Large Network Dataset Collection Repository. Available online: https://snap.stanford.edu/data/index.html#citnets (accessed on 1 June 2022).
- Leskovec, J.; Kleinberg, J.; Faloutsos, C. Graph Evolution: Densification and Shrinking Diameters. ACM Trans. Knowl. Discov. Data
**2007**, 1, 2-es. [Google Scholar] [CrossRef] - McAuley, J.; Leskovec, J. Learning to Discover Social Circles in Ego Networks. In Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012, Lake Tahoe, NV, USA, 3–6 December 2012. [Google Scholar]
- Wilson, A.W. On Semi-Metric Spaces. Am. J. Math.
**1931**, 53, 361–373. [Google Scholar] [CrossRef] - Holst, E.; Thyregod, P.; Wilrich, P. On Conformity Testing and the Use of Two Stage Procedures. Int. Stat. Rev. Int. Stat.
**2001**, 69, 419–432. [Google Scholar] [CrossRef] - Arnold, B.C. Pareto Distributions. In Monographs on Statistics and Applied Probability, 2nd ed.; CRC Press: Boca Raton, FL, USA, 2015; Volume 140. [Google Scholar]
- Wilke, C.; Altmeyer, S.; Martinetz, T. Large-scale evolution and extinction in a hierarchically structured environment. arXiv
**1998**, arXiv:adap-org/9803001. [Google Scholar]

**Figure 1.**Example of the histogram of the node degree of datasets from SNAP (collaboration networks from ArXiv (Astrophysics (AstroPh), Condensed Matter (CondMat), General Relativity (GrQc), High Energy Physics (HepPh), High Energy Physics Theory (HepTh)) and from Facebook).

**Figure 2.**Maximal assortative network provided by our proposed fast algorithm. Nodes with a similar degree tend to be connected among themselves. In the figure, this is very evident in the group of 1-connected units (nodes ranging from 71 to 100), in the group of 2-connected units (nodes ranging from 53 to 70), in the group of 3-connected units (nodes ranging from 41 to 52), and in the group of 4-connected units (nodes ranging from 31 to 40).

**Figure 3.**Example of the constraint on the number of edges: if node a has four edges, then node b must have four edges. Node a cannot have three edges if all the other nodes have four edges.

**Figure 4.**Minimal assortative network provided by the fast algorithm. Nodes 1, 2, and 3 have degree 9 and are connected to a total of 27 of the 1-connected units. There are a total of 30 units with only 1 connection, meaning that node 4, which is the last to have degree 9, can connect to the last three 1-connected units and be connected to the other six nodes of the group of 2-connected nodes. Therefore, the figure does not show a star for node 4. The next group of nodes (from 5 to 9) have eight connections and are linked first to the 2-connected nodes, then to the 3-connected,…, etc.

**Figure 5.**Figure corresponding to Table 4. The mean assortativity is shown as a function of the density. The error bars show the distance between the minimal and maximal assortativity.

Leading Digit | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|

$p(\xb7)$ | 30.1% | 17.6% | 12.5% | 9.7% | 7.9% | 6.7% | 5.8% | 5.1% | 4.6% |

**Table 2.**Analysis of the datasets: the first columns report the description and number of edges, while the last column shows the distance $d(\xb7,BN)$.

Description | Nodes | Edges | $\mathit{d}(\xb7,\mathbf{BN})$ |
---|---|---|---|

Astro Physics | 18,772 | 198,110 | 0.3725 |

Condensed Matter | 23,133 | 93,497 | 0.5009 |

General Relativity | 5242 | 14,496 | 0.8502 |

High Energy Physics | 12,008 | 118,521 | 0.5657 |

High Energy Physics Theory | 9877 | 25,998 | 0.7923 |

Facebook 2 | 1034 | 54015 | 0.0907 |

Leading Digit | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|

number of nodes | 30 | 18 | 12 | 10 | 8 | 7 | 6 | 5 | 4 |

**Table 4.**Densities calculated as percentages of the total number of links used to run the simulations on rewiring to achieve a BN. The first percentages differ by only 0.01 in order to fine-tune the threshold of the BN. The values above 0.1 differ by 0.1 because the increase in distance from a BN follows relatively stable path. The minimal, maximal, and average distance from a BN are shown, and correspond to the plot in Figure 5.

density | 0.01 | 0.02 | 0.03 | 0.034 | 0.04 | 0.05 | 0.06 | 0.07 | 0.08 | 0.09 |

mean ass. | 0 | 0 | 0.013 | 0.03 | 0.05 | 0.045 | 0.004 | −0.011 | −0.016 | 0.151 |

min ass. | −0.167 | 0.006 | 0.006 | 0.00 | −0.117 | −0.105 | −0.179 | −0.052 | 0.00 | 0.00 |

max ass. | 0 | 0.027 | 0.191 | 0.070 | 0.36 | 0.385 | 0.215 | 0.484 | 0.314 | 0.289 |

density | 0.1 | 0.2 | 0.3 | 0.4 | 0.436 | |||||

mean ass. | −0.087 | 0.002 | −0.022 | 0.003 | 0.00 | |||||

min ass. | −0.118 | −0.05 | −0.147 | −0.210 | −0.085 | |||||

max ass. | 0.136 | 0.191 | 0.162 | 0.171 | 0.020 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

de Kok, R.; Rotundo, G.
Benford Networks. *Stats* **2022**, *5*, 934-947.
https://doi.org/10.3390/stats5040054

**AMA Style**

de Kok R, Rotundo G.
Benford Networks. *Stats*. 2022; 5(4):934-947.
https://doi.org/10.3390/stats5040054

**Chicago/Turabian Style**

de Kok, Roeland, and Giulia Rotundo.
2022. "Benford Networks" *Stats* 5, no. 4: 934-947.
https://doi.org/10.3390/stats5040054