Previous Article in Journal
A Spatial Planning Model for Obnoxious Facilities with Spatially Informed Constraints
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Benford’s Law and Transport Infrastructure: The Analysis of the Main Road Network’s Higher-Level Segments in the EU

by
Monika Ivanova
1,*,
Erika Feckova Skrabulakova
2,
Ales Jandera
2,
Zuzana Sarosiova
2 and
Tomas Skovranek
2
1
Faculty of Humanities and Natural Sciences, University of Presov, 17. Novembra 1, 08116 Presov, Slovakia
2
Faculty of Mining, Ecology, Process Control and Geotechnologies, Technical University of Kosice, Nemcovej 3, 04200 Kosice, Slovakia
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2025, 14(11), 450; https://doi.org/10.3390/ijgi14110450 (registering DOI)
Submission received: 19 September 2025 / Revised: 31 October 2025 / Accepted: 14 November 2025 / Published: 15 November 2025

Abstract

Benford’s Law, also known as the First-Digit Law, describes the non-uniform distribution of leading digits in many naturally occurring datasets. This phenomenon can be observed in data such as financial transactions, tax records, or demographic indicators, but the application of Benford’s Law to data from the field of transport infrastructure remains largely underexplored. As interest in using statistical distributions to identify spatial and regional patterns grows, this paper explores the applicability of Benford’s Law to anthropogenic geographic data, particularly whether the lengths of higher-level segments of the main road network across European Union member states follow Benford’s Law. To evaluate the conformity of the data from all European Union countries with Benford’s distribution, Pearson’s χ 2 test of association, the p-value, and the Kolmogorov–Smirnov test were used. The results consistently show low χ 2 values and high p-values, indicating a strong agreement between observed and expected distributions. The relationship between the distribution of higher-level segment lengths and the leading digits of these lengths was studied as well. The findings suggest that the length distribution of the main road networks’ higher-level segments closely follows Benford’s Law, emphasizing its potential as a simple yet effective tool for assessing the reliability and consistency of geographic and infrastructure datasets within the European context.

1. Introduction

Benford’s Law is a mathematical phenomenon that describes the counterintuitive distribution of leading digits in many naturally occurring datasets. According to this law, not all digits from 1 to 9 are equally likely to appear as the leading digit. The digit 1 appears as the leading digit in about 30% of cases, with the probability decreasing as the leading digit increases.
Although it might initially seem like a purely mathematical curiosity, Benford’s Law surprisingly often occurs in nature. Joannes-Boyau et al. [1] showed that travelled distances of hurricanes since the 1840s satisfy Benford’s Law. Sambridge et al. [2] demonstrated that leading-digit distributions of earthquake depths adhere closely to Benford’s Law. More recent studies confirmed compliance for earthquake magnitudes [3] and recurrence times of seismic events [4]. Benford’s Law is capable of identifying quantum phase transitions in a manner similar to its application in detecting earthquakes [5]. Consequently, despite their fundamentally different physical origins, both seismic activity and quantum cooperative phenomena can be analysed using comparable techniques. This law is evident across a wide range of astronomical datasets as well. The intensity of gamma rays detected on Earth by the Fermi space telescope and the rotational frequencies of spinning stellar remnants known as pulsars both satisfy Benford’s Law [6]. The distances of galaxies and stars satisfy Benford’s Law too [7], as well as the masses of exoplanets [2]. The law does not apply universally, yet it recurs in a remarkably wide range of datasets. Seismic signals produced by debris flows satisfy Benford’s Law, whereas those caused by ambient noise (e.g., seismic data from rockfalls) do not [8]. It is already well established that Benford’s Law also applies to many datasets created by humans, such as financial, economic, bank transactions [9], accounting records [10], tax data [11], gross domestic product figures [12], etc. Also, socio-geographical distances between cities [13] follow this statistical distribution. For an overview of applications and exercises, see, e.g., ref. [14] and the references therein.
The fact that Benford’s distribution appears in data from many different natural areas relates to the fact that nature constantly produces processes that span several orders of magnitude and exhibit multiplicative or exponential tendencies.
Recent studies have shown that Benford’s Law can reveal spatial and regional patterns in geographic datasets [15], but its application within the field of geographical information remains relatively underexplored [16]. In the context of spatial analysis, the significance of complying with Benford’s Law can be seen in its role as an indicator of the naturalness, quality, and integrity of spatial data. Compliance with Benford’s Law may indicate that the data in question were generated naturally rather than through artificial partitioning. The law helps in detecting spatial anomalies or inconsistencies in data processing. Benford’s Law can also serve as a diagnostic tool during data preprocessing prior to spatial modeling, machine learning, or geostatistical analyses. In contrast to well-established geographic principles such as Tobler’s laws [17,18,19], Benford’s Law has received limited attention regarding spatial data, particularly in the context of infrastructure and transport networks.
A notable exception demonstrating the applicability of Benford’s Law in a spatial context is the study by Kopczewska and Kopczewski [13], which examines the geographical distribution of cities and their populations. In work [13], the authors explore the detection of a hidden order that drives the observed spatial heterogeneity. They view the geographical system of cities as a complex, nonlinear, three-dimensional network that is neither random nor independent, but instead exhibits strong spatial heterogeneity. They explain this hidden order through the lens of Benford’s Law. Since Benford’s Law holds for natural datasets, they argue that spatial phenomena such as natural urbanization tend to conform to Benford’s distribution, whereas spatial processes governed by artificial organizational rules (e.g., political boundaries) deviate from it. By comparing the level of conformity to Benford’s distribution, the authors show that the mutual 3D socio-geographical distances between populations and cities in most countries follow Benford’s Law, indicating that city geolocations exhibit a natural spatial distribution. Based on an analysis of historical settlement patterns, they conclude that the geolocation of cities and inhabitants worldwide has followed an evolutionary process leading to a natural spatial arrangement consistent with Benford’s Law.
A comparison of the distribution of several numerical properties of geographic objects with Benford’s distribution is presented in the study by [16]. The author investigates whether the numerical attributes of geographic entities conform to Benford’s Law. The results show that the examined numerical properties correspond to Benford’s Law to a certain extent, with only small differences observed between different types of geographic objects. The spatial patterns of deviations from Benford’s Law are similar for some aspects, but differ considerably for others. This finding suggests that there is no general rule determining which numerical properties, geographic objects, or spatial regions comply with Benford’s Law. In this sense, a more detailed examination of Benford’s Law in the context of geographic information and across various datasets represents a logical step toward its future application.
A new line of research on Benford’s Law in the spatial context is also outlined by Fernandes, Ciardhuáin, and Antunes [20], who investigated whether network traffic data, such as packet sizes and connection counts, conform to Benford’s and Zipf’s laws. They found that integrating Benford’s test with several different metrics can effectively identify anomalies in data flow (e.g., cyberattacks). Although this does not concern road traffic infrastructure, the underlying principle is analogous to traffic flow.
In the broader context of spatial analysis, data applicability, and validation related to the use of Benford’s Law, several open questions (research gaps) remain that should be addressed in studies. Most existing works analyse the distribution of digits, but only rarely investigate whether deviations from Benford’s Law exhibit spatial autocorrelation. Such an approach could reveal spatial “hotspots” of unreliable data within a territory [1,16].
Kopczewska and Kopczewski [13] and Szabó et al. [21] have pointed out that the literature lacks reference values or “baseline” for Benfords’ distributions for different types of geographic data (e.g., road distances, parcel areas, traffic intensities). Without such benchmarks, it is difficult to interpret what constitutes a significant deviation.
Studies by Yang et al. [22] and Fernandes et al. [20] have shown that Benford’s distribution has been tested in time series data (e.g., sediment motion), but no research has yet examined the spatio-temporal behavior of Benford’s conformity.
Mansouri et al. [23] and Joannes-Boyau et al. [1] emphasised that Benfords’ tests are currently applied only to a single dataset at a time. In practice, however, traffic data originate from multiple sensors, GPS devices, and systems. There is also a lack of research applying Benford’s analysis to assess consistency between different sources of traffic data. Moreover, various studies employ different metrics, thresholds, and sample sizes.
In the context of spatial data, there is still no unified methodological framework defining when a dataset is sufficiently large and suitable for Benford’s analysis [23,24].
Despite the considerable number of research gaps that have emerged in recent years, we focused on addressing at least one of them–specifically, the gap highlighted by Kopczewska and Kopczewski [13]: the lack of studies examining the spatial application of Benford’s Law in the context of infrastructure and transportation networks. This research gap motivated us to investigate whether anthropogenic activities also generate datasets that follow this law. Our work was partly inspired by Mocnik’s study [16], in which the author argues for the relevance of research that analyses subsets of OpenStreetMap data to determine which parts conform to Benford’s Law and which do not.
In this paper, we examine whether the outcomes of anthropogenic geographic activities, specifically transport infrastructure, conform to Benford’s Law. The lengths of higher-level segments of the main road network of 27 European Union member states were used in the analysis. To assess whether the higher-level segment length data from all European Union countries conform to Benford’s distribution, the Pearson’s χ 2 test of association and p-value were employed. The findings reliably indicate low χ 2 values and high p-values, suggesting a strong alignment between observed and expected distributions. The Kolmogorov–Smirnov test was used to verify the observation. Moreover, the relationship between the higher-level segment lengths and the leading digits of these lengths was analysed.

2. Methodology

2.1. Benford’s Law

Benford’s Law describes the counterintuitive observation that in many naturally occurring datasets, the leading digit d { 1 , 2 , , 9 } is not uniformly distributed. Instead, smaller digits appear more frequently, with their occurrence probabilities decreasing logarithmically. Although Simon Newcomb was the first to observe this phenomenon in the 19th century [25], the law is named after physicist Frank Benford, who systematically studied this distribution across a wide range of datasets [26]. Consequently, it is also known as the Newcomb-Benford Law [27], or the First-Digit Law [28].
This phenomenon is quantified by the probability distribution formula [29]:
P d = log 10 1 + 1 d ,
where P d is the probability of occurrence of a number with the leading digit d as shown in Table 1.
Benford’s Law usually appears in datasets that span several orders of magnitude and are not constrained by artificially imposed boundaries or uniform scales. Although it is seemingly simple, it reveals a hidden order within the numerical chaos of the real world, even in areas where one might not expect it. A remarkable feature of Benford’s Law is that it is scale-invariant, which means that it holds regardless of the units in which the data are expressed. For example, whether financial transactions are recorded in dollars, euros, or yen, or distances are measured in meters or miles, the distribution of the leading digits remains consistent with Benford’s pattern [14]. This unit independence makes the law extremely powerful for cross-domain and cross-context analyses, as it ensures that the observed phenomenon reflects essential properties of the data rather than arbitrary choices of measurement. Its possible applications therefore offer a new perspective on the quality and consistency of available data, enabling robust detection of irregularities and hidden structures across diverse fields.

2.2. Data Extraction and Processing

The input dataset was obtained from the OpenStreetMap (OSM) portal [30], a collaborative project that provides freely accessible geographic information, using the Overpass API, which allows querying and extracting geospatial data according to specific attributes. In this study, all ways tagged as highway = motorway, highway = trunk, or highway = primary within the geographic boundaries of the European Union (EU) were selected. The query included all nodes determining each way, thereby capturing the full geometry of each road segment, including bridges and tunnels associated with these road types. The resulting data were exported in GeoJSON format [31], a widely supported structure that maintains both spatial and attribute information in a lightweight format, for subsequent processing.

2.2.1. Creation and Aggregation of Higher-Level Segments

Each way from OSM is divided into individual segments, defined as the straight lines between two consecutive nodes. A node is one of the core elements of OSM. It consists of a single point in space defined by its latitude, longitude, node identifier (ID), and optionally altitude.
To address the complexity and dynamic nature of OSM data, where single physical routes may be represented by multiple ways or where ways may have been split/merged over time a rigorous aggregation and cleaning process was implemented to form higher-level segments (HILSs):
1.
Initial HILS Aggregation (Based on OSM ID).
Raw OSM segments sharing the same OSM way ID were initially aggregated into pre-final HILS. This simplified aggregation served as the base segment unit.
2.
Contiguity and Classification Check.
The pre-final HILSs were subjected to a subsequent topological and functional validation. Pre-final HILSs were only further aggregated into the final HILS unit if they were geometrically contiguous and maintained a consistent functional classification (i.e., the same way tag) at their junction nodes. This step effectively filtered out segments that were incorrectly tagged or where the road type changed. This approach mitigated the issue of unstable OSM IDs by enforcing that each final HILS represents one logical, uninterrupted path of a specific functional road class.
3.
Noise and Edge Cases.
Minor segments or those forming topological noise (e.g., duplicated segments or segments with erroneous tagging) were identified through a spatial filtering routine and removed to ensure that the analysis focused only on the coherent main road network.
For the purposes of this study, all motorway, trunk, and primary road HILSs are collectively referred to as HILSs of the main road network. This terminology provided a clear and unambiguous definition of the set of roads under consideration, encompassing the highest-level road categories in OSM across the EU.

2.2.2. Length Computation and Projection Distortion

To compute the lengths of HILSs, the GeoJSON data were imported into MATLAB 2024b environment [32] using geotable structures, and filenames were sanitised for structured storage. The road segments’ geometries were processed using the Mapping Toolbox [33]. All geometries were projected into the Web Mercator coordinate reference system (EPSG:3857) to ensure that distance computations were performed in meters. Multi-part geometries were identified using NaN delimiters to separate coordinate sequences. For each contiguous segment, latitude and longitude coordinates were transformed into projected coordinates ( x , y ) , and the Euclidean distance between consecutive points was calculated as:
E i = ( x i + 1 x i ) 2 + ( y i + 1 y i ) 2 .
The total length of each segment was then obtained by summing these distances:
L = i E i ,
and recorded both in meters and in converted units (feet). The lengths of all segments of the same IDs were subsequently summed to obtain the total length of each HILS, and the results were stored in structured variables for subsequent statistical analysis. The same analysis was provided with data in meters and converted units.
Because the Web Mercator projection (EPSG:3857) was used to compute the lengths, a correction that introduces significant scale distortion at latitudes ( ϕ ) typical for the European continent ( 35 N to 70 N). The scale factor (k) that inflates lengths in Web Mercator is given by the secant of the latitude:
k = 1 cos ( ϕ )
Using the extracted HILS length data corrected by scale factor k, the leading digits of individual HILSs of the main road network for each EU member state were identified, and their frequency distribution was determined and converted into percentage values to enable a comparative analysis. The data processing workflow chart is depicted at Figure 1.

2.3. Metrics

In order to test the conformity of the road-length data from all EU countries with Benford’s distribution, the Pearson’s χ 2 test of association, the p-value, and the Kolmogorov–Smirnov test were used. More concretely, we test whether the observed frequency distribution ( O d ) of the leading digits of HILS lengths deviates significantly from the expected distribution ( E d ) derived from Benford’s Law.
The Pearson’s χ 2 test of association is a standard statistical method used to assess the goodness-of-fit between observed and expected distributions; thus, it is suitable for hypothesis testing [34]. To calculate the χ 2 statistic, the following relation can be used [34]:
χ 2 = d = 1 c ( O d E d ) 2 E d ,
where O d stands for frequencies in the observed distribution, E d represents frequencies in the expected distribution, to which ones compares, and c denotes the number of categories (frequencies). Lower χ 2 values indicate better adherence to the expected distribution.
The p-value is a statistical measure used in hypothesis testing to determine the strength of evidence against a null hypothesis. It represents the probability that the observed results, or more extreme ones, could occur purely by chance, assuming the null hypothesis is true. The p-value, which indicates the difference between observed and expected frequencies, can be calculated using the relation [35]:
p - value = 1 F χ 2 χ 2 ; f ,
where F χ 2 is the cumulative distribution function of the χ 2 distribution and f represents the number of degrees of freedom. This function determines the probability that a χ 2 distributed random variable is less than or equal to χ 2 . The cumulative distribution function of the χ 2 distribution [36] is incorporated in most statistical software.
The Kolmogorov–Smirnov test (K-S test) is a nonparametric statistical procedure commonly employed for comparing distributions [37]. The one-sample K-S test was applied to assess whether the data deviate from a specified theoretical distribution by comparing the empirical distribution function of the sample with the expected distribution function of that theoretical model.

3. Experiments and Discussion on the Results

The χ 2 statistic, supported by the p-value, and K-S test were evaluated in this study. Furthermore, to gain a more comprehensive understanding of the relationship between the distribution of the leading digits of HILS lengths of the main road network and the actual lengths of these HILSs, a graphical comparison and assessment were conducted.

3.1. χ 2 Statistic, Kolmogorov-Smirnov Test, and p-Value

To perform hypothesis testing, the following null hypothesis was formulated:
“The leading digits of the HILSs of the main road network of EU member states follow Benford’s Law.”
In order to evaluate the hypothesis, a χ 2 statistic, K-S test, and p-value were calculated at a 0.05 significance level. Any deviations from the expected distribution were analysed and interpreted.
Calculating the p-value in relation to the χ 2 statistic is essential for deciding whether to accept or reject the null hypothesis; if p - value 0.05 , the data are consistent with Benford’s Law, and the null hypothesis is not rejected, while if p - value < 0.05 , the data are not consistent with Benford’s Law, and the null hypothesis is rejected.
One could argue that the χ 2 test performs well mainly with small to medium sample sizes, as it becomes very sensitive with extremely large datasets (as is the case here). In such cases, even minor deviations can lead to the rejection of the null hypothesis, even if they are practically insignificant. Therefore, for large datasets, alternative measures such as the Euclidean distance [38], Mean Absolute Deviation [39], or Kolmogorov-Smirnov test [37] are often recommended. These metrics assess the magnitude of deviation between the empirical and theoretical distributions [40]. However, they do not provide a formal statistical test (such as the one that produces a p-value), and it is not always clear what constitutes an acceptable small deviation from Benford’s Law. This lack of clarity was a decisive factor in choosing the χ 2 test with the p-value in this study. However, in an effort to obtain additional validation of the reliability of the proposed claims, to verify results, the one-sample Kolmogorov–Smirnov test was used as well.
In the initial stage of the experiment, the subset data_distrib, which contained information on the frequency distributions of the HILSs of the main road networks for all 27 EU member states, was visualised using bar charts (see Figure 2, Figure 3 and Figure 4), offering a clear overview of the observed frequencies.
Subsequently, the χ 2 values were calculated using the data_distrib subset. For 8 degrees of freedom (since there are 9 possible leading digits, 1–9, minus 1 degree of freedom) and a significance level of 0.05 , the critical χ 2 value is approximately 15.507 [41], with lower χ 2 values indicating better conformity to the expected distribution, in this case Benford’s distribution.
The χ 2 statistic results listed in Table 2 confirmed a high level of conformity with Benford’s distribution across all 27 EU countries, as indicated by the χ 2 values ranging from 0.0449 to 1.5100. The lowest values were recorded for Belgium (0.0449), Poland (0.0511), Germany (0.0685), Austria (0.0918), Cyprus (0.0927), and the Netherlands (0.0952), representing a shift of two orders of magnitude compared to Malta (1.5100) and Portugal (1.1101), and a shift of one order of magnitude compared to 19 countries. Malta and Portugal were the only two countries with a χ 2 value greater than 1.
The p-value in the experiment ranged from 0.9829 to 1.0000, with 17 countries (Austria, Belgium, Croatia, Cyprus, Czechia, Estonia, Finland, France, Germany, Greece, Lithuania, Luxembourg, the Netherlands, Poland, Romania, Slovakia, Sweden) demonstrating such high conformity with Benford’s Law that the p-value did not differ from 1 (see Table 2).
The K-S test values ranged from 0.0071 to 0.0349. The lowest values were recorded for Cyprus (0.0071), Belgium (0.0074), the Netherlands (0.0082), and Poland (0.0097), while the values for the remaining states were considerably higher. The highest values were observed for Slovenia (0.0349), Malta (0.0333), and Portugal (0.0319). At the 0.05 significance level, the K-S test did not reveal a statistically significant difference between the empirical distribution of the leading digits in data_distrib and the Benford distribution, as indicated by the K-S test values listed in Table 2 and supported by the χ 2 statistics and corresponding p-values. These results demonstrate conformity with Benford’s Law.
The scale invariance of Benford’s Law has been tested as well. Because Benford’s Law is scale-invariant, the ideal expectation is that the empirical leading digit distribution remains effectively unchanged after unit conversion. Therefore we computed the leading digit histogram for both representations and quantify agreement with Benford’s distribution, which in both cases pass successfully this condition–see Figure 2, Figure 3 and Figure 4.

3.2. The Higher-Level Segments’ Length Distribution

The relationship between the distribution of the lengths of HILSs and the leading digits of these lengths was examined country by country. The situation in each country is illustrated in Figure 5, Figure 6 and Figure 7. The x-axis represents leading digits of the lengths of the main road networks’ HILSs, the y-axis shows the decimal logarithms of these lengths, and the z-axis indicates the frequencies of HILSs of a given length associated with a given leading digit.
From these figures, it is evident that in all observed countries, HILSs of the main road network with lengths of the order of tens of meters predominate in most, if not all, of the 9 categories defined by leading digits. The highest frequencies on the logarithmic scale occur between the values of 1 and 2, which corresponds to the order of tens of meters. This interpretation follows from the nature of the logarithmic axis, where a value of 1 represents 10 1 = 10 meters and a value of 2 represents 10 2 = 100 meters. The peaks of the curves, obtained as cross-sections of the 3D plots parallel to the R y z plane, therefore fall within the range of approximately 10 to 100 m, indicating that most of the observed HILS lengths in the respective category are of this magnitude. In Ireland (see Figure 6), however, there is a relatively high number of HILSs with lengths in the range of hundreds of meters, considering the total number of HILSs analysed in that country. This phenomenon, although to a lesser extent, can also be observed in Croatia (see Figure 6). Conversely, relatively few HILSs of the main road network with lengths in the hundreds of meters, relative to the total number of analysed HILSs, are particularly notable in Germany, Cyprus, and Luxembourg. This trend is also partially evident in Hungary, the Netherlands, Bulgaria, and Portugal. In countries where very short HILSs prevail and longer ones are rare, this may indicate a high density of the road network but also the risk of traffic being “transferred” across sections where longer direct routes are missing. Conversely, countries such as Ireland may have a higher proportion of long HILSs, which suggests faster interregional accessibility but weaker fine-grained connectivity at the local level.
Depending on the size of each country and the development of its transport infrastructure, the number of HILSs of the most frequent length typically ranges from several thousand to tens of thousands. The only exceptions are the island states Cyprus and Malta. The similarity between the shapes of charts corresponding to individual countries depicted in Figure 5, Figure 6 and Figure 7 is remarkably high, although the shape of the chart associated with Malta, Ireland, and Latvia slightly differs from the other profiles. Cross-sections of the 3D plots taken parallel to the R x z plane mimic Benford’s distribution, while those parallel to the R y z plane resemble a normal distribution. The fact that the lengths of HILSs of the main road network in the analysed countries follow Benford’s distribution suggests that the formation of main road networks gives rise to patterns typically observed in naturally occurring datasets, indicating a certain universal character of these processes across different geographical and institutional contexts.

3.3. Practical Applications of the Findings

While this paper observes that the lengths of HILSs in EU countries follow Benford’s Law, the underlying causal mechanism merits deeper discussion. Previous works have shown that Benford-like distributions commonly arise in datasets governed by multiplicative processes, scale invariance, and combination of heterogeneous influences [42,43]. In the case of road infrastructure, although segment lengths are affected by diverse factors (terrain, funding availability, urbanisation patterns, political decisions), these factors do not act additively, but rather interact multiplicatively over time. This aligns with HILSs demonstration that the product of independent random variables, even with arbitrary distribution, tends to converge in Benford’s distribution. Moreover, the planning and growth of road networks often occur across multiple spatial and temporal scales, without a natural unit of measurement, creating a scale-invariant context where Benford’s Law is statistically favored [44]. Thus, while no single deterministic model dictates road lengths, their emergence from a multi-factorial, multiplicative, and scale-free development process helps explain the convergence in Benford’s distribution.
According to above statements, HILSs demonstrate an inherent conformity with Benford’s Law, mainly because they meet the key requirement of multiplicative growth across various scales. This arises from the statistical regularity created by the interaction of multiple independent factors (policy-making, planning, investment, terrain) over time, which leads the HILSs’ length distribution towards a logarithmic, Benford-like pattern.
However, this natural conformity is often disrupted by external, human-influenced factors such as political decisions, intentional rounding practices, or legislative cutoffs. Therefore, the measurable deviation from Benford’s Law should not be seen as definitive proof of data fraud but rather as a helpful supplementary tool for identifying anomalies and maintaining quality. A high or consistent pattern of deviation in a specific nation often indicates the influence of systematic, human-imposed constraints that compromise the data’s inherent integrity. This deviation may point to a highly centralised, rigid planning regime, manifested through political decisions or legislative cutoffs, that introduces non-multiplicative limits on infrastructure development. Alternatively, these subtle national differences could signal nation-specific methodological issues in data collection or reporting, such as intentional rounding or specific cutoffs mandated by national agencies. In any case, the measurable deviation acts as a clear indicator of external human factors requiring further investigation into the specific causes, whether fraudulent manipulation, methodological issues, or specific national planning policies.
The discovery that the examined infrastructure elements of the transportation network conform to Benford’s Law is significant from several perspectives. If new or updated data on HILS lengths deviate from the expected distribution, this may indicate possible typographical errors during data entry, inconsistencies between different data sources (e.g., national vs. European registries), or potential data alterations (in project reports, official records, public procurement, planning documents, or for budgetary or project-related purposes). In this context, internal database consistency checks based on Benford’s Law offer a cost-effective way to flag potentially problematic data and help auditors focus their attention where it is most needed, without requiring manual inspection of all records.
Furthermore, Benford’s Law can serve as a quick quality control test. For instance, inaccurate length calculations may stem from the use of inappropriate algorithms, such as applying Euclidean distance instead of geodetic distance, or from incorrectly projected GPS coordinates without proper metric conversion. Road segment lengths can also be distorted due to sparse GPS point sampling, excessive filtering, or snapping processes. Since Benford’s Law allows data quality assessment, it thus reduces costs associated with later corrections.
Finally, in times when roads are extended, shortened, or reorganized, Benford’s Law can serve as a basis for detecting infrastructure change. A deviation from Benford’s Law conformity can signal structural changes in the network, allowing automatic identification of modifications or real-time monitoring of how construction activities impact the structure of the road network.
The analysis of length distribution can serve as a tool for identifying peripheral or insufficiently serviced regions. In regional strategies, it can then be determined whether it is more appropriate to supplement the network with shorter connecting segments, which improve accessibility for smaller settlements, or with longer routes, which provide more efficient long-distance connections.

4. Conclusions

Tobler’s laws of geography represent fundamental principles of spatial relationships. In contrast to these spatial principles, Benford’s Law focuses on the statistical distribution of numerical data and provides a novel perspective for analysing patterns within datasets, such as those related to transport infrastructure.
This research aims to bridge the gap between spatial theory and statistical data analysis by applying Benford’s Law to transport infrastructure data. It offers a new tool for validating data quality, identifying anomalies, and revealing underlying structural patterns in transport networks, which can ultimately support improved planning and decision-making processes.
The lengths of higher-level segments of the main road network in the 27 European Union member states were used in the analysis. To assess whether the higher-level segment length data of the main road network of all EU countries conform to Benford’s distribution, a χ 2 test was performed, and the resulting p-value was used to evaluate the significance of the results. The additional validation of the reliability of our results came from the Kolmogorov-Smirnov test.
Overall, it can be concluded that EU member states have relatively well-developed main road networks, although we recognise that their quality may vary between countries. The higher-level segments of the main road network in each EU state adhere very well to Benford’s Law.
Although the χ 2 test is known to be very sensitive, often leading to the rejection of the null hypothesis due to minor discrepancies, this was not the case in this study. In all examined instances, the test yielded low χ 2 values and p-values above the standard significance level (0.05). These results were supported by the Kolmogorov-Smirnov test, which indicates minimal differences between observed and expected frequencies, showing strong conformity with Benford’s Law. Therefore, there is no reason to reject the null hypothesis: “The leading digits of the HILSs of the main road network of EU member states follow Benford’s Law.” On the contrary, the results support its acceptance, as the empirical data align closely with the theoretical distribution predicted by Benford’s Law. This strong conformity holds even in the context of a large dataset, further reinforcing the robustness of the results and the reliability of the applied statistical methods.
In summary, the analysis reveals that despite national differences in the distribution of higher-level segment lengths, the overall structure of the main road networks across the observed countries exhibits a striking degree of similarity.
The findings of this study offer several practically applicable insights in the fields of transport policy, spatial planning, and geospatial data management. For policymakers and regional planners, the identification of universal, self-organizing patterns in the development of road infrastructure provides a valuable foundation for more efficient and targeted investment in transport networks. The observed convergence in the length characteristics of main road segments across EU member states, including strong conformity with Benford’s Law, indicates the presence of naturally occurring organizational principles that transcend national planning decisions. This supports the case for enhanced regional cooperation and the development of common transport policies within the EU framework, such as through funding mechanisms like the Connecting Europe Facility (CEF).
The combination of spatial theory and statistical analysis enables the modelling of infrastructure development and the forecasting of the consequences of various investment strategies. In cases of divergence between countries, planners may investigate the underlying causes of such anomalies, whether rooted in historical, geographic, or institutional differences.
If road segments across different countries follow similar statistical distributions, it suggests that infrastructure development occurs in a relatively balanced manner. This insight may serve as an argument for or against further intervention by EU-level institutions. At the same time, it provides a reference framework for evaluating planned infrastructure projects, where significant deviations from the expected distribution may point to systemic imbalances, planning deficiencies, or specific contextual factors that require further investigation.
Beyond planning implications, this study also has important relevance for the field of data quality assessment and validation. In this context, Benford’s Law emerges as an effective tool for automated auditing of large geospatial datasets, particularly where manual validation is impractical. The ability to detect suspicious anomalies, inconsistencies, or artificial manipulations in data is critical, especially in environments that rely on open data and transparency in public administration, where data quality directly affects the reliability of analytical outputs and policy decisions.
Finally, the integration of spatial theories, such as Tobler’s laws, with data-driven statistical approaches like Benford’s Law opens new possibilities for interdisciplinary research into complex geographic systems. This approach not only deepens our understanding of the spatial structure and evolution of transport networks but also supports more transparent, adaptive, and evidence-based decision-making in spatial development and public investment.
Future studies could build on this approach by incorporating additional datasets or exploring similar applications across other fields.

Author Contributions

Conceptualization, Monika Ivanova and Erika Feckova Skrabulakova; methodology, Erika Feckova Skrabulakova, Tomas Skovranek and Ales Jandera; validation, Erika Feckova Skrabulakova, Ales Jandera and Tomas Skovranek; formal analysis, Monika Ivanova and Erika Feckova Skrabulakova; investigation, Zuzana Sarosiova; resources, Erika Feckova Skrabulakova, Ales Jandera and Zuzana Sarosiova; data curation, Monika Ivanova; writing—original draft preparation, Zuzana Sarosiova; writing—review and editing, Erika Feckova Skrabulakova and Tomas Skovranek; visualization, Ales Jandera; supervision, Monika Ivanova and Erika Feckova Skrabulakova; project administration, Zuzana Sarosiova; funding acquisition, Monika Ivanova and Tomas Skovranek. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Slovak Research and Development Agency under the contracts No. APVV-18-0526 and APVV-22-0508. This research was also funded by the Scientific Grant Agency of the Ministry of Education, Research, Development and Youth of Slovak Republic under the grants VEGA 1/0674/23 and VEGA 1/0100/22, and by the Cultural and Educational Grant Agency of the Ministry of Education, Research, Development and Youth of Slovak Republic under the grant KEGA 006TUKE-4/2024.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Joannes-Boyau, R.; Bodin, T.; Scheffers, A.; Sambridge, M.; May, S.M. Using Benford’s law to investigate Natural Hazard dataset homogeneity. Sci. Rep. 2015, 5, 12046. [Google Scholar] [CrossRef]
  2. Sambridge, M.; Tkalčić, H.; Jackson, A. Benford’s Law in the Natural Sciences. Geophys. Res. Lett. 2010, 37, L22301. [Google Scholar] [CrossRef]
  3. Ayyıldız, N.; Karadeniz, E.; İskenderoğlu, Ö. Deprem Büyüklüklerinin Benford Yasası’na Uygunluğu: Kahramanmaraş Depremleri Örneği. Türk Deprem Araştırma Dergisi 2023, 5, 22–32. [Google Scholar] [CrossRef]
  4. Sottili, G.; Palladino, D.M.; Giaccio, B.; Messina, P. Benford’s Law in time series analysis of seismic clusters. Math. Geosci. 2012, 44, 619–634. [Google Scholar] [CrossRef]
  5. De, A.S.; Sen, U. Benford’s law detects quantum phase transitions similarly as earthquakes. Europhys. Lett. 2011, 95, 50008. [Google Scholar]
  6. Courtland, R. Curious Mathematical Law Is Rife in Nature. Available online: https://www.elsevier.com/ (accessed on 19 September 2025).
  7. Alexopoulos, T.; Leontsinis, S. Benford’s law in astronomy. J. Astrophys. Astron. 2014, 35, 639–648. [Google Scholar] [CrossRef]
  8. Zhou, Q.; Tang, H.; Turowski, J.M.; Braun, J.; Dietze, M.; Walter, F.; Yang, C.-J.; Lagarde, S. Benford’s law as debris flow detector in seismic signals. J. Geophys. Res. Earth Surf. 2024, 129, e2024JF007691. [Google Scholar] [CrossRef]
  9. Grammatikos, T.; Papanikolaou, N. Applying Benford’s Law to Detect Fraudulent Practices in the Banking Industry; University of Sussex: Brighton, UK, 2015. [Google Scholar]
  10. Harb, E.G.; Nasrallah, N.; El Khoury, R.; Hussainey, K. Applying Benford’s law to detect accounting data manipulation in the pre-and post-financial engineering periods. J. Appl. Account. Res. 2023, 24, 745–768. [Google Scholar] [CrossRef]
  11. Demir, B.; Javorcik, B. Trade policy changes, tax evasion and Benford’s law. J. Dev. Econ. 2020, 144, 102456. [Google Scholar] [CrossRef]
  12. Nye, J.; Moul, C. The political economy of numbers: On the application of Benford’s law to international macroeconomic statistics. BE J. Macroecon. 2007, 7, 1–14. [Google Scholar] [CrossRef]
  13. Kopczewska, K.; Kopczewski, T. Natural spatial pattern—When mutual socio-geo distances between cities follow Benford’s law. PLoS ONE 2022, 17, e0276450. [Google Scholar] [CrossRef]
  14. Miller, S.J. Benford’s Law; Princeton University Press: Princeton, NJ, USA, 2015. [Google Scholar]
  15. Perazzoni, F.; Bacelar-Nicolau, P.; Painho, M. Geointelligence against illegal deforestation and timber laundering in the Brazilian Amazon. ISPRS Int. J. Geo-Inf. 2020, 9, 398. [Google Scholar] [CrossRef]
  16. Mocnik, F.-B. Benford’s law and geographical information—The example of OpenStreetMap. Int. J. Geogr. Inf. Sci. 2021, 35, 1746–1772. [Google Scholar] [CrossRef]
  17. Tobler, W. On the first law of geography: A reply. Ann. Assoc. Am. Geogr. 2004, 94, 304–310. [Google Scholar] [CrossRef]
  18. Tobler, W. Linear pycnophylactic reallocation: Comment on a paper by D. Martin. Int. J. Geogr. Inf. Sci. 1999, 13, 85–90. [Google Scholar] [CrossRef]
  19. Miller, H.J. Tobler’s first law and spatial analysis. Ann. Assoc. Am. Geogr. 2004, 94, 284–289. [Google Scholar] [CrossRef]
  20. Fernandes, P.; O Ciardhuáin, S.; Antunes, M. Unveiling malicious network flows using Benford’s law. Mathematics 2024, 12, 2299. [Google Scholar] [CrossRef]
  21. Szabo, J.K.; Forti, L.R.; Callaghan, C.T. Large biodiversity datasets conform to Benford’s law: Implications for assessing sampling heterogeneity. Biol. Conserv. 2023, 280, 109982. [Google Scholar] [CrossRef]
  22. Yang, C.-J.; Turowski, J.M.; Zhou, Q.; Nativ, R.; Tang, H.; Chang, J.-M.; Chen, W.-S. Measuring bedload motion time at second resolution using Benford’s law on acoustic data. Earth Space Sci. 2024, 11, e2023EA003416. [Google Scholar] [CrossRef]
  23. Mansouri, E.; Mostajabi, A.; Schulz, W.; Diendorfer, G.; Rubinstein, M.; Rachidi, F. On the Use of Benford’s Law to assess the quality of the Data provided by lightning locating systems. Atmosphere 2022, 13, 552. [Google Scholar] [CrossRef]
  24. Barabesi, L.; Cerioli, A.; Di Marzio, M. Statistical models and the Benford hypothesis: A unified framework. TEST 2023, 32, 1479–1507. [Google Scholar] [CrossRef]
  25. Newcomb, S. Note on the frequency of use of the different digits in natural numbers. Am. J. Math. 1881, 4, 39–40. [Google Scholar] [CrossRef]
  26. Benford, F. The law of anomalous numbers. Proc. Am. Philos. Soc. 1938, 78, 551–572. [Google Scholar]
  27. Kreiner, W.A. On the Newcomb-Benford Law. Z. Naturforsch. A 2003, 58, 618–622. [Google Scholar] [CrossRef]
  28. Raimi, R.A. The first digit problem. Am. Math. Mon. 1976, 83, 521–538. [Google Scholar] [CrossRef]
  29. Berger, A.; Hill, T.P. The mathematics of Benford’s law: A primer. Stat. Methods Appl. 2021, 30, 779–795. [Google Scholar] [CrossRef]
  30. OpenStreetMap. Available online: https://www.openstreetmap.org (accessed on 19 September 2025).
  31. GeoJSON. Available online: https://geojson.org/ (accessed on 19 September 2025).
  32. MATLAB. Available online: https://www.mathworks.com/products/matlab.html (accessed on 19 September 2025).
  33. Mapping Toolbox. Available online: https://www.mathworks.com/products/mapping.html (accessed on 19 September 2025).
  34. Pearson, K. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Lond. Edinb. Dubl. Philos. Mag. J. Sci. 1900, 50, 157–175. [Google Scholar] [CrossRef]
  35. Casella, G.; Berger, R. Statistical Inference; Chapman and Hall/CRC: Boca Raton, FL, USA, 2024. [Google Scholar]
  36. Abramowitz, M.; Stegun, I.A. Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables; Courier Corporation: New York, NY, USA, 1965. [Google Scholar]
  37. Massey, F.J., Jr. The Kolmogorov-Smirnov test for goodness of fit. J. Am. Stat. Assoc. 1951, 46, 68–78. [Google Scholar] [CrossRef]
  38. Joenssen, D.W. edist.benftest: Euclidean Distance Test for Benford’s Law. BenfordTests: Statistical Tests for Evaluating Conformity to Benford’s Law, (Version 1.2.0) [R Package]. CRAN; 2015. Available online: https://cran.r-project.org/package=BenfordTests (accessed on 19 September 2025).
  39. Elsayed, K.M.T. Mean absolute deviation: Analysis and applications. Int. J. Bus. Stat. Anal. 2015, 2, 63–74. [Google Scholar] [CrossRef] [PubMed]
  40. Wackerly, D.D.; Mendenhall, W.; Scheaffer, R.L. Mathematical Statistics with Applications, 7th ed.; Thomson Brooks/Cole: Belmont, CA, USA, 2008. [Google Scholar]
  41. da Silva Azevedo, C.; Gonçalves, R.F.; Gava, V.L.; de Mesquita Spinola, M. A Benford’s law based method for fraud detection using R Library. MethodsX 2021, 8, 101575. [Google Scholar] [CrossRef]
  42. Hill, T.P. A statistical derivation of the significant-digit law. Stat. Sci. 1995, 10, 354–363. [Google Scholar] [CrossRef]
  43. Pietronero, L.; Tosatti, E.; Tosatti, V.; Vespignani, A. Explaining the uneven distribution of numbers in nature: The laws of Benford and Zipf. Phys. A Stat. Mech. Its Appl. 2001, 293, 297–304. [Google Scholar] [CrossRef]
  44. Fewster, R.M. A simple explanation of Benford’s Law. Am. Stat. 2009, 63, 26–32. [Google Scholar] [CrossRef]
Figure 1. Workflow of the HILS data processing and Benford’s analysis. Left: OSM extraction, segmentation, and cleaning. Right: projection, length computation, scale correction, leading-digit extraction, Benford’s testing, and final classification of results.
Figure 1. Workflow of the HILS data processing and Benford’s analysis. Left: OSM extraction, segmentation, and cleaning. Right: projection, length computation, scale correction, leading-digit extraction, Benford’s testing, and final classification of results.
Ijgi 14 00450 g001
Figure 2. Leading digit distribution for the lengths of HILSs of the main road network in Austria.
Figure 2. Leading digit distribution for the lengths of HILSs of the main road network in Austria.
Ijgi 14 00450 g002
Figure 3. Leading digit distribution for the lengths of HILSs of the main road network for 12 selected EU member states.
Figure 3. Leading digit distribution for the lengths of HILSs of the main road network for 12 selected EU member states.
Ijgi 14 00450 g003
Figure 4. Leading digit distribution for the lengths of HILSs of the main road network for 14 selected EU member states.
Figure 4. Leading digit distribution for the lengths of HILSs of the main road network for 14 selected EU member states.
Ijgi 14 00450 g004
Figure 5. The HILSs’ length distribution in Austria.
Figure 5. The HILSs’ length distribution in Austria.
Ijgi 14 00450 g005
Figure 6. The HILSs’ length distribution for 15 selected EU member states.
Figure 6. The HILSs’ length distribution for 15 selected EU member states.
Ijgi 14 00450 g006
Figure 7. The HILSs’ length distribution for 11 selected EU member states.
Figure 7. The HILSs’ length distribution for 11 selected EU member states.
Ijgi 14 00450 g007
Table 1. Benford’s distribution.
Table 1. Benford’s distribution.
DigitExpected Frequency (%)Magnitude
130.1Ijgi 14 00450 i001
217.6Ijgi 14 00450 i002
312.5Ijgi 14 00450 i003
49.7Ijgi 14 00450 i004
57.9Ijgi 14 00450 i005
66.7Ijgi 14 00450 i006
75.8Ijgi 14 00450 i007
85.1Ijgi 14 00450 i008
94.6Ijgi 14 00450 i009
Table 2. Values of χ 2 , p-value and K-S test.
Table 2. Values of χ 2 , p-value and K-S test.
Country χ 2 p-ValueK-S Value
Austria0.09181.00000.0112
Belgium0.04491.00000.0074
Bulgaria0.51660.99980.0255
Croatia0.25221.00000.0201
Cyprus0.09271.00000.0071
Czechia0.11711.00000.0142
Denmark0.39310.99990.0132
Estonia0.22381.00000.0052
Finland0.27051.00000.0206
France0.28761.00000.0147
Germany0.06851.00000.0103
Greece0.21041.00000.0137
Hungary0.39100.99990.0279
Ireland0.46760.99990.0250
Italy0.82120.99910.0284
Latvia0.68410.99960.0203
Lithuania0.33461.00000.0152
Luxembourg0.33271.00000.0166
Malta1.51000.98290.0333
The Netherlands0.09521.00000.0082
Poland0.05111.00000.0097
Portugal1.11010.99750.0319
Romania0.11751.00000.0152
Slovakia0.17831.00000.0107
Slovenia0.90580.99880.0349
Spain0.78680.99930.0266
Sweden0.18651.00000.0180
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ivanova, M.; Feckova Skrabulakova, E.; Jandera, A.; Sarosiova, Z.; Skovranek, T. Benford’s Law and Transport Infrastructure: The Analysis of the Main Road Network’s Higher-Level Segments in the EU. ISPRS Int. J. Geo-Inf. 2025, 14, 450. https://doi.org/10.3390/ijgi14110450

AMA Style

Ivanova M, Feckova Skrabulakova E, Jandera A, Sarosiova Z, Skovranek T. Benford’s Law and Transport Infrastructure: The Analysis of the Main Road Network’s Higher-Level Segments in the EU. ISPRS International Journal of Geo-Information. 2025; 14(11):450. https://doi.org/10.3390/ijgi14110450

Chicago/Turabian Style

Ivanova, Monika, Erika Feckova Skrabulakova, Ales Jandera, Zuzana Sarosiova, and Tomas Skovranek. 2025. "Benford’s Law and Transport Infrastructure: The Analysis of the Main Road Network’s Higher-Level Segments in the EU" ISPRS International Journal of Geo-Information 14, no. 11: 450. https://doi.org/10.3390/ijgi14110450

APA Style

Ivanova, M., Feckova Skrabulakova, E., Jandera, A., Sarosiova, Z., & Skovranek, T. (2025). Benford’s Law and Transport Infrastructure: The Analysis of the Main Road Network’s Higher-Level Segments in the EU. ISPRS International Journal of Geo-Information, 14(11), 450. https://doi.org/10.3390/ijgi14110450

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop