#### 3.1. Study Area, Data, and Methods

The validity and rationality of the mathematical models can be verified and evaluated through empirical observation data. In fact, the success of natural sciences just rests heavily with their great emphasis on the role of interplay between quantifiable data and models [

16]. Four systems of cities in Europe and the U.S.A. can be employed to testify the hierarchical scaling laws and the related models about cities. Jiang and his coworkers [

38,

39] proposed a concept of

natural city and developed a new approach to measuring objective city sizes using street nodes or blocks. In urban geography, a city can be defined as a large settlement that has some kind of service functions to the surrounding areas. However, a natural city is the human settlement based on landscape rather than service functions. Natural cities proposed by Jiang and his co-workers [

38,

39] can be understood by two basic principles of geography: one is the man-land relations, and the other, the distance-decay effect. For urban form and growth, the man-land relations can be expressed by the allometric scaling relations between urban population and land [

6]. Human activities and city population size can be reflected on the urban land use. On the other hand, human population activity density of an urban region decreases from center to periphery with distance [

4,

6]. Thus, according to the distance decay law, we can identify the boundary lines of urban population activities by using some methods. The urban boundary can be termed “urban envelope” [

4,

40]. In terms of the man-land allometric relations, an urban envelope represents an urban place and reflects the city size. So, each envelope can be treated as a boundary of natural city. The key rests with how to identify urban boundaries. Based on remote sensing images or digital maps, at least three approaches have been developed to determine urban envelopes for cities. The first is the city clustering algorithm (CCA) proposed by Rozenfeld and his co-workers [

41,

42], the second is the method of clustering street nodes/blocks advanced by Jiang and Jia [

38], and the third is the fractal-based method presented by Tannier and his co-workers [

43]. In this paper, the natural cities are extracted by means of the method proposed by Jiang and Jia [

38]. Using this approach, we can obtain large datasets of natural cities. Compared with the cities in the usual sense, the rank-size distributions of natural cities are very robust and bear a longer scaling range. Recent years, Jiang and his co-workers developed new approach such as head-tail index to identify natural cities [

44,

45].

Urban block is an ordinary concept, and the street nodes are defined as street intersections and ends. Using an identification algorithm of urban boundary, Jiang’s research group was able to delineate boundaries of natural cities and yield city areal extents. Thus urban area can be determined by a city’s areal extent containing a large number of street blocks or nodes. The number of street nodes is significantly correlated with the population size of cities. The city data are extracted from massive volunteered geographic information OpenStreetMap databases through some data-intensive computing processes, and four datasets on the cities of the U.S.A., Britain (UK), France, and Germany are available. The process of identifying natural cities is actually an approach of spatial search, and the number of cities is automatically determined through spatial search technique. By the same technical criterion of spatial search, the numbers of natural cities extracted from different countries may be very different. The reason lies in the different geographical conditions, which result in great differences in the spatial patterns of urban development. In Britain and France, natural cities have better correspondence with the usual cities, while in Germany, natural cities are significantly different from traditional ones.

The empirical analysis can start from investigating Zipf’s distribution, which, as pointed out above, can be regarded as a signature of the hierarchies with cascade structure. If cities in a region follow Zipf’s law, they can be organized into a self-similar hierarchy [

7,

30]. It has been shown that the cities in the four countries follow Zipf’s law [

38,

39]. Applying the generalized 2

^{n} rule to the above-mentioned datasets, we can create four self-similar hierarchies of European and U.S. cities. Suppose that these systems of cities follow the pure Zipf’s law. Then the cities in each country can be reorganized into a hierarchy with cascade structure. The Zipf’s law cannot be directly derived by using the method of entropy maximizing, but the hierarchical scaling laws can be derived by means of this approach. Curry once tried to derive Zipf’s law using the idea from entropy maximization [

33], but his result is actually a three-parameter exponential function rather than a power function [

7].

Table 2 is presented for understanding the operational process of hierarchical reconstruction (two

Supplementary Material files are provided to show how to process the data and estimate the scaling exponents, see

Files S1 and S2).

Several algorithms can be adopted to evaluate the scaling exponents. The most common ones include the least squares method (LSM) [

37], maximum likelihood method (MLM) [

46,

47], and major axis method (MAM) [

26,

48]. Recent years, the MLM is often used to identify power-law distributions, and it is treated as the most available approach to estimating power exponents. In fact, the power-law relations of this work are based on exponential functions, and are converted into logarithmic linear relations. It was demonstrated that if the observations come from an exponential family and mild conditions are satisfied, the least-squares estimates are identical to the maximum-likelihood estimates [

49]. What is more, if the errors of a linear model belong to the normal distribution, the least squares estimators are also identical to the maximum likelihood estimators. All in all, the function of an algorithm is to estimate the parameter values of a mathematical model rather than judge the form of a model’s expression. Any algorithm has its advantages and disadvantages, sphere of application, and applicative conditions. The precondition of applying the MLM to observational data is that the variables satisfy the joint normal distribution. Unfortunately, for human systems such as cities, the observational data do not always meet the joint normal distribution. In this case, the LSM is an advisable approach to estimating power exponent values [

23,

37]. The models’ parameters are evaluated by using the least squares calculations.

#### 3.2. Results and Findings

The systems of cities in the U.S.A., U.K., France, and Germany can be well described with hierarchical scaling equations. In light of the generalized 2

^{n} principle expressed by Equations (1) and (2), we can organize the cities in each country into a hierarchy with cascade structure. The city number in the

mth level is

N_{m} = 1, 2, 4, …, 2

^{m}^{−1}, … The numbers of levels in the urban hierarchies in the four countries are 15, 11, 11, and 13, respectively. The last levels are lame-duck classes because that city numbers are not big enough. Based on the hierarchical structure, we can calculate the average city size

P_{m} and the corresponding average urban area

A_{m} at each level (

Table 3). The city numbers in different classes are designed according to the 2

^{n} rule and satisfy Equation (1). It is easy to testify that city size

P_{m} and urban area

A_{m} follow exponential distribution and meet Equations (2) and (3), respectively, but the lame-duck classes are two outliers due to lack of adequate cities. Strictly speaking, the first class is usually an outlier because the largest city is often an exception [

7]. In fact, a mathematical law always becomes ineffective when the scale of measurement is too large or too small.

The exponential distributions of city size and urban area result in the power-law relations between city number, size, and area. The exceptional values in the exponential laws often manifest themselves on the log-log plots for power laws. In fact, if the scale is too large or too small, a power-law relation always breaks down [

6,

50]. Thus the extreme classes always form exceptional points, and there exists a scaling range between the two extremes. For U.S. cities, the last class of cities is out of trend lines and forms outliers, but the first class of cities is normal (

Figure 3). For the British, French, and German cities, both the first and last classes are exceptional values (

Figure 4,

Figure 5 and

Figure 6). For comparability, the first class of U.S. hierarchy of cities is treated as an outlier, which does not influence the results and conclusions significantly. Removing the first and last data points as outliers yields the ranges for the scaling relations between city number and city size or urban area. All the data points within the scaling range follow power law and take on double logarithmic linear relationships. In particular, the influence of primate distribution of city sizes on the hierarchical scaling patterns is weak. In urban geography, city size distributions are divided into two different groups: rank-size distribution and primate distribution [

6]. In short, without considering the first and last classes, the relation between city size and number can be described with Equation (4), and the relation between urban area and city number can be described with Equation (5). Fitting Equations (4) and (5) to the datasets in

Table 3, we can evaluate the parameters by the least squares calculation. The scaling exponent values are close to 1, and the

d value (area exponent) is slightly greater than the

D values (size exponent). The ratio of

D to

d can be termed fractal dimension quotient of urban hierarchies. As indicated above, if

D approaches 1, the total “population” of the

mth level approaches a constant

S_{1}. Despite the fact that there are always many smaller cities than larger ones [

29,

38,

45], the product of average size and city number at each class seems to be invariable. This reminds us of the work of Auerbach, who asserted the product of the population size of city class

i,

P_{i}, and the rank of class

i within all classes when ordered by population size,

R_{i}, approaches a constant

K, i.e.,

P_{i}R_{i} =

K [

51]. The difference rests with that Auerbach’s finding is a special case of Zipf’s formula, which represents the restrictive rank-size rule rather than the hierarchy with cascade structure. In our context, the total size in the

mth level of the self-similar hierarchy,

T_{m} =

N_{m}S_{m}, approaches to a constant, i.e.,

T_{m}→

constant, which suggests a conversation law. The conversation law implies some type of symmetry [

6,

52]. In this study, the conversation is associated with hierarchical scaling symmetry.

Since Zipf’s law is a signature of the self-similar hierarchy of cities, two distributions related to the rank-size distributions should be discussed here. First, the relationship between Zipf’s distribution and the lognormal distribution. Where city-size distributions are concerned, if we do not identify the scaling range, the rank-size relation often satisfy a lognormal distribution rather than a power-law distribution; however, if the scaling range is taken into consideration, the power-law relation is always clearer than the lognormal relation. In order to reveal the power-law relations of urban hierarchies, the data points at the two extremes should be removed as outlier. Second, the rank-size distribution and the primate distribution. Both the city size distributions of Britain and France are regarded as primate distribution. However, according to

Figure 4 and

Figure 5, the primate distribution seems not to represent an independent type. The large cities in Britain and France take on the character of primate distribution because London and Paris are two global cities [

6]. The primate distribution has impact on the log-log relations between city number, size, and urban area. However, this influence is not significant to the hierarchical scaling relations based on large datasets. This seems to suggest that, compared with the rank-size law, the primate law represents a local rule rather than a global principle of city size distributions.

The relationships between city number and city size or urban area are a pair of fractal dimension relations, from which it follows an allometric scaling relation between city size and urban area. Using the data displayed in

Table 3, we can estimate the allometric scaling exponent values. Corresponding to the exponential models and fractal models above mentioned, the first and last classes are treated as outliers so that the allometric parameters and fractal parameters are more comparable with one another. The allometric scaling of the hierarchies of cities in the four European and American countries is clear and significantly convincing. For U.S. cities, all the data points follow the allometric scaling law; for the cities of the U.K., France, and Germany, the last levels, i.e., the lame-duck classes, are exceptional points (

Figure 7). The main results are shown in

Table 4, in which we can see the way and effect of data processing.

The four study areas, U.S.A., U.K., France, and Germany, are all developed countries, and the levels of urbanization are near their respective capacity values, i.e., the upper limit values. The allometric scaling properties of these urban hierarchies are as below: First, the allometric scaling exponent is close to but less than 1. This suggests that the relative growth rate of city size is slightly less than that of urban area. When a city is small, its population density is low, the per capita land use quantity is large, and the city expands fast in the two-dimensional space. With the growth of the city, the population distribution is becoming more and more concentrated, and the urban buildings begin to develop to the higher level, thus the per capita land consumption become smaller and smaller, and intensification of urban land use emerges. As a result, the allometric scaling exponent

b ≤ 1. Second, the allometric scaling exponent is equivalent to the fractal dimension quotient. In theory, the allometric exponent is the ratio of the fractal dimension of urban population size distribution to that of urban area size distribution. Where empirical analysis is concerned, the allometric exponent is close to the fractal dimension ratio. Generally speaking, for the developing systems of cities, the fractal dimension of population size distribution is significantly less than that of area size distribution. The allometric scaling exponent values come between 2/3 and 1, and always approach to 0.85 [

6,

31,

53]. However, for the developed cities, population growth and land use expansion reach the final equilibrium, and the difference between the two types of size distribution dimension is not significant. Therefore, the allometric scaling exponent is close to 1. Otherwise, a system will lose its balance [

54]. The state of maximizing entropy balance indicates the suitablest scaling exponent value, for example, the Zipf’s exponent is

q = 1 [

7,

27,

28], and the urban area-population allometric scaling exponent is

b = 1 or 0.85 [

31,

53]. If the two maximum entropy processes are seriously misaligned, the scaling exponent values will be abnormal. For instance, the Zipf’s exponent

q must fall between 1/2 and 2, i.e., 1/2≤

q ≤ 2 [

55], and the allometric scaling exponent exponent

b must come between 2/3 and 1, i.e., 2/3 ≤

b ≤ 1 [

31], or else, it suggests that the state of entropy balance of city size distribution and city frequency distribution is seriously damaged. According to the calculation results, for the natural cities in the developed countries, the three entropy maximization processes are approximately in step with each other and fall in the state of balance.