An Empirical Evaluation of the Critical Population Size for “Knowledge Spillover” Cities in China: The Significance of 10 Million

Gao, Xiaohui; Chen, Qinghua; Zhou, Ya; Huang, Siyu; Shi, Yi; Li, Xiaomeng

doi:10.3390/urbansci9070245

Open AccessArticle

An Empirical Evaluation of the Critical Population Size for “Knowledge Spillover” Cities in China: The Significance of 10 Million

by

Xiaohui Gao

¹,

Qinghua Chen

¹

,

Ya Zhou

^1,*,†,

Siyu Huang

^1,2,

Yi Shi

³ and

Xiaomeng Li

^1,*,†

¹

School of Systems Science, Beijing Normal University, Beijing 100875, China

²

School of Business, Guangxi University, Nanning 530004, China

³

China Population and Development Research Centre, Beijing 100081, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Urban Sci. 2025, 9(7), 245; https://doi.org/10.3390/urbansci9070245

Submission received: 31 March 2025 / Revised: 13 June 2025 / Accepted: 16 June 2025 / Published: 27 June 2025

Download

Browse Figures

Versions Notes

Abstract

In advanced countries such as the USA and China, some cities are characterized by “knowledge spillover industries”, which play crucial roles in driving innovation, entrepreneurship, and economic growth. However, the excessive expansion of megacities in China has led to the overabsorption of labour from other cities. The unchecked growth of individual megacities causes metropolitan malaise and regional imbalance, further limiting the emergence of new “knowledge spillover” cities, which is detrimental to overall economic development. This study analyses China’s employment population structure to identify the critical population size required for the formation of “knowledge spillover” cities. The results show that 10 million is the unique threshold for which cities with populations above this size see a significant improvement in the prominence of “knowledge spillover” industries. Therefore, a population base of approximately 10 million is essential for these cities to thrive. This result suggests that China should pay more attention to the construction of urban agglomerations as geographic or administrative units to better distribute resources and promote balanced regional development. This approach can help foster the emergence of more “knowledge spillover” cities, thereby enhancing national innovation capacity and economic growth.

Keywords:

population; urban development; knowledge spillover; industrial employment; innovative cities

1. Introduction

The emergence of cities is a symbol of the maturity and civilization of human beings [1,2]. Inevitable urbanization processes result in the accumulation of various material resources, such as capital, materials, commodities, and labour. The concentration and proper distribution of resources reduce various production and living costs, improve the social efficiency of a city, and promote the further concentration and distribution of material resources. As the economy and technology develop and progress, more emerging industries and employment opportunities appear in cities. This situation directly leads to rapidly expanding urban size [3] and complex economic activities [4].

The growth of the urban population is not random but follows a structured pattern [5]. Different industries have varying contributions to urban economic growth and exhibit different employment growth rates [6,7]. In urban scaling theory, the relationship between employment in different industries and the urban population is often described by the power-law function

Y_{i} \sim N^{β_{i}}

[8,9], where N represents the city population, and

Y_{i}

represents the employment in industry i. Industries that exhibit superlinear growth in employment relative to the urban population (

β_{i} > 1

) are generally considered to be driven by “knowledge spillovers”, whereas urban infrastructure industries (

β_{i} < 1

) show the opposite trend [8,10]. This relationship suggests that the population dependencies during urbanization may underpin the urban economic structure and promote its evolution [5], as the output and employment share of knowledge spillover industries continue to increase compared with those of infrastructure industries [11,12]. The employment structure of metropolitan regions has evolved in the postindustrialization era [13,14,15]. Cities with more pronounced production advantages in “knowledge spillover” industries (with

β_{i} > 1

) are referred to as “knowledge spillover” cities. Some scholars argue that the emergence of “knowledge spillover” cities is contingent upon a sufficiently large population. The most recent quantitative analyses indicate that in the United States, “knowledge spillover” cities begin to emerge only when the population exceeds 1.2 million [7].

China’s large population and abundant regional labour flows [16] provide a strong basis for rapid urbanization [17,18] and the construction of several international metropolises [19,20]. Moreover, China’s urbanization process has exhibited patterns of shifting from manufacturing to service industries as well as the gradual emergence of knowledge spillover industries as key drivers of urban growth. However, some megacities, even those with populations exceeding those of major international metropolises in the developed countries, still have dominant industries that are basic in nature and contribute limited value to GDP growth. This situation has sparked debates among scholars regarding whether China should restrict the size of its superlarge cities [21,22]. The emergence of “knowledge spillover” cities, which are expected to drive economic innovation and sustainable growth, has been hindered. This issue is characterized by the excessive expansion of a few megacities, which not only leads to “metropolitan malaise” and unbalanced regional development but also limits the emergence of other “knowledge spillover” cities [5,23]. Addressing this issue requires an analysis of the coevolutionary patterns between urban population size and industrial structure transformation.

This research aims to examine the mechanism by which population distribution characteristics promote urban evolution in China, with a focus on the emergence of “knowledge spillover” cities. It specifically investigates how urban population distribution impacts urban evolution. Hypothesis testing has confirmed that cities with populations exceeding 10 million have significantly more prominent “knowledge spillover” industries than those with populations below 10 million, and this threshold value is unique. The study identifies the labour conditions necessary for the emergence of “knowledge spillover” cities and demonstrates that the longitudinal changes in urban economies follow a universal process governed by changes in population size. This study intends to comprehend the underlying mechanisms driving urban development and provide insights into fostering a more balanced and sustainable urbanization process in China. Additionally, it aims to pinpoint the population thresholds and labour conditions required for the emergence of knowledge spillover cities.

2. Literature Review

The labour force and contributing individuals—represented by producers, consumers, and members of society in the urban area—are the most fundamental resources for any urban area. Generally, the population distribution among urban areas is uneven, where N is the urban population and

P (N)

is the distribution function. Urban populations typically follow a Pareto distribution

P (N) = C N^{- (α + 1)}

[24,25], Zipf’s law, or a log-normal distribution

P (N) = C N^{- 1} exp (- \frac{{(ln N - μ)}^{2}}{2 σ^{2}})

[26,27,28]. Urban areas are ranked by population size in descending order. The population of the urban area ranked r is proportional to

r^{- 1}

. For example, the largest urban area has twice the population of the second-largest urban area, three times that of the third-largest urban area, and so on. Although Zipf’s law is not expressed as a probability density function, it also indicates that the population of top-ranked urban areas is much larger than is that of lower-ranked urban areas. These distributions all feature a negative exponent of N, suggesting that population concentration results in a few large metropolises standing out from many smaller urban areas [29,30]. However, the quality of urbanization and industrial structure differ significantly among urban areas of varying sizes [5]. According to Zhao (2017), the shift from manufacturing to the service industry as the leading economic sector is an important reason for the continuous growth of urban areas [31]. However, not all expanding urban areas can leverage “knowledge spillovers” to become centres of superlinear growth industries and thus gain competitive advantages. Therefore, understanding how the distribution characteristics of urban populations influence the evolution of urban areas and how urban expansion can be effectively managed to better support the economic structure and foster “knowledge spillover” urban areas are questions of great interest to scholars and policy-makers.

Scholars have explored the general patterns by which increasing urban population sizes drive structural transformations in urban areas. Friedmann (2006) notes that disparities in urban size lead to great inconsistencies between the quality of spatial development in urbanization and the quality of social development [32]. According to Frank and Balland (2018, 2020), small urban areas heavily rely on manual labour, whereas large urban areas rely on cognitive labour [4,33]. The authors suggest that as the urban population increases, industries transform from labour- to capital- and knowledge-technology-intensive or from agriculture- to manufacturing- and modern-services-led [11]. Recent quantitative analyses in the United States show that “knowledge spillover” urban areas emerge only when the population exceeds 1.2 million [7].

China’s early rapid urbanization process [34,35] and subsequent “polarization” problem [36,37,38] have inspired comprehensive analyses of the mechanisms by which population promotes urban evolution. China’s large population base and substantial migrant population have enabled the rapid formation of multiple international metropolises. From 2020–2021, China had 21 urban areas with populations exceeding 5 million [39] and 17 urban areas with populations exceeding 10 million [40]. However, insufficient “knowledge spillover” urban areas have emerged. Therefore, it is necessary to investigate whether similar general patterns exist in China, where population size drives urban transformation, and to identify the population threshold for such transformations. This research demonstrates that the longitudinal change in urban economies follows a universal process governed by changes in population size.

In summary, while significant progress has been made in understanding the relationship between urban population dynamics and economic evolution, gaps remain in identifying the specific mechanisms and thresholds that drive the emergence of knowledge spillover in urban areas. Future research should focus on bridging these gaps, particularly in the context of China’s unique urbanization trajectory. The remainder of this study is arranged as follows. Section 3 introduces the data source and methods. Section 4 describes the scaling characteristics of employment in different industries and indicates that urban areas have different comparative advantages, which is the basic assumption used to discuss urban development and innovation, analyses the evolution of knowledge spillover industries, and illustrates the labour demand and its limitations for “knowledge spillover” urban areas in China. Section 5 provides the conclusion and discussion.

3. Data & Methods

3.1. Data Sources and Preprocessing

The dataset encompasses the urban population of cities in China at the prefecture-level and higher, along with their employment across various industries. Given the substantial floating population in China, studies on urban evolution typically utilize resident population data. The resident population accurately reflects the mobility characteristics of the current Chinese population and provides a precise depiction of urbanization levels on the basis of resident population standards. For example, China’s census data are based on the resident population.

3.1.1. Resident Population Data for Cities from 2004 to 2019

In January 2004, the National Bureau of Statistics mandated that all provinces, autonomous regions, and municipalities calculate the gross regional product (GRP) per capita using the resident population data. (According to the spirit of the 28th executive meeting of the State Council, the National Bureau of Statistics issued the Notice on Improving and Standardizing Regional GDP Accounting on 6 January 2004, which required provinces, autonomous regions, and municipalities to uniformly calculate per capita GDP using the resident population (i.e., the registered population minus the outflow population of more than half a year plus the inflow population).) Owing to the lack of publicly available and reliable resident population data, the urban population of each city was estimated using formula

\frac{G R P}{G R P_{per capita}}

from 2004 onwards. The effectiveness of this estimation method is discussed. A reliability analysis of the estimated data for four typical cities is provided in Appendix A.1.2.

The GRP and GRP per capita data are sourced from the “China City Statistical Yearbook” [41]. Since the Yearbook ceased reporting industry-specific employment data for prefecture-level cities starting in 2020, the dataset is confined to the period up to 2019. To extend the dataset to 2020, data from the 7th National Population Census were incorporated. Given that the census is conducted decennially and that the employment population distribution in prefecture-level cities was significantly influenced in the short term by the COVID-19 pandemic in 2020, the supplementary data for 2020 are presented exclusively in Appendix A.2.1. Given the altered characteristics of the urban population distribution, the implications for the conditions under which “knowledge spillover” cities emerged in 2020 are examined.

3.1.2. 280 Cities and 19 Industries

As of 2019, China had 4 municipalities and 293 prefecture-level cities. The employment data for 19 industries for the period 2004–2019 were selected from the “China City Statistical Yearbook” [41]. The industries are defined according to the Industry Classification of the National Economy (GB/T 4754) [42]. However, employment data for certain industries were missing in some cities. Although these cases are rare, they required rigorous preprocessing. Interpolation was employed for non-endpoint missing values, and regression prediction was used to interpolate endpoint vacancy values (details in Appendix A.2.2).

The detailed data are provided in Table 1. To ensure statistical consistency, 18 cities lacking a significant amount of data were excluded (Specifically, 16 cities lacked GRP data for several years, which poses problems for obtaining permanent resident population data (Sansha, Zhangzhou, Bijie, Zunyi, Tongren, Lasa, Rikaze, Changdu, Linzhi, Shannan, Naqu, Longnan, Haidong, Zhongwei, Tulufan, and Hami), and 2 cities lacked all employment population data for certain industries (Ziyang and Hengshui), so these 18 cities were excluded from the analysis. Additionally, Laiwu was merged with Jinan in January 2019, and the city was retained in the analysis to maintain consistency). The effective sample thus includes 276 prefecture-level cities and 4 municipalities directly under the central government, totalling 280 cities. According to the classification of China’s four major economic regions (see Table A5 in Appendix C.4), there are 86 cities in the eastern region, 80 cities in the central region, 80 cities in the western region, and 34 cities in the northeast region.

3.2. Investigate How the Comparative Advantage Evolves with Population Size

For city c, the revealed comparative advantage (

R C A

) of industry i in all employment is quantified as [7,47,48]

R C A_{c i} = \frac{Y_{c i} / \sum_{i} Y_{c i}}{\sum_{c} Y_{c i} / \sum_{c, i} Y_{c i}},

(1)

where

Y_{c i}

denotes the employment of industry i in city c. Specifically, an industry i is considered characteristic in city c if

R C A_{c i} > 1

, whereas

R C A_{c i} < 1

indicates a lack of specialization [7].

To verify that a population of 10 million is the appropriate threshold for “knowledge spillover” cities, in addition to plotting the evolution curves of

R C A

for different industry types with respect to city size and making intuitive judgements on the basis of these curves, this paper further conducted hypothesis testing to provide a more rigorous validation and explanation. This approach aims to substantiate the significance of the 10 million population scale in the context of urban development in China.

Define the sets

G_{1} = {N_{c} < p_{1}}

G_{2} = {p_{2} \leq N_{c} < p_{3}}

where

N_{c}

is the population of city c.

Null Hypothesis (

H_{0}

): The proportion of cities with an

R C A

value greater than 1 is the same for cities in

G_{1}

and cities in

G_{2}

.

Alternative Hypothesis (

H_{1}

): The proportion of cities with an

R C A

value greater than 1 is higher for cities in

G_{2}

than for cities in

G_{1}

.

Mathematically, these hypotheses can be expressed as

\begin{matrix} H_{0} : θ_{G_{1}} & = θ_{G_{2}} \\ H_{1} : θ_{G_{1}} & < θ_{G_{2}} \end{matrix}

where

θ_{G_{1}}

and

θ_{G_{2}}

denote the proportions of cities with

R C A

values greater than 1 in sets

G_{1}

and

G_{2}

, respectively.

To test the above hypotheses, we use a logistic regression model to predict the probability that the

R C A

value is greater than 1. The steps are as follows:

1. Construct the Logistic Regression Model:

Divide the population size into two groups:

p_{1} > N_{c}

and

p_{2} \leq N_{c} < p_{3}

. Assume that y is a binary variable indicating whether the

R C A

value is greater than 1 (1 if greater than 1, 0 otherwise). The logistic regression model can be expressed as:

log (\frac{P (y = 1)}{1 - P (y = 1)}) = β_{0} + β_{1} \cdot G

where

β_{0}

is the intercept,

β_{1}

is the coefficient for the population size group, and G is a categorical variable representing the population size group: 0 for

G_{1}

and 1 for

G_{2}

.

2. Fit the Logistic Regression Model:

Use maximum likelihood estimation (MLE) to fit the logistic regression model. The model outputs the coefficients, standard errors, z values, and p values for each independent variable.

3. Output the Results:

Output the summary of the logistic regression model, including the coefficients, standard errors, z values, and p values. Check if the p value for the population size group is less than the significance level (0.01).

If

β_{1}

is significantly greater than 0 (p value less than 0.01), cities with a population between

p_{2}

and

p_{3}

have a significantly higher proportion of

R C A

values greater than 1 compared to cities with a population less than

p_{1}

. Confidence intervals not including 0 further support the conclusion that population size has a significant effect on the proportion of

R C A

values greater than 1.

To ensure that the sample sizes of the two groups are the same and to avoid bias due to different group sizes, we performed stratified sampling by randomly selecting an equal number of samples from each group. This balanced sampling approach helps to eliminate potential biases in the analysis.

In practical applications, whether this difference is significant also needs to be judged in conjunction with the specific research background and domain knowledge. For example, if an

R C A

value greater than 1 indicates that a certain industry has a competitive advantage in a certain city, then an increase in population size might lead to a significant increase in competitive advantage.

3.3. The Comparative Advantage of Industries and Its Critical Point Analysis

Some studies use scaling exponents to explore employment distribution and reflect the technical level of industries [11,12]. Other scholars analyse a city’s growth model by examining the distribution of the urban population using scaling exponents [49,50]. However, few studies consider urban population and employment together. Some scholars explored the relationships between urban size growth and urban innovation using the scaling exponent

β

[7]. The authors reported that superlinearity (

β > 1

) is typically associated with knowledge spillover industries, whereas sublinear scaling (

β < 1

) is often attributed to infrastructure industries [7,10].

Y_{c i} \approx Y_{i o} N_{c}^{β_{i}} .

(2)

Although a power-law distribution is common in urban populations and other socioeconomic systems [51,52,53], not all countries or regions follow this pattern [54,55]. Some of the literature suggests that China’s urban population does not follow a power-law distribution [49] but obeys a log-normal distribution [56]. Here, on the basis of the data for urban residents, it is verified that China’s urban population satisfies the log-normal distribution (see Figure A2 in Appendix A.1.3). In the range of relatively small or large N, it can be assumed that the distribution function is linear under double logarithmic coordinates, as

P (N) \sim N^{- γ}

in

N \in [N_{min}, N_{max}]

, where

γ

is the coefficient of the explanatory variable [48,57,58].

The function of comparative advantage

R C A

is then derived as follows, with the details provided in Appendix B.1 and Appendix B.2:

R C A (β, N) = N^{β - 1} (\frac{β - γ + 1}{N_{max}^{β - γ + 1} - N_{min}^{β - γ + 1}}) .

(3)

According to Equation (3), the change in

R C A

with respect to the scaling exponent

β

and population N is quantified. This calculation allows for the analysis of the universal development path and labour demand of cities in China by comparing the evolution of the employment advantages among industries.

In physics, a critical point

(β^{*}, N^{*})

of the function

R C A (β, N)

is a point in the function’s domain at which it is either not holomorphic or the derivative is equal to zero [59]. On the surface of the graph of the function

R C A (β, N)

(a two-dimensional surface composed of

β

and N), the critical point is defined by

(\frac{\partial R C A}{\partial β} = 0 and \frac{\partial R C A}{\partial N} = 0)

. This definition indicates that regardless of how the two parameters change near this point, the value of

R C A

will remain unchanged. Thus, it is defined as a fixed point in mathematics. Whether the critical point is stable depends on whether the point is a minimum or a maximum. For a one-dimensional function, the critical point is relatively easy to judge. For example, for function

f (x) = x^{2} - 3

,

x^{*} = 0

is a critical point with

\frac{\partial f}{\partial x} = 0

, and it is stable with

\frac{\partial^{2} f}{\partial x^{2}} > 0

, indicating a minimum. The critical point of a multidimensional function is more complex because each dimensional state may differ.

In this study, on the basis of empirical analysis, the critical points are identified as saddle points, with the Hessian matrix having one positive eigenvalue and one negative eigenvalue. Through a schematic diagram of this two-dimensional surface, the evolution of

R C A

for different industrial employment with parameter changes is analysed (see Figure 1). When

N > N^{*}

, for industries with

β > β^{*} = 1

,

R C A

increases with increasing

β

, while for industries with

β < β^{*} = 1

,

R C A

decreases with the growth in

β

. This result demonstrates that

N^{*}

is an important critical point. The advantages of knowledge spillover industries can be realized only when the urban population reaches a certain scale.

4. Results

4.1. Distribution and Evolution of Scale Characteristics

4.1.1. Evolution of Scale Characteristics

The hypothesis posits that the employment population in different industries

Y_{c i}

and the urban population

N_{c}

adhere to a power-law relationship, described by

Y_{c i} \approx Y_{i o} N_{c}^{β_{i}}

. Here,

β_{i}

varies across industries, reflecting different scaling effects. To test this hypothesis, a log-linearized model

ln (Y_{c i}) = ln (Y_{i o}) + β \cdot ln (N_{c})

is employed for the regression analysis. Additionally, a null model

ln (Y_{c i}) = ln (Y_{i o})

is introduced for comparison, which assumes that the employment population is determined solely by a constant term and is independent of the urban population. Both the actual and the null models are fitted using ordinary least squares (OLS), and an F test is conducted to compare them. If the p value of the F test is less than the significance level (e.g., 0.05), the actual model significantly outperforms the null model, thereby supporting the hypothesis that a power-law relationship exists between the employment and the urban populations, with distinct

β_{i}

values for different industries.

For the period 2004–2020, only the scaling relationships in the agriculture and mining industries are insignificant. This result indicates that most industries have a logarithmic linear relationship between employment

Y_{c i}

and the urban population

N_{c}

, supporting the hypothesis. For two special industries, agriculture and mining, the scaling characteristics are not considered in subsequent research. Figure 2 shows that in China, approximately 50% of industries have

β_{i} > 1

. The distribution of

β_{i}

was wider in 2019 than it was in 2004. Specifically, in 2004,

β_{i}

fell within a narrower range of [0.7, 1.1], whereas in 2019, the range expanded to [0.7, 1.4]. The resident service, public facilities, and manufacturing industries are representative of those whose

β_{i}

increased rapidly. There was no obvious decrease in

β_{i}

from 2004 to 2019.

Different factor endowments and technology levels cause the same industry to have varying

β_{i}

across different countries. Although scaling characteristics cannot be used to judge the advantages of an industry between China and the United States,

β_{i} > 1

indicates that the employment structure of that industry in large cities is more advantageous. The economies of scale in large cities show that industries with

β_{i} > 1

tend to be more important and often represent the driving force of the country’s economic development. Therefore, the development of “knowledge spillover” cities should focus more on industries with

β_{i} > 1

.

The analysis compares the changes in scaling characteristics across industries and shows their longitudinal dynamics from 2004 to 2019, with manufacturing, finance, education, and public administration as four typical examples (see Figure A5 in Appendix A.2.3). The results show that the manufacturing industry has become more concentrated and that the scaling effect has strengthened; the education and public services industries have become more decentralized, with more equal resource allocation; and the financial industry has maintained a relatively stable relationship. The regression results for 2019, including

β_{i}

and

R^{2}

, indicate that the scaling relation captures the patterns of most cities.

4.1.2. Distribution of Scale Characteristics in Different Cities

Cities of different sizes have various advantageous industries. On the basis of the average population during 2004–2019 (omitting 2009 because of its instability), 280 cities were divided into three groups: small, medium, and large cities (each comprising 33% of the total). Figure 3 shows the probability distributions of the characteristic industries in the three groups. For all cities, more of the characteristic industries exhibit sublinear scaling with

β < 1

. For scaling exponent

β

, the mean value (

μ

) of large cities is greater than that of medium cities, and the

μ

of medium cities is greater than that of small cities. However, the standard deviation follows the opposite trend.

This analysis confirms that the advantageous industries in large cities are more likely to be knowledge spillover industries, whereas small cities have more significant infrastructure industries.

4.2. Labour Demand of “Knowledge Spillover” Development

4.2.1. Evolution of Knowledge Spillover Industries

The literature indicates that population dependencies may underpin the urban economic structure and its evolution, as small cities heavily rely on infrastructure industries, whereas large cities rely on knowledge spillover industries [4,33,60]. This finding is shown in Section 4.1.2. Furthermore, some scholars have analysed data from the United States and propose that knowledge spillover industries (referred to as cognitive industries) become characteristic only when the urban population reaches a certain scale [7]. Does this rule also apply to other economies? Here, Chinese cities of different sizes are shown to have distinct characteristic industries. Although industrial scaling characteristics vary across countries, do they follow similar universal laws in “knowledge spillover” development? As a populous country with rapid economic development, China’s urban evolution has attracted widespread attention. Chinese cities with characteristic knowledge spillover industries are defined as “knowledge spillover” cities, and the paths of “knowledge spillover” development and labour demand are studied using both empirical analysis and theoretical models.

Population size is not used to define “knowledge spillover” cities here. However, the subsequent analysis shows that the structural advantages of knowledge spillover industries often appear only in cities with sufficiently large populations, with the premises that the urban population of the country follows log-normal distribution characteristics and that the employment structures of infrastructure and knowledge spillover industries exhibit different growth (or scale) characteristics. Especially in China, compared with infrastructure industries, cities with advantageous knowledge spillover industries often require a population size of more than 10 million. This is a universal rule for the emergence of “knowledge spillover” cities in China, as detailed below.

Figure 4 shows the average

R C A

of 19 industries in cities with different population sizes in 2019. Here, the line colour indicates the industry

β

: blue for infrastructure industries, red for knowledge spillover industries, and grey for agriculture and mining industries. To clarify, the average evolutionary trends are highlighted for several typical knowledge spillover and infrastructure industries. On the basis of the previous assumption, only the effective region where the urban population and employment share a logarithmic linear relationship is discussed, i.e., the regions between

N_{min}

and

N_{max}

.

First, for small- and medium-scale cities in China, infrastructure industries (in blue) are usually more characteristic. With the expansion of the urban scale, the

R C A

of knowledge spillover industries (in red) increases, whereas the

R C A

of infrastructure industries gradually decreases. When the population size reaches

10^{7}

, the comparative advantage of cities shifts to knowledge spillover industries, whose

R C A

values exceed those of infrastructure industries. This result is similar to the urban development pattern in the United States, where the emergence of “knowledge spillover” cities is based on a certain population size. “Knowledge spillover” cities in the United States require less labour than Chinese cities do, with a threshold of approximately

1.2 \times 10^{6}

[7].

The results of hypothesis testing provide a more rigorous validation and explanation for 2019 (Table 2). Here the analysis is confined exclusively to industries with

β_{i} > 1

. In the initial experiments,

H_{0}

is rejected, indicating that for those industries with

β > 1

, the proportion of cities with

R C A > 1

is higher for cities in

G_{2}

(

p_{2} \leq N_{c} < p_{3}

) than it is for cities in

G_{1}

(

N_{c} < p_{1}

). For the other experimental groups,

H_{0}

cannot be rejected, implying that the significance of “knowledge spillover” industries does not differ significantly between the two groups.

Overall, cities with a population greater than 10 million show significantly greater significance in “knowledge spillover” industries than do those with a population less than 10 million. In cities with populations less than 10 million, no clear boundary demonstrates a significant difference between the two city groups. Therefore, the significance of 10 million as a critical threshold for the prominence of “knowledge spillover” industries is further clarified. This property occurred from 2004 to 2019 (Table A3).

After this phenomenon was observed from the empirical data, it was analysed with a theoretical model. On the basis of the analysis of Equation (3), two critical points were identified, both of which are saddle points. Especially at the critical point where the urban population is approximately

1.07 \times 10^{7}

, the advantages of knowledge spillover industries become prominent, with

R C A > 1

. Meanwhile, the advantages of infrastructure industries disappear, with

R C A < 1

. The competitive strengths of these two types of industries reverse at this point.

Here,

R C A

has a maximum in one dimension and a minimum in the other dimension. The details are provided in the Appendix B.2. According to the theoretical model, around the critical point, the

R C A

of knowledge spillover industries increases with urban population growth, whereas that of infrastructure industries decreases. Moreover, crossing the critical point indicates that the significance of knowledge spillover industries exceeds that of infrastructure industries. This result also implies that the emergence of “knowledge spillover” cities is based on a certain urban population size, such as

10^{7}

in China.

Therefore, both the empirical data and the theoretical model identify and verify the critical points, confirming the theoretical hypothesis and deduction process and revealing several statistical rules for a comparative advantage in China’s urban evolution.

4.2.2. Comparison with the Optimal City Size Model

The critical point in China is approximately

10^{7}

, whereas in the United States, it is approximately

1.2 \times 10^{6}

. As analysed in Section 4.2.3, in addition to the differences in absolute population (i.e., the various

N_{min}

and

N_{max}

in different countries), this difference stems from the distribution characteristics of urban size in different countries, as indicated by

γ

. This idea is similar to traditional optimal city size theory [61]. For a single city, finding the optimal size is a typical optimization problem that considers economic externalities or agglomeration benefits as well as commuting and rental costs [62,63,64]. The optimal size of multiple cities in an urban system is usually measured by the rank-size rule [17,65,66], which is typically described by the Pareto distribution, Zipf’s law, some improved models [24,25,67], or log-normal distributions [27,28].

Some scholars have calculated the optimal size of a single city in China and the United States. From the perspective of maximizing net income, Wang and Xia noted that the optimal city size in China is approximately

10^{7}

[68]. Carlino studied the relationship between agglomeration economies and the increasing coefficient of scale returns from the perspective of increasing returns to scale and noted that the optimal city size in the United States is approximately

3.4 \times 10^{6}

[69]. Using Carlino’s method as a reference, Jin noted that the optimal city size of Beijing, Shanghai, and Tianjin is approximately

10^{7}

[70]. Of course, some studies that incorporate environmental pollution and energy efficiency suggest that the optimal city size obtained is significantly smaller than that suggested by previous research and that the optimal urban size in China should not exceed

0.5 \times 10^{7}

[71,72]. The literature considers output as a whole, without discussing the dependence and impact of changes in output structure on city size.

Using one typical optimal size model of a single city (as there are many empirical models of the optimal city size, which cannot be tested individually this study takes only a representative model as an example to compare such methods with the empirical results of our study), this study quantifies the optimal city size in China and the United States that maximizes benefits through the analysis of personal income and expenditure [61]. The results show that, on the basis of empirical data from 2019, the optimal city size in the United States is

1.22 \times 10^{6}

, whereas that in China is

1.13 \times 10^{7}

. The optimal size model differs from the method used in this study, but both yield similar quantitative results (details in Appendix C.3).

In the optimal size model,

ϵ

represents the exponential rate of per capita GDP growth with urban population size. Both

γ

and

ϵ

describe the population-based scale effect, and urban endogenous agglomeration benefits from different perspectives. China has a smaller

γ

and a larger

ϵ

(see Table A4 in Appendix C.3). A smaller

γ

indicates that the population is more concentrated in a few large cities, whereas a larger

ϵ

indicates that per capita output or personal income is relatively higher in large-scale cities. These factors make China’s optimal city size much larger than that of the United States. Because the population is highly concentrated in a few large cities, fewer Chinese cities can cross the critical transition point compared to cities in the United States.

The traditional optimal city size model describes the benefits of urban growth through economies of scale and analyses the problems associated with congestion costs. Adding to the discussion on changes in economic structure with urban scale growth would provide a valuable supplement and verification to the theory of optimal city size. The employment structure reflects a city’s production characteristics. The predominance of knowledge spillover industries often means higher per capita output under the same size. Thus, the evolution of the employment structure is related to changes in a city’s optimal scale, as illustrated by the consistency of the results derived from the two models.

4.2.3. Limitations of Urban Innovation in China

For the cities located near the critical point, industries with

β > 1

significantly increase, whereas those with

β < 1

decrease. Therefore, cities near

N_{2}^{*}

are selected to analyse their evolution trends. As shown in Figure 5, some of these cities are on the verge of surpassing

N_{2}^{*}

and have the potential to transform into “knowledge spillover” cities; these include Guangzhou, Baoding, Shijiazhuang, Suzhou, Linyi, Shenzhen, Chengdu, and Wuhan. In addition, three cities—Beijing, Shanghai, and Chongqing—have already surpassed the critical point in terms of population size, despite not being located near it. Here, subinnovative cities refer to the emergence stage of the innovation cycle. Compared with subinnovative cities that have leading technologies and highly skilled jobs, cities in the “knowledge spillover” stage have more mature technologies and a greater density of skilled jobs [11]. Moreover,

94.3 %

of cities in China still need to further improve their technical levels and work proficiency to achieve “knowledge spillover” development.

Figure 6 analyses the geographical distribution of cities near the critical point. Dark red indicates cities that have crossed

N_{2}^{*}

, whereas light red indicates cities with population gaps of less than 20% from

N_{2}^{*}

. In terms of spatial distribution, all “knowledge spillover” cities are concentrated east of the Hu Huanyong Line, which indicates regions with relatively high population density in China. Moreover, most of the dark and light red cities are located in China’s major urban agglomerations, including the Central Plains, Jing-Jin-Ji, Shandong Peninsula, Yangtze River Delta, Yu-Rong, Middle Yangtze, and Greater Bay areas (marked here).

This result indicates that the proportion of Chinese cities crossing or near the critical point is relatively low. According to the 2019 statistics from the United States Census Bureau in [45], among the 384 metropolitan areas, 49 have populations exceeding the critical population size of 1.2 million [7], accounting for

12.8 %

. Therefore, China does not have enough cities likely to complete the transformation from subinnovative to “knowledge spillover” cities. Moreover, most of these cities are located near existing “knowledge spillover” cities. When these cities complete the transformation, they can form urban agglomerations of a certain scale, which will facilitate further regional expansion.

China is a populous country. In terms of absolute population size, most Chinese cities have larger populations than their U.S. counterparts do. Even in 2019, 256 cities at the prefecture level and above had populations exceeding the critical point of the United States (i.e., 1.2 million). However, owing to differences in urban population distribution and industrial employment scaling characteristics, the demand for population resources in “knowledge spillover” cities varies significantly between the two countries. This situation is related to factors such as production technology and other economic conditions. Only 16 cities in China meet the population demand for “knowledge spillover” development, and the innovation rate of population demand is far lower than that of the United States.

4.2.4. Influence of Population Distribution Characteristics

In addition, to compare economies of different population sizes, this study theoretically analyses the changes in

log (N^{*} / N_{m a x})

according to different values of

γ

.

γ

represents the distribution characteristic of the urban population in a country and has a significant effect on the proportion of “knowledge spillover” cities. The

γ

of the United States is less than 2 [7], whereas the

γ

of China is located in the range

[1.10, 1.35]

. Consequently,

log (N^{*} / N_{m a x})

is closer to 0 in China than it is in the United States. Therefore, the population demand of China’s “knowledge spillover” cities is closer to that of the largest cities, indicating a limited number of such cities in China (Figure 7b).

In addition, the trend of urban population growth in China is not encouraging [73]. China’s population has already experienced three consecutive years of negative growth [74]. According to the latest data, the total population peaked in 2022 and has since decreased. In 2024, the total population decreased by 139 million compared with that in the previous year, with a natural growth rate of −0.99%.

Furthermore, according to the latest projections, China’s working-age population is expected to continue to decrease at an accelerated pace, with a projected decrease of 200 million people by 2050 [75]. The slow or even negative growth of China’s population will lead to significant changes in population structure, which will greatly impact urban innovation capabilities [76]. Moreover, China’s large cities face challenges related to industrial agglomeration and innovation agglomeration [77].

5. Conclusions and Suggestions

Exploring the conditions for the emergence of cities dominated by “knowledge spillover” industries is of great significance. These cities leverage superlinear growth factors to achieve advanced development and utilize resources more efficiently. Focusing on China, this study identifies a development path in urban evolution and the labour demand characteristics of “knowledge spillover” cities through the analysis of industrial employment and comparative advantage. It uncovers a critical population size.

5.1. The Comparative Advantage of “Knowledge Spillover” Industries Requires a Critical Urban Population Size

The relationship between industry employment and the urban population reveals distinct growth patterns for different industries. Superlinear growth industries, such as “knowledge spillover” industries, accelerate their growth in larger cities, which can be interpreted as a more efficient aggregation of resources. This phenomenon is reflected in the statistical patterns of the urban population and number of employees in prefecture-level cities in China. The analysis from a modelling perspective further elucidates the underlying mechanisms involved.

The log-normal distribution of the urban population indicates that only a small number of large-scale cities exist. Industries have different scaling effects: sublinear growth industries typically dominate in smaller cities, whereas superlinear growth industries thrive in larger cities. The comparative advantage of superlinear over sublinear growth industries emerges when the urban population reaches a certain scale. This critical population size varies across economies, depending on urban population distribution characteristics and the scaling properties of different industries.

5.2. The Critical Population Size of 10 Million for China

The specific superlinear growth industries vary across countries due to differences in economic endowments, production technologies, and other factors. For example, in the United States, administrative and financial services exhibit superlinear growth characteristics, whereas in China, manufacturing and retail trade are more prominent. The underlying principle is the same: each country has industries that can grow at an accelerated rate, aggregating resources more efficiently and driving economic development more effectively. These industries can thrive as “knowledge spillover” and dominant sectors only in cities that reach a certain population size.

The labour demand for “knowledge spillover” cities varies across economies, and in China, it is

1.0 \times 10^{7}

. It is influenced by the overall population size and the characteristics of the urban population distribution. China’s urban population distribution has a large negative power index, indicating a relative insufficiency in the number of large cities. Although China is a populous country, only 5.7% of its cities have become “knowledge spillover” cities.

5.3. Future Prospects for China

The critical population size of 10 million in China represents a significant threshold. However, this threshold is particularly high given the current trends in China’s demographic and urban development. China’s population growth has been slowing, and the further expansion of megacities could lead to several challenges. The expansion of large cities may result in urban problems such as traffic congestion, housing shortages, and environmental pressures, commonly referred to as “metropolitan malaise”. The population concentration in a few megacities could exacerbate the imbalance in the urban population distribution, potentially increasing the threshold.

To address these challenges and leverage the advantages of superlinear growth industries, China needs to adopt a balanced urban development strategy. This strategy should focus on promoting the growth of medium-sized cities and improving the connectivity and integration of urban agglomerations. By doing so, China can create a more balanced urban population distribution, reduce the pressure on megacities, and provide more opportunities for the emergence of knowledge spillover cities. Additionally, policies should be designed to encourage the development of superlinear growth industries in a wider range of cities, ensuring that the benefits of these industries are more evenly distributed across the country.

Author Contributions

X.G. (first author): data curation, formal analysis, writing—original draft. Q.C.: supervision, methodology, and validation. Y.Z. (corresponding author 1): data curation, supervision, and writing—review and editing. S.H. (coauthor): conceptualization, formal analysis, methodology, and writing—original draft. Y.S. (coauthor): conceptualization, supervision, and validation. X.L. (corresponding author 2): formal analysis, supervision, writing—original draft, and writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Social Science Foundation (22BRK021) and the Interdisciplinary Construction Project of Beijing Normal University.

Data Availability Statement

The data can be downloaded through the links in Table 1. All data used in this study were derived from publicly available datasets.

Acknowledgments

We appreciate the comments and helpful suggestions from Dahui Wang, Zengru Di and Handong Li.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A. Data Sources and Statistical Analyses

Appendix A.1. Cities and Industries

Appendix A.1.1. Cities in China

In China, one of the administrative divisions is the city, which is typically categorized into three types on the basis of different administrative statuses: 1. Municipalities directly under the central government, which are part of provincial-level administrative regions. 2. Prefecture-level cities, which belong to prefecture-level administrative regions. 3. County-level cities, which are part of county-level administrative regions. By the end of 2020, China had a total of 685 cities, including 4 municipalities directly under the central government, 293 prefecture-level cities, and 388 county-level cities. Since the definition of urban agglomeration in China is not unified, urban planning and development are still implemented by the municipal governments of municipalities directly under the central government, prefecture-level cities, and county-level cities. This study primarily refers to the administrative divisions of cities by the Chinese government, as they more closely align with the realities in China. Prefecture-level cities are commonly used as scale units for analysing the urban economy in China. Additionally, the data in this study adopt the statistical scale of “the whole city”, which includes urban areas, counties, and cities, covering the entire range of prefecture-level cities, rather than including only urban areas.

Appendix A.1.2. Municipalities and Prefecture-Level Cities

Since 2019, China has had 4 municipalities directly under the central government and 293 prefecture-level cities, resulting in 297 cities. For this study, we selected 280 samples. Specifically, 16 cities were excluded due to the lack of gross regional product (GRP) data for several years, which also affected the availability of resident population data (these cities are Sansha, Zhangzhou, Bijie, Zunyi, Tongren, Lasa, Rikaze, Changdu, Linzhi, Shannan, Naqu, Longnan, Haidong, Zhongwei, Turpan, and Hami). Additionally, 2 cities were excluded because of the absence of employment population data for certain industries (Ziyang and Hengshui). These 18 cities were excluded from the analysis. Furthermore, Laiwu was merged with Jinan in January 2019, but the city was retained in the analysis to maintain consistency. Considering the completeness of the data, this study ultimately used 280 municipalities and prefecture-level cities.

The resident population can fully reflect the mobility characteristics of the current Chinese population and accurately depict the urbanization level on the basis of the resident population standard. In the spirit of the 28th Executive Meeting of the State Council, the National Bureau of Statistics issued the Notice on Improving and Standardizing Regional GDP Accounting on 6 January 2004. This notice requires provinces, autonomous regions, and municipalities in the future to uniformly calculate per capita GDP using the resident population (i.e., the registered population minus the outflow population of more than half a year plus the inflow population).

To estimate the urban population of each city, we used the formula

GRP / {GRP}_{per - capita}

, as many cities do not publicly release local resident population data. We selected four representative cities and compared the published urban population data with the estimated results. The results passed the t test (with sig. = 0.0107), indicating no significant difference between the two groups of data. In the linear fitting analysis, the adjusted

R^{2}

value was 0.9237 (see Figure A1).

Figure A1. Reliability of the estimated data (in 10 thousand). Source: Estimated data from the “China City Statistical Yearbook” (2019) [41] and published data from the statistical bureaus of cities.

Appendix A.1.3. 19 Industries

According to the latest classification (GB/T 4754-2017) [42], China’s industries are divided into 20 categories. The industry related to “international organizations” is not included in economic statistics. Thus, this study focuses on the remaining 19 industries.

Details of the data are shown in Table 1, which shows that China’s urban population satisfies the log-normal distribution in 2019 (Figure A2), and this characteristic persists from 2004 to 2019.

Figure A2. Distribution characteristics of the urban population in China. (a) Lognormal distribution of the urban population in China. (b) Approximate scaling exponent of Chinese cities in 2019. Source: Data from the “China City Statistical Yearbook” (2019) [41].

Figure A3. Average comparative advantage of Chinese cities of different sizes (2020). Each line represents the

R C A

of an industry in different cities, coloured by the industry scaling characteristic, as

β

. A value greater than

y = 1

indicates that the industry is significant in cities of this size. Source: Data from the 7th National Population Census (2020), National Bureau of Statistics, China [78].

Figure A3. Average comparative advantage of Chinese cities of different sizes (2020). Each line represents the

R C A

of an industry in different cities, coloured by the industry scaling characteristic, as

β

. A value greater than

y = 1

indicates that the industry is significant in cities of this size. Source: Data from the 7th National Population Census (2020), National Bureau of Statistics, China [78].

Appendix A.2. Data Sources

Appendix A.2.1. Explanation of the Time Interval of the Data

Since the “China City Statistical Yearbook” ceased reporting industry-specific employment data for prefecture-level cities starting in 2020, our dataset is confined to the period up to 2019. Additionally, Figure A4 shows the resident population of prefecture-level cities from the “China City Statistical Yearbook” for the years 2019–2024. After the Shapiro–Wilk test was conducted, these distributions were determined to follow a log-normal distribution, and the probability density functions of the log-normal distribution were fitted. Specifically, a comparative analysis was conducted on the distribution characteristics near

10^{7}

. The temporary increase in the number of cities with populations less than

10^{7}

from 2020 to 2021 gradually disappeared from 2022 to 2024, and the data returned to the characteristics observed in 2019. Therefore, it is appropriate that the analysis in the main text concludes with the year 2019.

Figure A4. The probability density of the resident population (2019–2024). Source: Data from the “China City Statistical Yearbook” (2019–2024) [41].

The population census in China is conducted every ten years. In 2020, data were released from the seventh population census, which include employment population statistics by industry for prefecture-level cities. These data are used to conduct an analysis for the year 2020. Owing to the differences in datasets, to ensure the validity and scientific nature of the study, this content has been included only in Appendix C.1.

Appendix A.2.2. Data Interpolation Method

If data for a certain industry are missing in only a few years, the approach can be divided into two cases: (a) For missing data in non-endpoint years (for example, within the time range of 2004–2019, the endpoint years are 2004 and 2019, whereas 2005–2018 are non-endpoint years), the linear interpolation method is used. This method requires that the data for adjacent years with missing values are not null. (b) For missing data in the endpoint years, linear regression is used. Here, the evolutionary trend of the data can be quantified, making linear regression suitable for estimating endpoint values.

Appendix A.2.3. Logarithmic Linear Correlation of the Population and Employment in Specific Industries

In Figure A5, each grey dot represents a city, with the x-coordinate indicating the urban population and the y-coordinate indicating employment. The grey arrow indicates the directional change in each city’s position from 2004 to 2019. The red (blue) line represents the fitted relationship in 2004 (2019). If the blue line is steeper than the red line, it indicates an increase in

β_{i}

, as seen in the manufacturing industry. Conversely, a flatter blue line indicates a decline in the scaling characteristics, as observed in the education and public administration industries. For the financial industry,

β_{i}

remains relatively stable, whereas the overall employment across cities has increased. This result suggests that the manufacturing industry has become more concentrated, with a stronger scaling effect; the education and public services industries have become more decentralized, reflecting a more equal allocation of resources; and the financial industry has maintained a relatively stable relationship. The regression results for 2019, including

β_{i}

and

R^{2}

(see Table A1), indicate that the scaling relationship captures the patterns of most cities.

Figure A5. Relationship between the population and employment in specific industries. (a) Manufacturing. (b) Finance. (c) Education. (d) Public Administration.

Appendix A.2.4. The Different β_i Values for China and the United States

The same industry can exhibit different properties in different countries. Figure A6 shows that China and the United States have distinct scaling exponent characteristics. Here, each dot represents an industry. The size of the dot and its label font indicate the employment scale of the industry in China. The x-coordinate shows the average

β

of the industry in the United States from 2004 to 2019 [7], and the y-coordinate shows the corresponding value in China during the same period. The red line represents the

y = x

line. Industries above the red line have relatively larger scaling exponents in China, whereas those below the red line have smaller scaling exponents in China. This result indicates that China and the United States have quite different industrial scaling characteristics.

In the lower right region of Figure A6, administrative services, finance, and the arts are shown to be superlinear industries in the United States but sublinear industries in China. In the upper left region, manufacturing, retail trade, and health care are shown to be superlinear industries in China but sublinear industries in the United States.

Figure A6. Correlation between scaling exponents in China and the United States.

Appendix B. The Theoretical Model and Derivation Process

Appendix B.1. The Scaling Laws for Cities

Zipf distributions of city sizes have been widely recognized [24,79]. In the past decade, most urban indicators have been determined to follow the ubiquitous scaling law [9]:

Y (t) \approx Y_{0} (t) N {(t)}^{β} .

(A1)

Here,

N (t)

represents the population size of a city at time t;

Y_{0} (t)

is a time-dependent normalization constant.

Y (t)

can be different types of urban indicators. The literature indicates that

β < 1

represents the sublinear regime, which is associated with economies of scale in the surface area. In contrast,

β > 1

represents the superlinear regime, which is associated with outcomes from social interactions, such as R&D employment, inventors, supercreatives, and income.

Some scholars have used scaling laws to analyse the effects of urban spatial concentration on economic development. The scholars observed that scaling exponents can be explained by the level of complexity of an activity, which may explain why complex economic activities are more concentrated in large cities [4,80]. These activities include research papers, patents, occupations, and industries. Later studies used employment data from different industries as urban indicators to analyse the evolution of infrastructure and knowledge spillover industries on the basis of data from the United States [7,33].

The scaling exponents are calculated using Equation (A1), and Table A1 shows the results for 2019.

Table A1. Scaling exponents in different industries.

Industry	$β$	$p_{reg}$	$R^{2}$	$p_{F}$
Agriculture, forestry, animal husbandry and fishery	0.24	0.05	0.01	0.05
Mining	0.09	0.68	0.00	0.85
Manufacturing	1.32 ^**	0.00	0.52	0.00
Production and supply of electricity, heating, gas, and water	0.73 ^**	0.00	0.36	0.00
Construction industry	1.35 ^**	0.00	0.53	0.00
Wholesale and retail	1.35 ^**	0.00	0.61	0.00
Transportation, warehousing, and postal services	1.16 ^**	0.00	0.56	0.00
Accommodation and catering industry	1.29 ^**	0.00	0.44	0.00
Information transmission, computing and services, and software	1.24 ^**	0.00	0.54	0.00
Finance	1.01 ^**	0.00	0.57	0.00
Real estate industry	1.26 ^**	0.00	0.53	0.00
Rent	1.21 ^**	0.00	0.49	0.00
Scientific research, technical services and geological survey	1.26 ^**	0.00	0.51	0.00
Public facilities management industry	1.25 ^**	0.00	0.51	0.00
Residential services, repair and other services	1.38 ^**	0.00	0.41	0.00
Educational Services	1.05 ^**	0.00	0.93	0.00
Health care and social work	0.98 ^**	0.00	0.88	0.00
Culture, sports and entertainment	1.03 ^**	0.00	0.53	0.00
Public administration, social security and social organization	0.76 ^**	0.00	0.80	0.00

^** indicates the significant industries every year at the 95% confidence level.

p_{reg}

: p value for the regression coefficient.

p_{F}

: p value from the F-test, indicating the overall significance of the regression model. Source: Data from the “China City Statistical Yearbook” (2004–2019) [41].

Appendix B.2. The Comparative Advantage Function

Y_{c i}

is the employment of industry i in city c.

{RCA}_{c i} = \frac{Y_{c i} / \sum_{i} Y_{c i}}{\sum_{c} Y_{c i} / \sum_{c, i} Y_{c i}} .

(A2)

An industry is called characteristic or significant if

{RCA}_{c i} > 1

. Employment

Y_{c i}

is proportional to city population

N_{c}

. With

Y_{c i} \approx Y_{i o} N_{c}^{β_{i}},

(A3)

the following can be obtained:

\begin{matrix} {RCA}_{c i} = \frac{Y_{c i}}{\sum_{i} Y_{c i}} / \frac{\sum_{c} Y_{c i}}{\sum_{c, i} Y_{c i}} \\ \sim \frac{Y_{i o} N_{c}^{β_{i}}}{\sum_{i} Y_{c i} \sum_{c} Y_{i o} N_{c}^{β_{i}}} \\ = \frac{N_{c}^{β_{i}}}{\sum_{i} Y_{c i} \sum_{c} N_{c}^{β_{i}}} . \end{matrix}

(A4)

Figure A2 shows that

N_{c}

is lognormal distributed according to

P (N) = \frac{1}{N \sqrt{2 π} σ} e^{- \frac{{(ln N - μ)}^{2}}{2 σ^{2}}},

(A5)

which is consistent with the conclusion of [81]. For

N_{min} < N < N_{max}

, it can be approximately defined that

P (N) \propto N^{- γ} β_{i} \neq γ - 1,

(A6)

\begin{matrix} \sum_{c} N_{c}^{β_{i}} & ≃ \int_{N_{min}}^{N_{max}} P (N) N^{β_{i}} d N \\ = \int_{N_{min}}^{N_{max}} N^{- γ} N^{β_{i}} d N \\ = \int_{N_{min}}^{N_{max}} N^{β_{i} - γ} d N \\ = \frac{1}{β_{i} - γ + 1} (N_{max}^{β_{i} - γ + 1} - N_{min}^{β_{i} - γ + 1}) . \end{matrix}

(A7)

With the hypothesis of

\sum_{i} Y_{c i} \sim N_{c}

, we obtain

RCA (β, N) \sim N^{β - 1} (\frac{β - γ + 1}{N_{max}^{β - γ + 1} - N_{min}^{β - γ + 1}}) .

(A8)

Appendix B.3. Changes in Comparative Advantage

Changes with N.

$\frac{\partial RCA}{\partial N} = (β - 1) N^{β - 2} (\frac{β - γ + 1}{N_{max}^{β - γ + 1} - N_{min}^{β - γ + 1}})$

(A9)

when $β > 1$ , $\frac{\partial RCA}{\partial N} > 0$ ; when $β < 1$ , $\frac{\partial RCA}{\partial N} < 0$ ; and when $β = 1$ , $\frac{\partial RCA}{\partial N} = 0$ .
Changes with $β$ .

$\begin{matrix} \frac{\partial RCA}{\partial β} = \frac{(β - γ + 1) N^{β - 1}}{(N_{max}^{β - γ + 1} - N_{min}^{β - γ + 1})} \\ [ln N + C (β)] \end{matrix}$

(A10)

$C (β) = [\frac{1}{β - γ + 1} - \frac{(N_{max}^{β - γ + 1} ln N_{max} - N_{min}^{β - γ + 1} ln N_{min})}{N_{max}^{β - γ + 1} - N_{min}^{β - γ + 1}}]$ .

$\begin{matrix} N^{*} & = e x p [\frac{N_{max}^{β - γ + 1} ln N_{max} - N_{min}^{β - γ + 1} ln N_{min}}{N_{max}^{β - γ + 1} - N_{min}^{β - γ + 1}} \\ - \frac{1}{β - γ + 1}] \end{matrix}$

(A11)

In 2019,
$N_{1 max} = 2.5 \times 10^{6}$ , $N_{1 min} = 6.0 \times 10^{5}$ , $γ_{1} = - 1.1093$ , when $β^{*} = 1$ , $N_{1}^{*} \approx 1.86 \times 10^{6}$ . When $N_{1}^{*} < N < N_{1 max}$ , $\partial RCA / \partial β > 0$ ; when $N_{1 min} < N < N_{1}^{*}$ , $\partial RCA / \partial β < 0$ ; and when $N = N_{1}^{*}$ , $\partial RCA / \partial β = 0$ .
$N_{2 max} = 2.1 \times 10^{7}$ , $N_{1 min} = 4.0 \times 10^{6}$ , $γ_{2} = 1.3211$ , when $β^{*} = 1$ , $N_{2}^{*} \approx 1.07 \times 10^{7}$ . When $N_{2 min} < N < N_{2}^{*}$ , $\partial RCA / \partial β < 0$ ; when $N_{2}^{*} < N < N_{2 max}$ , $\partial RCA / \partial β > 0$ ; and when $N = N_{2}^{*}$ , $\partial RCA / \partial β = 0$ .

Appendix C. Other Results

Appendix C.1. The Derivation Process for the 2020 Data

On the basis of the data from the seventh population census, the derivation process for 2020 is as follows:

In 2020,

N_{1 max} = 2.5 \times 10^{6}

,

N_{1 min} = 6.0 \times 10^{5}

,

γ_{1} = - 1.0275

, when

β^{*} = 1

,

N_{1}^{*} \approx 1.83 \times 10^{6}

. When

N_{1}^{*} < N < N_{1 max}

,

\partial RCA / \partial β > 0

; when

N_{1 min} < N < N_{1}^{*}

,

\partial RCA / \partial β < 0

; and when

N = N_{1}^{*}

,

\partial RCA / \partial β = 0

.

N_{2 max} = 2.1 \times 10^{7}

,

N_{1 min} = 3.5 \times 10^{6}

,

γ_{2} = 1.6559

, when

β^{*} = 1

,

N_{2}^{*} \approx 9.39 \times 10^{6}

. When

N_{2 min} < N < N_{2}^{*}

,

\partial RCA / \partial β < 0

; when

N_{2}^{*} < N < N_{2 max}

,

\partial RCA / \partial β > 0

; and when

N = N_{2}^{*}

,

\partial RCA / \partial β = 0

.

Figure A7. Distribution characteristics of the urban population in China. (a) Lognormal distribution of the urban population in China. (b) Approximate scaling exponent of Chinese cities in 2020. Source: Data from the 7th National Population Census (2020), National Bureau of Statistics, China [78].

As shown in Figure A8, although the urban population data for 2019 and 2020 were sourced from different databases, the correlation between the two is high, indicating that the overall data quality is good. Additionally, some notable changes can be observed. For example, near the second critical point (the red shaded area on the y-axis), although the number of cities with population growth is still less than that with population decline, the total population increase in this group of cities is relatively large. This result suggests that in 2020, population growth was more pronounced in secondary cities than it was in the top three cities. The population distribution of cities with populations of approximately 10 million became relatively more even, which led to an increase in

γ

. This change slightly decreased the threshold at which large cities can surpass the knowledge spillover barrier.

Figure A8. Comparison of urban population data for 2019 and 2020 from different databases. Source: 2020 data from the 7th National Population Census (2020) [78], National Bureau of Statistics, China; 2019 data from the “China City Statistical Yearbook” [41].

Appendix C.2. Labour Demand of “Knowledge Spillover” Cities in China

According to the fitting results, critical thresholds

N_{1}^{*}

and

N_{2}^{*}

can be obtained for transitioning to subinnovative and supinnovative economies. These thresholds help us understand the development paths and labour demands of “knowledge spillover” cities in China. The calculated results are shown in Table A2. Additionally, on the basis of an empirical analysis, the critical points are saddle points, with the Hessian matrix having one positive eigenvalue and one negative eigenvalue.

Table A2. Calculation results of

N_{1}^{*}

And

N_{2}^{*}

(in people).

Table A2. Calculation results of

N_{1}^{*}

And

N_{2}^{*}

(in people).

Year	$N_{1}^{*}$	$N_{2}^{*}$
2004	$1.61 \times 10^{6}$	$8.38 \times 10^{6}$
2005	$1.61 \times 10^{6}$	$8.36 \times 10^{6}$
2006	$1.65 \times 10^{6}$	$8.41 \times 10^{6}$
2007	$1.58 \times 10^{6}$	$8.40 \times 10^{6}$
2008	$1.80 \times 10^{6}$	$8.16 \times 10^{6}$
2009	$1.80 \times 10^{6}$	$9.67 \times 10^{6}$
2010	$1.82 \times 10^{6}$	$9.91 \times 10^{6}$
2011	$1.79 \times 10^{6}$	$1.00 \times 10^{7}$
2012	$1.78 \times 10^{6}$	$1.02 \times 10^{7}$
2013	$1.79 \times 10^{6}$	$1.00 \times 10^{7}$
2014	$1.77 \times 10^{6}$	$1.01 \times 10^{7}$
2015	$1.79 \times 10^{6}$	$1.02 \times 10^{7}$
2016	$1.79 \times 10^{6}$	$1.05 \times 10^{7}$
2017	$1.84 \times 10^{6}$	$1.08 \times 10^{7}$
2018	$1.81 \times 10^{6}$	$1.02 \times 10^{7}$
2019	$1.86 \times 10^{6}$	$1.07 \times 10^{7}$
2020	$1.83 \times 10^{6}$	$9.39 \times 10^{6}$

Table A3. The results of hypothesis testing in 2004–2020.

Year	$p_{1}$ Million	$p_{2}$ Million	$p_{3}$ Million	LLR p-Value	Population $p > \| z \|$
2004	8	9	∞	0.002 *	0.000 *
2005	8	9	∞	0.000 *	0.000 *
2006	8	9	∞	0.000 *	0.000 *
2007	8	9	∞	0.000 *	0.001 *
2008	8	9	∞	0.008 *	0.010 *
2009	8	10	∞	0.001 *	0.003 *
2010	8	10	∞	0.007 *	0.009 *
2011	8	10	∞	0.000 *	0.001 *
2012	7	10	∞	0.005 *	0.008 *
2013	10	11	∞	0.002 *	0.005 *
2014	10	11	∞	0.003 *	0.005 *
2015	10	11	∞	0.001 *	0.002 *
2016	10	11	∞	0.005 *	0.007 *
2017	10	11	∞	0.002 *	0.004 *
2018	10	11	∞	0.002 *	0.003 *
2019	10	11	∞	0.007 *	0.008 *
2020	8	10	∞	0.009 *	0.010 *

(*) indicates a significance level of 0.01.

Appendix C.3. Comparison with the Optimal City Size

N_{2}^{*}

in China is approximately

10^{7}

people, whereas that in the United States is approximately

1.2 \times 10^{6}

people. As the previous section analysed, the gap in the absolute population (that is, the various

N_{min}

and

N_{max}

in different countries) is due to the distribution characteristics of city size in different countries, such as

γ

. In the traditional theory of optimal city size, for a single city, the optimal city size is a typical optimization problem that comprehensively considers economic externalities or agglomeration utility as well as commuting or rental costs [61,63]. Some scholars have expanded the scope of social costs to include environmental protection [17,64] or have measured the optimal size of multiple cities in the urban system using the rank-size rule [66].

From the perspective of maximizing the net income of city size, Wang noted that the optimal city size in China should be approximately

10^{7}

[68]. Carlino analysed the relationship between agglomeration economies and increasing returns to scale and reported that the optimal city size in the United States is approximately

3.4 \times 10^{6}

people [69]. Using Carlino’s method for reference, Jin reported that the optimal city size of Beijing, Shanghai and Tianjin was approximately

10^{7}

[70]. When environmental pollution and energy efficiency are considered, the optimal city size is significantly smaller than it is in previous studies, which did not exceed

0.5 \times 10^{7}

people in China [72]. However, these studies take output as a whole without analysing the dependence and influence of the changes in output structure on the city size.

Using a typical optimal size model for individual cities, this study quantifies the optimal city size in China and the United States by analysing personal income and expenditure [61]. On the basis of empirical data from 2019, the optimal city size in the United States is

1.22 \times 10^{6}

people, whereas that in China is

1.13 \times 10^{7}

people. Although the optimal size model differs from our method, both approaches yield similar quantitative results.

In the optimal size model,

ϵ

represents the exponent of the relationship between per capita GDP growth and urban population size. Both

γ

and

ϵ

describe how the population-based scale effect and urban endogenous agglomeration benefit from different perspectives. China has a smaller

γ

and a larger

ϵ

(Table A4). A smaller

γ

indicates that the population is more concentrated in a few large cities, whereas a larger

ϵ

suggests that per capita output or personal income is relatively higher in large-scale cities. These factors make China’s optimal city size much larger than that of the United States. Because the population is highly concentrated in a few large cities, the number of cities in China that can cross the second critical transition point is smaller than it is in the United States.

The traditional optimal city size model can be used to describe the benefits of urban growth and economies of scale as well as to analyse problems related to congestion costs. Thus, the model serves as a useful supplement and validation for the theory of optimal city size. The employment structure reflects a city’s production characteristics, and the predominance of highly agglomerating industries often means that, for cities of the same size, per capita output will be greater. Therefore, the evolution of the employment structure is related to changes in a city’s optimal scale, which can be illustrated through the consistency of the results of the two models.

Suppose an urban system with heterogeneous cities, where workers benefit from agglomeration economies that translate into higher outputs. This study proposes that the gross output (per capita) in city i with population

n (i)

is given by [61]

P (i) = a \cdot n {(i)}^{ϵ},

(A12)

where

a > 0

represents exogenous city productivity and

n {(i)}^{ϵ}

represents the benefit of endogenous agglomeration outside the representative firm. Urban dwellers bear commuting costs

n {(i)}^{ρ}

and land rents

ρ \cdot n {(i)}^{ρ}

, as follows:

C (i) = (1 + ρ) \cdot n {(i)}^{ρ} .

(A13)

The optimization function is

max_{n (i)} U = a \cdot n {(i)}^{ϵ} - (1 + ρ) \cdot n {(i)}^{ρ} .

(A14)

This study uses GDP (per capita) and sales of commodities (per capita) as proxies for personal income

P (i)

and expenditures

C (i)

, respectively, for the year 2019. The details of the data are shown in Table 1. While the data on personal consumption expenditures are only available at the state level, the personal income data are available for individual cities. Given the strong correlation between personal income and personal consumption at the state level (Pearson correlation coefficient = 0.9983, sig. = 0.00), this study estimates expenditures on personal consumption for each city.

In addition, because the value of a is related to the price unit, it normalizes the per capita output of each city to ensure comparability between China and the United States. Specifically, it adjusts the data so the mean per capita output is the same for both countries. The results are shown in Table A4. Optimal city size

n^{*}

calculated using the optimal city size theory is very close to the second critical point identified by our theoretical model.

Table A4. Optimum city size of China and the United States in 2019.

	a	$ϵ$	$ρ$	$n^{*}$
China	245.471	0.1064	0.3528	1.13 × 10⁷
United States	131.978	0.0994	0.3393	1.22 × 10⁶

Appendix C.4. Differences in Cities’ Economic Regions

Because the scaling exponent is influenced by regional development imbalances in China, these cities are divided into groups on the basis of their geographical locations to compare the regional differences in scaling exponents (Table A5).

Table A5. China’s four major economic regions.

Region	Provinces
Eastern Region	Beijing, Tianjin, Hebei, Shanghai, Jiangsu, Zhejiang, Fujian, Shandong, Guangdong, Hainan, Taiwan, Hong Kong SAR, Macao SAR
Central Region	Shanxi, Anhui, Jiangxi, Henan, Hubei, Hunan
Western Region	Inner Mongolia Autonomous Region, Guangxi Zhuang Autonomous Region, Chongqing Municipality, Sichuan, Guizhou, Yunnan, Tibet Autonomous Region, Shaanxi, Gansu, Qinghai, Ningxia Hui Autonomous Region, Xinjiang Uygur Autonomous Region
Northeast Region	Liaoning, Jilin, Heilongjiang

Classification based on the standards of the National Bureau of Statistics of China.

As Figure A9 shows, during a five-year observation period, the scaling characteristics of the four regions generally increased. Notably, the average scaling characteristics of all of the regions in 2009 decreased sharply. This decline was due to a decrease in employment in 2009 compared with 2007 and 2008, while the urban population continued to grow, resulting in a lower

β

. Although the eastern region had the highest

β

, its growth slowed in subsequent years. In contrast, the scaling characteristics of the central region have accelerated in recent years, indicating that the gap between the central and eastern regions is gradually narrowing. The western region, which followed the central region, had a smaller average scaling characteristic but still showed an increasing trend.

Unlike other regions, Northeast China has scaling characteristics that not only lag but also exhibit slow growth, indicating that employment is shrinking in high-agglomerating industries. This trend is partly due to the rapid development and construction of the eastern coastal areas, which has caused the industrial centre of gravity to shift southwards. As a result, emerging industries in economically developed areas have replaced some dominant industries in Northeast China. During industrial restructuring in Northeast China, emerging industries started late [82]. The traditional pillar industries occupied significant resources, causing emerging industries to grow slowly and be disadvantaged in terms of nationwide competition. This situation has led to the relative “absence” of high-agglomerating and emerging industries in Northeast China.

Figure A9. Differences in the scaling characteristics in different regions. (a) The scaling characteristic distribution (as distribution of

β

with RCA

> 1

) of all cities in the region is considered every five years. From top to bottom in the figure, they are the eastern, central, western, and northeast regions. (b) Different colours represent the four economic regions in China. The average scaling characteristic of the city is its average

β

of all characteristic industries (with RCA

> 1

). The average scaling characteristic of the region is the average of all cities’ scaling characteristics. The shaded part represents the

95 %

confidence interval of the cities in the region. All calculations are based on data from the same year.

Figure A9. Differences in the scaling characteristics in different regions. (a) The scaling characteristic distribution (as distribution of

β

with RCA

> 1

) of all cities in the region is considered every five years. From top to bottom in the figure, they are the eastern, central, western, and northeast regions. (b) Different colours represent the four economic regions in China. The average scaling characteristic of the city is its average

β

of all characteristic industries (with RCA

> 1

). The average scaling characteristic of the region is the average of all cities’ scaling characteristics. The shaded part represents the

95 %

confidence interval of the cities in the region. All calculations are based on data from the same year.

Appendix C.5. The Recapitulation of Industries

In addition to observing the evolution of industries in different cities, this study compares the time series at the city scale and its related factors over the observed period. These changes are captured in the recapitulation of various industries, as shown in Equation (A15).

Δ ln Y_{c i} (t) \approx Δ ln Y_{i o} (t) + {\hat{β}}_{i} Δ ln N_{c} (t),

(A15)

Here,

Δ ln Y_{c i}

represents the total longitudinal change in employment, whereas

{\hat{β}}_{i} Δ ln N_{c}

represents the change in employment associated with population size changes (i.e., scaled growth) between the starting year (2004) and the ending year (2019). The regression of

Δ ln Y_{c i}

on

Δ ln N_{c}

yields the empirical scaled growth coefficient

{\hat{β}}_{i}

and the nationwide trend

Δ ln {\hat{Y}}_{i o}

. Specifically,

{\hat{β}}_{i}

denotes the longitudinal scaling effect of population change on employment.

The equation can obtain the scaling exponents of the cross-sectional change from annual data and the scaling exponents of the vertical change (i.e., the scaled growth coefficient

{\hat{β}}_{i}

) from the difference between the beginning and ending years. However, whether these two results are consistent requires further discussion. Recapitulation score

S_{i}

quantitatively estimates the consistency between them.

S_{i} (t) = \frac{1}{T} \sum_{t = 1}^{T} (1 - |\frac{{\hat{β}}_{i} - β_{i} (t)}{β_{i} (t)}|),

(A16)

Here,

{\hat{β}}_{i}

is the scaled growth coefficient, and

β_{i} (t)

is the cross-sectional scaling exponent in the t-th year, which is calculated via Equation (A1). If population changes are perfectly correlated with employment changes, the scaled growth coefficient is expected to equal the scaling exponent, resulting in a recapitulation score of

S_{i} = 1

. Conversely, a recapitulation score of zero indicates that population changes have no effect on employment.

Table A6 shows the recapitulation scores for various industries. Most industries have recapitulation scores higher than 0.5, indicating that their evolutionary paths during urban development are typical and have reference value. However, for industries with low recapitulation scores—such as public facilities, construction, and health—their scaling characteristics should be analysed on an annual basis rather than focusing solely on average values.

Table A6. Recapitulation score.

Industry	Score
Manufacturing	0.75 ^*
Production and supply of electricity, heating, gas, and water	0.93 ^*
Construction industry	0.42
Wholesale and retail	0.68 ^*
Transportation, warehousing, and postal services	0.83 ^*
Accommodation and catering industry	0.85 ^*
Information transmission, computing services, and software	0.67 ^*
Finance	0.61 ^*
Real estate industry	0.91 ^*
Rent	0.83 ^*
Scientific research, technical services, and geological survey	0.77 ^*
Public facilities management industry	−0.91
Residential services, repair, and other services	0.73 ^*
Educational services	0.83 ^*
Health care and social work	0.42
Culture, sports, and entertainment	0.82 ^*
Public administration, social security, and social organization	0.60 ^*

^* indicates industries with

S_{i} > 0.5

.

References

Bettencourt, L.M.; West, G. A unified theory of urban living. Nature 2010, 467, 912–913. [Google Scholar] [CrossRef] [PubMed]
Chen, M.; Zhang, H.; Liu, W.; Zhang, W. The global pattern of urbanization and economic growth: Evidence from the last three decades. PLoS ONE 2014, 9, e103799. [Google Scholar] [CrossRef] [PubMed]
United Nations. World Urbanization Prospects: The 2014 Revision; United Nations Department of Economic and Social Affairs Population Division: New York, NY, USA, 2015; Volume 41. [Google Scholar]
Balland, P.A.; Jara-Figueroa, C.; Petralia, S.G.; Steijn, M.P.; Rigby, D.L.; Hidalgo, C.A. Complex economic activities concentrate in large cities. Nat. Hum. Behav. 2020, 4, 248–254. [Google Scholar] [CrossRef]
Xu, H.; Jiao, M. City size, industrial structure and urbanization quality—A case study of the Yangtze River Delta urban agglomeration in China. Land Use Policy 2021, 111, 105735. [Google Scholar] [CrossRef]
Zheng, S.; Du, R. How does urban agglomeration integration promote entrepreneurship in China? Evidence from regional human capital spillovers and market integration. Cities 2020, 97, 102529. [Google Scholar] [CrossRef]
Hong, I.; Frank, M.R.; Rahwan, I.; Jung, W.S.; Youn, H. The universal pathway to innovative urban economies. Sci. Adv. 2020, 6, eaba4934. [Google Scholar] [CrossRef]
Bettencourt, L.M.; Lobo, J.; Helbing, D.; Kühnert, C.; West, G.B. Growth, innovation, scaling, and the pace of life in cities. Proc. Natl. Acad. Sci. USA 2007, 104, 7301–7306. [Google Scholar] [CrossRef]
Bettencourt, L.M. The Origins of Scaling in Cities. Science 2013, 340, 1438–1441. [Google Scholar] [CrossRef]
Bettencourt, L.M.; Lobo, J.; West, G.B. Why are large cities faster? Universal scaling and self-similarity in urban organization and dynamics. Eur. Phys. J. B 2008, 63, 285–293. [Google Scholar] [CrossRef]
Pumain, D.; Paulus, F.; Vacchiani-Marcuzzo, C.; Lobo, J. An evolutionary theory for interpreting urban scaling laws. Cybergeo Eur. J. Geogr. 2006, 2006, 1–20. [Google Scholar] [CrossRef]
Youn, H.; Bettencourt, L.; Lobo, J.; Strumsky, D.; Samaniego, H.; West, G. Scaling and universality in urban economic diversification. J. R. Soc. Interface 2016, 13, 20150937. [Google Scholar] [CrossRef] [PubMed]
Misra, S.B. Revisiting Rural Non-farm Sector Employment in India: Trends from 1993-94 to 2023-24. Indian J. Hum. Dev. 2025, 09737030251322830. [Google Scholar] [CrossRef]
Ge, P.; Sun, W.; Zhao, Z. Employment structure in China from 1990 to 2015. J. Econ. Behav. Organ. 2021, 185, 168–190. [Google Scholar] [CrossRef]
Arif, I. Productive knowledge, economic sophistication, and labor share. World Dev. 2021, 139, 105303. [Google Scholar] [CrossRef]
Li, X.; Huang, S.; Chen, Q. Analyzing the driving and dragging force in China’s inter-provincial migration flows. Int. J. Mod. Phys. C 2019, 30, 1940015. [Google Scholar] [CrossRef]
Chen, Y. The evolution of Zipf’s law indicative of city development. Phys. A Stat. Mech. Its Appl. 2016, 443, 555–567. [Google Scholar] [CrossRef]
Taylor, J.R. The China dream is an urban dream: Assessing the CPC’s national new-type urbanization plan. J. Chin. Political Sci. 2015, 20, 107–120. [Google Scholar] [CrossRef]
Ye, X.; Xie, Y. Re-examination of Zipf’s law and urban dynamic in China: A regional approach. Ann. Reg. Sci. 2012, 49, 135–156. [Google Scholar] [CrossRef]
Guan, X.; Wei, H.; Lu, S.; Dai, Q.; Su, H. Assessment on the urbanization strategy in China: Achievements, challenges and reflections. Habitat Int. 2018, 71, 97–109. [Google Scholar] [CrossRef]
Chen, X.; Du, W. Too big or too small? The threshold effects of city size on regional pollution in China. Int. J. Environ. Res. Public Health 2022, 19, 2184. [Google Scholar] [CrossRef]
Zhou, Y.; Xu, H.; Wang, Y.; Li, C.; Luo, Q.; Chen, S. Promoting coordinated spatial governance of mega-cities in China via spatial organization of metropolitan areas. Front. Urban Rural Plan. 2024, 2, 24. [Google Scholar] [CrossRef]
Zheng, D.; Dong, S.; Lin, C. The Necessity and Control Strategy of “Medium Density” in Metropolis. Int. Urban Plan 2021, 36, 1–9. [Google Scholar]
Zipf, G.K. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology; Ravenio Books: London, UK, 2016. [Google Scholar]
Hu, Y.; Connor, D.S.; Stuhlmacher, M.; Peng, J.; Turner Ii, B. More urbanization, more polarization: Evidence from two decades of urban expansion in China. npj Urban Sustain. 2024, 4, 33. [Google Scholar] [CrossRef]
Bee, M.; Riccaboni, M.; Schiavo, S. The size distribution of US cities: Not Pareto, even in the tail. Econ. Lett. 2013, 120, 232–237. [Google Scholar] [CrossRef]
Berry, B.J.; Okulicz-Kozaryn, A. The city size distribution debate: Resolution for US urban regions and megalopolitan areas. Cities 2012, 29, S17–S23. [Google Scholar] [CrossRef]
Li, H.; Wei, Y.D.; Ning, Y. Spatial and temporal evolution of urban systems in China during rapid urbanization. Sustainability 2016, 8, 651. [Google Scholar] [CrossRef]
Mori, T.; Smith, T.E.; Hsu, W.T. Common power laws for cities and spatial fractal structures. Proc. Natl. Acad. Sci. USA 2020, 117, 6469–6475. [Google Scholar] [CrossRef]
Ribeiro, H.V.; Oehlers, M.; Moreno-Monroy, A.I.; Kropp, J.P.; Rybski, D. Association between population distribution and urban GDP scaling. PLoS ONE 2021, 16, e0245771. [Google Scholar] [CrossRef]
Zhao, S.X.; Guo, N.S.; Li, C.L.K.; Smith, C. Megacities, the world’s largest cities unleashed: Major trends and dynamics in contemporary global urban development. World Dev. 2017, 98, 257–289. [Google Scholar] [CrossRef]
Friedmann, J. Four theses in the study of China’s urbanization. Int. J. Urban Reg. Res. 2006, 30, 440–451. [Google Scholar] [CrossRef]
Frank, M.R.; Sun, L.; Cebrian, M.; Youn, H.; Rahwan, I. Small cities face greater impact from automation. J. R. Soc. Interface 2018, 15, 20170946. [Google Scholar] [CrossRef] [PubMed]
Bai, X.; Shi, P.; Liu, Y. Society: Realizing China’s urban dream. Nat. News 2014, 509, 158. [Google Scholar] [CrossRef] [PubMed]
Wang, X.R.; Hui, E.C.M.; Choguill, C.; Jia, S.H. The new urbanization policy in China: Which way forward? Habitat Int. 2015, 47, 279–284. [Google Scholar] [CrossRef]
Xu, Z.; Zhu, N. City size distribution in China: Are large cities dominant? Urban Stud. 2009, 46, 2159–2185. [Google Scholar] [CrossRef]
Gao, B.; Huang, Q.; He, C.; Dou, Y. Similarities and differences of city-size distributions in three main urban agglomerations of China from 1992 to 2015: A comparative study based on nighttime light data. J. Geogr. Sci. 2017, 27, 533–545. [Google Scholar] [CrossRef]
Fang, L.; Li, P.; Song, S. China’s development policies and city size distribution: An analysis based on Zipf’s law. Urban Stud. 2017, 54, 2818–2834. [Google Scholar] [CrossRef]
Cai, E.; Zhang, S.; Chen, W.; Li, L. Spatio–temporal dynamics and human–land synergistic relationship of urban expansion in Chinese megacities. Heliyon 2023, 9, e19872. [Google Scholar] [CrossRef]
ECNS. 17 Chinese Cities Have a Population of over 10 Million in 2021. 2022. Available online: https://www.ecns.cn/news/cns-wire/2022-05-26/detail-ihaytawr8118445.shtml (accessed on 21 May 2025).
National Bureau of Statistics of China. China City Statistical Yearbook; Annual Publications from 2004 to 2020 Accessed from China Economic and Social Big Data Research Platform. Available online: https://data.oversea.cnki.net/en/trade/yearBook/single?zcode=Z009&id=N2025020156&pinyinCode=YZGCA (accessed on 25 May 2025).
GB/T 4754-2017; Industrial Classification for National Economic Activities. National Bureau of Statistics of China: Beijing, China, 2017. Available online: https://openstd.samr.gov.cn/bzgk/gb/newGbInfo?hcno=A703F0E23DD165A5A1318679F312D158 (accessed on 21 May 2025).
National Bureau of Statistics of China. China Statistical Yearbook for Regional Economy; Accessed from China Economic and Social Big Data Research Platform. Available online: https://data.oversea.cnki.net/en/trade/yearBook/single?zcode=Z009&id=N2015070200&pinyinCode=YZXDR (accessed on 25 May 2025).
U.S. Bureau of Economic Analysis. GDP by County, Metro, and Other Areas; Data Released on December 4, 2024. Available online: https://www.bea.gov/data/gdp/gdp-county-metro-and-other-areas (accessed on 25 May 2025).
U.S. Census Bureau, Population Division. Metropolitan and Micropolitan Statistical Areas Population Totals and Components of Change: 2010–2019; Data Covering the Period from 2010 to 2019. Available online: https://www.census.gov/data/datasets/time-series/demo/popest/2010s-total-metro-and-micro-statistical-areas.html (accessed on 25 May 2025).
U.S. Bureau of Economic Analysis. Regional Price Parities and Implicit Regional Price Deflators, by Metropolitan Area, 2020; Data in Millions of Constant (2012) Dollars. Available online: https://www.bea.gov/data/consumer-spending/real-consumer-spending-state (accessed on 25 May 2025).
Liesner, H. The European Common Market and British Industry. Econ. J. 1958, 68, 302–316. [Google Scholar] [CrossRef]
Levy, M. Gibrat’s Law for (All) Cities: Comment. Am. Econ. Rev. 2009, 99, 1672–1675. [Google Scholar] [CrossRef]
Deng, Z.; Fan, H. Study on the law of urban population size distribution in China. Chin. J. Popul. Sci. 2016, 4, 48–60. [Google Scholar]
Chen, D.; Yan, Z.; Wang, W. Urban population size, industrial agglomeration model and urban innovation: Empirical evidence from 271 cities at prefecture level and above. Chin. J. Popul. Sci. 2020, 34, 27–40. [Google Scholar]
Dong, K. Research on Urban Power Law Distribution and Urban Allometric Scaling. Ph.D. Thesis, University of Chinese Academy of Sciences, Beijing, China, 2019. [Google Scholar]
Blank, A.; Solomon, S. Power laws in cities population, financial markets and internet sites (scaling in systems with a variable number of components). Phys. A Stat. Mech. Its Appl. 2000, 287, 279–288. [Google Scholar] [CrossRef]
Gabaix, X. Power laws in economics: An introduction. J. Econ. Perspect. 2016, 30, 185–206. [Google Scholar] [CrossRef]
González-Val, R.; Lanaspa, L.; Sanz-Gracia, F. New evidence on Gibrat’s law for cities. Urban Stud. 2014, 51, 93–115. [Google Scholar] [CrossRef]
Lalanne, A. Zipf’s law and Canadian urban growth. Urban Stud. 2014, 51, 1725–1740. [Google Scholar] [CrossRef]
Wei, S.; Sun, N.; Jiang, Y. Applicability of Zipf’s law and Gibrat’s law in urban size distribution in China. J. World Econ. 2018, 96–120. [Google Scholar]
Eeckhout, J. Gibrat’s Law for (All) Cities. Am. Econ. Rev. 2004, 94, 1429–1451. [Google Scholar] [CrossRef]
Malevergne, Y.; Pisarenko, V.; Sornette, D. Testing the Pareto against the lognormal distributions with the uniformly most powerful unbiased test applied to the distribution of cities. Phys. Rev. E 2011, 83, 036111. [Google Scholar] [CrossRef]
Rabinowitz, P.H. (Ed.) Minimax Methods in Critical Point Theory with Applications to Differential Equations; Number 65; American Mathematical Society: Providence, RI, USA, 1986. [Google Scholar]
Michaels, G.; Rauch, J.; Redding, S.J. Task Specialization in U.S. Cities from 1880 to 2000. J. Eur. Econ. Assoc. 2019, 17, 754–798. [Google Scholar] [CrossRef]
Albouy, D.; Behrens, K.; Robert-Nicoud, F.; Seegert, N. The optimal distribution of population across cities. J. Urban Econ. 2019, 110, 102–113. [Google Scholar] [CrossRef]
Henderson, J.V. Optimum city size: The external diseconomy question. J. Political Econ. 1974, 82, 373–388. [Google Scholar] [CrossRef]
Capello, R.; Camagni, R. Beyond optimal city size: An evaluation of alternative urban growth patterns. Urban Stud. 2000, 37, 1479–1496. [Google Scholar] [CrossRef]
Mizutani, F.; Tanaka, T.; Nakayama, N. Estimation of optimal metropolitan size in Japan with consideration of social costs. Empir. Econ. 2015, 48, 1713–1730. [Google Scholar] [CrossRef]
Giesen, K.; Südekum, J. Zipf’s law for cities in the regions and the country. J. Econ. Geogr. 2011, 11, 667–686. [Google Scholar] [CrossRef]
Jiang, B.; Yin, J.; Liu, Q. Zipf’s law for all the natural cities around the world. Int. J. Geogr. Inf. Sci. 2015, 29, 498–522. [Google Scholar] [CrossRef]
Verbavatz, V.; Barthelemy, M. The growth equation of cities. Nature 2020, 587, 397–401. [Google Scholar] [CrossRef]
Wang, X. Urbanization Path and City Scale in China: An Economic Analysis. Econ. Res. J. 2010, 10, 20–32. [Google Scholar]
Carlino, G. Manufacturing agglomeration economies as returns to scale: A production function approach. In Papers of the Regional Science Association; Springer: Berlin/Heidelberg, Germany, 1982; Volume 50, pp. 95–108. [Google Scholar]
Jin, X. Theories on the Optimum Scales of Cities and Empirical Study: Taking the Example of the Three Municipalities. Shanghai Econ. Rev. 2004, 2004, 35–43. [Google Scholar] [CrossRef]
Zhang, Y. The Empirical Study of Optimal City Size in China: The Perspective of Economic Growth. Shanghai Econ. Rev. 2009, 5, 31–38. [Google Scholar]
Jie, Z.; Yang, X. Optimal city size in China: An extended empirical study from the perspective of energy consumption. China City Plan. Rev. 2017, 26, 22–29. [Google Scholar]
Sun, W.; Jones, J.; Gamber, M. A Turning Point for China’s Population: No Child and Long Illness. Aging Dis. 2023, 14, 1950–1952. [Google Scholar] [CrossRef] [PubMed]
Wang, P. Population Decline Narrows, Population Quality Continues to Improve. National Bureau of Statistics of China. 2025. Available online: https://www.stats.gov.cn/sj/sjjd/202501/t20250117_1958337.html (accessed on 21 May 2025).
United Nations. World Population Aging; Technical Report; Department of Economic and Social Affairs, Population Division: New York, NY, USA, 2017. [Google Scholar]
Liang, J. Prospects for Chinas drive for innovation: From the perspective of demographics. In China and the West; Edward Elgar Publishing: Cheltenham, UK, 2021; pp. 135–147. [Google Scholar]
Yu, X.; Yi, T. Analysis on the Influencing Mechanism of Population Agglomeration, Industry Agglomeration and Innovation Agglomeration in Chinese Megacities. Commer. Res. 2020, 62, 145–152. [Google Scholar]
National Bureau of Statistics of China. 7th National Population Census (2020). Available online: https://www.stats.gov.cn/sj/pcsj/rkpc/7rp/indexch.htm (accessed on 25 May 2025).
Jiang, B.; Jia, T. Zipf’s law for all the natural cities in the United States: A geospatial perspective. Int. J. Geogr. Inf. Sci. 2011, 25, 1269–1281. [Google Scholar] [CrossRef]
Hidalgo, C.A.; Hausmann, R. The building blocks of economic complexity. Proc. Natl. Acad. Sci. USA 2009, 106, 10570–10575. [Google Scholar] [CrossRef]
Hausmann, R.; Hidalgo, C.A. The network structure of economic output. J. Econ. Growth 2011, 16, 309–342. [Google Scholar] [CrossRef]
Zhao, R.; Wang, Y. Analysis on the causes of frequent economic recession in Northeast China—The change from “industry absence” to “system solidification”. Soc. Sci. Front. 2017, 2, 48–57. [Google Scholar]

Figure 1. Schematic diagram of a two-dimensional surface of

R C A

, which is based on

β

and

N

. The color represents the value of the RCA, increasing from blue to orange. * represents a critical point

(β^{*}, N^{*})

.

Figure 1. Schematic diagram of a two-dimensional surface of

R C A

, which is based on

β

and

N

. The color represents the value of the RCA, increasing from blue to orange. * represents a critical point

(β^{*}, N^{*})

.

Figure 2. Time series of scaling exponents of industries in China.

Figure 3. The characteristic distribution of significant industries in cities of different sizes.

Figure 4. Average comparative advantage of Chinese cities of different sizes. Each line represents the

R C A

of an industry in different cities, and the colours are the scaling characteristic of the industry, as

β

. A value higher than

y = 1

indicates that the industry is significant in cities of this size. (a) shows all industries across the entire range of urban scales, while (b) specifically depicts graphs for the typical “knowledge spillover” and “infrastructure” industries near the value of

10^{7}

. Source: Data from the “China City Statistical Yearbook” (2019) [41].

Figure 4. Average comparative advantage of Chinese cities of different sizes. Each line represents the

R C A

of an industry in different cities, and the colours are the scaling characteristic of the industry, as

β

. A value higher than

y = 1

indicates that the industry is significant in cities of this size. (a) shows all industries across the entire range of urban scales, while (b) specifically depicts graphs for the typical “knowledge spillover” and “infrastructure” industries near the value of

10^{7}

. Source: Data from the “China City Statistical Yearbook” (2019) [41].

Figure 5. Cities near the critical point.

N_{2}^{*}

represents the rough average critical points. Different colours represent different cities. Cities near these points are selected to observe the population change in these cities over time. Source: Data from the “China City Statistical Yearbook” (2004–2019) [41].

Figure 5. Cities near the critical point.

N_{2}^{*}

represents the rough average critical points. Different colours represent different cities. Cities near these points are selected to observe the population change in these cities over time. Source: Data from the “China City Statistical Yearbook” (2004–2019) [41].

Figure 6. The distribution of cities with populations near the critical point in 2019. The darkest red colour represents the districts larger than the critical point, the lighter red represents the districts below the critical point

20 %

, the lightest colour represents the districts below

80 %

of the critical population size, and the white areas have no statistical data. Source: Data from the “China City Statistical Yearbook” (2019) [41].

Figure 6. The distribution of cities with populations near the critical point in 2019. The darkest red colour represents the districts larger than the critical point, the lighter red represents the districts below the critical point

20 %

, the lightest colour represents the districts below

80 %

of the critical population size, and the white areas have no statistical data. Source: Data from the “China City Statistical Yearbook” (2019) [41].

Figure 7. (a) Average

β

of the largest cities in China. (b) Changes in

log (N^{*} / N_{m a x})

according to different

γ

.

Figure 7. (a) Average

β

of the largest cities in China. (b) Changes in

log (N^{*} / N_{m a x})

according to different

γ

.

Table 1. Data description.

Variable	Variable Description	Data Source
Employment in China	The employment of 19 industries in prefecture-level and above cities, annual data during 2004–2019, from “China City Statistical Yearbook” [41], 10 thousand people.	https://data.oversea.cnki.net/en/trade/yearBook/single?zcode=Z009&id=N2025020156&pinyinCode=YZGCA (accessed on 20 May 2025)
GRP, GRP (per capita) in China	The gross regional product and gross regional product per capita in prefecture-level and above cities of China, the annual data during 2004–2012 and 2014–2019 are from “China City Statistical Yearbook” [41], and the annual data in 2013 is from “China Statistical Yearbook for Regional Economy” [43], yuan.	https://data.oversea.cnki.net/en/trade/yearBook/single?zcode=Z009&id=N2025020156&pinyinCode=YZGCA (accessed on 20 May 2025); https://data.oversea.cnki.net/en/trade/yearBook/single?zcode=Z009&id=N2015070200&pinyinCode=YZXDR (accessed on 20 May 2025)
Sales of commodities in China	Total retail sales of consumer goods + total sales of commodities of enterprises above designated size in wholesale and retail trades prefecture-level and above cities of China, annual data in 2019, from “China City Statistical Yearbook” [41], 10,000 yuan.	https://data.oversea.cnki.net/en/trade/yearBook/single?zcode=Z009&id=N2025020156&pinyinCode=YZGCA (accessed on 20 May 2025)
GDP (per capita) in the United States	CAGDP1 gross domestic product (GDP) summary by metropolitan area, from U.S. Bureau of Economic Analysis [44], annual data during 2004–2019, thousands of chained 2012 dollars.	https://www.bea.gov/data/gdp/gdp-county-metro-and-other-areas (accessed on 20 May 2025)
Population in the United States	Annual estimates of the resident population for metropolitan statistical areas in the United States, from U.S. Census Bureau, Population Division [45], annual data in 2019, people.	https://www.census.gov/data/datasets/time-series/demo/popest/2010s-total-metro-and-micro-statistical-areas.html (accessed on 20 May 2025)
Sales of commodities in the United States	Real personal consumption expenditures by States, real personal income by metropolitan area, annual data in 2019, from U.S. Bureau of Economic Analysis, millions of constant (2012) dollars [46].	https://www.bea.gov/data/consumer-spending/real-consumer-spending-state (accessed on 20 May 2025)

Table 2. The hypothesis testing results in 2019.

No.	$p_{1}$ Million	$p_{2}$ Million	$p_{3}$ Million	LLR p-Value	Population $p > \| z \|$
1	10	11	∞	0.007 *	0.008 *
2	6	7	10	0.205	0.206
3	5	6	10	0.012	0.012
4	4	5	10	0.022	0.023
5	3	4	10	0.061	0.062
6	2	3	10	0.539	0.539

(*) indicates a significance level of 0.01.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, X.; Chen, Q.; Zhou, Y.; Huang, S.; Shi, Y.; Li, X. An Empirical Evaluation of the Critical Population Size for “Knowledge Spillover” Cities in China: The Significance of 10 Million. Urban Sci. 2025, 9, 245. https://doi.org/10.3390/urbansci9070245

AMA Style

Gao X, Chen Q, Zhou Y, Huang S, Shi Y, Li X. An Empirical Evaluation of the Critical Population Size for “Knowledge Spillover” Cities in China: The Significance of 10 Million. Urban Science. 2025; 9(7):245. https://doi.org/10.3390/urbansci9070245

Chicago/Turabian Style

Gao, Xiaohui, Qinghua Chen, Ya Zhou, Siyu Huang, Yi Shi, and Xiaomeng Li. 2025. "An Empirical Evaluation of the Critical Population Size for “Knowledge Spillover” Cities in China: The Significance of 10 Million" Urban Science 9, no. 7: 245. https://doi.org/10.3390/urbansci9070245

APA Style

Gao, X., Chen, Q., Zhou, Y., Huang, S., Shi, Y., & Li, X. (2025). An Empirical Evaluation of the Critical Population Size for “Knowledge Spillover” Cities in China: The Significance of 10 Million. Urban Science, 9(7), 245. https://doi.org/10.3390/urbansci9070245

Article Menu

An Empirical Evaluation of the Critical Population Size for “Knowledge Spillover” Cities in China: The Significance of 10 Million

Abstract

1. Introduction

2. Literature Review

3. Data & Methods

3.1. Data Sources and Preprocessing

3.1.1. Resident Population Data for Cities from 2004 to 2019

3.1.2. 280 Cities and 19 Industries

3.2. Investigate How the Comparative Advantage Evolves with Population Size

3.3. The Comparative Advantage of Industries and Its Critical Point Analysis

4. Results

4.1. Distribution and Evolution of Scale Characteristics

4.1.1. Evolution of Scale Characteristics

4.1.2. Distribution of Scale Characteristics in Different Cities

4.2. Labour Demand of “Knowledge Spillover” Development

4.2.1. Evolution of Knowledge Spillover Industries

4.2.2. Comparison with the Optimal City Size Model

4.2.3. Limitations of Urban Innovation in China

4.2.4. Influence of Population Distribution Characteristics

5. Conclusions and Suggestions

5.1. The Comparative Advantage of “Knowledge Spillover” Industries Requires a Critical Urban Population Size

5.2. The Critical Population Size of 10 Million for China

5.3. Future Prospects for China

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Data Sources and Statistical Analyses

Appendix A.1. Cities and Industries

Appendix A.1.1. Cities in China

Appendix A.1.2. Municipalities and Prefecture-Level Cities

Appendix A.1.3. 19 Industries

Appendix A.2. Data Sources

Appendix A.2.1. Explanation of the Time Interval of the Data

Appendix A.2.2. Data Interpolation Method

Appendix A.2.3. Logarithmic Linear Correlation of the Population and Employment in Specific Industries

Appendix A.2.4. The Different βi Values for China and the United States

Appendix B. The Theoretical Model and Derivation Process

Appendix B.1. The Scaling Laws for Cities

Appendix B.2. The Comparative Advantage Function

Appendix B.3. Changes in Comparative Advantage

Appendix C. Other Results

Appendix C.1. The Derivation Process for the 2020 Data

Appendix C.2. Labour Demand of “Knowledge Spillover” Cities in China

Appendix C.3. Comparison with the Optimal City Size

Appendix C.4. Differences in Cities’ Economic Regions

Appendix C.5. The Recapitulation of Industries

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Appendix A.2.4. The Different β_i Values for China and the United States