Population Density or Populations Size. Which Factor Determines Urban Trafﬁc Congestion?

: A large number of articles have documented that as population density of cities increases, car use declines and public transit use rises. These articles had a signiﬁcant impact of promoting high-density compact urban development to mitigate trafﬁc congestion. Another approach followed by other researchers used the urban scaling model to indicate that trafﬁc congestion increases as population size of cities increases, thus generating a possible contradictory result. Therefore, this study examines the role of both density and population size on trafﬁc congestion in 164 global cities by the use of Stochastic Impacts by Regression on Population, Afﬂuence and Technology model. We divide 164 cities into the two subgroups of 66 low density cities and 98 high density cities for analysis. The ﬁndings from the subgroups analysis indicated a clear-cut difference on the critical role of density in low-density cities and the exclusive role of population size in high-density cities. Furthermore, using threshold regression model, 164 cities are divided into the two regions of large and small population cities to determine population scale advantage of trafﬁc congestion. Our ﬁndings highlight the importance of including analysis of subgroups based on density and/or population size in future studies of trafﬁc congestion.


Introduction
Traffic congestion in large global cities continues to worsen resulting in high economic cost, lost time, accidents, pollution, and many other negative effects for billions of urban inhabitants [1][2][3][4]. Research on the causes and mitigation of traffic congestion has produced a very large number of publications particularly over the last three decades [5,6]. The two best known papers, by Newman and Kenworthy (1989) and Kenworthy and Laube (1999), have found a negative correlation between traffic congestion and population density [7,8]. They also found a positive correlation between density and transit use. In other words, high density in cities may discourage car ownership in favor of using public transportation systems. High-density cities may also require shorter driving distances.
Numerous follow-up studies using aggregate or disaggregate samples of cities have had a significant impact of promoting high-density compact urban development to mitigate traffic congestion and pollution. Using an aggregate sample allows the researcher to analyze the overall pattern of traffic congestion for all cities in one sample without differentiating heterogeneous characteristics such as density, population, and income. An aggregate sample can also be categorized into several disaggregate samples of cities with homogeneous characteristics such as density and population. Analysis of disaggregate samples often yields results that may be more relevant to individual cities with unique characteristics.
As for the key determinant of traffic congestion, a common approach is to use population density of a city while another approach is to use population size. Using density as the key determinant often has revealed a negative relationship between traffic congestion and density of cities [7,8]. In contrast, using population size as the key determinant has often shown a positive relationship between congestion and population size of cities. Similarly, using an aggregate versus disaggregate sample often has provided contradicting evidence on the relationship between traffic congestion and cities. This research attempts to resolve these contradictory results by using both aggregate and disaggregate samples of cities. In other words, both population density and population size are used simultaneously as the key determinants of traffic density.
We used traffic congestion data available from TomTom [9]. TomTom [9] created the traffic index (TI) in 2012 to provide an annual benchmark, making it possible to evaluate congestion levels globally in an objective way. In TI 2017, coverage was extended to 390 cities in 48 countries over six continents. Congestion levels are measured as the percentage of increase in overall travel times compared to non-congested, free-flow travel time. For example, a congestion level of 50% means that overall travel time is 50% greater than free-flow travel time. According to Cohn [10], travel times are calculated on all road segments in the top five functional road classes in each urban area. The overall TI value for a city is the weighted average percentage of extra travel time for drivers in that city over 24 h, 7 days a week for the given calendar period. The TI includes 164 large global cities with complete data on the traffic congestion index, population, income per capita, and population density.
A sample of 164 cities obtained from the TI from TomTom will be divided into two disaggregate samples: (1) 66 U.S.-Canadian low-density cities and (2) 98 high-density cities in 14 countries primarily in Europe. We used a cross-sectional multivariate regression model with population size, population density, and income as independent variables. We also examined the population scaling advantage of traffic congestion between the subgroup of large population size cities versus the subgroup of small population size cities. The threshold regression model was used to divide 164 cities into 46 cities whose population size was larger than the threshold value and 118 cities whose population size was less than the threshold value.
This remainder of this paper is organized into four sections. Section 2 is a brief literature review on urban congestion in relation to vehicle and transit usage. The data and method of analysis are presented in Section 3, followed by an analysis of results in Section 4. Finally, conclusions, implications, and limitations of this research are presented in Section 5.

Aggregate and Disaggregate Analysis of Traffic Congestion
The relationship between urban form and traffic congestion or congestion-generated pollution has been studied extensively over several decades, specifically after the publication of two well-known aggregate studies by [7] and [8]. With an aggregate sample of 46 global cities from several regions including North America., Asia, and Europe, Kenworthy and Laube [8] confirmed the earlier finding that as population density increases, car use declines (R 2 = 0.838) and transit use rises (R 2 = 0.757). These two papers have prompted numerous follow-up studies using both aggregate and disaggregate samples of cities and have had a significant impact of promoting the concept of high-density compact development to mitigate car use, traffic congestion, and pollution. Representative survey articles include [5,11].
Two methods of analysis have been used in traditional aggregate studies. The first group of articles continue to use population density as the key determining factor in multivariate regression analysis. For example, Karathodorou et al. [12] and Sue [13] have found statistically significant negative density coefficients of −0.229 and −0.064, respectively, for fuel consumption, supporting the earlier findings by Newman and Ken-worthy [7]. They determined that a 1% increase in density is expected to decrease car fuel consumption by 0.229% and 0.064%, respectively, all else being equal. Ye et al. [14] and Gudipudi et al. [15] also supported the impact of city density on decreased car use and less carbon emissions.
The second aggregate approach uses the urban scaling model, which states that traffic congestion, car use, and pollution emissions may be described by a power-law function [16,17]. For 942 U.S. urban centers, Fragkias et al. [18] established a power function exponent of 1.0 for CO 2 emissions as a function of population size, indicating a constant return to scale. In contrast, Oliveira et al. [19] found that the population size to scale superlinearly (with an exponent of 1.46) represented an increasing return to scale when the U.S. cities were defined as connected urban spaces. Muller and Jha [20] found a sublinear exponent of 0.75 for local air pollution for U.S. metropolitan areas, indicating a decreasing scale trend.
For traffic congestion, Louf and Barthelemy [21] derived a population exponent of 1.27 for a variation of total delay hours due to congestion and 1.262 for excess CO 2 emissions due to congestion for 101 U.S. cities. Barthelemy [22] also established a scale exponent of 1.58 (r 2 = 0.96) for total delay from congestion during peak hours as a function of population size of cities. Barthelemy's [22] data on almost 300 urban areas globally came from TomTom in 2016.
Kenworthy and Laube's [8] aggregate study did not explicitly pursue the possibility that driving behavior of people in dense cities may be different from those in less dense cities, even though their data supported this possibility. Their sample of 13 U.S. and 11 European cities clearly showed that the 1990 average density of the European subgroup was about 3.5 times higher at 49.9 persons per hectare and just 14.2 in the North American subgroup. In contrast, the average car use in the North American subgroup is almost 2.5 times higher at 11,155 km per capita and just 4517 for the European subgroup. The difference in driving behavior in cities with different density has been subjected to rigorous quantitative analysis in the context or residential self-selection of driving in disaggregated studies, which we elaborate on next.
One group of disaggregate studies continues to use density as the key determinant. These studies use household data to control for observable driving differences between households living in low-and high-density areas [23][24][25][26]. For example, Brownstone and Golob [25] found that residential density influences car use. Comparing two households that are similar in all aspects except residential density, a lower density of 1000 housing units per square mile, which is about 40% of the mean value of 2500 housing units, implies a positive difference of 1200 miles per year or 4.8% more miles. Duranton and Turner [26] also found that a 10% increase in "residential density" leads to a 0.7% to 1% decline in driving, when the effects from other variables are held constant. "Residential density" is defined as the density of residents and jobs within a 10-km radius of where a driver lives. Other measures of urban form do not have a measurable effect on driving behavior. Policy implications from these studies suggest that it would be more costly to reduce the driving difference through a densification process than through congestion pricing or increased gasoline taxes.
The second group of disaggregate studies has used population as the key determinant. For example, Chang et al. [27] compared a group of cities with less than 1 million inhabitants to show a strong superliner scale at the exponent of 1.7, while another group of cities with more than 3 million inhabitants showed a linear scale at an exponent of 1.0. Lu et al.'s [28] disaggregate study on pollution concluded that the rate of population growth can aggravate air pollution, but the impact is heterogenous. In mega cities, the rate of population growth can improve air quality, and in small-and medium-size cities, the rate of population growth can aggravate air pollution.
Finally, in an attempt to combine both population size and density to examine the effects on urban CO 2 emissions, Ribeiro et al. [29] proposed a generalized logistic model. The model can generate decreasing or increasing returns to scale, but population size always has a greater effect on emissions than population density. They also suggested a possible use of the IPAT model, which can use both population size and density simultaneously as predictors.

Comparison of Transit Capacity and Usage: The U.S. versus Europe
We analyzed the data by separating into two groups: 66 North American (Canada and the United States) cities and 98 other (mostly, European) cities. Separating into two groups in our analysis is based on the expectation that the effect of the independent variable (dense, population, & income) on the dependent variable would be different between the groups.
The primary factor that moderates the effects of independent variables on the dependent variable is public transit system [30]. In general, it is known that there is a strong correlation between population density and transit system usage. However, high density does not necessarily result in increases in transit capacity and transit uses. Well established transit system is expected to mitigate the effects of density, population size, and income on traffic congestion by absorbing the demands for automobile uses. There are sufficient evidences that in the US cities transit capacities and uses are significantly lower than in European cities.
Comparing the differences of transit capacity and transit usage between the 11 metropolitan cities in Europe versus 13 metropolitan U.S cities, Kenworthy and Laube [8] demonstrated a much greater transit dependency of the European cities over the U.S cities. As Table 1 shows, the overall transit capacity measured as transit service kilometers (km) per person was 3.28 times longer at 92 km in the European cities over 28 km in the American cities. The transit service km per person includes rail service km as well. The rail service intensity measures as rail service km per urban hectare (ha) shows even a greater difference. The European cities experienced 23.86 times higher intensity at 3651 km over just 153 km in the U.S. cities. As for actual passenger travel using transit systems, the European cities captured even a greater proportion of 22.6% over a tiny 3.1% realized by the U.S. cities, which represented 7.27 times higher transit usage by the European cities. Another transit usage measure is percentage of commuting workers using transit during peak hours. Once again, 38.8% of commuters used transit systems in the European cities over just 9.0% in the U.S. cities representing 4.31 times higher usage in the European cities. In short, the transit dependence of the European cities is much greater than the U.S. cities, whereas the automobile dependence is dominant in the U.S. cities.
A discrepancy in dependency on transit systems between the US and European cities has been widen. For example, Freemark [31] in his comparison analysis with data of 2002-2018 period reported that the transit ridership in the U.S. urban cities had declined by 15% during 2010-2018 while that in the French counterparts had increased by 18% during same period of time. With a few exceptions such as New York, Boston, and Houston, people in the U.S. cities tend to prefer using their own cars instead of using public transit system. Freemark [31] attributes the transit's decline in the U.S. to some factors such as low gas price, cultural influences, and economic differences from European ones. On the contrary, the largest French cities had developed effective tram and bus services. Public transit systems in French cities were designed to serve large and dense cities and a large amount of resources has been invested to develop pedestrian-friendly facilities. As an additional evidence, Figure 1 shows that there is a clear demarcation between US and Western European cities in terms of transit ridership per capita [32].

Method and Data
Ribeiro et al. [29] suggested that the well-known environmental principle of I = PAT might be used as the analytical model [33][34][35]. The IPAT (I = PAT) model is a multiplicate approach to assess the role of population (P), affluence (A), and technology(T), on environmental impact (I). The model is not an equation but rather an identity. The IPAT is still useful in determining which factors (driving forces) are most damaging to the environment.
The IPAT model was extended into the stochastic impacts by regression on population, affluence and technology (STIRPAT) model which enabled to estimate the proportional change in environmental impact per given proportional change in population.
The STIRPAT model is defined as where a is a constant, b c, and d are the exponents of P, A and T, respectively, that are to be estimated and e is the residual or error term. Subscript i denotes the cross-sectional units, namely cities in this paper.
To ease the task of estimating exponents, Equation (1) is converted into is log-log form of Equation (2) by taking the natural log of both sides.
The natural log is helpful as it converts non-linear variables to linear ones rendering the results to interpret as percentage change. For example, b can be viewed as population elasticity that measures the percentage change in environmental impact resulting from an one percent change in population.
The STIRPAT model has been used to examine the relationship between population size and CO 2 emissions [29,[36][37][38][39][40]. The STIRPAT model has also been used to examine the impact of population, income, and/or technology in other areas such as material footprint, human ecological footprint, and environmental efficiency of well-being [41][42][43].
Although the STIRPAT model has not been used in the analysis of traffic congestion, the use of the STIRPAT model conceptually may be appropriate. The reason is that TI scores such as CO 2 emission or other environmental and ecological measures may be greatly influenced by underlying elements such as population size, income level, and technology. Another reason for the use of the STIRPAT model is the ready availability of necessary data.
As for the measure for technology, there is no consensus on single measure of technology [44]. According to Cole and Neumayer [37], technology is a broad term which is intended to reflect technological, cultural, and institutional determinants of environmental impact. For example, Uddin et al. [45] have extensively used urbanization ratio measured as percentage of population living in urban areas for technology in their STIRPAT model. Wang et al. [46] also have used urbanization ratio, together with energy intensity for technology factor in their STIRPAT model.
For this study, population density is used to represent technology factor in the STIRPAT model, together with population size of city as P and income per capita of city as A, as shown in Equation (3) as follows.
For the estimation of Equation (3), an ordinary least square method of cross-sectional multivariate regression corrected for heteroskedasticity was used. Equation (3) was used to analyze the impact of traffic congestion for the total groups of 164 cities, 66 cities, and 98 cities.
To avoid error caused from the artificial division of population subgroups, this research uses the Hansen's [47] threshold regression model to test the threshold effect of the population size on traffic index. The single threshold regression model contains Equations (4) and (5) as follows: where i represents the units of analysis which is a city; Y i represents the dependent variable of traffic index; x i represents the explaining variables of population size (P), income per capita (I), and population density (PD); θ 1 and θ 2 represent parameters to be estimated; q i represents the threshold variable of population size used to split the sample into subgroups; γ represents the threshold quantity; and e i represents the error term. According to the variables selected in this research, the threshold model is expressed in Equations (6) and (7) as follows: We, then, combine Equation (6) and Equation (7) using a dummy variable which takes the value of one when the condition in parentheses is met, otherwise it becomes zero. This combined equation is used as the estimation equation of this research. The generalized threshold panel model has been extensively used in the field of energy consumption, renewable energy development, carbon emission, regional technological innovation level on sustainable development [48][49][50][51][52][53].
For data, the TomTom traffic index was downloaded from [9]. The 2016 TomTom traffic index covered 390 cities in 48 countries. For multivariate analysis, GDP per city as well as GDP per capita per city for 280 cities in the world for 2010 were downloaded from the World Cities Report 2016 by UNHABITAT: http://wcr.unhabitat.org/ (accessed on 20 December 2020). By dividing GDP per city by GDP per capita per city, we obtained population figures for 2010. Population density data were downloaded from the "Demographia World Urban Areas and Population Projections 10th Annual Edition 6.1: 2010.7" at http: //www.demographia.com/db-worldua2010.pdf (accessed on 20 December 2020).
We were able to generate data for 164 cities with complete matching data sets of the traffic index, income per capita, population size, and population density for this analysis. Figures 2-4 display 66 cities in the U.S. and 90 cities in Europe and 8 cities in Australia and Mexico respectively. The location of each city in the map is identified in number. Furthermore, each city has its traffic index listed. For example, Akron in the U.S. is numbered as 1 in the map and has its traffic index ranked as T163 among the 164 cities. In addition, Figures 5-8 provide rank distribution of 164 countries in traffic index, density, population and income per capita.        Table A1.
The averaged traffic index for the 98 cities was again higher at 29.3 over 20.5 for the 66 cities. As for income per capita, the average for the 66 cities was higher at $52,048 over $39,636 for the 98 cities. Finally, the averaged population size for the 66 cities was again higher at 2.598 million over 1.994 million for the 98 cities.

Analysis of Results
The correlation analysis of both dependent variable (traffic index) and three independent variables (population, income, and density) is shown in Table 2. The results indicated that statistically significant correlation between population size and population density did not exist. The results of the multivariate regression of the STIRPAT model on traffic congestion index shown in Table 3 indicate that all three variables generated statistically significant coefficients at less than 1% level for the aggregate group of 164 cities with a density coefficient of 0.242, a population coefficient of 0.155, and an income coefficient of −0.20. These coefficients indicate that a 1% increase in population density is expected to increase traffic congestion by 0.242%, while the effects from population and income are held constant for the 164 cities. A 1% increase in population is expected to increase traffic congestion by 0.155%. In other words, the impact from density is greater than the impact from population size for the 164 cities. However, a 1% increase in income is expected to reduce traffic congestion by −0.2%, holding the effects from population and density constant. The results of the same multivariate regression of the STIRPAT model on traffic congestion for the two disaggregate subgroups displayed a substantial difference. For the 98 high density cities primarily from Europe, only the population coefficient met less than the 1% level of statistical test of significance at 0.155 which was identical to the population coefficient of whole 164 cities at 0.155. The density coefficient displayed a marginal value of −0.004, while the income coefficient was −0.137.
In contrast, the 66 low density cities in the US-Canada had both density and population coefficients meeting the statistical test of significance at less than the 1% level, while their income coefficient failed to meet the statistical test of significance. In fact, the impact of density and population became greater at 0.311 and 0.207 respectively, compared to the coefficients for whole 164 cities. On the other hand, the income coefficients realized a negative value of −0.140 while feeling to meet the statistical test of significance.
In short, the results from the aggregate group of 164 cities and the first disaggregate group of low density 66 cities indicated that density was the key determinant whereas the second disaggregate group of high density 98 cities indicated that population was the exclusive determinant for traffic congestion, as shown in Figure 9.
In order to pursue now the question of urban scaling of traffic congestion [18][19][20][21][22]27], population size of cities for whole 164 cities were analyzed by the use of threshold panel regression [54]. The results as shown in Table 4 indicate that the optimal number of threshold was just one with the threshold value of ln14.675 or the population size of 2.138 million inhabitants. In other words, the aggregate group of 164 cities was divided into Region 1 or 46 cities with more than 2.138 million population and Region 2 of 113 cities with less than 2.138 million population.
All three independent variables generated statistically significant coefficients at less than the 1% level for the income and density coefficients and less than 5% level for the population coefficient from the Region 1. However, only the density and population coefficients from the Region 2 met the statistical test of significance at less than the 1% level.  Comparing the population coefficient for the 46 large population cities at 0.149 to that of 118 smaller population cities at 0.209 supported a moderate scale advantage for the subgroup of large population size cities. To explain, a 1% increase of population size in the larger population city subgroup (Region 1) generated only 0.149% increase of traffic congestion, while a 1% increase of population size in the smaller population size subgroup would generate 0.209%, supporting the urban scale advantage in traffic congestion, as shown on Figure 10.
It is interesting to note that the density coefficient also displayed urban scale advantage in that a 1% increase in density generated smaller increase of congestion at 0.277% in the large population city subgroup overcome small population city subgroup at 0.327%. The same urban scaling advantage did exist on income in that a 1% increase in income generated a larger reduction of congestion at 0.349% in the large population city subgroup over −0.2% for the aggregate group of 164 cities. Finally, between density and population, density played somewhat greater impact to traffic congestion in large population subgroup as well as in small population subgroup, supporting the earlier finding from the whole 164 cities.

Conclusions
The key findings of this research can be summarized as follows. First, the results of multivariate regression on traffic congestion using the STIRPAT model for the aggregate group of 164 cities from 16 countries generated the most influential factor to be density with its coefficient at 0.242 over the population coefficient at 0.156. On the other hand, the income coefficient at −0.20 reduced congestion as income per capita in cities increased. All three coefficients met the statistical test of significance.
Second, the results from the same STIRPAT analysis for the disaggregate group of 66 low-density US-Canadian cities supported the results from the aggregate group with even higher density coefficient at 0.311 and a population coefficient at 0.207, both meeting the statistical test of significance. However, the income coefficient with a value of −0.140 failed to meet the statistical test of significance.
Third, the results of traffic congestion from the other disaggregate group of 98 highdensity cities primarily from Europe generated that population coefficient at 0.155 was the only influential factor for traffic congestion. The density and income coefficients failed to meet the statistical test of significance. Besides, the density coefficient carried a very low value of 0.004.
Fourth, the use of threshold panel regression enabled the division of whole 164 cities into two regional subgroups of 46 larger population cities and the remaining 118 smaller population cities. Comparing the population coefficients between the subgroup of 46 larger population size cites and the subgroup of 118 smaller population-size cities, the same 1% increase in population resulted in 0.149% and 0.209% increase of traffic congestion respectively. Put it another way, the larger city would generate about 35% less traffic congestion from the same 1% increase of population size, compared to the smaller city.
Fifth, as for the difference of impact among the three variables, the large population subgroup of 46 cities displayed the order of impact from income at −0.349, followed by density at +0.227 and population at 0.149. The smaller population subgroup of 118 cities displayed the order of impact from density at +0.327 followed by population at +0.209.
In short, combining the use of STIRPAT model and threshold panel regression, we have analyzed traffic congestion in the aggregate group of 164 cities together with four disaggregate groups of 66 low-density cities, 98 high-density cities, 46 large population cities and 118 small population cities. Among the three independent variables of density, population, and income, statistically significant impact exercised by income was limited only in the two case of the aggregate group and the disaggregate group of large population cities. Therefore, focusing on the relative impact between remaining variables of density and population, population exercised the primary impact to congestion only in one case of the 98 high-density cities. In the remaining four cases, density has exercised greater impact over population to determine the level of traffic congestion.
These findings from the aggregate group indicate that traffic congestion in cities is influenced by varying density and population sizes. The findings from the disaggregate samples highlighted a clear difference in the critical role of density in low-density cities and the exclusive role of population size in high-density cities. Further threshold analysis by population-based subgroups in the aggregate group established the existence of both a population and density scale advantage to traffic congestion.
This study has several implications for researchers and city planners. For evaluation and planning of traffic congestion, individual cities may want to begin their analysis with an aggregate sample of cities and use all three determinants of density, population, and income since this research established statistically significant coefficients for all three factors. Next, if a disaggregate analysis between cities with low versus high density is possible, then a high-density city may wish to focus on analyzing a disaggregate group of high-density cities and use population and other determinants, because an additional increase of density in highly compact cities may have little impact on traffic congestion as suggested by Kenworthy and Laube [8] and many others. In contrast, the population scale advantage of congestion indicates that cities with a larger population may experience proportionately less congestion.
A low-density city, however, may wish to focus on a disaggregate group of lowdensity cities and use density as the primary determinant together with population and other variables. This research estimated that the size of the density coefficient was nearly 82 percent larger than the population coefficient from the 64 low-density cities (0.351/0.193). However, this study also indicates that using both density and population as determinants is preferred, as both variables met the statistical test of significance.
In summary, the use of aggregate analysis where high-and low-density cities are combined may generate both density and population coefficients, which may not always be appropriate to evaluate traffic congestion for individual cities with either low or high density. In other words, the results from an aggregate study with a large cross-country sample needs to be verified with appropriately designed disaggregate studies before developing specific policy implications for an individual city or country. Furthermore, empirical findings of population scale advantage on traffic congestion provide valuable supporting evidence to earlier studies of urban scaling of traffic congestion [21,22,27].
This study has several limitations that provide opportunities for future studies. Due to the unavailability of relevant data, this study was unable to account for several other possible determining factors for traffic congestion. These other factors may include automobile density, percent of the commuting population, traffic control systems, traffic regulations, road conditions, weather, and many other factors that vary across cities in multiple countries. The use of the TomTom traffic index may be another limitation in that the index cannot be verified by other data sources.
Despite these limitations, this research has established the role of both population density and population size of cities as important determining factors in traffic congestion. Equally important is that it is essential to use disaggregate samples based on density and/or population in future studies of traffic congestion. Acknowledgments: The authors are very grateful for an anonymous reviewer, who suggested many ideas to improve the quality of the paper. Competent and sincere help provided by research assistants, Su Min Kim and Young Eun Kim, at the Gachon Center for Convergence Research is also appreciated.

Conflicts of Interest:
The authors declare no conflict of interest.