Next Article in Journal
Effect of Pig and Cattle Slurry Application on Heavy Metal Composition of Maize Grown on Different Soils
Previous Article in Journal
How to Enhance Sustainability through Transformational Leadership: The Important Role of Employees’ Forgiveness
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Fast Identification of Urban Sprawl Based on K-Means Clustering with Population Density and Local Spatial Entropy

1
Department of Urban Planning, School of Urban Design, Wuhan University, Wuhan 430072, China
2
Department of Graphics and Digital Technology, School of Urban Design, Wuhan University, Wuhan 430072, China
*
Author to whom correspondence should be addressed.
Sustainability 2018, 10(8), 2683; https://doi.org/10.3390/su10082683
Submission received: 28 June 2018 / Revised: 25 July 2018 / Accepted: 27 July 2018 / Published: 31 July 2018
(This article belongs to the Section Sustainable Urban and Rural Development)

Abstract

:
As urban sprawl is proven to jeopardize the sustainability system of cities, the identification of urban sprawl is essential for urban studies. Compared with previous related studies which tend to utilize more and more complicated variables to recognize urban sprawl while still retaining an element of uncertainty, this paper instead proposes a simplified model to identify urban sprawl patterns. This is a working theory which is based on a diagram interpretation of the classic urban spatial structure patterns of the Chicago School. The method used in our study is K-means clustering with gridded population density and local spatial entropy. The results and comparison with open population data and mobile phone data verify the assumption and furthermore indicate that the accuracy of source population data will limit the precision of output identification. This article concludes that urban sprawl is mainly dominated by population and surrounding unevenness. Moreover, the Floating Catchment Area (FCA) local spatial entropy method presented in this research brings about an integration of Shannon entropy, Tobler’s first law of geography and the Moore neighborhood, improving the spatial homogeneity and locality of Batty’s Spatial Entropy model which can only be used in a general scope.

1. Introduction

Urban spatial expansion is described as a process in which the city encroaches surrounding open space and extends the territory of urban areas, due to its growing population, increasing incomes and decreasing commuting costs [1]. As a disfavored type of expansion, urban sprawl refers to excessive spatial growth of cities, which results in negative consequences such as rapid loss of farmland, despoiled ecological system and swelling traffic congestion in both the West and the East [2,3,4,5,6,7,8,9,10], which resulted in a wide range of economic [2], social [3], health [4] and environmental problems [11], jeopardizing the sustainability system of cities and becoming a major concern for urban studies. Urban sprawls have attracted considerable attention from various perspectives, such as its definition, measurement, cause, effect, simulation and solutions [12,13,14,15,16,17,18]; however, it still has no universal definition [19,20,21], which has made it very difficult to quantify and model the sprawl [22].
Most methods for identification of urban sprawl can be categorized as relative and absolute methods [23]. Relative measures compare urban sprawl indicators among different times usually with data of population density or satellite map [23,24,25], which is often limited by the resolution of data and faces problems of the definition and recognition of urban built-up areas. Relative measurements allow analysts to conclude whether the study area is sprawling or not, while absolute assessments clearly distinguish compact cities from sprawled ones [26], with spatial metrics of landscape ecology, multiple statistic indices such as employment, resource consumption, population, living quality and architecture aesthetics. Similarly, the lack of a threshold value between harmful sprawl and benign expansion still challenge the outcomes of such methods. Contrary to the uncertainty in such a threshold value, the entropy method presents a clear value of urban sprawl measurement on a general scale [27,28,29,30,31], due to the entropy being able to measure the uneven degree of particle distribution in the system. Bhatta argues that the entropy method is more spatial, robust and static than others [20,23]. However, the entropy method cannot provide a spatial result that indicates which area is sprawling. This article hereby argues that the combination of the entropy method and urban sprawl indices presents an ideal way of identifying urban sprawl.
As Ewing claims, they “know it when they see it” [32] or can “identify urban sprawl based on their judgment” [33]—a simple and universally accepted sprawl indicator deserves exploration, one that could be implemented for a wide range of locations and explained straightforwardly to decision makers [34]. Although the driving forces underlying urban sprawl differentiate throughout the world [35,36], there still exists well-accepted common sense on spatial features of urban sprawl: leapfrog development, low-density and unlimited outward expansion [8]. Phelps and Silva indicate that urban interstices at four geographic scales will influence the measure of urban sprawl, wherein the so-called empty lands are not inert at all [37]. Desalvo and Qing argue that the undeveloped land around an average dwelling could be the main determinant of sprawl, based on the testing of the traditional monocentric model in which population density plays a dominant role [38].
Based on the reinterpretation of urban spatial structure models of the Chicago School with gridded population density map, this paper concludes that there exist five combination patterns of spatial population density in Moore neighborhood units, which helps to identify whether the central unit is in a sprawling area. This paper hereby presents a simplified urban sprawl measuring method based on K-means clustering with merely the population density of gridded units and their local spatial entropy which calculates the uneven distribution degree of population density of surrounding units. With testing of the case study in Wuhan, located in China, the result clearly shows the urban expansion pattern of Wuhan with the identification of the sprawling area.
The paper is organized as follows: the first section reviews research on methods of measuring urban sprawl. Section 2 provides the methodology for identifying urban expansion with the population density map and FCA local spatial entropy, followed by the introduction of the case study area and source data in Section 3. Section 4 presents the findings on urban sprawl analysis, further discussion is in Section 5. The final section summarizes the main findings and reflects on the methodology.

2. Methods

2.1. Urban Expansion Diagrams of Gridded Population

Batty indicated that urban modeling and planning methods had been evolving with the development of cities [39], while a monocentric city grew into a multicentric city, connected with regional cities, weaving a more complicated system in the global socioeconomic networks. Classical urban spatial structure diagrams were first revealed by the Thünen ring model in Isolated State theory and then by Ernest Burgess ‘s Concentric zone model [40]. More importantly, Alonso’s Bid Rent curve theory perfectly explained the monocentric city model with the distance decay pattern of population density away from the city center [41]. As the city keeps expanding outward, the Hoyt’s Sector Model [42] and Harris and Ullman’s Multiple Nuclei Model [43] indicate that the transportation system and emerging subcenters generated sector and multi-nuclei model, which still locally obeyed the distance decay pattern in population density near the expanding core or corridor. According to common sense on urban sprawl, sprawling unit is leapfrog, discontinuous away from the main city with a population density lower than central area but still relatively higher than its surrounding area. Thus, based on the different pattern of population density distribution, the diagram of the three classical urban spatial structure models by the Chicago School and the fourth pattern as urban sprawl could be concluded (Figure 1).
In the previous figure of gridded population density, it is obvious that population density plays the key role in urban expansion analysis, which is consistent with what Ewing and Angel claimed [32,33]. With a Moore neighborhood perspective, this study focused on the land unit and its neighbor units (NU) which could be categorized into five kinds of population density distribution pattern: central area (CA), edge area (EA), rural area (RA), sprawl area (SA) and inner ecological area (IA) (Figure 2). Rather than merely utilizing population density of land unit as a single variable in plenty of related research, this category took the surrounding condition into account.
CA located in the main urban area has relatively higher population density as well as its surrounding units, showing a high evenness in the local neighborhood. Similarly, the RA units possess a same even state but only with lower population density. As SA is defined as a discontinuous unit from the main urban area, it would have a higher population density than its neighbor units, showing an obvious unevenness state in its local area. EA and IA show a middle even state comparing to the uneven state of SA and the even state of CA/ RA, wherein the EA units can be seen as the natural expansion of urban core. This article supposed to categorize all the units by their population density and the different evenness state in terms of local situation, which included the unit and surrounding neighbor units. If grouped into 3, the urban area, rural area and expanding area can be recognized, where the expanding area will contain the edge area, sprawl area. While as grouped into 5, all the above units should be further recognized (Table 1).
As the population was already acquired in gridded population map, this experiment only need to calculate the local evenness and define the grouping method, which were defined as FCA local spatial entropy and K-means respectively in this study.

2.2. FCA Local Spatial Entropy

The concept of entropy was first defined in Thermodynamics by Clausius, describing the energy unable to do work during the transferring procedure of energy. Then Boltzmann used entropy as a measurement of disorder in Thermodynamic statistics, which was introduced into information theory and then defined as uncertainty by Shannon [30]. According to the definition of Shannon Entropy, Shannon Entropy can measure the evenness of probability distribution. In a probability set of K (n1,n2, …, nk), wherein ∑ni = 1, the evenness of K can be calculated as:
H(k) = −∑ Pi log Pi,
H(k) ranges from 0 to LogK, while one of the Pi equals 1, H(k) = 0; while all the Pi equals 1/k, H(k) will be the maximum value of LogK. In other words, H(k) is related to the amount of K, so when comparing entropy value among data sets of a different amount, the relative entropy is always utilized and written as:
R(k) = −∑ Pi log Pi/logK,
While applying the entropy method for land units to calculate the local evenness, this study presented a method similar to floating catchment area (FCA) method, searching surrounding area within a threshold distance for land unit m, calculating the Shannon Entropy value of the units inside as the evenness value of the central unit m (Figure 3).
Although Batty introduced entropy theory in space-related studies and defined it as Spatial Entropy [27], spatial entropy only can compute the total evenness of population distribution, which had seldom been used locally. While the First Geographic Law of Tobler pointed out, the nearer the lands exist, the more related their attributes are. Because the FCA method with Shannon entropy applied the spatial correlation rule of Tobler’s law and is capable to measure the uncertainty locally, this study defined this method as Local Spatial Entropy.
The local spatial entropy for gridded units can be calculated in ArcGIS by three steps: (1) Use FEATURE TO POINT to generate centroids for every unit; (2) Use POINT DISTANCE within a threshold searching radius to generate local matrix for each centroid, ensuring merely including the surrounding 8 centroids; (3) Calculate the local spatial entropy for the central point based on Shannon Entropy (Equations (1) and (2)) with the population of all the 9 centroids.

2.3. K-Means Clustering

Since urban sprawl was supposed to be defined by clustering with population density and local spatial entropy, a handy tool suitable for area classification would be preferred. Although there exist many statistical packages, the commonly used K-means clustering was chosen according to Vickers’s comparison on cluster analysis methods [44]. K-means clustering is capable to partition units by spatial correlation, aiming to make the differences among the units in a group, overall groups, is minimized [45]. Furthermore, the K-means algorithm is suitable for the situation that the number of clusters has already been designated [46], since the previous assumption has divided the cells into 3 or 5 groups. Besides, the K-means algorithm is distance-based, taking distance as the evaluation index of similarity, that is, the closer the distance between two objects is, the larger the similarity is. The algorithm considers the cluster to be composed of objects that are close together, so the compact and independent cluster is the ultimate target. Considering the division of the data set X into k groups using Euclidean distance, the minimum of Sum of the Squared Error (SSE) is the objective of clustering algorithm:
SSE   =   i   =   1 k x C i ( x i C i ) 2 ,
Wherein, k is the number of clusters, x is a set of value in a certain cluster with a mean of Ci.
Admittedly, the uncertainty in defining the number of clusters is the main challenge for K-means clustering, if the number is unknown. Normally, the higher Pseudo F-statistic is, the number would be better, however, when the F-Statistic reaches its max value, the algorithm can result in a local optimum rather than a global one. That is to say, with k increasing, the sample partitioning will be more refined, SSE will definitely become smaller. Wherein, when k is smaller than an ideal number of clusters, the SSE will decrease largely, while k is larger than that, SSE will decrease slowly. Such pattern constructs an elbow shape of SSE and k, or F—Statistic and k (Figure 4), which is also called the Elbow method [47]. It is an ideal choice to select the k value at elbow point or slightly larger than that, rather than at the maximum value while using F-statistic.
K-means clustering can be performed in ArcGIS with a Pseudo F-statistics report, wherein the Group Analysis in toolbox has a built-in module of K-means algorithm while setting NO SPATIAL CONSTRAINT for the Spatial Constraints parameter, which makes the process more convenient.

3. Data

3.1. Study Area

The study area is the Wuhan city, located in central China (Figure 5). As the capital of Hubei province and one of the nine National Central Cities of China, Wuhan is the most populous city in Central China. The city boasts abundant mountain and water resources and is divided into the “Three Towns” of Wuchang, Hankow and Hanyang by the Yangtze River and Han river. Limited by the complicated geographic condition, Wuhan keeps expanding along the Yangtze River and the main inter-provincial roads, especially the north mountain area and south lake area extremely affect the expansion. As the same as other Chinese cities [35,36], the urban expansion of Wuhan has been largely driven by government, making more enclaves outside the urban core.

3.2. Open Gridded Population Data of GeodataCn

The gridded population datasets in 2010 was obtained from National Earth System Science Data Sharing Infrastructure of China (http://www.geodata.cn), which covered mainland China with a spatial resolution of 1 km, with land use regression model based on census data at the county level and land use data at a scale of 1:100,000 in national scale. Land use data contained urban and rural built-up area, grassland, forest land and etc., in which built up area made an important role in disaggregating population, hence the GeodataCn represented a static combining population density of both work-time and non-worktime. The GeodataCn had been tested having an estimation accuracy of above 50% in a general national scope [48], which may result in uncertainty in a city level.

3.3. Gridded Population Generated by Mobile Phone Data

To further verify the assumption, this study took a gridded population data generated by mobile phone data. Data of phone call records used in the present study was provided by a partner telecommunication operator whose market share was about 60%, verified for representing whole population distribution proportionally in Wuhan [49]. Mobile phone data of 7,300,000 users in November 2015 in Wuhan City was used in the study. Data was pre-processed, eliminating all privacy-related information. With a similar land use regression model and two-step floating catchment area method (2SFCA), the population of work time was disaggregated to gridded land units, in which the accuracy of the gridded population was verified with an adjusted R2 of about 80% [50]. The mobile phone data in 2015 provided a verification of the urban sprawl pattern of 2010 and a comparison of the different accuracy of data source (Figure 6). Theoretically, the work time population was more capable to help identify the structure of urban expansion, however, the residential area with few populations in the daytime may influence the result of identifying urban sprawl.

4. Results

Extracted from gridded population map of GeodataCn, the population density map of Wuhan could be easily symbolized, the white area inside the map were units with no population, most of which existed river, lakes and mountains. With the calculation of local spatial entropy base on population, local evenness of grid units could be mapped. However, the units along the border of Wuhan showed obviously lower value than nearby units, because the border units have fewer neighbor units, make the k and LogK smaller. Thus, the relative entropy was further calculated to diminish the influence of k, the number of nearby units. It can be seen in the result mapping that the relative entropy shows a more continuous distribution of evenness value than original Shannon Entropy, especially, the border effect has been eliminated (Figure 7). While applying Group Analysis in ArcGIS with a value of population and local spatial entropy, original population density map was categorized into three groups and five groups (Figure 8) with summary reports (Table 2 and Table 3).
Table 2 shows that in 3-group clustering, the green units in the left picture of Figure 6 share relative high mean population density of 14,278 people/km2 and high relative entropy of 0.94, reflecting the feature of Central Area. The gray units with a lower mean population of 449 people/km2 and relative entropy of 0.96 also reflect the feature of Rural Area. Obviously, the red ones are the expanding area and it is not difficult to distinguish the urban sprawl area and edge area from whether it is continuous from the central area, which is aligned to Ewing’s claim of “knowing it by seeing it.”
Table 3 shows that in 5-group clustering, the green and yellow units in the right picture of Figure 6 reflect features of central area with high population and entropy value, which could be recognized as urban area and suburban area. The other ones share lower population but with different entropy value, wherein the red ones reflect an expanding pattern of edge area, the gray ones reflect the pattern of rural area, the purple ones could be supposed to be in the sprawling area.
Although Figure 9 indicates that the maximum of group number is 13 while the Pseudo F-Statistic gets the maximum value, according to the Elbow method, the 5-group and 3-group clustering are preferred because 3 is the ideal k value at the elbow point. Meanwhile, 5-group also shows a relative high F-statistic value and R2 with an overlapped value of media and mean. In conclusion, the 3-group clustering can indicate the expanding area, the 5-group with a higher R2 of 0.94 and 0.84 can further point out the units in urban sprawl area.
Comparing the grouping result map of Figure 8 and the right image of Figure 6, in other words, the expanding trend recognized in 2010 and the existing population distribution in 2015, it can be concluded that the expanding area near the main core has transferred into central area, while those located far away from the center still keeping undeveloped. The comparison confirms that the methodology can be used to identify the pattern of urban sprawl.
This study applied the same method to the gridded population 2015 of Wuhan generated by mobile phone data, clustering the l and units into 3 and 5 groups (Figure 10) with output summary tables (Table 4 and Table 5). Table 4 and Table 5 show the same pattern as Table 2 and Table 3, as the group 1 is located in the rural area, group 4 and 5 represents the central area. The red units in Group 3 indicate the expanding area, while in 5-Group the red units and purple units show the expanding pattern. Interestingly, Figure 11 indicates that the ideal group number is 3, the same as the previous result for the Geodatacn population.
To further illustrate the difference and similarity in these two data, a four-quadrant diagram was provided with the mean value of population and local spatial entropy in all the groups (Figure 12). As the clustering number increases from 3 to 5, both kinds of data show three features: relatively stable rural units with high entropy value, subdivision of urban units and expanding units. Derived from the separating of urban units, the suburban units are refined with a relatively large amount of mean population. Meanwhile, the expanding units are divided into two groups, one with lower local spatial entropy and larger population, the other with higher entropy and a smaller population, both could be identified as sprawling units in term of different definitions of urban sprawl. In this article, the scattered units are defined as sprawling ones, thus both the east and west side of Wuhan show an obvious sprawling trend, wherein most sprawling units are located along the main roads. Instead, the north side in Wuhan shows a classic Hoyt’s sector pattern, a strip or belt expansion.
A detailed expansion map of Wuhan provided by local planning bureau with the built-up area information in 2010 and 2015 (Figure 13) was compared to further testify the result of the proposed method. Not only most of the strip-growth pattern in the north of Wuhan but also the expansion area in east and south side is overlapped, which show the applicability of the proposed fast identification method. Moreover, the scatter sprawling units detected by this article shows more information about the urban sprawling pattern of Wuhan. Generally, Wuhan is expanding along the Yangtze River, keeping the restraints on the development in the direction of the south and north, where exist ecological barriers of mountains and lakes. As the east side of Wuhan has been planned as a new city with the high-tech industrial park, the sprawling state can be seen as a result of upside down driving force. However, it is noteworthy that the massive sprawling area also emerges in the west direction, where is located in the plains and near Hanyang, which is a city with a famous automobile industrial park.

5. Discussion

The research we have done suggests that gridded population can be the only source data to identify the pattern of urban sprawl with K-means clustering based on population and local spatial entropy. These findings are understandable because grid units located in different location have the corresponding level of population and unevenness of surrounding situation, wherein urban sprawl area shows a relatively high population and low local spatial entropy, different from other units, making it easy to be identified.
The ideal cluster number of 3 at the Pseudo F-statistics elbow point in both GeodataCn and mobile phone data further verifies that K- means clustering is suitable to identify the expanding land units from the urban and rural area. On the other hand, the coincidence that Pseudo F-statistics reaches its max value at the elbow point in Figure 9 indicates that the higher resolution population data provides stronger support for the method.
These results agree with Ewing [32] and Angel [33]’s statement, urban sprawl pattern can be recognized by seeing it, moreover, the two variables of population and local spatial entropy support what Joseph [38] and Burchfield [51] have pointed out that the determinant of urban sprawl are population and the amount of undeveloped land around an average dwelling. However, the procedure and data utilized in this research are more simple and straightforward than previous studies. This approach corresponds to the classical urban spatial structure theory of the Chicago School and is highly consistent with Alonso-Muth-Mills’ work, which explained the importance of population density in identifying urban structures [52]. On the other hand, the interstices in the scattered sprawling units verify the pending feature of undeveloped land in urban expansion [37], which should be further explored.
Based on the comparison of output result by open population data and mobile phone data, the finding of this study is restricted to the resolution and accuracy of the data source. Moreover, the FCA local spatial entropy method applied in this research only considers the surrounding 8 units, further study of which threshold distance should be selected is needed.

6. Conclusions

This study suggests the identification of urban sprawl based on K-means clustering with the merely gridded population and its local spatial entropy, wherein the Elbow method is used to verify the group number. Based on the result of the experiments, such an assumption is verified, which further indicates that the role of population density deserves more exploration. The other important contribution of this study is the application of FCA local spatial entropy, which combines Shannon entropy, Tobler’s first law of geography and Moore neighborhood, focusing on the measurement of local spatial unevenness.
The present study focuses on the measurement based on human activities, other than physical environment emphasized by most remote sensing method, which may result in a malfunction in identifying the typical sprawling pattern of Ghost towns [53], where few populations live in newly developed cities. Another limitation is that the accuracy of population data which may influence the discrimination of urban sprawl units, however, with the quick development of urban data environment, population data will be more precise, real-time, thus the method of fast identification for urban sprawling would help government and planners makes quicker and more flexible decisions. In prospect studies, comparison among cities or sprawling areas within the same city-region could be conducted, exploring the driving force underlying urban sprawl.

Author Contributions

L.L. and H.W. conceived and designed the experiments; L.L. performed the experiments; Z.P. and J.Z. acquired and analyzed the data; Y.Y. and H.J. contributed reagents/materials/analysis tools; H.W. and L.L. wrote the paper.

Funding

The research is funded by China Postdoctoral Science Foundation (No. 2016M600609); National Science Fund for Young Scholars (No. 51708425); and China Postdoctoral Science Foundation (No. 2016M602357).

Acknowledgments

The open population data is supported by “National Earth System Science Data Sharing Infrastructure, National Science & Technology Infrastructure of China (http://www.geodata.cn)”.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Brueckner, J.K. Urban Sprawl: Diagnosis and Remedies. Int. Reg. Sci. Rev. 2000, 23, 160–171. [Google Scholar] [CrossRef]
  2. Vietz, J.G.; Rutherfurd, I.D.; Walsh, C.J.; Chee, Y.E.; Hatt, B.E. The Unaccounted Costs of Conventional Urban Development: Protecting Stream Systems in an Age of Urban Sprawl. In Proceedings of the Australian Stream Management Conference, Townsville, QLD, Australia, 31 July 2014. [Google Scholar]
  3. Heckman, C.J. Public Parks and Shady Areas in Times of Climate Change, Urban Sprawl, and Obesity. Am. J. Public Health 2017, 107, 1856–1858. [Google Scholar] [CrossRef] [PubMed]
  4. Frumkin, H. Urban Sprawl and Public Health. Public Health Rep. 2002, 117, 201. [Google Scholar] [CrossRef]
  5. Wu, F.; Xu, J.; Yeh, A.G. Urban Development in Post-Reform China: State, Market, and Space; Routledge: Abingdon, UK, 2006. [Google Scholar]
  6. Christiansen, P.; Loftsgarden, T. Drivers Behind Urban Sprawl in Europe. TØI Rep. 2011, 1136, 2011. [Google Scholar]
  7. Sturm, R.; Cohen, D.A. Suburban Sprawl and Physical and Mental Health. Public Health 2004, 118, 488–496. [Google Scholar] [CrossRef] [PubMed]
  8. Burchell, W.R.; Mukherji, S. Conventional Development Versus Managed Growth: The Costs of Sprawl. Am. J. Public Health 2003, 93, 1534–1540. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  9. Alberti, M. The Effects of Urban Patterns on Ecosystem Function. Int. Reg. Sci. Rev. 2005, 28, 168–192. [Google Scholar] [CrossRef]
  10. Zhang, Y.; Yang, Z.; Li, W. Analyses of Urban Ecosystem Based on Information Entropy. Ecol. Model. 2006, 197, 1–12. [Google Scholar] [CrossRef]
  11. Wang, H.; Ning, X.; Zhu, W.; Li, F. Comprehensive Evaluation of Urban Sprawl on Ecological Environment Using Multi-Source Data: A Case Study of Beijing. ISPRS—International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. 2015, XLI-B8, 1073-77. Available online: https://www.int-arch-photogramm-remote-sens-spatial-inf-sci.net/XLI-B8/1073/2016/isprs-archives-XLI-B8-1073-2016.pdf (accessed on 29 July 2018).
  12. Encarnação, S.; Gaudiano, M.; Santos, F.C.; Tenedório, J.A.; Pacheco, J.M. Urban Dynamics, Fractals and Generalized Entropy. Entropy 2013, 15, 2679–2697. [Google Scholar] [CrossRef] [Green Version]
  13. Sullivan, C.W.; Lovell, S.T. Improving the Visual Quality of Commercial Development at the Rural–Urban Fringe. Land. Urban Plan. 2006, 77, 152–166. [Google Scholar] [CrossRef]
  14. Burchell, W.R.; Shad, N.A.; Listokin, D.; Phillips, H.; Downs, A.; Seskin, S.; Davis, J.S.; Moore, T.; Helton, D.; Gall, M. The Costs of Sprawl-Revisited; Transportation Research Board: Washington, DC, USA, 1998. [Google Scholar]
  15. Ewing, R.H. Characteristics, Causes, and Effects of Sprawl: A Literature Review. In Urban Ecology; Springer: Boston, MA, USA, 2008; pp. 519–535. [Google Scholar]
  16. Frenkel, A.; Ashkenazi, M. The Integrated Sprawl Index: Measuring the Urban Landscape in Israel. Ann. Reg. Sci. 2008, 42, 99–121. [Google Scholar] [CrossRef]
  17. Knaap, G.; Talen, E.; Olshansky, R.; Forrest, C. Government Policy and Urban Sprawl; Illinois Department of Natural Resources, Office of Realty and Environmental Planning: Springfield, IL, USA, 2000. [Google Scholar]
  18. Tsai, Y.-H. Quantifying Urban Form: Compactness Versus ‘Sprawl’. Urban Stud. 2005, 42, 141–161. [Google Scholar] [CrossRef]
  19. Yue, W.; Zhang, L.; Liu, Y. Measuring Sprawl in Large Chinese Cities Along the Yangtze River Via Combined Single and Multidimensional Metrics. Habitat Int. 2016, 57, 43–52. [Google Scholar] [CrossRef]
  20. Bhatta, B. Analysis of Urban Growth and Sprawl from Remote Sensing Data, Advances in Geographic Information Science; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
  21. Galster, G.; Hanson, R.; Ratcliffe, M.R.; Wolman, H.; Coleman, S.; Freihage, J. Wrestling Sprawl to the Ground: Defining and Measuring an Elusive Concept. Hous. Policy Debate 2001, 12, 681–717. [Google Scholar] [CrossRef]
  22. Wilson, H.E.; Hurd, J.D.; Civco, D.L.; Prisloe, M.P.; Arnold, C. Development of a Geospatial Model to Quantify, Describe and Map Urban Growth. Remote Sens. Environ. 2003, 86, 275–285. [Google Scholar] [CrossRef]
  23. Bhatta, B.; Saraswati, S.; Bandyopadhyay, D. Quantifying the Degree-of-Freedom, Degree-of-Sprawl, and Degree-of-Goodness of Urban Growth from Remote Sensing Data. Appl. Geogr. 2010, 30, 96–111. [Google Scholar] [CrossRef]
  24. Singh, B. Urban Growth Using Shannon’s Entropy: A Case Study of Rohtak City. Int. J. Adv. Remote Sens. Gis 2014, 3, 544–552. [Google Scholar]
  25. Torrens, P.M. A Toolkit for Measuring Sprawl. Appl. Spat. Anal. Policy 2008, 1, 5–36. [Google Scholar] [CrossRef] [Green Version]
  26. Al-Sharif, A.A.A.; Pradhan, B.; Abdullahi, S. Urban Sprawl Assessment; Springer International Publishing: Cham, Switzerland, 2017. [Google Scholar]
  27. Batty, M. Spatial Entropy. Geogr. Anal. 1974, 6, 1–31. [Google Scholar] [CrossRef]
  28. Batty, M. Entropy in Spatial Aggregation. Geogr. Anal. 1976, 8, 1–21. [Google Scholar] [CrossRef]
  29. Batty, M.; Morphet, R.; Masucci, P.; Stanilov, K. Entropy, Complexity, and Spatial Information. J. Geogr. Syst. 2014, 16, 363–385. [Google Scholar]
  30. Cabral, P.; Augusto, G.; Tewolde, M.; Araya, Y. Entropy in Urban Systems. Entropy 2013, 15, 5223–5236. [Google Scholar] [CrossRef] [Green Version]
  31. Yeh, A.G.O. Measurement and Monitoring of Urban Sprawl in a Rapidly Growing Region Using Entropy. Photogramm. Eng. Remote Sens. 2001, 67, 83–90. [Google Scholar]
  32. Ewing, R.H. Characteristics, Causes, and Effects of Sprawl: A Literature Review. In Urban Ecology: An International Perspective on the Interaction between Humans and Nature; Marzluff, J.M., Shulenberger, E., Endlicher, W., Alberti, M., Bradley, G., Ryan, C., ZumBrunnen, C., Simon, U., Eds.; Springer: Boston, MA, USA, 2008; pp. 519–535. [Google Scholar]
  33. Angel, S.; Parent, J.; Civco, D. Urban Sprawl Metrics: An Analysis of Global Urban Expansion Using Gis. In Proceedings of the ASPRS 2007 Annual Conference, Tampa, FL, USA, 7–11 May 2007. [Google Scholar]
  34. Aurambout, P.J.; Barranco, R.; Lavalle, C. Towards a Simpler Characterization of Urban Sprawl across Urban Areas in Europe. Land 2018, 7, 33. [Google Scholar] [CrossRef]
  35. Liu, Y.; Fan, P.; Yue, W.; Song, Y. Impacts of Land Finance on Urban Sprawl in China: The Case of Chongqing. Land Use Policy 2018, 72, 420–432. [Google Scholar] [CrossRef]
  36. Tian, L.; Li, Y.; Yan, Y.; Wang, B. Measuring Urban Sprawl and Exploring the Role Planning Plays: A Shanghai Case Study. Land Use Policy 2017, 67, 426–435. [Google Scholar] [CrossRef]
  37. Phelps, A.N.; Silva, C. Mind the Gaps! A Research Agenda for Urban Interstices. Urban Stud. 2018, 55. [Google Scholar] [CrossRef]
  38. Desalvo, S.J.; Su, Q. The Determinants of Urban Sprawl: Theory and Estimation. Int. J. Urban Sci. 2018, 1–17. [Google Scholar] [CrossRef]
  39. Batty, M. Fifty Years of Urban Modeling: Macro-Statics to Micro-Dynamics; Physica-Verlag HD: Heidelberg, Germany, 2008. [Google Scholar]
  40. Burgess, E.W. The Growth of the City: An Introduction to a Research Project. City 2008, 18, 71–78. [Google Scholar]
  41. Alonso, W. Location and Land Use; Harvard University Press: Cambridge, MA, USA, 1964. [Google Scholar]
  42. United States Federal Housing Administration; Hoyt, H. The Structure and Growth of Residential Neighborhoods in American Cities. Development 1941, 19, 453–454. [Google Scholar]
  43. Harris, D.C.; Ullman, E.L. The Nature of Cities. Ann. Am. Acad. Political Soc. Sci. 1945, 242, 7–17. [Google Scholar] [CrossRef]
  44. Dan, V.; Rees, P. Creating the Uk National Statistics 2001 Output Area Classification. J. R. Stat. Soc. 2007, 170, 379–403. [Google Scholar]
  45. Harris, R.; Sleight, P.; Webber, R. Geodemographics, Gis and Neighbourhood Targeting. J. Direct Data Digit. Mark. Pract. 2007, 8, 364–368. [Google Scholar]
  46. Everitt, S.B.; Dunn, G.; Everitt, B.S.; Dunn, G. Cluster Analysis; Wiley: New York, NY, USA, 2011. [Google Scholar]
  47. Ketchen, J.D.; Shook, C.L. The Application of Cluster Analysis in Strategic Management Research: An Analysis and Critique. Strateg. Manag. J. 1996, 17, 441–458. [Google Scholar] [CrossRef]
  48. Bai, Z.; Wang, J.; Wang, M.; Gao, M.; Sun, J. Accuracy Assessment of Multi-Source Gridded Population Distribution Datasets in China. Sustainability 2018, 10, 1363. [Google Scholar] [CrossRef]
  49. Wu, H.; Liu, L.; Yu, Y.; Peng, Z. Evaluation and Planning of Urban Green Space Distribution Based on Mobile Phone Data and Two-Step Floating Catchment Area Method. Sustainability 2018, 10, 214. [Google Scholar] [CrossRef]
  50. Liu, L.; Peng, Z.; Wu, H.; Jiao, H.; Yu, Y. Exploring Urban Spatial Feature with Dasymetric Mapping Based on Mobile Phone Data and Lur-2sfcae Method. Sustainability 2018, 10, 2432. [Google Scholar] [CrossRef]
  51. Burchfield, M.; Overman, H.G.; Puga, D.; Turner, M.A. Causes of Sprawl: A Portrait from Space. Q. J. Econ. 2006, 121, 587–633. [Google Scholar] [CrossRef]
  52. Takahashi, T. Location Competition in an Alonso–Mills–Muth City. Reg. Sci. Urban Econ. 2014, 48, 82–93. [Google Scholar] [CrossRef]
  53. Ge, W.; Yang, H.; Zhu, X.; Ma, M.; Yang, Y. Ghost City Extraction and Rate Estimation in China Based on Npp-Viirs Night-Time Light Data. ISPRS Int. J. Geo-Inf. 2018, 7, 219. [Google Scholar] [CrossRef]
Figure 1. Urban expansion diagrams based on gridded population distribution.
Figure 1. Urban expansion diagrams based on gridded population distribution.
Sustainability 10 02683 g001
Figure 2. Population Density Distribution Pattern of Land Unit and Neighbor Units.
Figure 2. Population Density Distribution Pattern of Land Unit and Neighbor Units.
Sustainability 10 02683 g002
Figure 3. Batty’s Spatial Entropy and FCA local spatial Entropy.
Figure 3. Batty’s Spatial Entropy and FCA local spatial Entropy.
Sustainability 10 02683 g003
Figure 4. Elbow Method for determining the number of clusters.
Figure 4. Elbow Method for determining the number of clusters.
Sustainability 10 02683 g004
Figure 5. Location of Wuhan, Hubei province in China.
Figure 5. Location of Wuhan, Hubei province in China.
Sustainability 10 02683 g005
Figure 6. Gridded Population Density Map.
Figure 6. Gridded Population Density Map.
Sustainability 10 02683 g006
Figure 7. Calculation of Local Spatial Entropy.
Figure 7. Calculation of Local Spatial Entropy.
Sustainability 10 02683 g007
Figure 8. K-means Clustering in Group 3 and Group 5 of Population 2010.
Figure 8. K-means Clustering in Group 3 and Group 5 of Population 2010.
Sustainability 10 02683 g008
Figure 9. Pseudo F-Statistic Plot of Different Groups.
Figure 9. Pseudo F-Statistic Plot of Different Groups.
Sustainability 10 02683 g009
Figure 10. K-means Clustering in Group 3 and Group 5 of population 2015.
Figure 10. K-means Clustering in Group 3 and Group 5 of population 2015.
Sustainability 10 02683 g010
Figure 11. Pseudo F-Statistic Plot of Different Groups.
Figure 11. Pseudo F-Statistic Plot of Different Groups.
Sustainability 10 02683 g011
Figure 12. Mean value of Population and Local Spatial Entropy.
Figure 12. Mean value of Population and Local Spatial Entropy.
Sustainability 10 02683 g012
Figure 13. Comparison with Built-up area changes.
Figure 13. Comparison with Built-up area changes.
Sustainability 10 02683 g013
Table 1. Groups with Population Density and Local Evenness.
Table 1. Groups with Population Density and Local Evenness.
LocationPopulation Density
Land unitSurrounding unitsLocal EvennessGroup 3Group 5
CAHigh/MiddleHigh/MiddleHigh11
EAMiddleMiddle /LowMiddle32
RALowLowHigh23
SAHigh/MiddleLowLow34
IALowHigh/Middle/LowMiddle35
Table 2. Variable-Wise Summary of Group 3 with K-means Algorithm.
Table 2. Variable-Wise Summary of Group 3 with K-means Algorithm.
Population R2 = 0.88
GroupMeanStd. Dev.MinMaxShare
1449.6209412.15165290.3741
2764.20761335.67110,8340.6207
312,738.21562947.59652917,4530.6259
Total976.49512515.30117,4531.0000
Relative Entropy R2 = 0.68
GroupMeanStd. Dev.MinMaxShare
10.95420.04160.84071.00000.2155
20.72930.10090.26100.85210.7999
30.91250.07800.59630.99990.9999
Total0.91410.10220.26101.00001.0000
Table 3. Variable-Wise Summary of Group 5 with K-means Algorithm.
Table 3. Variable-Wise Summary of Group 5 with K-means Algorithm.
Population R2 = 0.88
GroupMeanStd. Dev.MinMaxShare
1449.7935249.65140870.2341
2402.3769567.8729136930.2116
3501.9186760.2594156890.3259
46880.21832224.644355012,7810.5289
514,278.15281757.8904970017,4530.4442
Total976.49512515.3013117,4531
Relative Entropy R2 = 0.68
GroupMeanStd. Dev.MinMaxShare
10.96930.02510.90731.00000.1254
20.84550.04560.74140.90750.2248
30.63660.08850.2610.74050.6489
40.80890.08320.34010.99130.8813
50.94540.04920.76840.99990.3133
Total0.91410.10220.2611.00001.0000
Table 4. Variable-Wise Summary of Group 3 with K-means Algorithm.
Table 4. Variable-Wise Summary of Group 3 with K-means Algorithm.
Population R2 = 0.88
GroupMeanStd. Dev.MinMaxShare
1599.6339573.3265162180.1901
21129.01761466.5203175290.2302
311,069.49443536.1484566432,7080.8268
Total1190.78372524.6185132,7081
Relative Entropy R2 = 0.68
GroupMeanStd. Dev.MinMaxShare
10.97480.03360.830910.1706
20.69040.12090.00910.84630.8449
30.82770.12940.30630.99730.6973
Total0.93360.1120.009111
Table 5. Variable-Wise Summary of Group 5 with K-means Algorithm.
Table 5. Variable-Wise Summary of Group 5 with K-means Algorithm.
Population R2 = 0.88
GroupMeanStd. Dev.MinMaxShare
1558.6441397.6774145780.1399
2757.1577821.7557140480.1238
31345.10961883.1429110,4160.3184
47135.83541867.0962375210,4660.2053
513,846.37722696.654310,48032,7080.6796
Total1190.78372524.6185132,7081
Relative Entropy R2 = 0.68
GroupMeanStd. Dev.MinMaxShare
10.98090.02170.888110.1129
20.79680.06110.67390.89170.2198
30.55850.10570.00910.69020.6873
40.84470.10450.55040.99830.452
50.82780.13350.30630.99730.6973
Total0.93360.1120.009111

Share and Cite

MDPI and ACS Style

Liu, L.; Peng, Z.; Wu, H.; Jiao, H.; Yu, Y.; Zhao, J. Fast Identification of Urban Sprawl Based on K-Means Clustering with Population Density and Local Spatial Entropy. Sustainability 2018, 10, 2683. https://doi.org/10.3390/su10082683

AMA Style

Liu L, Peng Z, Wu H, Jiao H, Yu Y, Zhao J. Fast Identification of Urban Sprawl Based on K-Means Clustering with Population Density and Local Spatial Entropy. Sustainability. 2018; 10(8):2683. https://doi.org/10.3390/su10082683

Chicago/Turabian Style

Liu, Lingbo, Zhenghong Peng, Hao Wu, Hongzan Jiao, Yang Yu, and Jie Zhao. 2018. "Fast Identification of Urban Sprawl Based on K-Means Clustering with Population Density and Local Spatial Entropy" Sustainability 10, no. 8: 2683. https://doi.org/10.3390/su10082683

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop