Examining the Density and Diversity of Human Activity in the Built Environment: The Case of the Pearl River Delta, China

Rapid urbanization in China has been accompanied by spatial inefficiency in patterns of human activity, of which ‘ghost towns’ are the most visible result. In this study, we measure the density and diversity of human activity in the built environment and relate this to various explanatory factors. Using the Pearl River Delta (PRD) as an empirical case, our research demonstrates the distribution of human activity by multi-source data and then explores its dynamics within these areas. This empirical study is comprised of two parts. The first part explores location information regarding human activity in urbanized areas and shows density and diversity. Regression models are applied to explore how density and diversity are affected by urban scale, morphology and by a city’s administrative level. Results indicate that: 1) cities with smaller populations are more likely to be faced with lower density and diversity, but they derive greater marginal benefits from improving land use efficiency; 2) the compactness of the layout of urban land, an index reflecting the plane shapes of the built environment, is highly correlated with density and diversity in built-up areas; and 3) the administrative importance of a city has a significant and positive impact on the density of human activity, but no obvious influence on its diversity.


Introduction
Over the past 30 years, China's rapid urbanization has resulted in a considerable increase in the number and size of built-up areas. Meanwhile, land use in some cities has become wasteful and inefficient. These places have become 'ghost cities', where few people are active [1,2]. The Standard Ranking and Investment Times jointly issued the 'China ghost town ranking (2014)'. This ranking demonstrated built-up areas that were clearly inefficient, such as Ordos in Inner Mongolia and Chenggong in Yunnan. This inefficiency was also found in Changzhou New District in the Yangtze River Delta (YRD), Tsuihang and Huidong New District in the Pearl River Delta (PRD) and Chengde in the Beijing-Tianjin-Hebei region, which are considered to be the top three most developed regions in China.
As the emergence of ghost towns complicates the realization of sustainable urban development [3], Batty [1] stated that ghost towns have become the epitome of China's urbanization, the operating rules of which are worth pondering over. Chen et al. [2] have proposed that ghost towns are a direct consequence of China's finance-driven urbanization process. To examine this topic, Woodworth and Wallace [4] reported that an effective way to investigate the phenomenon of ghost towns is to consider land use to discern patterns revealing a surge in built-up areas relative to population. In other words, this approach focuses on examining ghost towns by measuring social and economic human activities in a built-up area.
However, data scarcity has long prevented the exploration of measuring human activities in built-up areas and the verification of factors related to them. Most scholars' attention has centered around the efficiency of entire regions and has failed to offer a detailed investigation of human activity patterns in built-up areas [5]. What has been commonly found in traditional studies covering this topic is that researchers merely acquire data from statistical yearbooks, but these tend to be outdated. Thus, unsurprisingly, such data does not allow them to reveal the factors that bring about higher levels of density and diversity, which are critical features of human activity in a built-up areas.
On another note, empirical studies that could serve as verifications of human activity-related factors that affect those areas are limited, even though several scholars have made progress in studying the topic of the expansion of construction land in China. For instance, Gao et al. [6] found that the dominant characteristics of urban land expansion in the YRD varied depending on the development and administrative level of the area. Li et al. [7] also revealed that cities at higher administrative levels show a faster increase in urban land expansion. Nevertheless, the relationship between human activities and urban morphology or the impact government and governance in China has not been examined.
With the availability of big data and opportunities to process these data, multiple sources, including open data from the internet, can record activities on Earth from various equipment [8]. This progress has made it possible to measure individual activity in built-up areas and relate it to various explanatory factors. With this in mind, we have formulated two main research questions:

1.
How can we determine the density and diversity of human activity in built-up areas? 2.
Which factors, besides urban scale, indicate the high efficiency of a city?
These two questions-that is, questions that effectively describe the density and diversity of human activity in an urban area and disclose the major factors causing it-constitute the overarching aim of this study. We feel it is important to elaborate on these two issues since policymakers around the world are observing an increase in the numbers of urban projects considered to be ghost towns (e.g., Masdar Eco-City in the United Arab Emirates and Songdo in South Korea) [4]. Thus, by addressing these two research questions, this study can lay the foundation for a better understanding of ghost towns and then provide a robust reference for global policymakers to promote urban sustainability through ensuring the density and diversity of human activity in land use. This paper is structured as follows. First, we present the theoretical background and the research context. Then, the study area, the definition of our methodology and data acquisition process are discussed. We subsequently follow up with our empirical methods, which include mapping the density and diversity of human activities in the studied area as well as examining the factors that can explain these results by regression analysis. The paper concludes with an overview of our main findings and provides implications, which may serve as an agenda for policies regarding land usage.

International Perspectives on Ghost Towns
Thus far, we have not seen any widely recognized definition of ghost towns. A definition of ghost towns was suggested by Shepard [9] as "a new development that is running at significantly under capacity, a place with drastically fewer people and businesses than there is an available space for". However, this definition still has omissions. Sorace and Hurst [10] stated that ghost towns comprise a broad phenomenon of land urbanization. For example, according to 'China's ghost town ranking (2014)', ghost towns can be identified when either a town has recently experienced a massive amount of development or when a place has a sufficient existing built environment, but a fairly small amount of new construction.
As there is no consensus on the definition and standards of ghost towns [8,11,12], Woodworth and Wallace [4] have proposed three approaches to this topic:

1.
Property market dynamics: Exploring how robust demands, the specific geography of the property market and households' financial strategies can contribute to an oversupply of housing and other types of property; 2.
New-town projects: Explaining why new town projects that were planned on a large scale, led by a growth orientation and featuring an abundance of public infrastructure may drive the emergence of ghost towns; 3.
Land use: Considering land use to discern patterns that reveal a surge in built-up space relative to population.
Due to the scarcity of reliable data on urban tenancy and vacancy in China, these three approaches were not fully taken into account in a further analysis [4]. However, some recent technological progress allows us to circumvent data limitations. With the deepening of information technology, data from multiple sources, including open-source data from the internet, enable us to record human activities occurring in space [4]. As a result, the last approach (i.e., considering land use to discern patterns that reveal a surge in built-up space relative to the population) can be further studied. In other words, it is possible to examine ghost towns by measuring social and economic human activities in built-up areas.

Focusing on Human Activity in Built-Up Areas in China
Human activity in built-up areas is a critical issue, but it has long been overlooked in research related to urban development in China. In reviewing the literature on land use mechanisms that have resulted in wasteful and inefficient suburbanization after 1978, we found that this phenomenon always hinges on developers, governments, land prices, the legal position of the owners of rural land and the requirement to secure agricultural land [7,12]. However, these mechanisms, which are related to the supply of construction land, are only one side of the coin, since they ignore the demand of human activities in urban planning, which has also contributed to the ghost town phenomenon in China.
Some recent research efforts have attempted to examine human activity at the micro-level. Chi et al. [8] used Baidu positioning data to present the spatial distribution of vacant housing areas. Jin et al. [13] explored the road junctions, points of interest and location based service records to identify ghost towns. Zheng et al. [14] utilized remote sensing datasets including nighttime light imagery, land cover products and a population grid to develop a "ghost city" index (GCI). However, they only successfully distinguished different types of activity on construction land, failing to examine diversified individual human activities when illustrating the condition of construction land. Few studies have discussed the density and diversity of human activity in urbanized areas. As density and diversity are critical features of human activity in built-up areas, failing to make them explicit makes it difficult to understand the relationship between people and land, and is not conducive to a more thorough approach the phenomenon of ghost towns.
Therefore, to address this gap, we decided to explore human activity in the built environment in well-designed places in China that have also encountered the problem of insufficient individual activity [1,15]. Thus, we should seek to figure out where human activity in built-up areas is lower or higher than the average, including the underlying general mechanisms. To reach this goal, we mainly deal with the density and diversity of human activity in built-up areas, rather than how construction land has been acquired or developed, topics that have already been addressed extensively in the academic literature [6,7,12,16,17].

Measuring Human Activity in Built-Up Areas
Measuring human activity in built-up areas is indispensable for the evaluation of land use intensity. Land use intensity was used as a breakthrough point to explore the efficiency of land use in earlier decades [18], which could be defined as the degree of urban land intensity [5]. There is, however, not a single definition of land use intensity [19]. In fact, various definitions for the term land use efficiency or load efficiency of land use are used in different fields [20][21][22][23]. For example, density has been considered one of the most important indices for measuring land use efficiency [24,25]. In China, scholars have also begun to pay attention to the intensive utilization of land throughout the urbanization process. Generally, studies have employed economic statistical yearbook data to measure the intensity of land use [4,13,14], which can demonstrate the pros and cons of an urban built environment on the macro scale. However, the low efficiency of urban construction land is mostly reflected in some specific areas, rather than the whole city. Previous studies put an excessive emphasis on the region as a whole and fail to give a thorough investigation into the specific area. Moreover, they ignore human activities that express real efficiency. The results of such an evaluation could differ from the daily patterns of behavior if they only rely on economic statistical yearbook data without exploring individual spatial activities. Further discussion about human activity and urban density will be presented in Section 3.2.2 of this study.

Human Activity Factors in Built-Up Areas
As mentioned in Section 2.1, there is not one overriding factor that can explain the emergence of ghost towns. Based on this condition and to avoid the construction of 'ghost towns', in this study we further explore factors influencing human activity in urbanized areas. Although human activity, as a crucial element of spatial planning, has been mentioned in a significant amount of the literature, there is limited research directly examining which factors have had an impact on it [8]. Based on the key studies focusing on human activity, which are highly relevant to the context of China, this paper proposes three potential factors as pivotal elements in making land use policies, namely urban scale, morphology and administration [7,12,[26][27][28].
The first factor is urban scale. Changes in urban size inevitably have a profound impact on agglomeration economy [29]. Some scholars believe that large cities perform better in terms of urban economics, following the theory developed by Fujita et al. [30] and Glaeser [31], while other scholars suggest that small cities and towns still play essential roles in Europe [32,33]. Moreover, various studies have discussed the relationship between urban scale and land use intensity. For example, land utilization efficiency will be improved when the scale of a city increases in some cases [34,35]. While the relationship between urban scale and human activity in built-up areas has not been thoroughly analysed in the field of urban studies [5], in this regard, this paper aims to analyse the potential impact of city size on human activity in urbanized areas.
Secondly, urban morphology is proposed as another factor. Urban morphology can be considered as an embodiment of various elements such as development policies, urban design and the accessibility of transport infrastructure [36,37]. Although this still needs to be verified empirically in more detail, urban planners normally assume that compactness in urban morphology is closely related to human activity [25][26][27]. The main reason is that compactness in urban morphology is conducive to decreasing the likelihood of automobile-oriented designs that render urban space dangerous for walking and performing other slow outdoor activities [38]. Correspondingly, in the theory and practice of new urbanism and smart growth, dealing with the problem of suburban sprawl by creating compact urban spaces was the main task [39][40][41]. In line with existing work on the plane form of the built environment [42][43][44][45], this study will examine the compactness of the layout of urban areas, the shapes of which can be observed in layout maps of land use.
Finally, we identify urban administration as the third factor. Political choices can affect urban planning and the socio-economic activities in a city [46,47]. In China, administrative factors within the country have exerted an important impact on city growth since ancient times [48]. Historically, the power held within the state, with its different tiers of government, has had a notable impact on a city's economic gathering activities [49][50][51]. During China's recent urbanization, the development of an administrative center to help a new town prosper was recognized as a common approach. It is also important to point out that state interventions are becoming gradually more professional and a fragmented governance regime will result in land fragmentation in developed regions in China [16,52,53]. Through the leasing of land, local governments have accumulated the revenue needed to improve infrastructure construction and investment [19]. This has also resulted in urban sprawl and thus has led to serious social, economic and environmental problems [54]. Based on the existing work on the possible impact of administration on land use [7,12], this paper will verify the relationship between human activity in urbanized areas and administration.
The abovementioned studies have provided the basis for our hypothesis regarding the main factors behind human activity. Although there are other factors, such as a beautiful urban landscape or magnificent architecture, human activity in new towns is most directly affected by a town's population, layout morphology and administrative governance. The reason is that many examples of well-designed areas in China are still struggling with the phenomenon of ghost cities [1]. Moreover, narrowing the research scope will be helpful in informing planners as to which factors are significant in their planning toolboxes.

Study Area
Planning with the purpose of land use efficiency improvement in mind is more likely to succeed in the land market of highly developed city regions than in its less developed counterparts in the western and central parts of China [55][56][57][58]. Nevertheless, according to Mingpao and Cable News Network (CNN), ghost towns were also found in the PRD and Beijing-Tianjin-Hebei region, which are both considered developed regions. For this study, we chose the PRD, an advanced city region in southern China adjacent to Hong Kong and Macao [59], as the case for our empirical study. PRD has a population of 56 million, with a Gross National Product (GDP) of 578 million yuan in 2014, making it one of the most developed and active areas in China. According to data released by the Ministry of Land and Resources, the PRD accounted for nearly 20% of the total construction land in China in 2012.
This study relies primarily on the 'Master Plan for the Pearl River Delta Region (2015-2020)' drafted by the Guangdong provincial government. The main results of this research were recognized by the Urban and Rural Planning and Design Institute of Guangdong Province in 2015. In this paper, the PRD is defined as the area that consists of Guangzhou, Shenzhen, Foshan, Dongguan, Zhaoqing, Zhuhai, Zhongshan, Jiangmen and Huizhou. Due to lack of data, this paper dismisses some parts of Zhaoqing, including Huaiji, Fengkai, Deqing and Guangning towns, all of which were not included in the 'Regional Planning of Pearl River Delta (2004-2020)'. Thus, there are, in total, 43 units in the PRD according to existing administration and planning documents. This modification makes it possible for us to update statistics and verify the potential impact of the administrative hierarchy of each spatial unit on human activity. It should be noted that some of these units have exclaves due to the sprawling construction land in this mega-city region ( Figure 1). This feature also brings us to the challenging task of measuring morphology for these units within the PRD.

Research Approach
Open data can be conducive to the human activity measurement in the built environment. Guided by data from the statistical yearbook, analyses of regional construction land typically ignore individual behavior, which is one of the weaknesses in previous research. In recent years, the emergence of open data, to a certain extent, has solved the long-standing problem of data shortage in the past, making quantitative analyses feasible [60][61][62][63]. Geo-data mining from open sources has been a huge breakthrough in measuring human activity, economy, culture, transport, and entertainment [64]. It goes hand in hand with information technology [56,61,[65][66][67][68], which makes it possible to carry out quantitative analyses of crowd activities and spatial distribution with the help of Weibo.com, the Chinese equivalent of Twitter, and other Social Networking Services (SNS) tools. At present, there are lots of studies on traveling residents, the data of which are usually derived from telecom operators, social networking sites, taxis, and bus IC cards [69][70][71]. Scholars have also explored the features of land use in urban areas through the lens of social media [23,[72][73][74]. Internet real-time data can update and pinpoint the location of users, which makes the analysis of human activity in urbanized areas possible. As mentioned before, the aim of this study is to find the density and diversity of human activity in built-up areas and its major causes. Figure 2 presents a schematic of our approach, including data acquisition, the mapping of density/diversity and factor examination.

Data Collection and Processing
Residential behavior not only refers to daily life, but also includes business and leisure activities. Human activities include residential, business and recreational activities. Moreover, the real data for 'ghost towns', such as the number of property vacancies and the amount of unused land often refer to the data for the entire city, which do not accurately reflect specific neighborhoods. There is a risk that these data will also cover other high-value regions, resulting in analysts' inability to accurately identify whether ghost area development occurs. This study takes open data as the primary source and census data as a supplement. The data sources are as follows: (1) residential activities from data about online/TV shopping and bus routes, from which we acquired spatial location data by geocoding address information; (2) business activity data, which came from a directory of the enterprises, public facilities and commercial facilities existing in 2014; (3) recreational activities data, which came from a list of entertainment facilities and open platforms on Weibo.com and Panoramio.com, on which we could get microblogs and pictures posted by users with geographic information when a blogger published a microblog through the information interface in the microblog Application Programming Interface (API) service ('2/place/nearby_timeline') via App Key and App Secret. It should be noted that these open data can be acquired on the internet without barriers to global access. The available contents include: time, geographical coordinates, published text content, user ID, distance from the center point, gender, user location. It does not involve specific names and addresses, and there is no personal privacy problem. Data collection was completed in July 2015, contributing to the Master Plan for the region of Pearl River Delta (2015-2020). Specifically, data from Weibo.com, online shopping and TV shopping were all compiled in July 2015. The directory of enterprises includes all active firms at the end of 2014. Thus, diversified open source data will bring about a certain bias when measuring the land use efficiency in the PRD. It should be noted that this paper only provides one perspective regarding construction land efficiency measurement using open data. The types of activity data available to scholars in different countries are different. Human activities can be measured through a variety of data. The behavioral data that reflect human activities can be used as data sources, although scholars do not have to restrict themselves to the data sources, acquisition methods and data types used in this paper. For example, the data from the heatmap of google.com (Google-Maps-iOS-Utils) can be used as residential activity and business activity data, the geographic data with location information from facebook.com can be used as social activity data, the data from flickr.com can be used as recreational activity data.
After data mining and spatial orientation, this article explores the spatial characteristics of human activity within the built environment. ArcGIS, serving as an information platform for sustainable land management [75], is extensively used.
Data processing is divided into four steps. (1) Geocoding the online/TV shopping data and the addresses of the public service facilities, bus routes and enterprises. (2) Converting text information from the address into warp/weft coordinates through the geocoding API service function in Baidu Location Based Services (LBS) (http://lbsyun.baidu.com/) on the open platform of the web API page. With the help of Locoy spiders software, we sent and received the data returned by HyperText Transfer Protocol (HTTP) request in batches, and completed the address resolution of the text information.
(3) Transferring coordinating data to location data that can be used on the GIS platform, on the basis of which we then divided the spatial units of construction land to match the density analysis. (4) Merging the data into a 2.5 km × 2.5 km grid through the merge function in ArcGIS to make GIS analysis possible, which refers to the urban residential area. It is supposed to include 30,000-50,000 people according to 'Urban Residential Area Planning and Design Standard GB 50180-93' (2002 edition).

Mapping the Density and Diversity of Human Activity
There are various measures for establishing land use efficiency in different fields [18][19][20][21]. To explore the efficiency of urban land use in grid areas, we used the measurements of the density and diversity of human activity.
On the basis of data mining, geocoding and spatial dimension reduction, this study carries out an examination of urban activity in construction land through a comprehensive measurement. High density, diversity and properly mixed functionality have been considered the three dominant features of a city since first identified by Jacobs [76] in 1961, and have become key issues in the development of urban studies and theories. According to Jacobs, the diversity and density of urban spaces are closely related and deeply intertwined, just like a pair of twins, to quote his original expression. Furthermore, ensuring necessary urban density and diversity is crucial to maintaining the vitality of human activity [76]. Therefore, this research will map the density and diversity of human activity in built-up areas and consists of two aspects.
The first aspect is the spatial measurement of singular factors. It includes the estimated density distribution of daily, business and recreational activities. By using open platforms to geocode address information, such as enterprises and lists of multidimensional data, it is possible to gain access to the activities of the regional distribution. It then tries to draw connections between the construction land and the population density distribution in relation to a single type of activity on the scale of an entire city.
The second aspect is the space component of comprehensive data, which represents the summarized value of the weighted level of human activity. In each normalized indicator element, after gridizing various indicators, we used an inverse variance weighted average to realize a comprehensive evaluation of activity density on construction land and distinguished the high and low score space. With regard to singular factors in the intensity of land use, from the density of basic activities, we first normalized the atomization data of each type of activity i in grid j, through the lens of division by the maximum of the grid, to 0-1 range and then defined the normalized activity i in grid j as M ij . Based on the normalized values, the weighted density of activity, D i , in each grid is computed as: In the above formula, α i of activity i is given by the method of inverse-variance weighting, while S is the equal area of each grid. An inverse-variance weighted average is a method of aggregating random variables to minimize the variance in the weighted average, which results from eight kinds of human activity data from different sources. Each random variable is weighted in inverse proportion to its variance as: where α i is the variance of activity i, computed as: With respect to a comprehensive evaluation, we composites the eight types of normalized indicators of activity data and demographic space data by combining them with the weighted average method, eventually leading to a comprehensive density evaluation in each grid, as in the formula below: We then evaluates the diversity of comprehensive data. With the same scale, we measures the diversity index of each activity and conducts a comprehensive activity evaluation. With respect to diversity measurement, the study refers to the classic Shannon index, which is used to study the genetic diversity of human beings. We defined the weighted ratio of activity i in each grid as shown below: Thus, the comprehensive evaluation of activity diversity, which indicates the summarized diversity of human activity in a city, is calculated as follows: According to formulas (1), (4), and (5), we calculates the density and diversity in the grid scale and visualized it with ArcScene.

Examining the Related Factors
As mentioned previously, other factors such as morphology and administration also have a potential impact on human activity in urbanized areas [7,12,[28][29][30]. To find out which factors, our study will continue to explore the relationships among them using a linear regression model. The indices of the compactness of the layout of the urban area, city size and administrative level are different factors, while the indices of density and diversity of human activity, both of which indicate the situation of urban function in the context of this study (in built-up areas) are independent variables.
Considering these factors, we created a grid with 43 spatial units, comprised of the current administrative county-level urban units and master planning documents regarding the PRD. For example, the region of Guangzhou was divided into six urban units, including the downtown areas of Guangzhou, Zengcheng, Conghua, Panyu, Huadu, and Nansha.
To study the sprawl of construction land within mega-city regions (Figure 1), more work is needed in the measurement of urban morphology. The compactness of the urban form has attracted attention among academics and practitioners, mainly regarding the impact of city's form on transport, mobility and environment [77][78][79]. Longley et al. [42], Batty [43], Huang et al. [80], Colantoni et al. [81] and Pili et al. [45] applied indices to the shapes of a city's layout, such as the length of its diameter to its area size, average distance and land fragmentation, in order to describe continuous or discontinuous settlements. In line with existing work [42][43][44][45]82], we aimed to measure the compactness of the layout of urban areas, the shapes of which can be observed in layout maps of land use. The specific problem of the measurement of urban morphology, along with the selection and definition of factors, will be resolved in the regression analysis.

Distribution of Human Activity
The normalized distribution of eight kinds of human activity in the built environment is shown in Figure 3. The density of residential activity representing online/TV shopping data is shown in Figure 3a.
The occurrence of such daily activities is considerably frequent in the central areas of Guangzhou, while in the peripheral areas there is a slight drop in frequency. In addition, residential activity is relatively concentrated to the west coast of the PRD. Compared with other types of activities in the Luohu district, Shenzhen city holds the highest density of daily activities of online/TV shopping. The density of employment activity representing enterprises is shown in Figure 3b. The polarization is relatively obvious in the PRD. In the city region, employment activity is less centralized than the daily activities of online/TV shopping. Other features of recreational activities in the built-up areas of the PRD can be observed as well. As shown in Figure 3, compared with the distribution of online/TV shopping, enterprises, Weibo users and bus routes, the dispersion can be more clearly observed in the activities represented by pictures from Panoramio.com in terms of commercial facilities and public facilities. The high density of these scattered activities can also be seen in the cases of Huizhou and Zhuhai. Comparatively speaking, these three kinds of activities are more dispersed, and polarization is less significant. Overall, the activity densities of Guangzhou, Shenzhen, Zhuhai, and Dongguan are high.

ComprehensiveEevaluation of Activity Density
From Figure 4a, the high density of activities on construction land are mainly observed in the central cities of Guangzhou, Foshan, Shenzhen, and Dongguan. The stretch zone of Guangzhou-Foshan and Shenzhen-Dongguan is gradually appearing. The density of activities in the central areas of Shenzhen is higher than that in Guangzhou. Meanwhile, Huadu and Conghua districts surround the high-density area that lies within the center of Guangzhou. Activity density in the regions of Zhaoqing, Jiangmen, Huizhou, and other places are significantly lower than the average level. In addition, we also find that the activity density in Xijiang, the new town of Gaoming and other development zones are also low efficiency regions. These initial findings are helpful to further verify the presence of possible "ghost towns" in the PRD.

Comprehensive Evaluation of Activity Diversity
As shown in Figure 4b, the distribution of diversity is less concentrated and, compared to activity density, there are no significant differences. High-scoring regions are found within the Guangzhou-Foshan and Shenzhen-Dongguan stretch zones and the central Zhongshan city. In the periphery of the PRD, comprised of Zhaoqing, Jiangmen and Huizhou, the diversity of activity is relatively lower. Shenzhen is the region with the highest diversity, but its comprehensive score of diversity is still lower than Guangzhou, which has a certain degree of spatial differentiation. In addition, a high diversity of activity still appears in the downtown area, and activities in peripheral areas are mainly individual activities. Furthermore, the regions with densely recreational activity show low diversity, such as Zhuhai and Huizhou.

Region of Low Human Activity Density
At this stage, it is possible to detect the areas with the lowest efficiency regions according to the density and diversity mapped above. Possible 'ghost towns' might be among the cities with the lowest level of density and diversity of human activity. As shown in Figure 5, low bearing strength regions of activity were concentrated in the periphery of the PRD such as Zhongshan, Jiangmen, and Huizhou, mainly including Longmen, Boluo and Huidong towns in Huizhou, Enping district in Jiangmen and Tsuihang District in Zhongshan. From the satellite picture, we see a lot of blank areas in these regions. This indicates that a large proportion of construction land is being exploited and the phenomenon of the low density and diversity of activity in built-up areas caused by large-scale urban construction is gradually becoming clear. Among them, Tsuiheng intends to become the next 'Qianhai' in Zhongshan, with great development due to the construction of the Shenzhen-Zhongshan second channel highway. However, as its administrative hierarchy is relatively lower than that of Nansha and Qianhai, its ability to gather resources from the local government is thus limited. Tsuiheng exhibits a serious example of a lack of human activity and the inefficient use of construction land. Meanwhile, Huidong town sits on the coastline of the Pearl River Delta, but the development of its construction land is still based on an outward expansion mode and indicates low levels of density and diversity of human activity as well.

Selecting the Variables
Next, we must examine the mechanisms explaining the density and diversity of human activity. To verify the impacts of possible factors, namely the urban scale, morphology and administration, on human activity [7,12,[26][27][28]34,35], measurements should be pre-processed before the regression analysis.
First, taking the population and the built-up area to refer to the urban scale for the 43 units (most of the 43 units are not independent cities in the region), the 'urban scale' here is used mainly to verify the relationship between urban scale and activity. This is still meaningful for local governments at the county level since the population size is targeted at areas of construction land in the master plans for most cities in China [7,12].
Secondly, using the urban administration hierarchy and the number of administrative agencies, we can examine the possible impacts of policy on human activity. Based on the 'Chinese Urban Division Standard', urban units were divided into between one and five administrative levels. Sub-provincial cities such as the central areas of Guangzhou or Shenzhen were defined as five. Central areas in other prefecture-level regions were defined as four. The special administrative unit of Shunde district, listed by Guangdong provincial government, was defined as three. The suburban areas of Guangzhou or Shenzhen were defined as two; other county-level units were defined as one, while the number of administrative agencies can be obtained from the list of government institutions in China.
Finally, it is necessary to apply suitable indices to represent the layout-compactness in urban morphology according to existing works [42][43][44][45]82]. The first measurement proposed by Batty [43] is to measure the ratio of the average urban distance (d) and the maximum urban distance (d max ), namely Compact_d/d m . The formula is d/d max . However, there is a dimensional weakness in this approach. Irrespective of the size of the point, assuming that there is a square plaque formed by four point settlements (Figure 6), layout-compactness values are equal, regardless whether the side length of the square plaque is one or two. This is due to the fact that the former layout-compactness degree is 0.791, which is found by [(1 + 1 + 1 + 1 + 1.414 + 1.414)/6]/1.414 and the latter layout-compactness degree is also 0.791 by [(2 + 2 + 2 + 2 + 2.818 + 2.818)/6]/2.818. In Figure 6, the result is obviously not consistent with the real situation, because the real case is less compact due to the increase in the average distance of the plaque. This shows that defining layout-compactness leads to certain problems in measurement. The second measurement involves measuring the urban geometric characteristics of the plaque area (a) and plaque perimeter (p), namely Compact_p/a. The formula is Compact_p/a = [(a/π) 1/2 ]/(p/2π). Although this formula also has the risk of failure in the PRD, which is overspread with urban construction land, it is assumed that there is a city formed by four nearby plaques (Figure 7). The spacing of the former plaque is one and the spacing of the latter plaque is two. In this case, the layout-compactness values are equal, because the plaque areas and perimeters are both fixed. Therefore, this extensive method of measuring layout-compactness is also problematical because of the unclear substitute relationship between outsourcing lines and perimeters composed of plentiful isolated plaques. Following the potential problems related to layout-compactness in defining the urban morphology discussed above, we propose a third formula, which relies on a comprehensive improvement of the two methods that Batty [43] has defined. This third way is to measure the relation between the average distance (d) and the area (a), namely Compact_d/a. The formula is Compact_d/a = d/[(a/π) 1/2 ]. In a given city, the greater the average distance is, the less compact the city is. The layout-compactness values of Compact_d/a range from zero to one. This formula can overcome the problems caused by Compact_d/d m or Compact_ p/a when mearing plentiful isolated plaques.  Thus, there are seven candidate independent variables that describe the three aspects of urban scale, administration and morphology (Table 1).

Modeling the Regression Analysis
The first step is to test if the seven candidate independent variables are significantly relevant to the dependent variable (i.e., density). It can be found that, except for Compact_p/a, all other candidate independent variables have statistically significant correlations with the dependent variables (i.e., density and diversity).
Subsequently, in this study, we aim to verify whether any of the variables of each descriptive aspect continue to hold in the regression model and if the number of explaining variables have decreased due to possible multi-collinearity. To develop a better regression model, the correlation among independent variables should be small in order to avoid difficulty in explaining the dependent variable as the single coefficient. Thus, a multi-collinearity detection of independent variables is needed to determine the proper ones and to establish a regression model. Two kinds of multi-collinearity detection are used. The first method is to calculate the Kappa value; the other is to calculate the Verify In Field (VIF) of the regression model. Here, we would use the first method to eliminate variables with high multi-collinearity and then apply the VIF test to the developed regression model as a form of validation. The correlation significance is also considered when removing a variable with a high multi-collinearity.
The independent variable combination is considered to have a weak multi-collinearity when the Kappa value is smaller than 10. According to the results, when taking the correlation between an independent variable and a dependent variable (i.e., density) into consideration, the independent variable combination of Scale_pop, Admin_hierarchy, and Compact_d/a is more comprehensive. This regression model (Table 2) can explain 91.1% of the activity density of construction land within the PRD. In addition, the VIF test is performed to detect any multi-collinearity within the developed model as a form of validation. Similarly, taking the diversity of human activity as the dependent variable, the independent variable combination of Scale_pop, Compact_d/a and Compact_p/a then becomes the most suitable explanatory factor. The regression model (Table 3) can explain 67.7% of activity diversity. Generally speaking, the size of the population, Scale_pop, and the compactness defined by the average commuting distance, Compact_p/a, in the given area are two important variables for both the density and diversity of human activity.

Explaining Dynamics for the Regression Results
It is necessary to verify whether urban scale has a positive impact on human activity in land use. In China, the issue of urban scale for the strategy of urbanization has been controversial in recent decades. Scholars such as Fei [83] and Shi [84] assume that small towns, instead of large cities, are capable of containing a larger population during the process of urbanization. This opinion was accepted by many professional planners in China because large cities have brought about many problems such as traffic congestion, environmental pollution and inadequate welfare facilities. Lu Lu argues that developing big cities to accommodate a growing population is better for the future of urban China [85]. Although the Planning Document of New Path for Urbanization in China (2014-2020) has stated that the development of large metropolises, medium-sized and small cities will be coordinated in the new era of urbanization, it does not make a clear choice between small cities and large cities.
This empirical study indicates that cities with a higher population are more likely to avoid becoming 'ghost cities' and smaller cities face the challenge of inefficient construction land. This means that most small cities in the PRD are faced with the difficulty of attracting a sufficiently large population. Despite this, smaller ones have also greater marginal benefits in improving land use efficiency according to the coefficients for the regression models in Tables 2 and 3, in which both values of B for Ln (Scale_pop) are less than one.
It can be seen that the newly defined index, Compact_d/a, fits the regression analysis better when used to measure dispersed urbanized areas in the PRD. For a city region consisting of many isolated construction areas, the decrease in an urban perimeter does not imply a decrease in the complexity of the boundary [43] and the morphology of the urban boundary may not totally reflect the built-up fabric [81]. Longer commuting distances in one city within a given area may harm human activity within construction land because of inconvenient passenger transport. Thus, the index of Compact_d/a reflects the dispersed built-up areas in the PRD more accurately.
We also found that state powers will influence the activity density in construction land, which verified the hypothesis presented by Zhang et al. [86] that hierarchal administrative governments have an important impact on urban growth. Specifically, administrative hierarchy has more of an impact on the density of human activity in the built environment than the number of administrative units. Tian and Shen [16] also pointed out that the degree of government control is one of the factors affecting the implementation of master plans, while administrative hierarchy has no obvious influence on the diversity of human activity in the PRD.
It should be noted that there are many aspects of urban models that could be further explored, particularly their coupling and convergence properties [87,88]. In other words, other factors could also have an impact on density. For example, some people believe that it is the city of Dayawan's proximity to a nuclear power plant that resulted in the 'ghost town' in the new district of Huidong. Though this opinion is not widely held, it does show us that there are some special dynamics in the activities of built-up areas within China that need to be considered. With this in mind, the regression results can only propose some general factors of influence on urban activity, while the possible existence of these special dynamics must also be kept in mind.

Conclusion and Discussion
This study adopts the method of data mining to obtain open data and attempts to answer the question of how density and diversity in human activity in the urbanized areas of the PRD can be explained. We find that a low density and diversity of human activity usually emerges in peripheral areas like Tsuihang, Huidong, Longmen, and Engping. In addition, the real estate bubble and the large-scale expansion of construction land have very apparently created a growing number of problems in these areas.
The contribution we make in this study lies mainly in two aspects:

1)
This study successfully processes three kinds of data (i.e., the data of residential, business and recreational activities) to evaluate patterns of human behavior and identify ghost towns. This offers a more solid and innovative angle to determine the existence and features of ghost towns than research that explores this topic on a micro-level and solely relies on residential activities, such as in Chi et al. [8], Jin et al. [11] and Zheng et al. [12]; 2) In this study, we explicitly analyzed how three core factors (i.e., urban scale, compactness of urban morphology and administrative hierarchy) affect the diversity and density of human activity in the PRD. This helps to further the understanding of ghost towns because Woodworth and Wallace [4], Shepard [9] and Sorace and Hurst [10] are in agreement that there is not yet any consensus in the definition and factors leading to this phenomenon. Furthermore, our analysis provides a valuable reference for cities worldwide in terms of considering how to avoid ghost town development and ensure urban sustainability.
When it comes to innovative insights, this study provides two implications for policymaking regarding land and data usage. The first level of implications focus on policy suggestions relevant to China. First of all, the urban scale does have an impact on activity density and diversity in the PRD, leading us to present some policy implications. It should be noted that tolling congestion also has a similar impact on regulating urban density, since driving and location are equal [22]. This means that, while planning to avoid excessive traffic congestion, the optimum population size for megacities such as Guangzhou and Shenzhen should also be considered. Secondly, the compactness of the urban morphology should also be brought forth as an important strategy to be used in planning policies. The sprawl of construction land in the PRD is gradually threatening its polycentric urban structure. Meanwhile, the tendency of decentralized and fragmented land use is even more severe as the PRD encounters the problem of land use regulations for its collectively owned land, which promotes inefficient sprawl [50]. Finally, as administrative hierarchy is another important factor that effects the activity density in construction lands in the PRD, it is possible that the issue of 'ghost towns' in Tsuihang or Huidong New District might be easier to resolve if the provincial government of Guangdong province was relocated there, but there is a very limited chance that high administrative levels can be 'supplied' for so many of these kinds of cities and it is impossible to change the existing Chinese political system. Moreover, administrative hierarchy has no obvious influence on the diversity of human activity in the PRD.
The second level of implications revolves around policy recommendations that may be considered internationally. Moreno and Blanco [89] have pointed out that failed urban planning contributes to the emergence of unsustainable cities around the world. For example, in Latin America, unsuccessful planning stimulated mismatches between human activity, housing and infrastructure, and then resulted in segregation and unsustainable interaction patterns. In other words, this mismatch is partly the result of dispersion between human activity, construction land and transport. To overcome this deficiency, our analysis in Section 5.3 pointed out that improving the compactness of urban morphology is a critical step. Thus, we suggest that the compactness of urban morphology should be included in urban planning strategies all over the world. In doing so, citizens will be able to access necessary infrastructure within the areas where they live, as a result of which the diversity and density of human activity, and thus sustainable urban development, are promoted.
The study leaves several limitations on the table that could be picked up for future research. Firstly, although we reviewed various relevant studies and multiple existing methods at the beginning of the article to ensure rationality and effectiveness, the factor selection of human activity in the built environment still suffers from incompleteness in establishing possible correlations. Secondly, our findings are particularly relevant to China and rely on the Chinese context. Future studies should examine how this is relevant elsewhere in the world. Thirdly, this study mainly deployed open source data, which may be vulnerable to a selection bias. Some citizens, such as elderly people or users unfamiliar with social media, may have been ignored during our data collection. To overcome this limitation, future research could extend the scope of the literature review to determine a more comprehensive relevant index system. Meanwhile, examining the overall demographics of the open data source and then attempting to control selection biases through statistical modelling is a topic that requires a continuous discussion [90].