Evaluation and Planning of Urban Green Space Distribution Based on Mobile Phone Data and Two-Step Floating Catchment Area Method

: Urban green space is closely related to the quality of life of residents. However, the traditional approach to its planning often fails to address its actual service capacity and users’ demand. In this study, facilitated by mobile phone location data, more speciﬁc features of the spatial distribution of urban residents are identiﬁed. Further, population distribution in relation to trafﬁc analysis zones is mapped. On this basis, the two-step ﬂoating catchment area method (2SFCA) is adopted in combination with urban green space planning to evaluate the per capita area of green space and its accessibility in practice. Subsequently, classiﬁcation of per capita area and spatial distribution of green spaces within the study area are obtained; thus, urban districts currently with low accessibility to green areas are identiﬁed and can be deemed as key areas for the planning of green areas in the future. The study concludes that mobile phone data can be used to more accurately map the spatial distribution of residents; while, the 2SFCA offers a more comprehensive quantitative measuring of the supply and demand of green spaces. The two combined can be used as an important basis for decision-making in the planning of urban green spaces. Since urban green space can be regarded as a kind of public facility, the methodology of the present study is also believed to be applicable in studies of other types of urban facilities.


Background
Various international studies point out the important role that urban green space can play in the interaction between human and the environment [1]. Green spaces are key elements of urban landscape and urban sustainability [2]. Scholars have affirmed the values of urban green spaces in terms of health [3], economic [4], social [5], and climatic benefits [6,7]. With the continued increase of urbanization, it is projected that 90% of the world's population will be living in cities by the end of the 21st century [8]. Therefore, the reasonable planning of green spaces will have a direct bearing on the quality of life of urban residents [9].
Extensive research literatures can be found on the topic of urban green spaces [10,11]. Some of them are based on on-site surveys and questionnaires. But, in general, they increasingly involve the application of GIS technologies [12,13]. The spatial analysis component of GIS can facilitate the evaluation of urban green space system from multiple perspectives, such as accessibility, disaster prevention and risk control, and attractiveness, out of which, accessibility is regarded as one of the most important criteria [14][15][16]. However, most accessibility evaluations are based on simple buffer analysis of green spaces or network analysis of roads. Most of these methods take into consideration neither the number of users, nor the area and service capacity of green spaces [17].
To address such problems, Wang Fahui et al. put forward an optimized 2SFCA method to study public facility planning from both the supply and demand sides [18,19]. Since green space can be deemed as a kind of public facility, goods, or service, the approach is also applicable for the planning of urban green spaces. In the application of the 2SFCA method, some key data inputs include the number of urban residents that a public facility serves, and also the accurate spatial distribution of urban residents. Traditionally, these data are acquired through social surveys and lack timeliness and precision.
In recent years, with the rapid development of computer and information technology, it has become possible to use urban big data to analyze spatial behaviors of urban residents and the mechanism behind urban functioning. According to official reports released by the Ministry of Industry and Information Technology, the number of mobile phone (cell phone) users in China has reached 1,306,000,000 by December 2015, which means that every 100 persons own 95.5 cell phones on average. At present, mobile devices, such as cell phones and various location-based services (LBS) provided by APPs installed on smart devices have become a source of urban data with high availability and practical value. In fact, mobile phone data have been used to study the behavior of urban residents since an early time, from the temporal and spatial analysis of human behaviors based on a small sample group of 30 people by Ahas [20], to studies by Ratti and Reade et al. that cover large numbers of residents in Milan and Rome [21,22]. These studies advanced not only in terms of data size but also in visualization and representation. In recent years, mobile data have been used to identify the commuting of residents, and further the functional zones of the city [23][24][25][26]. By comparing them to actual census data, the results of analysis have been proven relatively accurate with evidently improved data size and precision. In general, various studies have proven that mobile phone data, can, to a certain extent, reflect the temporal and spatial distribution of urban residents and their movement trajectory, which in turn, makes it possible to acquire data of urban residents' distribution with high precision.

Background of Case Study
The case in the present study is Wuhan city in central China, where the local government is making a plan of the micro green spaces in the Wuchang district. Micro green space falls into the category of green space and can be regarded as a type of public facilities or public goods and services. Currently, as urban planning in China moves toward inventory planning from incremental planning over the past several decades of rapid urban development, the possibility of building new large-scale public space in a highly dense urban central area becomes increasingly small [27]. Therefore, micro green space has gained increasing importance because of its small size and flexible presence. It can be extensively distributed in cities despite the tension in land resources in cities. Additionally, micro green space is close to the daily lives of urban residents as they are usually accessible by walking and are more frequently used than larger urban parks. The present study attempts to use mobile phone data to identify the spatial distribution of urban residents and thus the actual demand for green spaces. On this basis, the study analyzes the problems of accessibility in relation to the spatial distribution of existing urban green spaces, so as to support the planning of new micro green spaces.

Methodology
The work flow of the study is presented in Figure 1. The major steps are: mapping the spatial distribution of population using mobile phone call data; acquiring the distribution of urban green spaces based on existing urban planning; and subsequently, accessibility evaluation of green spaces using the 2SFCA method.

Two-Step Floating Catchment Area Method
Accessibility has been regarded as one important factor in evaluating the rationality of public facility distribution [28]. The conventional approach to measuring accessibility is usually the supplydemand ratio method. However, this approach has at least two weaknesses. First, spatial differences at the more microscopic level inside a region are neglected. Second, the mobility between various areas is ignored [19].
The basic rule of 2SFCA is that the supply and demand points are used as centers to perform floating catchment searches, respectively. For the first search, public facility (j) is used as the center point, all settlements (k) within threshold time (d0) are searched and the ratio (Rj) between the service capacity of facility and the served population within the corresponding area is calculated. For the second search, each settlement (i) is used as the center point, all locations of public facilities within threshold time (d0) are searched and the aggregated services (Rj) from various public facilities is calculated, so as to acquire the service capacity of public facility (accessibility) at point (i). The above rule is described in Equation (1), where S stands for supply and P for demand. An example is given in Figure 2.
Although it is easy to use, the above method still sees some weaknesses. For example, regarding the parameter of search range, it only considers a single threshold distance while disregarding all of the areas out of the threshold range and a uniform single influence is applied within the threshold range. To address the problem, researchers put forward an improved 2SFCA method which assigns different weights to different travel times, so as to reflect a gradual change effect of range attenuation in an indirect manner [29].
In 2SFCA, the two most important variables are the service capability of public facilities (total supply) which is based on their spatial distribution, and population distribution (total demand) decided by resident settlements. In the evaluation of the accessibility of an urban green space, its total demand is defined as the number of users in the catchment area, while the total supply is defined as the area of the green space. The planning of urban green space system, regarded as a means of providing public facility or public service, should take into consideration it equity in serving the users. From a macro perspective, it should ensure that all residents in the city should have access to its services; while from the service supply perspective, every resident within the catchment area of a green space is entitled to its service. Based on this principle, when evaluating the distribution of urban green spaces, the population covered in the catchment area of an urban green space can be regarded as the users besides its frequent users. For this reason, the present study uses the spatial distribution

Two-Step Floating Catchment Area Method
Accessibility has been regarded as one important factor in evaluating the rationality of public facility distribution [28]. The conventional approach to measuring accessibility is usually the supply-demand ratio method. However, this approach has at least two weaknesses. First, spatial differences at the more microscopic level inside a region are neglected. Second, the mobility between various areas is ignored [19].
The basic rule of 2SFCA is that the supply and demand points are used as centers to perform floating catchment searches, respectively. For the first search, public facility (j) is used as the center point, all settlements (k) within threshold time (d 0 ) are searched and the ratio (R j ) between the service capacity of facility and the served population within the corresponding area is calculated. For the second search, each settlement (i) is used as the center point, all locations of public facilities within threshold time (d 0 ) are searched and the aggregated services (R j ) from various public facilities is calculated, so as to acquire the service capacity of public facility (accessibility) at point (i). The above rule is described in Equation (1), where S stands for supply and P for demand. An example is given in Figure 2.
Although it is easy to use, the above method still sees some weaknesses. For example, regarding the parameter of search range, it only considers a single threshold distance while disregarding all of the areas out of the threshold range and a uniform single influence is applied within the threshold range. To address the problem, researchers put forward an improved 2SFCA method which assigns different weights to different travel times, so as to reflect a gradual change effect of range attenuation in an indirect manner [29].
In 2SFCA, the two most important variables are the service capability of public facilities (total supply) which is based on their spatial distribution, and population distribution (total demand) decided by resident settlements. In the evaluation of the accessibility of an urban green space, its total demand is defined as the number of users in the catchment area, while the total supply is defined as the area of the green space. The planning of urban green space system, regarded as a means of providing public facility or public service, should take into consideration it equity in serving the users. From a macro perspective, it should ensure that all residents in the city should have access to its services; while from the service supply perspective, every resident within the catchment area of a green space is entitled to its service. Based on this principle, when evaluating the distribution of urban green spaces, the population covered in the catchment area of an urban green space can be regarded as the users besides its frequent users. For this reason, the present study uses the spatial distribution of all the residents in the study area without further categorizing them into specific groups of ages or employments. of all the residents in the study area without further categorizing them into specific groups of ages or employments.

Population Distribution Estimation Based on Mobile Phone Location Data
Data of phone call records used in the present study is acquired from a partner telecommunication operator whose market share is about 60%. Mobile phone data of 7,300,000 users in the month of November 2015 in Wuhan City China is used in the study. Data is pre-processed, eliminating all privacy-related information. The basic format is a multi-field table tagged with the user ID. Data from the busiest base stations during the one month is categorized into three time periods, namely, work-time (from 7 a.m. to 7 p.m., Monday through Friday), off-time (7 p.m. to 7 a.m., Monday through Friday; Saturdays and Sundays), and all-time. Through the user's ID, temporal and spatial locations are associated to base stations; based on the geographic location of the base stations, the user density at a specific time in a specific urban area can then be analyzed. With concentrated spatial distribution of base stations in central urban areas with dense population, especially in mega cities as the case study city, the error margin of user's location can be contained within several hundred meters. Subsequently, the spatial distribution of base stations is established in Arcgis, and a field table is produced with the identification of various base station IDs. Further, users' statistics associated with each base station ID are presented in terms of the most common number of users at work-time or off-time; in return, user numbers are then assigned to each base station according to their IDs; thus, the spatial mapping of user numbers at various time periods at each base station is realized (Figure 3). The data collected is validated against the population data of various administrative districts in Wuhan. First, base stations are categorized into 13 districts in accordance with their respective geographic location, user numbers of each district at the three time periods are then calculated. When considering that telecom operator's data may not cover all of the residents, the proportion of users in each administrative district against all of the users is used for the comparison and validation, the result is presented in Figure 4. It can be seen that, in general, the

Population Distribution Estimation Based on Mobile Phone Location Data
Data of phone call records used in the present study is acquired from a partner telecommunication operator whose market share is about 60%. Mobile phone data of 7,300,000 users in the month of November 2015 in Wuhan City China is used in the study. Data is pre-processed, eliminating all privacy-related information. The basic format is a multi-field table tagged with the user ID. Data from the busiest base stations during the one month is categorized into three time periods, namely, work-time (from 7 a.m. to 7 p.m., Monday through Friday), off-time (7 p.m. to 7 a.m., Monday through Friday; Saturdays and Sundays), and all-time. Through the user's ID, temporal and spatial locations are associated to base stations; based on the geographic location of the base stations, the user density at a specific time in a specific urban area can then be analyzed. With concentrated spatial distribution of base stations in central urban areas with dense population, especially in mega cities as the case study city, the error margin of user's location can be contained within several hundred meters. Subsequently, the spatial distribution of base stations is established in Arcgis, and a field table is produced with the identification of various base station IDs. Further, users' statistics associated with each base station ID are presented in terms of the most common number of users at work-time or off-time; in return, user numbers are then assigned to each base station according to their IDs; thus, the spatial mapping of user numbers at various time periods at each base station is realized (Figure 3). The data collected is validated against the population data of various administrative districts in Wuhan. First, base stations are categorized into 13 districts in accordance with their respective geographic location, user numbers of each district at the three time periods are then calculated. When considering that telecom operator's data may not cover all of the residents, the proportion of users in each administrative district against all of the users is used for the comparison and validation, the result is presented in Figure 4. It can be seen that, in general, the proportions of statistics from the three periods are basically consistent and are strongly and positively related to the population in each administrative district while deviate to different degrees within each district. The most evident geographic feature of Wuhan is that the city is divided into three districts by two rivers; if the base stations are categorized according to the three towns and then used for comparison, the deviation is further narrowed (Table 1). Also, it can be seen in Figure 4, that when proportional data of the three towns are overlapped, two straight lines evidently exist, which is in line with the division of the three towns. proportions of statistics from the three periods are basically consistent and are strongly and positively related to the population in each administrative district while deviate to different degrees within each district. The most evident geographic feature of Wuhan is that the city is divided into three districts by two rivers; if the base stations are categorized according to the three towns and then used for comparison, the deviation is further narrowed (Table 1). Also, it can be seen in Figure 4, that when proportional data of the three towns are overlapped, two straight lines evidently exist, which is in line with the division of the three towns.   proportions of statistics from the three periods are basically consistent and are strongly and positively related to the population in each administrative district while deviate to different degrees within each district. The most evident geographic feature of Wuhan is that the city is divided into three districts by two rivers; if the base stations are categorized according to the three towns and then used for comparison, the deviation is further narrowed (Table 1). Also, it can be seen in Figure 4, that when proportional data of the three towns are overlapped, two straight lines evidently exist, which is in line with the division of the three towns.    It can be proven by analysis that the correspondence of population proportions becomes stronger or can be deemed entirely consistent when a division of three big areas is adopted and more impacts of geographical factors are taken into consideration. When compared with data sources, such as population census statistics, mobile phone location data hold evident advantage in statistical accuracy, and timeliness. Thus, mobile phone location data have the capacity to truly reveal spatial temporal distribution of urban residents at macro and meso levels within administrative districts of a city. Finally, when considering the fact that off-time is usually the period, which sees the most users and most intensive use of green space, data of population distribution at off-time were selected as the data input for the evaluation of green space accessibility in the present study.

Evaluation of Green Space Accessibility Based on 2SFCA
One objective of this study is to evaluate accessibility of green spaces within the research area. Therefore, green spaces that have been established and are in existence can be seen as the supply points of public facilities, while resident settlements in the city as demand points.
Firstly, the research area is divided into several traffic analysis zones according to the urban road network (see Figure 5), then telecom base stations are assigned to these traffic analysis zones. User data assigned to base stations during off-time are used to identify population distribution. As for service capacity, the green spaces in the master plan of Wuhan city were selected as the research objects (see Figure 4) and their respective size was used as an indicator of service capacity. In the 2SFCA calculation, different search radiuses were assigned to green spaces in accordance with their sizes when searched by users. In other words, the greater the size of a green space is, the more traffic analysis zones it covers. Furthermore, green spaces were divided into three categories, namely city-level, district-level and small-sized green spaces. Their service search radiuses were 2000 m, 1000 m, and 500 m, respectively.
As for the specific 2SFCA procedures, in the first step, a green space is defined as supply point; then using the simple buffer tools of GIS, the service area (C 0 ) within a service radius (d 0 ) is defined. Subsequently, demand point (k) is searched within the service area, all of the communities within the service range are retrieved and the sum of population is calculated. Subsequently, total area of the green space is divided by the sum of population to obtain the supply-demand ratio (R j ), i.e., service capability of the green space. In the second step, for every demand point (i), multiple searches corresponding to the service radius (d 0 ) of all the green spaces are performed; the total area (j) of green spaces covered by these searches is then retrieved. Area (j) is then divided by total community population to obtain (T i F ), standing for the spatial accessibility to green space resources at this particular community, i.e., the green space available to this demand point. Although population size has been spatially correlated to base stations, population distribution data are processed for the second time, when considering that the service radius of a base station may be larger than the area of traffic analysis zone or a base station may locate on the edge of a traffic analysis zone, which may lead to considerable error when these traffic analysis zones are used to make population calculations. Specifically, base station positions are processed by spatial analysis tools in Arcgis to generate the Thiessen polygons, and the population per area within a certain Thiessen polygon is calculated for all base stations. Then, the sum of population in the polygons within the traffic analysis zone is obtained. The calculation process is presented in Figure 6. At this point, population size acquired is still the subscriber number of this telecom operator. A final conversion is made, using the ratio between subscriber number and the total statistical population, to obtain the population distribution closer to the actual situation.  Although population size has been spatially correlated to base stations, population distribution data are processed for the second time, when considering that the service radius of a base station may be larger than the area of traffic analysis zone or a base station may locate on the edge of a traffic analysis zone, which may lead to considerable error when these traffic analysis zones are used to make population calculations. Specifically, base station positions are processed by spatial analysis tools in Arcgis to generate the Thiessen polygons, and the population per area within a certain Thiessen polygon is calculated for all base stations. Then, the sum of population in the polygons within the traffic analysis zone is obtained. The calculation process is presented in Figure 6. At this point, population size acquired is still the subscriber number of this telecom operator. A final conversion is made, using the ratio between subscriber number and the total statistical population, to obtain the population distribution closer to the actual situation. Although population size has been spatially correlated to base stations, population distribution data are processed for the second time, when considering that the service radius of a base station may be larger than the area of traffic analysis zone or a base station may locate on the edge of a traffic analysis zone, which may lead to considerable error when these traffic analysis zones are used to make population calculations. Specifically, base station positions are processed by spatial analysis tools in Arcgis to generate the Thiessen polygons, and the population per area within a certain Thiessen polygon is calculated for all base stations. Then, the sum of population in the polygons within the traffic analysis zone is obtained. The calculation process is presented in Figure 6. At this point, population size acquired is still the subscriber number of this telecom operator. A final conversion is made, using the ratio between subscriber number and the total statistical population, to obtain the population distribution closer to the actual situation.

Discussion
The final per capita area of green space obtained through the above method is given in Figure 7. In the diagram, per capital area of green space decreases from dark blue to dark red as represented by colors of the legend. According to master plan of Wuhan, the lower limit of per capital area of green space is 5 m 2 per person. This means that, in actual supply-demand relation of green space, nearly half of the research area fails to satisfy this lower limit. From the objective of case study in this research, the local areas with the most number of users and the least area of green spaces can be identified as areas with the most urgent demands for micro green spaces. From Figure 6, it can be clearly seen that the three types of areas colored from orange to dark red (see legend) are the key areas where micro green spaces should be added. This result provides a credible quantitative basis for follow-up surveys, planning, and design practices of the project team.

Discussion
The final per capita area of green space obtained through the above method is given in Figure 7. In the diagram, per capital area of green space decreases from dark blue to dark red as represented by colors of the legend. According to master plan of Wuhan, the lower limit of per capital area of green space is 5 m 2 per person. This means that, in actual supply-demand relation of green space, nearly half of the research area fails to satisfy this lower limit. From the objective of case study in this research, the local areas with the most number of users and the least area of green spaces can be identified as areas with the most urgent demands for micro green spaces. From Figure 6, it can be clearly seen that the three types of areas colored from orange to dark red (see legend) are the key areas where micro green spaces should be added. This result provides a credible quantitative basis for follow-up surveys, planning, and design practices of the project team.  Traditional approach to urban green space planning is mostly based on per capita index, which often fails to effectively guide the spatial distribution of green spaces [32]. Moreover, site selection of green spaces is usually based on service radius without consideration of actual population distribution. By combining the 2SFCA method with actual population distribution (acquired through mobile phone location data), plan of green spaces of the city, and traffic analysis zones based on urban roads, the present study analyzes and evaluates per capita green space in an attempt to realize the evaluation of the actual service capacity of green spaces based on their own sizes. The authors believe the methodology of the present study (Figure 1) can be applied in the accessibility evaluation of other public facilities and/or used as the basis for site selection in their planning. When applied to different kinds of facilities, the variables might be different in the calculation of service capacities. For instance, the service capacities of hospitals or school could not be simply represented by land area, but rather number of beds or floor areas of buildings, etc.

Conclusions
In general, the present study represents an effort to offer a more comprehensive and quantitative basis for urban planning. Since the urban planning discipline has been deemed unscientific for using too much qualitative analysis, researchers begin to turn to quantitative analysis [33]. In the transition from qualitative to quantitative studies, a crucial problem is how to obtain and use data. For this reason, big data and smart city have attracted extensive attention from researchers from various disciplines from urban planning to urban geography. Thus, an era of mass amount of data will or has already come.
When it is less difficult to acquire data, how to use them in research and design becomes an issue that more urgently needs to be considered. The present study is a step in this direction.
Specifically, the authors believe that the study realizes progress in the following two aspects: first, the application of the 2SFCA method in performing the actual supply-demand analysis of urban public green spaces; second, the application in urban planning of residents' spatial distribution obtained through mobile phone location data analysis.
The study demonstrates that the combination of valid data with quantitative analysis based on models can further improve rationality of early stage analysis before urban planning and also provide directions for subsequent design practices. Mobile phone data and the 2SFCA method offer perspectives from the supply and demand side, respectively, in evaluating the differences in the accessibility of green spaces. When data are acquired, the approach can also be applied in other big cities and for the accessibility evaluation of other types of public facilities whose service capacities are measured by the number of population that they serve.
In the present study, one of the limitations is that the catchment area of a green space is calculated using a simple centroid range approach. In prospect studies, analysis based on urban road network will be conducted, in which the residents' actual paths to green spaces are to be used to measure their travel distances. Further analysis of mobile phone data is also expected to evaluate the actual use of green spaces and the features of the users so as to provide more specific references to decision-making in planning practices.