Measuring the Degree of Balance between Urban and Tourism Development: An Analytical Approach Using Cellular Data

: This study presents an analytical approach for measuring the degree of balance between urban and tourism development, which has been previously analyzed qualitatively and was difﬁcult to measure. With the help of 1012 million cellular data records generated by 20 million users in two weeks, we tracked the behavior of residents, commuters, and tourists at a set of historical conservation areas in central Shanghai. We calculated the degree of balance and visualized it via ternary graphs. Moreover, the relationships between key urban features derived from multi-sourced urban data and balanced degrees of tourism development were analyzed via multinomial logistic analysis. Insights gained from this analysis will help to achieve a more scientiﬁc decision-making process toward balanced urban development for historical conservation area. Achievements in this study contribute to the development of human-centered planning through providing continuous measurements of an “unmeasurable” quality. Author Contributions: Conceptualization, C.S. and Y.Y.; methodology, Y.Y.; software, M.L.; valida-tion, M.L., Y.Y. and C.S.; formal analysis, M.L.; investigation, M.L.; resources, C.S.; data curation, C.S.; writing—original draft preparation, M.L.; writing—review and editing, Y.Y.; visualization, M.L.; supervision, Y.Y.; project administration, Y.Y.;


Introduction
Tourism has been proven to be a key contributor of economic growth in many countries [1,2], especially in developing or newly developed countries with a rising population and governmental economic goals. The development of tourism can motivate the optimization of the city image including the re-habitation of cultural heritage and natural environment. In addition, socio-economic issues like employment opportunities [3], recreation, and cultural activities [4] are promoted as well. However, tourism can bring adverse effects through congestion and cultural distance problems between tourists and locals [5,6]. Inefficient land use or urban planning can also a result of over-proliferated tourism [7]. Therefore, balanced urban tourism is considered the ideal vision for future urban tourism development, mitigating those conflicts and enhancing the benefits to the locality, which is also an enduring topic in urban and tourism development research. The common goals of economic, sociocultural, and resource sustainability and cultural heritage protection should be shared by different stakeholders including the community, the tourism industry, and local residents [8]. Considering that the development of tourism is a dynamic process between supply and demand, the exploitation of tourism facilities may directly activate tourism demand and trigger potential influences on related aspects to the local area [9]. Therefore, to guarantee balanced development, especially for areas with abundant tourism resources, urban and tourism management experts need to evaluate the dispersal of tourists and local residents appropriately.
As a response to this demand, Global Positioning System (GPS) tracking devices has been applied to acquire the positioning records of tourists and local residents [10,11]. This tracking technique helps to reveal many new insights. Nevertheless, the distribution of tracking devices is a time-costly and low efficiency process. It is also difficult to collect large samples within city scale. Therefore, cellular data with the capacity of recording spatiotemporal distribution of human behavior might bring new research potentials for measuring the degree of balance between urban and tourism development.
In this context, this paper is attempting to develop an analytical approach with the help of multi-sourced urban data for enabling urban planners and tourism managers to evaluate the degree of "balance", especially for areas with tourism resources, which could provide a more evidence-based decision-making process towards balanced urban development. Hence, to test the effectiveness of introducing new urban data, the data will be tested in research sites following the six sections. After the introduction, Section 2 introduces background research, and Section 3 explains the methodology of cellular and other urban data process. Section 4 describes the results on balancing tourism and different categories of urban issues. Then, in Section 5, the results will be discussed in order to answer the research questions. Suggestions for balanced urban development for historical conservation areas, limitations, and future research are also mentioned.

Measuring the Degree of Balance between Urban Development and Tourism Planning: Previous Attempts and Difficulties
To monitor the impact of tourism on local communities, Faulkner and Tideswell [12] propose a conceptual framework, including the ratio of tourists to local residents as an important indicator. Higher value of this ratio means a high likelihood of disturbing local residents regardless of the actions of tourists. This concept mainly focuses on tourists in relation to local residents and has also been adopted in many other studies [13].
Nevertheless, it is hard to get a big picture on distribution and behavioral characteristics of tourists to get an indicator of balanced development, although direct and indirect explorations have been made. Cros [14] inspects tourism congestion through interviews. Kearsley and Coughlan [15] attempt to clarify the tourism behavior mechanisms through questionnaire surveys. These two approaches obtain samples which are too limited to draw a convincing conclusion. The majority of researchers use the conventional official statistic of EU tourism provided by Eurostat [16] and the World Tourism Organization on a yearly basis, which has much more sample-coverage but limited spatiotemporal details.
Moreover, the tourist-resident ratio has the limitation of excluding other stakeholders. As Waligo, Clarke, and Hawkins [17] revealed, sustainable tourism needs to consider more stakeholders from industries, the government, etc. Moyle et al. [18] also argued that the concept of tourism impact could be discussed in various scales, referring to the long-term or short-term, and individual or accumulative influences of continuous interactions between tourists and local communities of tourist destinations, or local enterprises. Hence, there are other groups which could be involved in this research on balanced tourism, such as commuters who play an important role in supporting both tourism activities and local residents. Further explorations in this direction are still needed.

New Research Potentials Generated from Emerging Multi-Sourced Urban Data
Batty [19] emphasizes the transition of understanding cities from a state to a multidimensional system. It is essential to capture data on tourists with all the geographic and temporal variations in a more efficient way [20]. Emerging data accompanying the improvement of information and communication technologies have opened up new research potential that could not be achieved with conventional approaches [21,22]. Specifically, social media data can be used to generate international destination patterns [23]; web search engines can forecast travel demand [24]; and GPS data can provide high resolution spatial and temporal data [25]. Edwards and Griffin [26] tracked 154 participants to see how tourists moved by using GPS devices within Melbourne and Sydney. Li et al. [11] Sustainability 2021, 13, 9598 3 of 19 applied GPS data to detect the spatial and temporal distribution of tourists as an indicator of balanced tourism development.
Among these new urban data, cellular data may play an important role as these enable a graphic representation of the intensity of urban activities and their evolution through space and time [27]. The advantage of cellular data in identifying tourists' pattern has been noticed as it could provide broad geographic and temporal coverage of datasets with their behavior pattern at fined-grained resolution [10,28,29]. Benefiting from high cell-phone-penetration rate, cellular data have a large sampling rate which is close to a full representation of the whole population. Its unique random phone ID enables researchers to recognize different group interactions through their peculiar pattern of trip frequency and origin-destination [30], which enables a capability of identifying different profiles of local residents from temporary population such as commuters and visitors [31]. In this context, cellular data has been applied to optimize tourism development via identifying tourists' group movement patterns [32], measuring tourists' spatiotemporal preference on destination visiting [33]. It is also interesting to apply cellular data to identify the effect of tourists' party size on their tourism behavior [34]. In short, the advantages of cellular data have been well recognized, which also bring research potential for analyzing the behavior of different stakeholders besides tourists to figure out the mechanism of balanced tourism development.

Research Questions and Framework
To promote a more balanced urban development for areas with abundant tourism resources, this study attempts to develop a quantitative analytical approach with the help of cellular data to measure the degree of "balance" and inform better urban management strategies. Our two main research questions are: (1) How can the degree of "balance" in urban areas based on the dispersal of human flow be evaluated? (2) How do different categories of urban features affect balanced urban development?
The analytical framework illustrated in Figure 1 contains four phases. First, cellular data were collected and cleaned while the research sites were set. Meanwhile, the pointsof-interests (PoIs) representing urban facilities and built environment data recording urban planning indices were crawled via Baidu Map API. After that, cellular data and PoIs and built environment data were processed separately. The positioning distribution of residents, commuters, and tourists within the selected research sites was computed via cellular data from a mobile phone network. Meanwhile, key urban features, including land use and urban morphological factors, were calculated based on the collected PoIs and built environment dataset. Based on this, we were able to visualize the degree of balance between tourism and urban development using ternary graphs. Then, we ran a multinomial logistic regression in order to explain the impact of various urban features on balanced development of specific research sites.

Research Cases
Shanghai was chosen as an example of a historical city which has undergone fast urbanization. As one of the largest and most developed cities in China, tourism development is one of the main factors to promote economic growth in Shanghai. The total income of tourism and related services in 2018 was 50.9 billion CNY (Chinese Yuan), occupying 6.4% of the GDP of Shanghai [35]. Fast tourism development has brought pressure on historical areas. In this context, Shanghai Resources Bureau [36] has published a conservation agenda for twenty historical conservation areas (with twelve of them located in the city center) to promote a more sustainable balance between tourism and urban development.
Twelve urban historical conservation areas in Shanghai city center were selected for this research, as shown in Table 1, and they are divided into 25 subsets according to current zoning settings and land-use occupancy. Another eight suburban historical conservation areas were also included. Their neighborhood features and spatial distribution are illustrated in Figure 2. Specifically, Laochengxiang and Longhua Road are old Shanghai town center, which are famous for ancient temples. The Bund and the Hengshan-Fuxin are British and French concessions in the 20th century respectively, and both of them are successful cases of re-adaption for commercial use. Jiangwan is a cultural district where historical universities and institutions are located. Others belong to the category of public concession with many garden villas. Eight suburban areas are traditional Chinese Watertown. This versatile history and land use lead to different morphological features for each historical conservation area. However, they still have a common feature in that all of them are attractive to both foreign and domestic tourists.

Research Cases
Shanghai was chosen as an example of a historical city which has undergone fast urbanization. As one of the largest and most developed cities in China, tourism development is one of the main factors to promote economic growth in Shanghai. The total income of tourism and related services in 2018 was 50.9 billion CNY (Chinese Yuan), occupying 6.4% of the GDP of Shanghai [35]. Fast tourism development has brought pressure on historical areas. In this context, Shanghai Resources Bureau [36] has published a conservation agenda for twenty historical conservation areas (with twelve of them located in the city center) to promote a more sustainable balance between tourism and urban development.
Twelve urban historical conservation areas in Shanghai city center were selected for this research, as shown in Table 1, and they are divided into 25 subsets according to current zoning settings and land-use occupancy. Another eight suburban historical conservation areas were also included. Their neighborhood features and spatial distribution are illustrated in Figure 2. Specifically, Laochengxiang and Longhua Road are old Shanghai town center, which are famous for ancient temples. The Bund and the Hengshan-Fuxin are British and French concessions in the 20th century respectively, and both of them are successful cases of re-adaption for commercial use. Jiangwan is a cultural district where historical universities and institutions are located. Others belong to the category of public concession with many garden villas. Eight suburban areas are traditional Chinese Watertown. This versatile history and land use lead to different morphological features for each historical conservation area. However, they still have a common feature in that all of them are attractive to both foreign and domestic tourists.

Cellular Data Processing
The number of positioning points for tourists, commuters, and local residents were calculated using cellular data according to specific spatiotemporal patterns. The cellular database used herein was provided by China Mobile, an operator covering approximately 65% of mobile users in Shanghai [37]. This database consists of call detail records (CDRs) in Shanghai lasting 14 days in March 2018. Considering the balance between urban and tourism development in this study is a long-term pursuit which is not targeting peak days, we are mainly focusing on people's daily lives and how built environment features affect this issue. Therefore, we only involved the degree of balance in weekdays and weekends in the current study. In addition, positioning data recording the position of Internet of Things (IoT) has been moved out in the China Mobile's data preparation process. Considering these IoTs, e.g., shared bikes and electric metrics, usually obtain quite different movement routes, it is easy to identify the IoT records with the public's daily behaviors.

Cellular Data Processing
The number of positioning points for tourists, commuters, and local residents were calculated using cellular data according to specific spatiotemporal patterns. The cellular database used herein was provided by China Mobile, an operator covering approximately 65% of mobile users in Shanghai [37]. This database consists of call detail records (CDRs) in Shanghai lasting 14 days in March 2018. Considering the balance between urban and tourism development in this study is a long-term pursuit which is not targeting peak days, we are mainly focusing on people's daily lives and how built environment features affect this issue. Therefore, we only involved the degree of balance in weekdays and weekends in the current study. In addition, positioning data recording the position of Internet of Things (IoT) has been moved out in the China Mobile's data preparation process. Considering these IoTs, e.g., shared bikes and electric metrics, usually obtain quite different movement routes, it is easy to identify the IoT records with the public's daily behaviors.
Moreover, this study relies on the CDRs as the internet signaling data were not available in 2018. Nevertheless, the CDRs includes active confirmation data per hour, event data of phone messages, and location area code (LAC) switching data beyond a certain range. These three main sources of CDRs can provide 30-50 LBS records per person per day, which is enough for this study.
In total, around 1012 million cellular data records generated by 20 million users in two weeks were provided. The recording intervals of were not fixed but within 30 min. As shown in Figure 3, the medium service area of cellular base stations within historical conservation subsets in central Shanghai was 0.03 km 2 , while the medium area of historical conservation subsets is 1.04 km 2 , which enabled an acceptable estimating precision of users' location. The Voronoi diagram in Figure 4 indicates the fine-grained distribution of cellular base stations in central Shanghai. Figure 3, the medium service area of cellular base stations within his conservation subsets in central Shanghai was 0.03 km , while the medium area of h ical conservation subsets is 1.04 km , which enabled an acceptable estimating pre of users' location. The Voronoi diagram in Figure 4 indicates the fine-grained distrib of cellular base stations in central Shanghai.  Anonymous IDs allow recognition of data points generated by the same touris while the trajectory generated in a certain period of time is aggregated through the li range. These three main sources of CDRs can provide 30-50 LBS records per person per day, which is enough for this study.

As shown in
In total, around 1012 million cellular data records generated by 20 million users in two weeks were provided. The recording intervals of were not fixed but within 30 min. As shown in Figure 3, the medium service area of cellular base stations within historical conservation subsets in central Shanghai was 0.03 km , while the medium area of historical conservation subsets is 1.04 km , which enabled an acceptable estimating precision of users' location. The Voronoi diagram in Figure 4 indicates the fine-grained distribution of cellular base stations in central Shanghai.  Anonymous IDs allow recognition of data points generated by the same tourist [30], while the trajectory generated in a certain period of time is aggregated through the linkage Anonymous IDs allow recognition of data points generated by the same tourist [30], while the trajectory generated in a certain period of time is aggregated through the linkage of discrete personal roaming data point. Different groups of users were classified via positioning points following the criteria below: (1) First, we classified the local and non-local IDs at the scale of the whole Shanghai city. Those IDs which consistently stayed in the Shanghai at night were regarded as residents of Shanghai. IDs that travelled for outside to Shanghai were considered to be non-local visitors.
(2) Then, we identified the residential, working, and recreational places for each resident of Shanghai. The location where one ID stayed between 8:00 p.m. and 8:00 a.m. over a period of more than seven days was marked as the "home" of this ID. Correspondingly, where one ID stayed between 8:00 a.m.-8:00 p.m. for over seven days was identified as the place where this ID works. Beyond the residential and working place, the rest place where one ID spent most time was identified as the day's place of "recreation."  Figure 4 shows a series of trips and places visited for one local ID. Nanjingxi Road Historical Area could be identified as his or her place of recreation on one weekday. (3) Based on that, the four kinds of people related to the 33 historical conservation sites, i.e., local residents, local commuters, local tourists, and non-local tourists, were identified. Specifically, the local residents and local commuters are the IDs obtaining residential and working places in these historical conservation sites, respectively. The local tourists are the IDs obtaining recreational places in the sites. The non-local tourists are non-local visitors spending at least three hours in the sites. (4) With the help of ArcGIS, we were able to identify the number of people within 33 historical conservation sites. Specifically, the numbers of local residents, local commuters, local tourist, and non-local tourists were derived for each site. According to the proportion of residents, commuters, and tourists, the degree of balance of urban and tourism development among these historical conservation areas can be measured.

Visualizing the Degree of Balance among Different Historical Conservation Areas
The 'balanced' urban development here represents the situation historical conservation areas obtaining an equilibrium among the three groups of people, i.e., local residents, commuters, and tourists. Specifically, the identification of these three groups is achieved based on cellular data. The ternary graph ( Figure 5) is an analytical tool to visualize the 'balance' degree and express three groups of people as proportions. It is a barycentric plot on three variables which sum up to a constant, which is a widely used tool to show the composition of systems with three indicators [38]. In a ternary plot, the values of the three variables a, b, and c must sum up to a constant. Usually, this constant is represented as 1.0 or 100%. Because the three numerical values cannot vary independently-there are only two degrees of freedom-it is possible to graph the combinations of all three variables in only two dimensions. In recent years, some urban researchers have adopted the ternary graph as a tool to test equilibrium among three urban functions [39].  In this study, the proportion of three groups of people ( , , based on the above cellular data processing results. As shown on Fig  "HOME," "RECREATION," and "WORK" are represented on the of respectively, with each angle point representing the 100% ratio of "H TION," and "WORK." These data points are plotted according to the c The threshold ratio defining the balanced degree of historical sub In this study, the proportion of three groups of people (RH i , RT i , RW i ) were calculated based on the above cellular data processing results. As shown on Figure 4, the sides of "HOME," "RECREATION," and "WORK" are represented on the of RH i , RT i , RW i , scales, respectively, with each angle point representing the 100% ratio of "HOME," "RECRE-ATION," and "WORK." These data points are plotted according to the calculated data.
The threshold ratio defining the balanced degree of historical subsets in this ternary graph is based on the concept of tourist to resident ratio. This ratio indicates travel intensity, with the threshold for balanced tourism development depending on residents' attitude towards tourism [6]. Li, Xie, and Wang's [40] empirical study shows that historical areas with a ratio of tourist to local resident between 1.0 and 1.5 can be regarded as "balanced," where tourists and local residents can benefit from each other. Areas with a ratio above 1.5 tend to be tourism-oriented neighborhoods. Moreover, there is also some research on the job to housing ratio. Studies find that a job to housing ratio ranging from 0.75 to 1.25 leads to a "balanced" community [41,42]. Based on this, we propose a hypothetical ratio of 1:1:1 (33.3%:33.3%:33.3% in the ternary graph) to be the ideal balanced point for three groups of people. The balanced area based on this ideal point can be visualized as the grey hatched triangle with the buffer zone. In other words, the points within the grey hatched triangle close to the center are identified as balanced developed subsets.
The points located on the periphery of the graph indicate less balanced or unbalanced development areas. In addition, the color of data points indicates the ratio non-local tourists occupied among all the tourists in this area (RT i−NL ). Blue represents a higher ratio of non-local tourists, while yellow represents a lower ratio.
Moreover, areas were labeled as "always unbalanced" if the points are never located within the hatched grey area during weekdays and weekends. Areas were labeled as "sometimes balanced" if they are located in the hatched area for either weekdays or weekends. The historical conservation area which was plotted within the balanced triangle for both weekdays and weekends was classified as being "always balanced."

Detecting Contributing Features to the "Degree of Balance" through Statistical Analysis
A multinomial logistic regression was constructed to explore the relationship between related urban attributes and degree of balance mapped via the ternary graph. This statistical model was based on the Hedonic price model [43]. Specifically, the degrees of balance are typically regressed against a large set of predictor variables with external factors including urban scale features and internal factors including local facilities and functions based on PoIs and built environment features.

Independent Variables Related to Urban Development
Independent variables were selected according to elements related to urban management (Table 2). Detailed distribution of these variables can be checked in Figure A1 in Appendix A. First, urban scale features include distance to the city center (DISC) and Huangpu River (DISR). Renmin Square is considered a symbol of city center here. Second, local urban facilities and land-use functions serving both tourism and residences were included as well. Public transportation is an economical way to enable tourists to move around between different destinations. As mentioned by Le-Klaehn and Hall [44], most non-local tourists depend on public transportation, and tourist destinations with effective and accessible public transport networks have a higher possibility of attracting more attention from tourists. Therefore, the number of bus stations (BUS) and metro stations (MET) per square meters for each target subsets are included in the model. Moreover, hotels (HOT), which are an important factor in boosting local tourism, have been added as well. In addition, commercial space (COM), public service (PBS), and catering space (CTS) were included, as they can also promote tourism by integrating tourists with local residents through socializing and meeting in this space. These supplied services including urban infrastructure or amenities determine the lowest realized tourism levels [9]. However, some other built environment features related to urban planning management are included in this model as well, such as mean height (MH), area of the district (ADIS)/buildings (ABUI), and medium of urban block area (MEUAB). Street accessibility measured via space syntax was also used as the last predictor variable in this MLR models. Space syntax mainly focuses on detecting how spatial configurations affect behaviors within the urban environment. The measurement of choice is defined as the number of least-angle-change paths between all of the other links that pass-through a given segment [45]: where g ik (p i ) = the number of shortest paths between node p j and p k which contain p i , g jk = the number of all shortest path between node p j and p k . It reflects the potentials of travel for pedestrians or drivers [46]. Specifically, the choice value calculated by the sDNA software represents the accessibility of individual streets. Conversion from streets to historical conservation subsets was then achieved through buffer analysis in GIS.

Multinomial Logistic Regression (MLR)
The multinomial logistic regression model is estimated in RStudio using the "nnet" package [47]. P i1 , P i2 , P i3 are the probability associated with the choice of the three "degrees of balance" respectively by ith individual areas, which is represented as: Here, x i is the ith row of the model X, while β j is a vector of all the regression coefficient for each j (j = 1, 2, 3) categorical labels. P i1 (label: always unbalanced) is the baseline of this MLR model. Essentially, MLR could form two logits and be transformed into linear regression problem as fitting the two log odds: Namely, the proposed model could also be interpreted as the following form: The log odds of two "degrees of balance" = f (DISC, DISR, MET, BUS, COM, PBS, . . . , ACC).

Ternary Graphs as a Tool to Classify the Balanced Degrees for Urban Historical Conservation Areas
From Figure 6A,B, the points in the ternary graph show a movement trend towards the angle of "RECREATION" from weekdays to weekends, while the three points of QB, SY1, and HQ3 demonstrate the most significant movement trend. QB (Qibao Watertown), SY1, HQ3 (Shanghai Zoo) have the largest movement from Figure 6A,B, suggesting that these three have the largest increasing rate of tourists. However, TL1, YY1, and NJ2 remain almost the same from Figure 6A,B). High indicates that PL, PG, and QB have significantly higher proportion non-local tourism than the others. WT1 is near the Huangpu River and has famous wate front skyline scenery. LC1 is symbolic for YU Garden, Chenghuang Temple. Both WT and LC1 are famous tourism destinations of Shanghai.
of all the historical conse vation areas decreased from weekdays to weekends, likely due to the rise of local touris on weekends. HF1, HQ2, and SY2 are destinations preferred by locals rather than no local tourists. High RT i−NL indicates that PL, PG, and QB have significantly higher proportion of non-local tourism than the others. WT1 is near the Huangpu River and has famous waterfront skyline scenery. LC1 is symbolic for YU Garden, Chenghuang Temple. Both WT1 and LC1 are famous tourism destinations of Shanghai. RT i−NL of all the historical conservation areas decreased from weekdays to weekends, likely due to the rise of local tourists on weekends. HF1, HQ2, and SY2 are destinations preferred by locals rather than non-local tourists.
As shown in Figure 7, the development of HF, HQ, and JW is relatively homogenous from the clustered pattern in the spatial distribution of balanced degree. However, LC has quite significant divergence. LC1 and LC2 are residential-biased while LC3 and LC4 are balanced comparably.

Multinomial Logistic Regression (MLR) Classification
A multinomial logistic regression model was used to assess the marginal effect of urban features, i.e., urban scale features, local facilities and functions, and built environment features, on the likelihood of categories to be balanced. In Figure 8, due to the high correlation between ratio of local tourists of weekdays and weekends, distance to city center and Yangtze River, area of buildings, number of public services, catering services, and hotels, therefore, RT i−NL of weekdays, DISR, ABUI, HOT, CTS are removed to acquire the highest accuracy of prediction from the models.
All other urban features with statistical significance have small standard errors, indicating that these urban features play an important role in affecting the degree of balanced development. As discussed in Section 3.5.2, the coefficients in the table are for estimating log odds (β 2 for log P i2 ). Then, the exponentiation of the coefficient represents the odds ratio showing the constant effect of the specific predictor variable. If the exponential value is higher than one, the likelihood of choosing the category of nominator other than denominator of the odds will increase.
In Table 4, exp(β 2 ) and exp(β 3 ) are compared to identify variables that effectively distinguish "Always Balanced" subsets from the others ("Balanced Sometimes" and "Always Unbalanced"). The exponential value of DISC and ACC are around 1, which indicates that predictor variables like distance to city center (DISC) and local accessibility (ACC) have less influence on the classification. This also suggests that features on the urban scale have little impact on local balanced urban development. The count of bus stations (BUS), the total area of subsets (ADIS) and medium of urban block area (MEUBA) have negative coefficients with these two logits. With more bus stations and larger area of subsets, the probability of the subsets being always balanced increases. The count of metro stations (MET), the commercial spaces (COM), the count of public services (PBS), and the ratio of non-local tourists (RT i−NL ) have both exponential value higher than one. With more metro stations, commercial spaces, public services, and more non-local tourists, the probability of the subsets being always balanced decreases. The mean height (MHEI) has a different effect with the two exponential values, making it difficult to draw inferences.

Multinomial Logistic Regression (MLR) Classification
A multinomial logistic regression model was used to assess the marginal effect of urban features, i.e., urban scale features, local facilities and functions, and built environment features, on the likelihood of categories to be balanced. In Figure 8, due to the high correlation between ratio of local tourists of weekdays and weekends, distance to city center and Yangtze River, area of buildings, number of public services, catering services, and hotels, therefore, of weekdays, DISR, ABUI, HOT, CTS are removed to acquire the highest accuracy of prediction from the models.
All other urban features with statistical significance have small standard errors, indicating that these urban features play an important role in affecting the degree of balanced development. As discussed in Section 3.5.2, the coefficients in the table are for estimating In Table 4, ( ) and ( ) are compared to identify variables that effectively distinguish "Always Balanced" subsets from the others ("Balanced Sometimes" and "Always Unbalanced"). The exponential value of DISC and ACC are around 1, which indicates that predictor variables like distance to city center (DISC) and local accessibility (ACC) have less influence on the classification. This also suggests that features on the urban scale have little impact on local balanced urban development. The count of bus stations (BUS), the total area of subsets (ADIS) and medium of urban block area (MEUBA) have negative coefficients with these two logits. With more bus stations and larger area of subsets, the probability of the subsets being always balanced increases. The count of metro stations (MET), the commercial spaces (COM), the count of public services (PBS), and the ratio of non-local tourists ( ) have both exponential value higher than one. With more metro stations, commercial spaces, public services, and more non-local tourists, the probability of the subsets being always balanced decreases. The mean height (MHEI) has a different effect with the two exponential values, making it difficult to draw inferences.
The odds ratio of ( − ) in Table 5 with ( ) in Table 4 are compared to identify variables that effectively distinguish "Always Unbalanced" subsets from the others ("Balanced Sometimes" and "Always Balanced"). The number of metro stations (MET), mean height (MHEI), and the ratio of non-local tourists ( ) have both exponential values higher than one, indicating that areas with increasing number of metro stations, higher mean height, and higher ratio of non-local tourists on weekends tend to be unbalanced all the time. The medium of urban block area (MEUBA) shows less than one in both exponential values, which indicates that subsets with large urban block areas have a less likelihood of being unbalanced all the time. The count of bus stations (BUS), the commercial spaces (COM), the count of public services (PBS), and total area (ADIS) shows adverse effects with the two exponential values, which cannot be sufficient variables to differentiate levels here.  The odds ratio of exp(β 3 − β 2 ) in Table 5 with exp(β 3 ) in Table 4 are compared to identify variables that effectively distinguish "Always Unbalanced" subsets from the others ("Balanced Sometimes" and "Always Balanced"). The number of metro stations (MET), mean height (MHEI), and the ratio of non-local tourists (RT i−NL ) have both exponential values higher than one, indicating that areas with increasing number of metro stations, higher mean height, and higher ratio of non-local tourists on weekends tend to be unbalanced all the time. The medium of urban block area (MEUBA) shows less than one in both exponential values, which indicates that subsets with large urban block areas have a less likelihood of being unbalanced all the time. The count of bus stations (BUS), the commercial spaces (COM), the count of public services (PBS), and total area (ADIS) shows adverse effects with the two exponential values, which cannot be sufficient variables to differentiate levels here. Modeling classification performance was evaluated via a confusion matrix. In the following Table 6, a confusion matrix is presented for all classified reviews in three degrees of balance according to the above MLR models. The value of accuracy representing the proportion of subsets correctly classified divided by total observations is 81.8%, which indicates the model produces accurate results especially for identifying the always balanced subsets.

A New Perspective for Measuring the Degree of Balance between Urban Development and Tourism Planning
The issue of balanced tourism development is capturing scholarly interest due to the increasing number of cities facing conflict between tourism and local community development. Evaluating the current degree of balance is always a difficult task requiring complex data and analyses. Therefore, empirical analyses in this direction are relatively rare [12,13].
An analytical approach was employed, utilizing ternary graphs to develop a quantitative-based qualitative classification to visualize the degree of balance, which was an intangible issue in the past. As stated by UNWTO [6], balanced tourism and urban development should never be limited to tourism related industry. It is important to involve related stakeholders to ensure local residents have a thorough understanding of the positive effects of tourism. Based on a review of existing literature, this study extends previous analyses focusing on the tourist-resident ratio by adding commuters into the consideration to construct a ternary graph. This new approach is able to demonstrate the balanced condition of three stakeholders within the local community: the tourists, commuters, and residents.
In general, the ternary graph approach proposes a new perspective to monitor the degree of balance. This approach might be helpful for further strategies to protect unique local characteristics while stimulating cultural tourism [48]. The successful application in Shanghai historical conservation areas shows its potential to be applied in many other historical cities facing the same demand of keeping a balance between urban and tourism development. Insights achieved from this tangible and continuous analytical approach may assist efficient policymaking in both urban development and tourism planning.

Urban Management Implications towards Balanced Development for Historical Conservation Area
First, related management strategies promoting balanced urban development should not be simplified as the restraining of tourism industry near historical conservation areas. In turn, it is important to strengthen local-scale infrastructures which can be easily accessed and used by residents nearby. As mentioned in our study, a large number of metro stations contributes to a business-biased area and an unbalanced area, which agrees with existing empirical studies [49]. In turn, bus stations tend to show positive effects on balanced development. This might be because metro stations are usually integrated with business-dominant development areas, encouraging the large amount of tourist inflow [50]. However, bus stations are usually used by local residents and commuters rather than tourists. Therefore, appropriate urban management bringing benefits and accessibility for local residents might play an important role in promoting balanced urban development for historical conservation areas.
Existing urban management strategy often tends to remove all commercial facilities from the historical convention sites as intensive commercial development does bring negative effects. Our study confirms that too many commercial and public services can bring negative effects on balanced development of local communities. Nevertheless, there is no evidence that integrating any commercial and services will result in an always unbalanced area for the historical conservation area. It suggested that integrating moderate public and commercial services with historical conservation areas could encourage communication between residents and visitors and support all the stakeholders in the neighborhood, making this a desirable strategy for urban development.
In terms of built environment features, building height might lead to unbalanced local communities. Therefore, building height should be restricted, which matches with the consensus in urban planning [51]. In addition, our empirical study reveals that larger block size and area of subsets may promote balanced tourism development, which is contrary to the advocacy of smaller block areas and denser street networks [52]. This situation can be explained as slightly larger block areas indicating less pedestrian movement and commercial potential can keep residential life within the block. Therefore, relatively larger street blocks should be encouraged in the following urban management implications to avoid over-tourism in historical areas.

Large-Scale Analysis with a Human-Perspective: Advantages Based on Cellular Data
This study also revealed advantages of distinguishing people's spatio-temporal patterns based on a large-scale analysis of cellular data. Cellular data could be collected for millions of participants within urban regions. Meanwhile, details of individual behavior based on tracing routes were identified with an acceptable spatio-temporal resolution. In addition, the combination of spatial and temporal distribution provides an approach to draw personal profits of cell phone users. We are able to distinguish tourists, commuters, and local residents from millions of cell phone users with high accuracy. Compared with GPS tracking using a limited amount of GPS trackers [53,54], using a large-scale analysis with a human-perspective provides new research potential for identifying the tourists' behavior.
As stated by Brown [55], tourism development activity was always a responsive decision process instead of a preplanning outcome. This new analytical framework based on cellular data is capable of measuring the intangible balanced development and assisting in a more evidence-based and scientific-oriented decision support for urban management of historical conservation area.

Limitations and Future Steps
First, the situation that one individual owns multiple mobile phones does exist and is difficult to avoid. Due to the privacy protection requirements, all user IDs are encrypted and anonymous in the original data, and personal information cannot be obtained. Nevertheless, empirical studies show that the users holding more than one mobile phone are a minority and thus would not affect the result too much. Second, although China Mobile obtains the highest market share in Shanghai, current analyses that compare the number of mobile devices to the target population still pose some challenges and uncertainties. A combined verification of cell phone data from all three mobile communication companies with other Internet LBS data might help to address these two limitations in the future. Third, due to technical restriction, spatial resolution of cell phone positioning is relatively limited. Adding cell phone GPS records could be a supplement to provide higher accuracy in future studies. In addition, the "balanced" area identified in a ternary graph is based on a hypothetical ratio generated from previous literature, which has some space to be refined. An integrated study could be made in future studies to provide more concrete information in this direction. Moreover, the two-week time span is relatively short. A longer observation covering at least one whole year would help understand the effect of seasons in our next study.
In addition, current data-driven approaches which rely on large amounts of cellular data cannot reveal people's motivation and detailed behavior effects. Instead, we firstly classify the people into groups, i.e., residents, commuters, and tourists following the selection principles. Based on the hypothesis that the people within one group will share more similar behavioral patterns in local communities, we then evaluate their impact on balance degree of local communities. The similar analytical pattern has been widely applied in a series of studies [31,[56][57][58]. Although it cannot provide detailed behavior records for everyone, this data-driven approach is still able to reveal a big picture with the help of big data.
Nevertheless, it is a pity that personal motivation and behavioral effects were not involved in current analysis. Classical methods, e.g., public survey and in-depth interview, can address this issue well but they are time-costly and can only be applied within a small scale. Large-scale cellular analysis is capable of covering a whole city but would miss the concern of personal motivation and behavioral effects. We hope future technical innovations would help to achieve the co-present of personal motivation, behavioral effects, and spatial-temporal distribution, and finally promote a more comprehensive analysis.
It is also worth mentioning that current definitions and measurements of the balance degree between urban and tourism development are from previous studies. The changing of balancing standards may cause different results. Current standards applied in Shanghai might not perform well in other cities, as cities among different cultures and regions may obtain different criteria. Nevertheless, this limitation might not affect the key contribution of our study. As we claimed in the abstract, the focus of this study is to explore a methodological approach for measuring the degree of balance between urban and tourism development. This analytical approach based on cellular data works well, as shown in the current paper. However, it is important for us to consider the setting of standard when we apply it in various kinds of cases.

Conclusions
The main contribution of this research is creating an analytical framework using ternary graphs to evaluate the degree of balance between urban development of an area with tourism resources, which was previously hard to visualize and measure. This paper has also identified that some noteworthy features impact balanced development of local communities. An understanding of these urban features will assist efficient decisionmaking to promote balanced urban development.
The framework is not targeted at restraining the development of tourism for the historical conservation area, but at keeping all the invested infrastructure balanced and fully functional. Balanced tourism has been shown to not only promote local economyeconomies and offer employment opportunities, but also correlate closely with socio-economic development of cities and the convenience of life of local residents. Moreover, some urban planning strategies need to be modified to foster more balanced tourism.
Using cellular data in urban management brings a new perspective to measuring the degree of development balance for historical conservation areas. Future research should build a more comprehensive user portrait with combined application of fine-grained cellular data and multi-sourced urban data to assist data-informed urban management.