1. Introduction
Urban green spaces, parks in particular, offer numerous advantages for people’s well-being, such as enhancing their psychological and social well-being, as well as their physical health [
1,
2,
3]. Visits to green parks provide the opportunity to directly experience the extra benefits of “natural” environments, especially for people with limited contact with nature. Consequently, quantifying the visits of residents to urban green spaces, and recognizing the factors that influence their visits, are vital for urban park management and planning. It is important to measure visitation to parks and urban green spaces to recognize people’s recreational interests and to define the factors that influence them. Visitor polls, on-site counters and direct observation are the traditional methods for estimating the number of visitors [
4,
5]. These methods of organized observation usually pick a representative sample of urban green spaces and collect information regarding the use of green parks, for example, visitation numbers and features, as well as the activities and behavior of green park visitors. However, these approaches are typically time-consuming and site-specific, and therefore have restricted spatial coverage [
6]. The advent of free social media data offers new and different methods to analyze urban green park visits. Data collected from social media, and other “big data” as reliable sources, grow in size every year, and can be utilized to investigate how the public interact with actual environments, and to accurately assess individual preferences through time and space [
7].
Green areas or parks are obviously the best way to encourage people to take part in physical exercise. In this research, we analyzed visitors’ spatiotemporal behavior in green parks, taking into account 122 green parks with nearly 250,000 visitor check-ins, using the location-based social network (LBSN) service named Sina Weibo, also known as Weibo [
8]. Weibo was created in August 2009 and is one of China’s largest social media sites, as well as a popular microblogging platform that allows people to post and share their daily activities with their circle of friends. The term “check-in” refers to a user willingly sharing their location on the LBSN, by publicizing it on a location-sensing smartphone or tablet while participating in certain activities [
9].
In this research, we examined the effect of visits to green spaces on users’ check-in behaviors. We utilized a sample of data to research green park-related check-in behaviors, including likes and dislikes, as a summary of the behavior or preferences of the general public in terms of various activities while using park resources. We also implemented the linear regression model by including variables that have a mild correlation with the dependent variable. Similarly, the aim of the study was to respond directly to the following questions: (1) What type of parks do people choose to visit, and what are the characteristics of each category of park? (2) How often does the time of the day influence the behavior of the general public in relation to green parks? (3) How does the season influence the behavior of the general public in terms of green park visits? (4) Does gender have an impact on visits to green park? To properly grasp the link between green parks and visitor behavior, we analyzed users’ spatiotemporal trends in reference to urban green spaces by using a large-scale dataset that covered the period of July 2014 to June 2017. The research was conducted within Shanghai city in south-east China. Our main results indicate the distribution of users’ check-ins in different green parks, and exciting behavioral changes based on the time of day (i.e., a 24-h period) and the day of the week. We also investigated seasonal impact on the behavior of the public in relation to green spaces to show real-life patterns. Through examining these factors, we were able to analyze the use of urban green parks and the subsequent variance across a range of temporal scales. This method of capturing visitor activity in urban green spaces is an effort to solve the drawbacks of previous studies, and it paves the way for socio-ecological research using crowd-sourced and social network data, instead of relying on the results of subjective coverage and observational data.
2. Literature Review
Urban areas are generally marked by downgraded ecosystems, rising environmental damage, higher temperature and decreasing urban green parks. Urban green spaces (UGS) are important for improving the quality of life in metropolitan areas, for balancing thermal comfort and providing an appropriate thermal budget [
10], and for helping the public heal from the physical and emotional pressures of their daily lives [
11]. Green spaces in urban environments are vital for improving living conditions in urban areas by improving the quality of the air and the area’s esthetics, which eventually leads to higher real estate values and a reduction in energy use for cooling. UGS can also be applied as a cumulative resource for the development of sustainable urban areas [
12]. UGS also provides children with much-needed space to play, which is significant for their social, cognitive and mental functioning [
13].
Urbanization has become one of the worldwide agendas for development. Based upon the United Nations’ Sustainable Development Goal 11 (SDG 11)—which seeks to expand human settlements and urban communities and to ensure that they are safeguarded, strong and feasible—the United Nations expanded the sustainable development goal (SDG) Agenda by implementing the New Urban Agenda in 2016 [
14]. Urban sustainable development and social justice rely on different requirements, including equitable spatial distribution, planning, environmental facilities, urban strategic planning, quality of green space, and socio-economic facilities. The social benefits issued by UGS to urban residents are crucial for maintaining and increasing the personal choices of urban residents [
15]. Public urban green spaces (PUGS) are important for the mitigation of high summer surface temperatures [
16], and are also essential for the elimination of pollution and for decreasing noise levels [
17]. The temperature difference among urban and green areas is high in summer and low in winter. Moreover, in summer, the difference is greater during the day than during the night, while in winter, the opposite is true.
In recent years, several analytical efforts have been made to use social media data for the fields of application and urban planning, the purposes of which are dynamic, and range from basic tasks to quite complex analysis, for example, urban form and feature verification. In reality, Facebook, Weibo, Twitter, and the other big data networks are utilized to assess the mobility and behavior of people, with calculations ranging from the inter-urban to the global scales [
18]. However, it would be extremely difficult to determine the finer spatial and temporal stages of these two phenomena by utilizing traditional methods, such as questionnaires or on-site observations. Furthermore, LBSN data can be utilized for socio-spatial analysis [
19] using techniques such as the absorption of user’s tweet components, or by exploring their feelings and disparities over space and time [
20], also taking into consideration health factors such as physical exercise or diet. Campagna [
21] implemented the idea of “Social Media Geographic Information” as a means to investigate people’s insights, opinions and interests in space and time in order to promote spatial planning and geo-design by using space–time analysis. By using various sources of data (except for survey data), such as cellular data, it is feasible to better define the spatiotemporal elements of urban environments [
22]. Meanwhile, other authors have worked on smart city transportation, technologies and applications [
23,
24,
25].
The kernel density estimation (KDE) method has been used to spatially model geolocation data, and it offers an additional particular and comfortable structure for the spatial estimation of density. KDE has also been used for the assessment of environmental attributes, for example, healthcare properties [
26], the diet environment [
27] and green space access. It boosts up the rate of access, or measures distance by transforming point data into something similar to a nonstop surface, thus allowing the density of the characteristic to be measured at any point onto the map’s surface [
28]. The researchers in [
29] investigated the relationships among location, walking speed and adequate levels of physical activity by utilizing KDE. KDE was also utilized to research sites such as food stores in connection with factors such as overweight, body mass index (BMI) and dietary intake [
30]. In [
31], the authors availed the KDE approach to analyze the relative importance of external factors correlated with temporal and spatial user distribution in urban green spaces. The identification and analysis of accidents or catastrophes and their effect on daily urban development procedures also offers another interesting potential usage of social media network data [
32]. The authors compared KDE and point density in their research [
33].
3. Materials
3.1. Study Area
As part of the alluvial plain, Shanghai and the Yangtze River Delta have an average elevation of around 4 m. From east to west, there is a slight terrain gradient. The land is plain, except for some south-western foothills. Shanghai is part of the climate zone for the north subtropical humid monsoon. There are four seasons, with abundant sunshine and rainfall. Therefore, in such a metropolitan area, extreme events and urbanization have had a substantial influence on public health promotion services and the country’s economy. Shanghai is indeed the world’s 10th, and most prominent, agglomeration region. Therefore, the process of urbanization is relatively quick. In the United Nations’ (UN) future forecasts of urbanization, it was stated that Shanghai’s urban population ranks as the world’s second largest, and China’s first [
34].
In 2016, Shanghai was divided into 16 regions: 15 districts (i.e., Baoshan, Changning, Fengxian, Hongkou, Huangpu, Jiading, Jing’an, Jinshan, Minhang, Pudong New District, Putuo, Qingpu, Songjiang, Xuhui and Yangpu) and one county (i.e., Chongming) [
35]. Seven regions (i.e., Changning, Hongkou, Huangpu, Jing’an, Putuo, Xuhui and Yangpu) are located in Puxi (literally, Huangpu West). These seven areas are known as Shanghai’s downtown or city center [
36]. The study area in this research includes 10 districts (i.e., Baoshan, Changning, Hongkou, Huangpu, Jing’an, Minhang, Pudong New District, Putuo, Xuhui and Yangpu), as shown in
Figure 1. The locations of the green parks can also be seen within the study area.
3.2. Dataset
The dataset that we used to examine the number of visits to green parks came from the Chinese popular micro-blog Weibo. Weibo is considered to be comparable to China’s Twitter, which is the major social media blog in China. Weibo has a massive number of customers, representing the biggest dataset of obtainable geotags. According to the latest Weibo annual report, it was announced that above 500 million active users were registered on the platform in 2018, and it reached 462 million monthly active users in December 2018 [
37]. The demographics of Weibo users are inconsistent with the total population; thus, it was decided to launch the public interface of the Weibo Location Based Service (LBS) on 28 May 2012. Weibo users have since been able to share their location on the internet in real-time. As a kind of scalable and available large-scale crowdsource dataset, Weibo sign-in data were the most appropriate that we could obtain to estimate actual park visitors. However, an earlier study of 87 city parks situated in Shanghai, China showed that there is an important link between data from Jiepang’s social media platforms and official visitor numbers [
38].
In addition, earlier research has already revealed that Weibo data offer a perfect representation of the interests and behaviors of the individuals in urban areas [
39]. Using the check-in data of Weibo [
40], a new model was proposed, which integrates the parameters of urban environmental function networks, thereby enriching the definition of the structure of urban networks. However, while using Weibo check-ins as a proxy for visits is still uncommon, earlier research utilizing data from similar social media sites or platforms (for example, Facebook, Flickr and Instagram) found important beneficial ties amongst official visit statistics and the number of visitors reported on these sites [
6,
41].
Figure 2 represents the criteria that we deployed for collecting the check-in data.
The Weibo application program interface (API) helps in the processing of data collection. The dataset was obtained from Weibo for a time frame of 3 years, from July 2014 to June 2017. After retrieving this location information, we found that some locations that we had added were not green parks (e.g., sidewalks, former homes of famous people, and sculptures). We checked these locations one after another and removed those that were not in the green park category, but we still counted the parking lots or park areas linked to green parks. Some larger parks have more than two location IDs, for example, garden area, kids’ play areas and barbecue areas were all combined into one location ID [
42]. After pre-processing, filtration and cleaning, a total of 250,632 geo-tagged visits to green parks in 10 districts of Shanghai were included. The data were collected with the help of the programming language of Python (version 2.7.12), and were filtered for exclusions such as invalid records and fake users, including:
The geographical location of data should exist only in Shanghai;
The lowest number of check-ins per green park should be 100 within the time period of the study;
Every record should have a geo-location (latitude and longitude), user id, time, gender, day, month and year;
Parks that are separated into several geo-locations inside the green spaces were combined into one geo-location.
3.3. Park Type Classification
The green parks in Shanghai were divided into six commonly selected categories: (1) community parks (
n = 30), (2) cultural relic parks (
n = 8), (3) large urban parks (
n = 12), (4) natural parks (
n = 09), (5) neighborhood parks (
n = 46), and (6) recreational parks (
n = 17) (
Table 1). These parks were classified on the basis of their dominant functions and different kinds of administration [
43,
44].
Figure 1 reveals the distribution of the different types of urban green parks throughout the study area. Amongst the 122 green parks in our study area, the greatest proportion belonged to neighborhood parks, followed by community parks.
4. Methodology
4.1. Data Preparation
The data that we collected cover all check-ins from July 2014 to June 2017 made within Shanghai’s boundary. The data downloaded were included in many JSON files.
Figure 3 represents the data preparation process.
In this study, the Weibo dataset included information such as a unique user ID and check-in date and time. In addition, information about the geographic location (latitude and longitude) and gender was collected through the Weibo API. The LBSN dataset therefore assumes that daily trends are evidence of users’ daily activities, behaviors on social media, and spatiotemporal patterns [
45]. A typical Weibo “check-in” is represented as: check-in (B2094554D064ABF44293) = {1758115961, ####, B2094554D064ABF44293, Mon July 25 14:47:41 +0800 2016, m, 121.484566, 31.270601}, where B2094554D064ABF44293 denotes “location_id,” 1758115961 denotes “user_id,” #### denotes the “user_name,” Mon July 25 14:47:41 +0800 2016 denotes “day, month, date, time, and year,” m denotes “gender,” and 121.484566, 31.270601 denotes the geo-location. JSON is a Java platform programming platform format, which is the most commonly used data format though Java, and is considered to be the main programming language with open source accessible reader and writer modules. Using the selected software [
46], the data were filtered into a CSV (comma-separated values) file format, so that all user data, which include geo-locations, could be identified and stored in the database regardless of the publication date.
Table 2 shows an example of a “check-in” in CSV format.
In view of the heterogeneity problem, only green parks that include more than 100 check-ins were chosen to establish the user sample, in order to confirm a fairly high level of representativeness.
4.2. Social Media Data Analytics
In this research, we analyzed Weibo-based geo-location datasets in 10 districts of Shanghai, China (July 2014 to June 2017).
Figure 4 shows a check-in behavior analysis framework in which the LBSN data analysis method includes the framework, the data pre-processing and cleaning, the temporal and spatial analysis of the LBSN data, and the statistics that indicate the worth of the LBSN data.
Across the three-year period, the number of visitors to the urban green spaces was obtained as statistical data. A spatiotemporal assessment, utilizing statistical graphs and tables, was performed based on the results, and the distribution of check-ins was examined using the KDE method.
4.3. Temporal Analysis
With the purpose of tracking variations in user behavior, we divided the check-in time stamps into different time classifications—daily, weekly and seasonal. The daily pattern shows the hourly distribution of check-ins during the day, and the weekly pattern shows the weekly distribution of check-ins. Seasonal trends and climate factors were also taken into account, as they may influence the characteristics of green parks. Winter trends could provide significant data regarding park usage throughout the colder months. We did not define seasons by different dates in our research, but as a substitute used simple categories based upon months: March–May is spring; June–August is summer; September–November is autumn; and December–February is winter.
SPSS v25 was used for statistical analysis. SPSS is a frequently-used program in social science for performing statistical analysis to solve various study problems, and it is used by industry and health researchers, survey agencies, government education specialists, data mining companies, marketing agencies and more. It provides various methods, including hypothesis checking and reporting, ad-hoc analysis and data management to facilitate analysis. We used Table 2019.2 for its visualization techniques to explore and analyze relational databases and data cubes.
4.4. Statistical Analysis
To assess the significance of the explanatory variables, it was necessary to statistically explore the predictors (i.e., explanatory variables) and their effect on the response variable (i.e., the number of check-ins). For this model, we used the following regression equation:
Table 3 displays the parameters and the explanatory variables used in our regression model.
Afterward, by applying the linear regression model, our fitted value equation converted to:
We implemented the linear regression model by including variables that have a mild correlation with the dependent variable; in our scenario, the dependent variable was the number of check-ins, with correlation values from 0.10 to 0.50. As for the model’s inference, the
p-value of the F-statistic model showed that the model is itself significant. It must be remembered that not all predictors have a significant
p-value, since the model was built using
Table 4’s highest adjusted R
2.
Table 3 defines the model coefficients, in which the modification shows that the number of check-ins increased by an average of approximately 2.73% for each unit, with a very small
p-value. Likewise, for each unit of Huangpu, Jingan and Yangpu that increased, the average check-in period increased, with very small
p-values, by approximately 1.84%, 1.38% and 1.66%, respectively. Analysis of ANOVA has been represented in
Table 5.
All independent variables are significant predictors that rely on the
p-values [
47], as revealed in
Table 4. For the statistical analysis, we used the statistical programming language R [
48] and the program RStudio [
49] to perform basic descriptive and regression analysis.
4.5. Spatial Analysis
We utilized the KDE method to create a smooth surface density for check-in hotspots in the geographic area. The KDE method is a non-parametric estimation technique for determining the density of a random sample of data [
50]. KDE smooths each data point into small density bumps, and then all of these small bumps are combined to make a final estimation of the density. KDE is widely accepted for spatial distribution [
51,
52,
53], and it describes the spatial density distribution combined with the distance–decay effect, and projects hotspots by transforming scatter point data into a continuous density surface [
54]. KDE is an evolving spatiotemporal technique that has been used previously [
55,
56] to inspect several features of social media (but not restricted to LBSN) data analytics, such as users’ online activity and movement trends [
57], check-in behavior [
58], city boundary descriptions [
59,
60] and point-of-interest recommendations [
61]. It also explores the distribution of destinations in communities, enabling researchers to see where there are densely scattered destinations, and where they are more sparsely scattered. Eventually, this method seeks to create a smooth surface of density within the geographical space of spatial point cases [
46]. The authors utilized the KDE method for the analysis of spatiotemporal patterns in green parks [
62,
63].
KDE can effectively calculate the visitor density spatial structure within an area of study. KDE is a statistical method for estimating a smooth and continuous distribution from a small number of observations [
64]. The data taken into account in our analysis were in the form of geo-tagged check-ins. Let
E be a collection of historical data for check-in, i.e.,
E = {e1, …, en}, where
ei = <x, y> is a check-in geo-location
1 < i < n, of individual
i and at time
t, where
E represents the dataset we used. The total of the kernel’s functions was scaled to construct a smooth curve, i.e., a unit field. This resulted in a bivariate of KDE in the following form:
where
e denotes the check-in location in dataset
E, along with bandwidth
h.
h is supposed to be reliant on the estimated density
fKD, producing a smooth density surface around
E at the data point
ei.
ArcGIS 10.0 was used to evaluate the spatial distribution of the check-ins in space. In particular, ArcGIS 10.0 (Environmental Systems Research Institute, Inc., Redlands, CA, USA) software, with a 2016-developed Shanghai map using the WGS 1984 geodesic coordinate system, was used. The base map also included the major transportation lines (i.e., the line layer) and the administrative districts (i.e., the polygon layer and the district layers) with the new OpenStreetMap subway lines and entries (i.e., the point layer).
5. Results
Shanghai city is among the fastest growing metropolises in the world, with a population of 22,125,000 per 4015 km
2 [
65]. The total number of green parks within the city of Shanghai is 366 [
66], which encourages the city’s inhabitants to participate in various healthy activities. For this analysis, 122 of these green parks were chosen after processing the data collected from Weibo. The distribution of the different categories of green parks included in this research is shown in
Figure 5, where the different colors reveal different categories of parks.
We utilized KDE to examine the spatial distribution of the check-in data, and ArcGIS for visualization, to investigate the Weibo geo-location check-in data.
Figure 6 indicates the overall check-in density in Shanghai between July 2014 and June 2017, where the areas colored in red display a higher human density, a higher level of activity, and a higher percentage of social media usage. It is therefore no surprise that the downtown green parks have large clusters of activity.
Figure 7 presents the overall check-in density in all of the different categories of parks; the parks were divided into six categories, and our results reveal that the most check-ins were found in neighborhood parks and recreational parks as the research question was raised, and so the density is higher in these categories. The reason behind this is that the number of neighborhood parks is higher compared to the other types of parks, and they are located near residential areas.
Figure 8a shows the temporal differences in the number of visitors during a 24-h period. Although the visitors accessed the parks during all periods of the day, the maximum number of check-ins were made at 4:00 p.m.–6:00 p.m. and at 10 p.m. for all of the parks considered in the research. This trend continues to increase until midnight. These results indicate that Shanghai’s green parks are fast becoming popular recreational destinations for the people of Shanghai.
Figure 8b shows the percentage of the total number of check-ins, including a record of every hour, and it also testifies that most people like to visit neighborhood parks up until midnight.
Although a nearly equal number of check-ins were made every weekday, more check-ins were made on Saturdays and Sundays, and this weekly pattern is consistent across all categories of parks.
Figure 9a describes the general pattern of weekly check-ins, while
Figure 9b reveals the weekly trend in all categories of green parks, further highlighting that the most check-ins were made on weekends across all park types.
We investigated the data based on gender (i.e., male and female) in 10 districts of Shanghai, to examine the check-in rates and behavior.
Figure 10a, displays the overall check-in pattern among the districts. Moreover,
Figure 10b reveals the number of check-ins across the different categories of parks. The important thing to note here is that female visitors were more active users of Weibo in contrast to male visitors.
A comparison of genders in terms of their check-ins (regarding both frequency and behavior), across week days and districts, was used to investigate the differences among male and female visitors in Shanghai.
Table 6 and
Table 7 represent the outcomes of this comparison across the days of the week, seasons, districts, and categories of parks.
The seasonal differences in visit check-ins to green parks were investigated for autumn, winter, spring and summer. In accordance with similar studies, significant seasonal differences in user check-ins were identified [
62,
67]. According to the research question, the seasonal pattern shows a higher percentage of check-ins in green parks throughout the summer and spring. It is worth noting that check-ins throughout the winter were slightly lower than in autumn.
Figure 11b represents the number of check-ins across the different categories of green parks during the different seasons, and shows that neighborhood parks and recreational parks dominate the number of check-ins in all seasons.
Finally, the overall statistics regarding the number of check-ins are shown in
Table 8, separated by seasons, districts and daily trends.
6. Conclusions and Recommendations
In this research, we applied big data methodology to revisit the issue of environmental justice linked to the spatial provision of urban green parks in Shanghai. We utilized geo-tagged Weibo check-in data as a park-visit indicator within 10 districts of Shanghai, and investigated the number of check-in visits accordingly. We focused upon the different categories of parks in Shanghai because, to the best of our knowledge, we are the first ones to address this problem in such a highly crowded and metropolitan area. We achieved an in-depth experimental analysis of check-in behavior in the current research, which used intensity maps and patterns from LBSN data. The conclusions reveal the distribution of users in parks by analyzing the check-in data, and the findings show that neighborhood green parks are much more crowded than other green spaces. Analysis of the seasons’ impacts upon people’s behavior toward green spaces in different categories of parks shows that the number of check-ins is much higher in summer and spring, compared to autumn and winter. The total number of Weibo users, depending on the hour of the day, shows that the peak time to visit green parks is from midday to midnight. Lastly, gender-based variations were measured in relation to green parks visits, and the findings reveal that female visitors are more involved in their use of social media services when visiting green parks.
Kernel density estimation is a technique for defining the probability density function, and is a must-have method that allows the user to examine the studied probability distribution more effectively than when using an old-style histogram. The core technique, different from the histogram, provides a uniform estimate, uses the locations of all the sample points, and, more importantly, involves multimodality. KDE is a function in which events are balanced according to their distances and the two necessary parameters. The first of these is bandwidth, the control distance. Bandwidth selection has a great influence on performance. The second parameter is the K-weighting function, more often a normal function. The kernel bandwidth is a free parameter that indicates a strong effect on the resulting estimate. Comparing to the commonly used histogram, the kernel density estimator offers many advantages. It is a smooth curve, and therefore shows details better, and uses the locations of all sample points, so the information contained in the sample should be better represented. KDE estimates smooth distributions by not including local noise to a particular degree, which minimizes flaws by providing a non-parametric probability distribution with optimal bandwidth.
The wide spatial coverage of this research offers valuable information, which can improve urban green space development and planning in other major cities. The results indicate that planners must start paying more attention to the importance of small neighborhood green parks in urban green space arrangements. This article provides encouragement for the entertainment value of these parks, by defining their high check-in visitation intensity. The presence of accessible, well-maintained, small green spaces in the urban park system is also crucial for meeting the recreational needs of local residents. More support services, including the area and location of the green park, can charm more visitors, as this research reveals that green urban parks in the city center or the downtown area attract more visitors than parks in the other parts of the city. These findings can also be useful in the urban development of smart cities for green spaces, by considering visitors’ preferences. In addition, similar research must be carried out in other megalopolis cities to assess the common issues affecting the utilization of urban green spaces.
7. Limitations and Future Work
Based on the outcomes of the present study, LBSN data have the ability to deliver a new outlook, in addition to providing observations of gender differences and check-in intensity. LBSN check-in data have some huge benefits, such as high spatial accuracy and minimal cost. However, some constraints are associated with this type of data, such as low sample size frequency, gender bias and location category bias. In conclusion, LBSN data is more likely to be supplementary to, than a replacement for, traditional sources of data. When implementing similar techniques, social media data that are shared or posted by green park visitors can be further understood to help visitors realize their feelings or sentiments, as well as to evaluate the numerous benefits offered by urban green parks.