1. Introduction
Thanks to the high-speed development of the economy driven by reform and open policy, China has experienced the largest urbanization process in human history [
1]. Although the rapid urbanization process improves people’s material conditions, it also brings many problems to urban planning and development, such as the single function of urban fringe areas and excessive traffic congestion within the city, which brings inconvenience to people’s daily life and travel [
2]. Disorderly urban expansion also poses threats to the natural ecological environment and disrupts ecological balance, resulting in economic, social, and ecological problems [
3]. Therefore, the functional zoning of cities is essential for their scientific and reasonable construction and planning. China’s latest policies emphasize the importance of urbanization and stress the need to identify and divide the functions of urban land use. This approach not only facilitates rational construction and planning of the city but also contributes to the optimization of urban layout, efficient distribution of urban resources, and the formation of a more efficient and orderly urban structure layout. This provides a solid foundation for the steady progress of urbanization [
4,
5,
6,
7].
The traditional methods of identifying urban land use functions rely on data from population census, land census, industrial statistics, and socio-economic statistics. However, these sources often involve confidentiality, making them challenging to access and resulting in lengthy update cycles. This greatly reduces the practicality of experiments, leading to research that often remains at a macro level, and is difficult to widely apply. With the advent of the network era and the rapid development of communication technologies, big data has emerged as a valuable resource. This includes various types of data such as remote sensing images and environmental and meteorological data obtained through professional scientific research equipment. These big data have advantages such as high accuracy, wide coverage, strong timeliness, and low acquisition cost [
8]. Compared with traditional remote sensing images, the network big data obtained through communication devices has better timeliness and microscale, including POI (point-of-interest) data, GPS (global positioning system) trajectory data, mobile phone signaling data, and bus card swiping data that can be obtained from users’ daily lives [
9]. Currently, more and more scholars are analyzing urban space by mining big data, focusing on exploring urban economic activities, resident mobility, and resident traffic trajectories from a micro perspective. For example, Frias Martinez et al. used Twitter’s social media activity data to analyze urban land use functions in Manhattan, London, and Madrid using clustering analysis [
10]; Jiang et al. used the frequency of POI data in the Boston metropolitan area to identify land types in cities and estimate employment distribution density based on this data [
11]; Hu et al. used Landsat remote sensing image data and POI data to achieve the division of functional land in Beijing [
12]; Ye et al. combined social media data with street level remote sensing image data to identify urban land use functions in Beijing [
13]; Liu et al. used time series data of taxi travel trajectories to identify the functional layout of land use in Chengdu City [
14]; Sun et al. achieved recognition of land use functions in Beijing by analyzing text information on Sina Weibo [
15]; and Huang et al. divided the urban functional areas of Beijing and Wuhan by combining nighttime lighting data and daytime multi-perspective remote sensing image data [
16]. With the progress and development of technology, the acquisition and analysis methods of big data are becoming increasingly perfect, which can further explore urban spatial layout and intuitively reflect urban spatial structure.
With the maturation of big data mining technology, many sample data and precise positioning information have been excavated in cities. The integration of big data with urban spatial analysis and the comprehensive analysis of hidden information has emerged as a research hotspot [
17]. For example, Yuan et al. used the time, location, and type of crowd activity to represent the potential activity trajectory of the crowd and combined it with POI data kernel density analysis to identify the urban function of Beijing [
18]; Feng et al. combined the Logic regression model with POI data to extract the urban functional zoning of the Wuhan urban area [
19]; Zhai et al. combined the Place2vec model with the K-means clustering algorithm to partition the functional layout of land use in Wuxi City, with an overall accuracy of 74.24% [
20]; Yan et al. used the KD-Tree clustering algorithm and Tyson polygon algorithm combined with POI data to accurately partition urban functional areas [
21]; and Sun et al. used Word2vec, LDA, and Block2vec models to identify land use functions in the central urban area of Wuhan, which also confirmed that the Word2vec model has the best recognition accuracy [
22]. In addition, the TF-IDF (term frequency inverse document frequency) quantification method based on machine learning has also received the attention of some scholars [
23,
24,
25]. The TF-IDF algorithm is a weighting technique used for key information in data mining. This method has unique advantages in analyzing policy texts at the macro level and extracting keywords of different importance levels [
26].
Theoretical research on urban land use functions has reached a relatively mature stage, and technical methods have undergone significant transformation. However, current research on identified urban functional zoning has primarily focused on applying new urban planning technologies, evaluating current layout features, and identifying functional issues. There is a lack of in-depth analysis of the identification results and insufficient connection with regional urban planning policies and development models. As a result, there is a challenge in providing feedback for future planning and design, as well as offering guidance for urban formulation and management. Urban function identification based on POI data often stems from surveying and mapping, cartography, and related disciplines. In the current era of land spatial planning, it is essential to conduct research on land use functions from the perspective of urban planning [
27]. Moreover, current research primarily relies on manual comparative statistical analysis for the accuracy evaluation of the final recognition results, which is significantly influenced by the researcher’s personal subjective will. The new first-tier cities are a list of 337 prefecture-level-and-above cities in China, evaluated and analyzed by First Financial News based on five major indicators: commercial resource agglomeration, urban hub, urban human activity, lifestyle diversity, and future plasticity. These new first-tier cities have strong representativeness, and their development process can be seen as a microcosm of China’s urbanization process to a certain extent. However, current research on new first-tier cities predominantly focuses on cultural, commercial, financial, and other fields, with limited attention to the functional layout of urban land use. Furthermore, existing research on the functional structure layout of urban land use is primarily confined to the study of individual cities, with inadequate comprehensive analysis and research on the functional layout of land use across multiple cities.
This study is grounded in OSM data and POI data to identify the current land use functions in the central urban areas of five new first-tier cities. The aim is to enhance understanding of the current situation and distribution pattern of land use functions in major Chinese cities and to provide a reference basis for optimizing the allocation of urban and national spatial planning. Furthermore, building upon the results of urban land use function identification, this study conducts an analysis from three perspectives: urban functional spatial elements, urban functional mixing degree, and urban structural similarity, to ascertain whether there is a certain regularity in the land use functions of these five new first-tier cities. The objective is to establish a data foundation and provide theoretical support for the formulation of planning policies for underdeveloped cities in China, thereby ensuring the steady and smooth progress of urbanization in the country.
  6. Discussion
  6.1. Feasibility Analysis of Methods
Previous studies have demonstrated the advantages of multi-source big data in urban land use research and refinement [
42]. Among these, POI data represents urban functions in daily life in spatial point form, encompassing spatial information such as longitude, latitude coordinates, and addresses of geographic entities, along with various attribute information like main categories, administrative divisions, and names. Compared to other big data sources, POI data is closely related to land use types, has fewer privacy concerns, is easily obtainable, and relatively straightforward to process. It significantly reduces the cost of preliminary research, effectively reflects urban spatial structure, and finds wide application in the identification and analysis of urban functional areas [
11,
12,
43]. For a more accurate analysis of urban functional zoning, a need for refined research units arises. Li et al. used grid data to generate research units to identify functional areas in the central urban area of Wuhan, which is a simple and fast division method [
33]. However, urban blocks typically have irregular polygonal shapes, and using grid data may not accurately reflect the actual block situation. OSM (OpenStreetMap) road network data, with its high positioning accuracy and basic spatial information, including longitude, latitude, road types, and names, provide a more realistic division of block units, making it a valuable data source in land use analysis [
44,
45,
46]. This study, based on OSM and POI data, breaks through the limitations of traditional static data by utilizing the TF-IDF quantitative analysis method in machine learning to identify urban land use functions. This approach reduces manual intervention and provides a more scientific and objective reflection of urban land use. Comparison with previous studies reveals relatively consistent identification results for land use functions in these five cities. For example, Yu et al. identified the urban functional areas in Chengdu and observed that the primary functional zones were distributed in concentric circles centered around the city center. Residential land dominated the main urban areas, with mixed functional areas comprising 50% of the total area, aligning closely with the findings of this study [
47]; Ding et al. identified the functional areas in the central urban area of Nanjing and discovered that residential land comprised 42.3% of the study area, a slight variance from the 50.88% residential land in this study. This difference may be attributed to the broader scope of our study area, leading to a greater diversity of functional elements and a larger proportion of various land uses [
48]. Wu et al. developed an urban functional area identification model using the kernel density estimation method and delineated six single/mixed functional areas within Tianjin City. The distribution of different land types exhibits a gradually dispersing pattern from the city center to the periphery, which is largely in line with the findings of this work [
49]. In conclusion, the availability of OSM and POI data makes this method easily applicable to other cities.
  6.2. Current Situation Analysis
The study found that residential land in five cities occupies most of the major urban areas. This is because the urbanization level of these cities is relatively high, and a large number of rural populations are gathering in the central urban areas, leading to an increasing demand for urban residential land. In addition, due to historical reasons, Hangzhou, Nanjing, and Chengdu implemented the plan of evacuating residential functions to the periphery earlier in the process of urban development. Therefore, their proportion of residential land is lower than that of the central urban areas of Xi’an and Tianjin. For the element of green space and open space land, as Hangzhou is located along the southern edge of the Yangtze River Delta and the Qiantang River Basin with abundant water and forest resources, it not only has natural landscapes such as West Lake and Xixi Wetland, but its unique natural endowment also facilitates the creation and development of various green spaces and parks between cities, resulting in the highest proportion of green space and open space land. Although Xi’an, as one of the four ancient capitals of China, has cultural relics such as the Daming Palace and the Weiyang Palace, its complex terrain and variable climate result in the lowest proportion of green space and open space land.
By integrating the findings of land use function identification, it was observed that the majority of land units with low functional diversity in the study area of the five new first-tier cities are characterized by diverse urban landscapes. This can be attributed to two potential reasons: (1) These land units may contain buildings and facilities with singular functions, such as commercial, residential, or industrial areas. The lack of comprehensive planning and design results in a singular and less dynamic urban landscape, contributing to the low degree of functional diversity in these land units; (2) Constraints imposed by land ownership and planning management. Divergent interests and planning objectives among different landowners and planning managers may lead to the fragmentation and decentralization of internal functions within land units.
  6.3. Shortcomings and Prospects
First of all, there is still room for further improvement in the processing of POI data. From the initial deletion processing to the subsequent weight assignment, there are certain human influence factors. To address this problem, deep learning algorithms can be used to further reduce POI data to ensure its authenticity and objectivity and minimize the occurrence of duplicate superimposition. OSM has low density and a lack of data in suburban and rural areas, resulting in excessive division of functional units and poor local recognition results.
Secondly, when analyzing and evaluating the five new first-tier cities, there is potential to enrich the indicators used, particularly for comprehensive comparative analysis among these cities. In the future, it would be beneficial to include indicators such as location entropy and nearest-neighbor distance to further assess the current status of urban land use functions and explore their patterns. This approach will enable the formulation of more objective and comprehensive problem assessments and optimization suggestions.
Thirdly, when selecting research subjects, for a more extensive exploration of the current state of urbanization in China, it would be advantageous to encompass a broader range of cities for a more comprehensive analysis. This broader perspective will allow for an exploration of the current state of urbanization in China from a more macroscopic viewpoint.
Finally, this study identified certain similarities in their land use functions. However, due to the different policies and historical conditions implemented in different cities, it is difficult to promote this land use feature throughout China and even the world. Future research could involve a comprehensive evaluation of various land use functional layouts to determine the optimal layout for urban planning.
  7. Conclusions
Identification of central urban land use functions in the five new first-tier cities involves processing OSM and POI data for each city and utilizing the TF-IDF machine algorithm for function identification. Influenced by geographical conditions, resource endowments, and planning policies, differences in the functional layout of land use are observed among these cities. However, common characteristics among the five cities include residential land, commercial service land, public management and public service land, green land, and open land, with the total number and area of these land types accounting for over 90%.
The accuracy verification of central urban land use function identification results of the five new first-tier cities involved constructing a confusion matrix to test the accuracy of land use function identification. The Kappa coefficients, falling within the range of [0.61, 0.80], indicate a high level of consistency in accuracy evaluation. This further verifies the feasibility of urban land function identification based on multi-source big data. Most of the plot units with low functional mixing in the research area of the five new first-tier cities are distributed with various urban landscapes.
The evaluation of central urban land use in the five new first-tier cities involves assessing the present situation of urban land use function through land use mix degree and urban structure similarity degree. Chengdu and Tianjin exhibit the highest land use function mixing degree, followed by Xi’an, Nanjing, and Hangzhou. In Chengdu, Tianjin, and Xi’an, the mixing degree of five levels of land use functions shows an obvious dispersion pattern, while Nanjing and Hangzhou have high mixed land use function plots concentrated in several administrative regions. According to the calculation results of urban structure similarity, Hangzhou and Nanjing have the highest similarity of land use function structure layout, followed by Xi’an and Nanjing, with the lowest similarity observed between Chengdu and Nanjing. The similarity of land use function structure layout is higher when cities are closer in terms of the type and quantity of land use functions.