Identifying Urban Functional Areas in China’s Changchun City from Sentinel-2 Images and Social Sensing Data

: The urban functional area is critical to an understanding of the complex urban system, resource allocation, and management. However, due to urban surveys’ focus on geographic objects and the mixture of urban space, it is difﬁcult to obtain such information. The function of a place is determined by the activities that take place there. This study employed mobile phone signaling data to extract temporal features of human activities through discrete Fourier transform (DFT). Combined with the features extracted from the point of interest (POI) data and Sentinel images, the urban functional areas of Changchun City were identiﬁed using a random forest (RF) model. The results indicate that integrating features derived from remote sensing and social sensing data can effectively improve the identiﬁcation accuracy and that features derived from dynamic mobile phone signaling have a higher identiﬁcation accuracy than those derived from POI data. The human activity characteristics on weekends are more distinguishable for different functional areas than those on weekdays. The identiﬁed urban functional layout of Changchun is consistent with the actual situation. The residential functional area has the highest proportion, accounting for 33.51%, and is mainly distributed in the central area, while the industrial functional area and green-space are distributed around.


Introduction
The term "urban functional area" refers to zones divided in accordance with the dominant function [1], it is associated with land use, and its external form is closely related to its economic, social, and cultural functions [2]. Urban functional area delimitation can significantly facilitate urban research and planning [3,4]. In particular, as urbanization increases worldwide, life is changing rapidly from the traditional rural way to urbanization [5,6]. The United Nations Sustainable Development Goals (SDGs) seek to make cities inclusive and sustainable. The statements of SDG 3 (promoting good health and well-being) and 11 (sustainable cities and communities) demonstrate that people are devoted to investing efforts and resources to achieve goals, including making cities more livable, promoting well-being, and promoting sustainable development [7]. Understanding the spatial pattern of urban functional areas is conducive to better allocation of resources and improvement of the urban living environment. However, because most urban surveys focus on geographical objects rather than functional areas on a larger scale, functional area maps are difficult to obtain in most cities [8], and the complexity and integration of urban functions create great difficulties for accurate and efficient mapping of urban functional areas [9].
The remainder of the paper is organized as follows. The second section describes the study area and the used data. The third section describes the research workflow and the features used to identify urban functional areas. The fourth section compares the performances of different feature combinations, and analyzes the results of the identification. The fifth section discusses the influencing factors and uncertainty in the identification. The last section presents the research conclusions and suggestions for future work about the identification of urban functional areas.

Study Area
Changchun is the capital of Jilin Province in China. Changchun has convenient traffic conditions and obvious location advantages. Beijing-Harbin railway and most expressways and high-grade highways in Jilin Province cover Changchun. The central area of Changchun was taken as the research area (Figure 1), which is a concentrated construction area allowed by the current urban master plan, with a total planning area of 610 square kilometers, accounting for about 0.25% of the total area of Jilin province. This region has a highly concentrated population and serves as the province's economic and social hub, accounting for 12.64% of the province's population in 2015 [32]. can serve as a reference for identifying urban functional areas and as a source of basic data to understand the layout of urban functional areas in the study area. The remainder of the paper is organized as follows. The second section describes the study area and the used data. The third section describes the research workflow and the features used to identify urban functional areas. The fourth section compares the performances of different feature combinations, and analyzes the results of the identification. The fifth section discusses the influencing factors and uncertainty in the identification. The last section presents the research conclusions and suggestions for future work about the identification of urban functional areas.

Study Area
Changchun is the capital of Jilin Province in China. Changchun has convenient traffic conditions and obvious location advantages. Beijing-Harbin railway and most expressways and high-grade highways in Jilin Province cover Changchun. The central area of Changchun was taken as the research area (Figure 1), which is a concentrated construction area allowed by the current urban master plan, with a total planning area of 610 square kilometers, accounting for about 0.25% of the total area of Jilin province. This region has a highly concentrated population and serves as the province's economic and social hub, accounting for 12.64% of the province's population in 2015 [32].

Remote Sensing Image
The study selected Sentinel-2A as the remote sensing data source. The Sentinel-2A satellite was successfully launched in June 2015 with a 13-band multi-spectral instrument (MSI). The spatial resolution of four bands is 10 m and the revisit period is 10 days [33,34]. It has been widely used in urban studies [35]. The Level 1C data were downloaded from the Sentinels Scientific Data Hub [36] with an acquisition time of 27 September 2020.

Remote Sensing Image
The study selected Sentinel-2A as the remote sensing data source. The Sentinel-2A satellite was successfully launched in June 2015 with a 13-band multi-spectral instrument (MSI). The spatial resolution of four bands is 10 m and the revisit period is 10 days [33,34]. It has been widely used in urban studies [35]. The Level 1C data were downloaded from the Sentinels Scientific Data Hub [36] with an acquisition time of 27 September 2020.

Social Sensing Data
Social sensing data used in the study include POI and mobile phone signaling data. Baidu map, founded in 2005, has become the main online map system in China, providing Remote Sens. 2021, 13, 4512 4 of 15 intelligent positioning and POI retrieval services [37], with high POI coverage and a total volume of 150 million. Therefore, the Baidu Map application program interface (API) was used to obtain POI data of the research area in 2020 ( Figure 2). Social sensing data used in the study include POI and mobile phone signaling data. Baidu map, founded in 2005, has become the main online map system in China, providing intelligent positioning and POI retrieval services [37], with high POI coverage and a total volume of 150 million. Therefore, the Baidu Map application program interface (API) was used to obtain POI data of the research area in 2020 ( Figure 2). The signaling data of mobile phones were provided by China Unicom. The data period includes a complete week, from 6 July 2020 to 12 July 2020. When mobile phone users make calls, send and receive SMS, move, and switch their phones on and off, the corresponding location and time data are generated and recorded [38]. According to the geographic location information recorded in mobile phone signaling data, the signaling records were allocated to the cell tower at hourly intervals, and then mobile phone time series data were obtained. The smaller the distance between adjacent cell towers, the greater the cell tower density, and the higher the location accuracy of mobile phone signaling will be. The average value of the nearest distance between cell towers in the study area was 97 m, and 98% of the cell towers had an adjacent distance of less than 500 m ( Figure 3).

Urban Function Type Definition
Land-use type is a basic unit reflecting and bearing urban functions. Based on the land use classification standard of China, Gong, et al. [39] created the essential urban land The signaling data of mobile phones were provided by China Unicom. The data period includes a complete week, from 6 July 2020 to 12 July 2020. When mobile phone users make calls, send and receive SMS, move, and switch their phones on and off, the corresponding location and time data are generated and recorded [38]. According to the geographic location information recorded in mobile phone signaling data, the signaling records were allocated to the cell tower at hourly intervals, and then mobile phone time series data were obtained. The smaller the distance between adjacent cell towers, the greater the cell tower density, and the higher the location accuracy of mobile phone signaling will be. The average value of the nearest distance between cell towers in the study area was 97 m, and 98% of the cell towers had an adjacent distance of less than 500 m ( Figure 3). Social sensing data used in the study include POI and mobile phone signaling data. Baidu map, founded in 2005, has become the main online map system in China, providing intelligent positioning and POI retrieval services [37], with high POI coverage and a total volume of 150 million. Therefore, the Baidu Map application program interface (API) was used to obtain POI data of the research area in 2020 ( Figure 2). The signaling data of mobile phones were provided by China Unicom. The data period includes a complete week, from 6 July 2020 to 12 July 2020. When mobile phone users make calls, send and receive SMS, move, and switch their phones on and off, the corresponding location and time data are generated and recorded [38]. According to the geographic location information recorded in mobile phone signaling data, the signaling records were allocated to the cell tower at hourly intervals, and then mobile phone time series data were obtained. The smaller the distance between adjacent cell towers, the greater the cell tower density, and the higher the location accuracy of mobile phone signaling will be. The average value of the nearest distance between cell towers in the study area was 97 m, and 98% of the cell towers had an adjacent distance of less than 500 m ( Figure 3).

Urban Function Type Definition
Land-use type is a basic unit reflecting and bearing urban functions. Based on the land use classification standard of China, Gong, et al. [39] created the essential urban land

Urban Function Type Definition
Land-use type is a basic unit reflecting and bearing urban functions. Based on the land use classification standard of China, Gong, et al. [39] created the essential urban land use categories (EULUCs), which contain two classification levels. The classification system is based on China's urban land survey and management standard, which is consistent with the land use survey type conducted in the central urban area of Changchun, China. Therefore, this study divided urban functions into five types: residential, commercial, Remote Sens. 2021, 13, 4512 5 of 15 industrial, public, and greenspace according to the first-level classification system of EULUC. Green space refers to green open space and waterbody. The study area was divided into 500 m × 500 m urban cells following the method of Tu et al. [29]. Su et al. [40] deeply analyzed the impact of different sampling strategies on the classification of urban land use, and found that the preferred purity of samples was 60-90%. In order to ensure a sufficient number of available samples, functional types with an area ratio of over 60% were selected to label urban cells according to the field survey data of land types in Changchun in 2016, and their labels were verified in 2020 using Sentinel-2A image and POI data. A total of 367 samples were selected, of which 252 were used as training samples and the remaining 115 as testing samples (Table 1).

Methodology
An identification framework for urban functional areas based on remote sensing image and social sensing data was proposed. The main work included four steps. (1) The original data were pre-processed. The Sentinel-2A L1C images underwent atmospheric correction. The Baidu POI data were reclassified according to the functional area categories. The cell towers were assigned to the mobile phone signaling records to obtain spatialized mobile phone signaling time series data. (2) With the help of ENVI, ArcGIS, and Spyder software, texture analysis, spatial analysis, and discrete Fourier transform (DFT) were employed to extract features, which were assigned to urban cells. (3) By comparing the overall accuracy of repeated classification, the optimal feature combination was determined. (4) The random forest model was trained by using the samples with the optimal feature combination, and the optimal model was utilized to identify urban functional areas ( Figure 4). Examples of urban cells and related features (Experiment_dataset.csv), code for evaluating identification performance of different feature combinations (Comparison_of_classification_accuracy.py), code for parameter tuning of RF (RF_parameter_optimization.py), and optimized RF model (RF_trained.sav) are available in the Supplementary Materials Section.

Sentinel 2A Data Processing and Feature Extraction
The Level 1C product was processed with the Sen2cor v2.5.5 tool provided by the European Space Agency. Sen2Cor performs a pre-processing of Level-1C Top of Atmosphere (TOA) image data, and applies a scene classification and atmospheric correction and a subsequent conversion into an ortho-image Level-2A Bottom-Of-Atmosphere (BOA) reflectance product [41]. Typical spectral indices, such as the normalized difference vegetation index (NDVI) [42] and normalized difference built-up index (NDBI) [43], were obtained by band calculation. Gray-Level Co-occurrence Matrix was used to generate texture features of red (band 4), green (band 3), blue (band 2), near-infrared (band 8), and mid-infrared (band 11) bands, including entropy, correlation, and angular second moment. Finally, the mean values of the spectrum, spectral indices, and texture features of each band in the urban cell were obtained as features [44]. Remote Sens. 2021, 13, x FOR PEER REVIEW 6 of 16 Figure 4. The overall process of integrating remote sensing data and social sensing data to identify urban functional areas.

Sentinel 2A Data Processing and Feature Extraction
The Level 1C product was processed with the Sen2cor v2.5.5 tool provided by the European Space Agency. Sen2Cor performs a pre-processing of Level-1C Top of Atmosphere (TOA) image data, and applies a scene classification and atmospheric correction and a subsequent conversion into an ortho-image Level-2A Bottom-Of-Atmosphere (BOA) reflectance product [41]. Typical spectral indices, such as the normalized difference vegetation index (NDVI) [42] and normalized difference built-up index (NDBI) [43], were obtained by band calculation. Gray-Level Co-occurrence Matrix was used to generate texture features of red (band 4), green (band 3), blue (band 2), near-infrared (band 8), and mid-infrared (band 11) bands, including entropy, correlation, and angular second moment. Finally, the mean values of the spectrum, spectral indices, and texture features of each band in the urban cell were obtained as features [44].

POI Data Processing and Feature Extraction
Baidu Map POI data mainly serve the internet navigation map, and their classification system is different from the classification of urban functional areas. The original Baidu POI data contained 19 groups, including food, hotels, shopping, tourist attractions, education, training, etc. The rescreening was performed according to the categories of urban functional areas, and the types that are not highly related to urban functions were deleted. For instance, convenience stores, shops, and small supermarkets included in POI of shopping type do not have an independent land-use scope. A total of 30,474 POI data were obtained. Using the spatial analysis of ArcGIS 10.2, the total number and the proportion of each type of POI were obtained. The number of POI categories, total number of POIs [45], and mean kernel density of each POI category [20] in the urban cell were calculated as features.

Mobile Phone Signaling Data Processing and Feature Extraction
The profiles of mobile phone time series data exhibit periodicity, and there are obvious differences in the temporal profile between different functional areas. The shapes of change curves are different, and the time points of peaks are also different. This study assumes that similar functional areas exhibit similar temporal characteristics [3]. At 11 p.m., the value fluctuates, higher than that of the two adjacent time periods. The main reason is that applications in the mobile phone are automatically updated and produce

POI Data Processing and Feature Extraction
Baidu Map POI data mainly serve the internet navigation map, and their classification system is different from the classification of urban functional areas. The original Baidu POI data contained 19 groups, including food, hotels, shopping, tourist attractions, education, training, etc. The rescreening was performed according to the categories of urban functional areas, and the types that are not highly related to urban functions were deleted. For instance, convenience stores, shops, and small supermarkets included in POI of shopping type do not have an independent land-use scope. A total of 30,474 POI data were obtained. Using the spatial analysis of ArcGIS 10.2, the total number and the proportion of each type of POI were obtained. The number of POI categories, total number of POIs [45], and mean kernel density of each POI category [20] in the urban cell were calculated as features.

Mobile Phone Signaling Data Processing and Feature Extraction
The profiles of mobile phone time series data exhibit periodicity, and there are obvious differences in the temporal profile between different functional areas. The shapes of change curves are different, and the time points of peaks are also different. This study assumes that similar functional areas exhibit similar temporal characteristics [3]. At 11 p.m., the value fluctuates, higher than that of the two adjacent time periods. The main reason is that applications in the mobile phone are automatically updated and produce more signaling data. Considering that 23:00 to 5:00 in the next day is the time when crowd activity is at its weak intensity, the data of this time period were excluded from our analysis ( Figure 5).
The mobile phone time series data were fitted by DFT to extract temporal features. Fourier transform can suppress noise and truly reflect the periodic change of time series data. It is demonstrated that it can be effectively applied to smoothing and interpolating NDVI time series data [46][47][48], and the obtained coefficients are employed as temporal indicators [43].
Assuming that mobile phone time series data are expressed as y = (y 1 , y 2 , y 3 , · · · , y n ), they can be decomposed into a set of trigonometric forms: where a 0 is the mean, and a j and b j are the jth order cosine trigonometric function and sine trigonometric function, where each order represents a harmonic. w j is the frequency of the trigonometric function. According to the change frequency of the time profile every day, w j was set as 2jπ/n. N was set to equal 5 and 2, respectively, to fit the mobile phone time series data of weekdays and weekends.
Remote Sens. 2021, 13, x FOR PEER REVIEW 7 of 16 more signaling data. Considering that 23:00 to 5:00 in the next day is the time when crowd activity is at its weak intensity, the data of this time period were excluded from our analysis ( Figure 5). The mobile phone time series data were fitted by DFT to extract temporal features. Fourier transform can suppress noise and truly reflect the periodic change of time series data. It is demonstrated that it can be effectively applied to smoothing and interpolating NDVI time series data [46][47][48], and the obtained coefficients are employed as temporal indicators [43].
Assuming that mobile phone time series data are expressed as = ( , , , ⋯ , ), they can be decomposed into a set of trigonometric forms: where is the mean, and and are the th order cosine trigonometric function and sine trigonometric function, where each order represents a harmonic.
is the frequency of the trigonometric function. According to the change frequency of the time profile every day, was set as 2 / . N was set to equal 5 and 2, respectively, to fit the mobile phone time series data of weekdays and weekends.
According to Sentinel-2A, Baidu POI, and mobile phone signaling data, 39 features were extracted to describe urban cells (Table 2).  According to Sentinel-2A, Baidu POI, and mobile phone signaling data, 39 features were extracted to describe urban cells (Table 2).

Prediction Model
In this study, the random forest model was employed as the prediction model, which is insensitive to multicollinearity and robust and has a good prediction effect for various types of observation data [49,50]. Random forest is a decision tree model based on a bagging framework and consists of multiple cart trees. To generate an independent tree, samples and features are randomly selected, which can prevent overfitting. The classification problem is determined by the category with the most output times of individual trees [51].
Real-world datasets are usually composed of many general data samples and a few abnormal but important samples [52]. This study collected 162 samples of residential function areas and only 17 samples of commercial function areas. Class imbalanced data is advantageous to the majority class, and the cost of classifying an abnormal example as a normal example is usually much higher than that of a reverse error [52]. The Synthetic Minority Oversampling Technique (SMOTE) was used to tackle class-imbalance problems at the data level [53]. The basic idea of the SMOTE algorithm is to analyze minority samples and synthesize new samples according to minority samples and add them to the dataset, rather than replicating minority samples. SMOTE and random forest were implemented by the Scikit-Learn package, a free machine learning library for Python programming language.
The number of features (max_features) and decision trees (n_estimators) used to generate decision trees are two main parameters affecting the accuracy of RF classification [35,54].
In the process of model parameter optimization, the 'GridSearchCV' method was used for determining the model parameter combination with the highest accuracy in the validation set through cross-validation. The importance of a specific feature is determined by the sum of impurity reduction degrees of branch nodes formed by it in each RF tree [55]. To compare the prediction accuracy of each combination and optimize the random forest parameters, the K-fold stratified cross-validation method was used, and the F1_macro index was utilized to evaluate the model performance. F1_ Macro was obtained by calculating the mean of F1 of all categories (the weight of each category is the same). F1 measurement is the harmonic average of precision and recall. For a specific category C, F1 is defined as follows [56]: where TP is the number of samples correctly classified as C, FP is the number of samples wrongly classified as C, and FN is the number of samples belonging to C but wrongly classified.

The Overall Identification Performance of Different Combinations
Seven combinations were set up (Table 3) to compare the classification accuracies of random forest under various feature combinations. The F1_macro scores were calculated by random forest with three-fold stratified cross validation, and the process was repeated 50 times, yielding 150 F1_macro values for each combination. The classification accuracy based on single-source features is less than that based on the combination of different source features. For the classification using single-source features, features derived from Sentinel-2A data had the highest classification accuracy, with an average of 79.62%, while features derived from POI data had the lowest classification accuracy, with an average of 74.83%. The accuracy was the lowest for combining multiple source features when only based on features derived from social sensing data. The combination of the Sentinel-2A, POI, and mobile phone signaling data features exhibited the highest classification accuracy (88.06%) (Figure 6). the combination of different source features. For the classification using single-source features, features derived from Sentinel-2A data had the highest classification accuracy, with an average of 79.62%, while features derived from POI data had the lowest classification accuracy, with an average of 74.83%. The accuracy was the lowest for combining multiple source features when only based on features derived from social sensing data. The combination of the Sentinel-2A, POI, and mobile phone signaling data features exhibited the highest classification accuracy (88.06%) ( Figure 6).

Parameter Tuning of Random Forest
When using features of three data sources concurrently, the classification accuracy was the highest. Based on this understanding, the "GridSearchCV" method was used to find the model with the highest classification accuracy, and the corresponding model parameters 'max_features' and 'n_estimators' were 9 and 38, respectively, and the 'F1_macro' value was 0.88 (Figure 7).

Parameter Tuning of Random Forest
When using features of three data sources concurrently, the classification accuracy was the highest. Based on this understanding, the "GridSearchCV" method was used to find the model with the highest classification accuracy, and the corresponding model parameters 'max_features' and 'n_estimators' were 9 and 38, respectively, and the 'F1_macro' value was 0.88 (Figure 7). The importance scores of features derived from Sentinel-2A images, mobile phon signaling data, and POI data respectively accounted for 45%, 34%, and 22% of the tota scores. The features derived from the Sentinel-2A blue band scored high in importance while the spectral mean of the blue band (B2_mean) was in the first position. The textur features, "B2_ent" and "B2_asm", were in the fifth and eighth places, respectively, and the scores of the above three features were in the top three of all derived features of Sen tinel-2A. The score of the signaling-derived features of mobile phones ranked second and third. For different time periods of mobile phone signaling-derived features, three of th top five features came from weekend mobile phone signaling data, ranking first and sec The importance scores of features derived from Sentinel-2A images, mobile phone signaling data, and POI data respectively accounted for 45%, 34%, and 22% of the total scores. The features derived from the Sentinel-2A blue band scored high in importance, while the spectral mean of the blue band (B2_mean) was in the first position. The texture features, "B2_ent" and "B2_asm", were in the fifth and eighth places, respectively, and the scores of the above three features were in the top three of all derived features of Sentinel-2A. The score of the signaling-derived features of mobile phones ranked second and third. For different time periods of mobile phone signaling-derived features, three of the top five features came from weekend mobile phone signaling data, ranking first and second (Figure 8).
The importance scores of features derived from Sentinel-2A images, mobile phone signaling data, and POI data respectively accounted for 45%, 34%, and 22% of the total scores. The features derived from the Sentinel-2A blue band scored high in importance, while the spectral mean of the blue band (B2_mean) was in the first position. The texture features, "B2_ent" and "B2_asm", were in the fifth and eighth places, respectively, and the scores of the above three features were in the top three of all derived features of Sentinel-2A. The score of the signaling-derived features of mobile phones ranked second and third. For different time periods of mobile phone signaling-derived features, three of the top five features came from weekend mobile phone signaling data, ranking first and second ( Figure 8). The obtained model was applied to the testing samples to evaluate its generalization ability to a dataset independent of the training data, resulting in an F1_macro value of 74.69%. Regarding the F1 value of each category, the classification accuracy of green space, residence, and industry were high, reaching 86.49%, 81.25%, and 74.51%, respectively. The recall value of public function was 83.33%, while other categories were misclassified as a public function, resulting in an accuracy rate of 52.63%. Only one industrial function sample was wrongly classified as commercial function, whereas four commercial function samples were wrongly classified as public and residential functions (Table 4). The obtained model was applied to the testing samples to evaluate its generalization ability to a dataset independent of the training data, resulting in an F1_macro value of 74.69%. Regarding the F1 value of each category, the classification accuracy of green space, residence, and industry were high, reaching 86.49%, 81.25%, and 74.51%, respectively. The recall value of public function was 83.33%, while other categories were misclassified as a public function, resulting in an accuracy rate of 52.63%. Only one industrial function sample was wrongly classified as commercial function, whereas four commercial function samples were wrongly classified as public and residential functions (Table 4).

Spatial Layout of Urban Functional Areas in Changchun
According to the optimized model, the urban function of each grid in the central urban area was identified. The residential, commercial, industrial, public, and green space functional area accounted for 33.51% (204.42 km 2 ), 1.97% (12.01 km 2 ), 26.46% (161.38 km 2 ), 10.42% (63.55 km 2 ), and 27.65% (168.64 km 2 ) of the whole central urban area, respectively. The identified results were consistent with the actual urban function layout of Changchun City. Specifically, the residential function area of Changchun City was mainly located in the central region, and the industrial area and green space were distributed in the surrounding areas. The ecological environment in the southeast of the city was favorable due to a more concentrated distribution of green space (Figure 9). The largest area of the south and southeast regions where the public land was concentrated was university campuses (Figure 9). the central region, and the industrial area and green space were distributed in the surrounding areas. The ecological environment in the southeast of the city was favorable due to a more concentrated distribution of green space (Figure 9). The largest area of the south and southeast regions where the public land was concentrated was university campuses (Figure 9).

Discussion
This study evaluated the performance of different source feature combinations in identifying urban functions and predicted the urban function layout of the central urban

Discussion
This study evaluated the performance of different source feature combinations in identifying urban functions and predicted the urban function layout of the central urban area of China's Changchun City. It was found that integrating remote sensing data with social sensing data can significantly improve the identification accuracy, and the accuracy of all features reached 88.06%, which is higher than that of single-source features. The proportion of residential functional areas is the largest in the city, whereas the proportion of green space and industrial functional areas exceeds 25%. Green space functional areas are concentrated around the city, resulting in unequal access to green space for residents in different locations. According to studies, cities in northeast China have a greater supply of green space but are less accessible than those in southern China [57,58]. In conjunction with residential layout, it is meaningful to evaluate spatial disparities in access to urban green space and provide a reference for planning and policy intervention closely related to a location to make the supply of urban green space more equitable and inclusive [57].
The comparison of the prediction accuracy of different features and analysis of the feature importance of the optimal model reveal that the derived features of remote sensing data perform the best, while the derived features of POI data perform the worst. POIs of the same type may be located in different areas and support different functions. For instance, restaurants exist in residential, commercial, and industrial areas [10]. At present, POI data does not include scale information, such as floor area and building area. In addition, different types of points are not balanced, resulting in redundant POI points for the same residential area.
Fourier transform was employed to fit the change curves of mobile phone signaling time series on weekdays and weekends. The obtained features can distinguish between change characteristics of different functional areas, with a prediction accuracy of 79.25%. In addition, derivative characteristics of mobile phone signaling during the weekend are found to be more important than those during the weekday period in the prediction model, indicating that derivative characteristics of the weekend can better represent the population activity rules of various functional areas. The activities of residents vary significantly between weekdays and weekends [59]. For example, people are more likely to go to shopping malls or public cultural places, such as libraries and theatres, during weekends.
The verification of the testing samples reveals a significant risk of public and commercial functions being misclassified. Theoretically, the characteristics of crowd activity in various functional areas are highly differentiated. However, the scope of urban functional areas varies. Commercial and public functional areas are often small in area or attached to other functional buildings [3], whereas the space of original mobile signaling data used in the study is represented by the location of cell towers. However, the distribution of mobile networks is usually uneven, which is related to the population density and traffic facility [26,60]. In general, the cell towers in the central area of the city are densely distributed, while the cell towers in the urban fringe are sparse. The representativeness of the POI function is related to the density of its points. For the industrial functional area, a plant with a large area may be divided into two urban cells, but only one POI represents the location of the plant, resulting in no direct social sensing data to describe its characteristics. It is necessary to determine a reasonable aggregation scale for social sensing data and compare the results with survey data, particularly for tasks of determining the absolute number, such as the O/D volume in a traffic survey [27]. Remote sensing data were consistent in space and possessed relative advantages. However, by constructing smart cities, more accurate data of location information can be used to perceive human activities, such as individual Global Positioning System (GPS) trajectory data or indoor positioning data, and the combination of remote sensing and social sensing data can describe urban function at a fine-grained spatial resolution, as well as the relationships between them. The populations of different ages and occupations in the city have different activity characteristics. The social characteristics of mobile phone users will help to improve the recognition accuracy. The population activity characteristics in urban fringe and central areas are also different.
Indicators representing the distance from the urban cell to the urban center (single center or multi center) could be introduced to enrich the features of the urban cell in future research. Additional methods are required to fully investigate the temporal dynamic information of social sensing data. For instance, social sensing data with spatial and temporal dimension information can be regarded as multispectral remote sensing data, allowing for an evaluation of the effectiveness of applying remote sensing image processing methods.

Conclusions
The urban functional area is an important basis for considering urban resource allocation and management. Most urban land surveys focus on geographical objects rather than urban functional areas, and due to the complexity of urban systems, most cities lack data on urban functional areas. In this study, social sensing data were used in conjunction with remote sensing data to reflect social characteristics, in which DFT change was employed to extract temporal characteristics from mobile phone signaling data, and then the classification accuracies of different feature combinations were compared using the random forest model. Finally, the optimized RF model was used to map the layout of functional areas of Changchun. The findings indicate that integrating remote sensing and social sensing data can significantly improve the identification accuracy of urban functional areas. The feature importance hierarchy for the optimized model was as follows: features from remote sensing data, mobile signaling data, and POI data. For different functional areas, dynamic characteristics of the weekend were more distinguishable than those of the working day period. Residential areas represent the highest proportion (33.51%) in Changchun, which are mainly distributed in the central area, while industrial areas and green spaces are mainly distributed around.
The combination of social sensing data and remote sensing data achieved high recognition accuracy of urban functional areas. Urban area is a complex system with high heterogeneity. Some issues still need to be further discussed in the future, such as how to obtain social sensing data considering population characteristics, to describe spatial proximity characteristics, and to make up for the spatial sparsity of social sensing data.