Soft Sensor with Deep Learning for Functional Region Detection in Urban Environments

The rapid development of urbanization has increased traffic pressure and made the identification of urban functional regions a popular research topic. Some studies have used point of interest (POI) data and smart card data (SCD) to conduct subway station classifications; however, the unity of both the model and the dataset limits the prediction results. This paper not only uses SCD and POI data, but also adds Online to Offline (OTO) e-commerce platform data, an application that provides customers with information about different businesses, like the location, the score, the comments, and so on. In this paper, these data are combined to and used to analyze each subway station, considering the diversity of data, and obtain a passenger flow feature map of different stations, the number of different types of POIs within 800 m, and the situation of surrounding OTO stores. This paper proposes a two-stage framework, to identify the functional region of subway stations. In the passenger flow stage, the SCD feature is extracted and converted to a feature map, and a ResNet model is used to get the output of stage 1. In the built environment stage, the POI and OTO features are extracted, and a deep neural network with stacked autoencoders (SAE–DNN) model is used to get the output of stage 2. Finally, the outputs of the two stages are connected and a SoftMax function is used to make the final identification of functional region. We performed experimental testing, and our experimental results show that the framework exhibits good performance and has a certain reference value in the planning of subway stations and their surroundings, contributing to the construction of smart cities.


Introduction
The increasing population in urban areas has led residents to demand more for daily life and travel. To meet the needs of residents and build better smart cities, governments need to use the Internet of Things and communication technologies to obtain real-time data for further decision-making and planning [1,2]. One of the most important problems is identifying urban functional regions. Urban functional regions were first proposed in the Athens Charter, which claims that planners should address four types of city areas: the residential region, the work region, the recreation region, and the transportation region. With the development of cities around the world, other functional regions have emerged that make the urban spatial structure more complicated, and these new functional regions vary with the specific features of each city [3,4]. Urban functional regions can be defined by some types of activities or spatial interactions that may occur in a region [3]. In urban areas, one of the fastest growing means of transportation are subways. The rapid development of the subway has provided residents with great convenience. Due to the diversity of individual purposes and preferences, different subway stations have gradually reflected their unique functions [5,6], and the same station functional groups show similar rules [7]. The accuracy of the classification results for subway stations is closely

Image Recognition
Remote sensing images are widely used to help identify land use and urban functional regions [22,23]. Remote sensing techniques are more effective in identifying the physical features of the land, such as land surface features, but they do not have many abilities to identify human social activities [24][25][26][27]. The division of urban functional areas must consider actual human activities more, which will change over time and cannot be fixed [28], instead of only relying on the land surface features or the plans proposed by the government. Hu et al. [29] noted that urban areas of interest (AOIs) are related to people in the urban environment. These areas are closely related to humans, and people often visit these places. For example, social media data, such as pictures, can record human connections to these environments. These AOIs can be defined by human activities, providing useful information for urban planners and decision makers. However, this method also has limitations, such as failure to record all types of regions. Although current studies can identify more regions by combining high-level semantic information, it is still difficult to identify different human social activities [30,31].

Geodemographic Classification
Compared to remote sensing, geodemographics can be used to describe and analyze individuals by where they live, and geodemographic classification is a spatially explicit classification of socioeconomic data, contributing to describing the sociodemographic structure of urban regions [32,33]. Although there are still some problems with geodemographic classification, for example, a few dimensions cannot provide enough information to fully describe an area, this has still become a mainstream method [34]. Using this method can identify many functional areas, such as education, retail, medical, and work zones [35][36][37]. Martin et al. [38] used geographic areas to solve the problems derived from workplace population data. Many related studies have shown that this type of classification method is efficient, although there is still considerable opportunity for improvement [39].

Big Data and Smart Card Data
In more recent studies, POI data and SCD have been combined for research. Long et al. [40] converted the bus credit card data into two-dimensional time series data for each bus station's flow and constructed a city functional area recognition model based on the bus credit card data and POI data. Liu et al. [41] proposed a method to automatically identify and characterize parcels using OpenStreetMap (OSM) and points of interest (POI) data. Wang et al. [42] presented a new model integrating geographic information systems (GIS) with artificial neural networks (ANNs) to predict multiple transitions among land use types and urban subclasses. Bao et al. [43] first used k-means clustering to divide bicycle sharing stations into five categories based on surrounding POIs and then combined them with SCD for Latent Dirichlet Allocation (LDA) analysis to infer the purpose of cycling. Wang et al. [44] proposed the semantic framework of IS2Fun by using doc2vec to derive the relationship between the travelers and the stations, to infer the function of the subway stations. Tang et al. [45] used SCD to derive passenger travel patterns and relied on the LDA model with POI data to separately obtain mobile semantics and location semantics. After standardization and other processes, the improved k-means algorithm was used to cluster and obtain the function of the subway stations. Zhao et al. [46] developed an identification of land-use characteristics using bicycle sharing data.
SCD includes too much information, but previous studies lack a methods such as a feature maps to integrate them together and display them clearly and easily. In addition, there are many studies that have combined more data or innovative new methods for exploration. For instance, Bao et al. [47] not only used bike-sharing trip data and POI data but also added bicycle infrastructure data, weather data, and sociodemographic characteristics. Some even combined land data with spatial information [48][49][50]. Zhai et al. [51] proposed the method of place2vec. However, in previous studies, POI data and SCD were basically used separately, and they were not input to a model together for calculation. Some research failed to achieve data diversity, and they rarely combined e-commerce platform data. For the analysis, POI data and SCD can help to divide the functional area, but the accuracy will be weakened.
Over time, a considerable amount of research has focused on land use in cities or the division of cities into functional areas. The methods used in this research have been changing. Image recognition was used mainly in the early stages to identify urban functional areas; however, it is difficult to finish complex identification tasks. Later, geodemographic classification became popular. This method combines more information, but it still has considerable potential to be improved. Recently, research has used big data and SCD, primarily due to comprehensiveness and timeliness. Nonetheless, the diversity of data needs to be considered more carefully so that accuracy can be increased.

Method
As shown in Figure 1, we propose a two-stages framework, to identify the functional region of subway stations. In the passenger flow stage, the SCD feature is extracted and converted to a feature map, and a ResNet model is used to get the output of stage 1. In the built environment stage, the POI and OTO feature are extracted, and a deep neural network with stacked autoencoders (SAE-DNN) model is used to get the output of stage 2. Final we connect the outputs of two stages and use a SoftMax function to make the final identification of functional region.
Sensors 2020, 20, x FOR PEER REVIEW 4 of 17 model together for calculation. Some research failed to achieve data diversity, and they rarely combined e-commerce platform data. For the analysis, POI data and SCD can help to divide the functional area, but the accuracy will be weakened. Over time, a considerable amount of research has focused on land use in cities or the division of cities into functional areas. The methods used in this research have been changing. Image recognition was used mainly in the early stages to identify urban functional areas; however, it is difficult to finish complex identification tasks. Later, geodemographic classification became popular. This method combines more information, but it still has considerable potential to be improved. Recently, research has used big data and SCD, primarily due to comprehensiveness and timeliness. Nonetheless, the diversity of data needs to be considered more carefully so that accuracy can be increased.

Method
As shown in Figure 1, we propose a two-stages framework, to identify the functional region of subway stations. In the passenger flow stage, the SCD feature is extracted and converted to a feature map, and a ResNet model is used to get the output of stage 1. In the built environment stage, the POI and OTO feature are extracted, and a deep neural network with stacked autoencoders (SAE-DNN) model is used to get the output of stage 2. Final we connect the outputs of two stages and use a SoftMax function to make the final identification of functional region.

SCD Processing
Several researchers have applied SCD to identify the functional region, but their methods chose some indicators of passengers and missed some high-level features of passengers. To obtain deep level features, this paper presents a method to create a passenger feature map which retains all the SCD records without indicators. To explore the rules of passenger flow at each station, taking a single station as an example, the SCD collected is counted by the hour, and the total number of people entering and leaving the station in each hour interval is recorded separately. Then, the changes in passenger flow at each station can be observed at different time periods. Weekdays and weekends are separated to count the number of people entering and leaving the station at different intervals on different days because the characteristics of passenger flow at some stations differ greatly on weekdays and weekends.
Assume i is the index of entering station, j is index of the exiting station and the values of i and j are 1 to 358 (Beijing Subway System has 358 stations). The i-th station weekends' inbound feature matrix A (i) and the i-th station weekends' inbound feature matrix M (i) are as follows:

SCD Processing
Several researchers have applied SCD to identify the functional region, but their methods chose some indicators of passengers and missed some high-level features of passengers. To obtain deep level features, this paper presents a method to create a passenger feature map which retains all the SCD records without indicators. To explore the rules of passenger flow at each station, taking a single station as an example, the SCD collected is counted by the hour, and the total number of people entering and leaving the station in each hour interval is recorded separately. Then, the changes in passenger flow at each station can be observed at different time periods. Weekdays and weekends are separated to count the number of people entering and leaving the station at different intervals on different days because the characteristics of passenger flow at some stations differ greatly on weekdays and weekends.
Assume i is the index of entering station, j is index of the exiting station and the values of i and j are 1 to 358 (Beijing Subway System has 358 stations). The i-th station weekends' inbound feature matrix A (i) and the i-th station weekends' inbound feature matrix M (i) are as follows: where a (i, j) represents the vector of the total number of people coming in from the i-th subway station and leaving from the j-th subway station on weekdays.
where C (i, j, k) represents the total number of people who enter from the i-th subway station and exit from the j-th subway station in the k-th period on weekdays. Where k represents time intervals, with an interval of one hour, and the value of k ranges from 0 to 23. m(i, j) is similar to a (i, j), but it represents the vector of the total number of people coming in from the i-th subway station and leaving from the j-th subway station at different time periods on weekends.
represents the total number of people who enter from the i-th subway station and exit from the j-th subway station in the k-th period on weekends.
In addition to recording the number of people entering a specific station, it is also necessary to know the number of people leaving the station. The i-th station weekdays' outbound feature matrix B(i) and the i-th station weekends' outbound feature matrix N (i) are as follows: where b(i, j) represents the vector of the total number of people coming in from the j-th subway station and leaving from the i-th subway station on weekdays.
where D (i, j, k) represents the total number of people who enter from the j-th subway station and exit from the i-th subway station in the k-th period on weekdays. And n (i, j) is similar to b(i, j), but it indicates the vector of the total number of people coming in from the j-th subway station and leaving from the i-th subway station at different time periods on weekends.
where Q(i, j, k) represents the total number of people who enter from the j-th subway station and exit from the i-th subway station in the k-th period on weekends.
After the above steps, four matrixes are obtained: weekday station-entering statistics, weekday station-exiting statistics, weekend station-entering statistics, and weekend station-exiting statistics.
R (i) is the input. To train the features in a convolutional neural network (CNN)-based model, assuming that the feature map of site i is R (i), the composition of the feature map is shown below: The passenger flow feature map R(i) is a superposition of four matrices, forming a 3D vector ( Figure 2).

ResNet Model
To identify the travel pattern of passenger flow, R(i) is used as an input of the identification model, as: where e denotes to a machine learning identification model. In this section, due to the good learning ability of the residual neural network (ResNet) [52][53][54], it is used to construct the identification model. ResNet is a special CNN model which make use of very deep hidden layers by using residual units (ResUnits), as where is the input of the l-th ResUnit, denotes to the output of the l-th ResUnit and ( ) represents to the residual function. As shown in Figure 3, 14 layers are made up of 6 ResUnits, an input convolution layer, and an output convolution layer to construct the prediction model. In each ResUnit, we use 64 filters of 3 × 3 with zero-padding to stack two convolution layers. 64 filters are used in the input layer while one filter is utilized in the output layer. Consequently, we get an output as .

Built Environment Stage
In the built environment stage, we introduce two kind of data, including POI and OTO; which could reflect the station feature in different dimensions. POI includes the static information of

ResNet Model
To identify the travel pattern of passenger flow, R(i) is used as an input of the identification model, as: where e denotes to a machine learning identification model. In this section, due to the good learning ability of the residual neural network (ResNet) [52][53][54], it is used to construct the identification model. ResNet is a special CNN model which make use of very deep hidden layers by using residual units (ResUnits), as where X l is the input of the l-th ResUnit, X l+1 denotes to the output of the l-th ResUnit and F(·) represents to the residual function. As shown in Figure 3, 14 layers are made up of 6 ResUnits, an input convolution layer, and an output convolution layer to construct the prediction model. In each ResUnit, we use 64 filters of 3 × 3 with zero-padding to stack two convolution layers. 64 filters are used in the input layer while one filter is utilized in the output layer. Consequently, we get an output as y 1 i .

ResNet Model
To identify the travel pattern of passenger flow, R(i) is used as an input of the identification model, as: where e denotes to a machine learning identification model. In this section, due to the good learning ability of the residual neural network (ResNet) [52][53][54], it is used to construct the identification model. ResNet is a special CNN model which make use of very deep hidden layers by using residual units (ResUnits), as where is the input of the l-th ResUnit, denotes to the output of the l-th ResUnit and ( ) represents to the residual function. As shown in Figure 3, 14 layers are made up of 6 ResUnits, an input convolution layer, and an output convolution layer to construct the prediction model. In each ResUnit, we use 64 filters of 3 × 3 with zero-padding to stack two convolution layers. 64 filters are used in the input layer while one filter is utilized in the output layer. Consequently, we get an output as .

Built Environment Stage
In the built environment stage, we introduce two kind of data, including POI and OTO; which could reflect the station feature in different dimensions. POI includes the static information of buildings, and OTO includes the dynamic information of passenger preference.

Built Environment Stage
In the built environment stage, we introduce two kind of data, including POI and OTO; which could reflect the station feature in different dimensions. POI includes the static information of buildings, and OTO includes the dynamic information of passenger preference.

Point of Interest Data Processing
The POI data are processed by using the latitude and longitude coordinates of both subway stations and all POI points in Beijing to calculate the number of POIs around different subway stations within a certain range and how many POIs each category has. Based on Cervero et al., Kuby et al., and Zhao et al. [55][56][57], for this study we chose to set the range as 800 m. Using the latitude and longitude distance formula, the distance between each POI point and subway station can be calculated. Taking each station as an example, only the businesses that are less than or equal to 800 m away from the station were considered, and the number was determined according to the category to which these businesses belong. Thus, the classification of different types of businesses around each station can be known. The i-th stations' POI feature vector is defined as: where S (i, j) represents the total number of j-th category of the i-th subway station.

Online to Offline (OTO) Data Processing
OTO data can help in understanding the distribution of different types of businesses around subway stations and the overall satisfaction with these businesses. Taking a single subway station as an example, the process is as follows: count the number of businesses at different categories around this subway station first, then calculate an average of the ratings of the businesses in each category, and finally calculate a comprehensive average score for this subway station according to the weight of different categories. Assume i is a subway station, m is the category, v(i) = (V (i,1), V (i, 2), . . . , V (i, 8)), where V (i, j) represents the percentage of the m category of the i-th subway station. g(i) = (G (i,1), G(i, 2), . . . ,G(i, j)), where G (i, j) represents the average score of j-th categories of the i-th subway station, and H (i) represents the comprehensive score of the i-th subway station. The formulas are as follows: The vector matrix Z of the final OTO data is of the form:

SAE-DNN Model
For suspected anomaly recognition, a deep neutral network with stacked autoencoders (SAE-DNN) model was built in this study. We conducted feature extraction about the feature vector in the SAE stage, and input the trained weight into the DNN, contributing to the high accuracy of suspected anomaly detection.
In unsupervised learning of efficient coding, an autoencoder, which is an artificial neural network, is used [58]. In order to solve the problem that "back propagation without a teacher", Hinton and the PDP group became the first people to introduce the concept of the autoencoder [59]. Otherwise, the input data was used as the teacher [60]. Lately, autoencoders have played a pretty important role in learning generative models of data [61].
It is typically made up of three parts. The first part is an input layer, containing the input vector X, where X = {W i , Z i }. The second part is a set of hidden layers, containing the transformed feature vector H SAE , which is defined as an encoder shown in Equation (11). The final part is an output layer, including the reconstruction vector R SAE , which is defined as a decoder shown in Equation (12).
δ and δ represent the linear and weighted combinations shown in Equations (12) and (13). The output vectors should match the input vectors, which have the same dimensions and values as the input vectors. T(·) denotes to the activation function. Tanh and rectifier are applied as the activation functions displayed in Equations (15) and (16), respectively [62].
Recti f er(α) = max(0, α), where input vector X represents a set of training datasets {x 1 , x 2 , x 3 , . . . ,x n }; H SAE is a set of encoders {h 1 , h 2 , h 3 , . . . ,h n }; R denotes a set of reconstruction results {r 1 , r 2 , r 3 , . . . ,r n }; f(X) is the encoder function with weight (w i ) and bias (b i ); g(H) represents the decoder function with weight ( w i ) and bias (b i ); and α denotes δ or δ. The function of Equation (17) is minimizing the reconstruction error between the input vector X and the reconstruction vector R SAE . L X, R SAE = minL X, RR SAE , The reason why we fine-tuned the parameters of w i and b i , shown in Equations (18) and (19) to minimize the loss function L(X, R SAE ), is that we want to match the reconstruction results R and the input vector X.
where w i and w i denote to the original weight and updated weight for the i-th node in each hidden layer; bi and bi denote to the original bias and updated bias for the i-th node in each hidden layer; and η is the learning rate. Hence, we obtain an output as y 2 i = R SAE .

Final Prediction
We connected the output of the above two stages, y i is the connection result, Then we used the SoftMax function to get the identification result, that is, the exponential normalization function, which is used in the multiclassification process. It maps the output of multiple neurons into the (0, 1) interval, which can be understood as a probability for multiple classifications.

Parameters Learning
The overall framework is to solve an optimization problem, the decision variables are the parameters of 2 stages, objective function is the mean squared error (MSE) of predicted value, as: where y i denotes the real value of the outbound flow of target station, andŷ i denotes the predicted value of the outbound flow of target station. And θ denote to the parameters of 2 stages, which can be learned through the Adam optimizer via backpropagation.

Study Area
As the capital of China, Beijing is the national political and cultural center. It consists of 16 municipal districts and covers an area of 16,412 km 2 . As of 2016, the total population of Beijing reached 21.729 million. Beijing's development prospects are becoming increasingly better, and numerous job opportunities and platforms constantly attract outsiders. As a result, Beijing's resident population is also increasing. To better deal with the thriving development of Beijing and the rising demand for tourism, the local government officially opened and operated subway lines on 15 January, 1971, and Beijing became the first city in China to open a subway. An increasing number of subway lines have opened since, making it convenient for residents to travel and reducing traffic pressure. At the end of July 2017, there were 20 Beijing subway lines under construction, totaling 354.8 km 2 . It is expected that by 2020, the Beijing metro will form a rail network with nearly 30 lines and a total length of more than 1000 m. By the end of January 2020, the Beijing metro had a total of more than 380 stations serving the core areas, connecting the city center and suburbs (Figure 4). It is expected that by 2020, the Beijing metro will form a rail network with nearly 30 lines and a total length of more than 1000 m. By the end of January 2020, the Beijing metro had a total of more than 380 stations serving the core areas, connecting the city center and suburbs (Figure 4).

Smart Card (SCD)
Smart card refers to the plastic card with embedded microchip which people swipe to get in and out of the subway station. Smart card data can provide information about the riding time and travel path of a passenger. The SCD record is derived from the Automatic Fare Collection System (AFC) of the Beijing metro and covers all the credit card data of the Beijing metro for up to six months. However, the specific private information of each card is not available. This paper uses a full half-year SCD (from November 2017 to April 2018), approximately 8 million per day, excluding the problematic data of credit card records caused by subway equipment failures, such as an outbound station time earlier than entering station time. After filtering, these data serve as the data source for this paper.

Smart Card (SCD)
Smart card refers to the plastic card with embedded microchip which people swipe to get in and out of the subway station. Smart card data can provide information about the riding time and travel path of a passenger. The SCD record is derived from the Automatic Fare Collection System (AFC) of the Beijing metro and covers all the credit card data of the Beijing metro for up to six months. However, the specific private information of each card is not available. This paper uses a full half-year SCD (from November 2017 to April 2018), approximately 8 million per day, excluding the problematic data of credit card records caused by subway equipment failures, such as an outbound station time earlier than entering station time. After filtering, these data serve as the data source for this paper.

Points of Interest (POI)
The POI data for this research were downloaded from the Gaode Map (https://www.amap.com/) application programming interface (API). All 2019 POIs located in Beijing are collected, and all building information is divided into nineteen categories, including name, address, company, shopping, organization, infrastructure, construction and real estate, transportation, education, hotel, tourism service, food, car, life service, cultural venue, entertainment, medical care, banking and finance, sports and others. Each POI corresponds to a latitude and longitude coordinate, a large classification, and a small classification. Additionally, this paper also collects the latitude and longitude coordinates of all subway stations, for a total of 358.

Online to Offline (OTO): Meituan
In addition to the subway SCD and POI data, this paper also uses crawler software to collect Meituan data as the OTO data. Meituan is a platform that contains information about various businesses. People can see the basic introductions related to this business, and they can rate on Meituan as well as fill out a public evaluation based on their service experience. On the Meituan page, different subway station names are entered, the crawler jumps to the page, and the first 31 pages of the business are selected, which is approximately 1000 records. The information contains the store names, ratings, and number of reviews of these businesses. To eliminate some duplicate business information, deduplication processing is carried out to obtain the final data. These stores have a total of 18 categories: online shopping, local shopping, home decoration, pet, wedding, cate, parenting, travel, training, entertainment, cinema, sports, life, health and beauty, food, hotel, real estate, and medical care. Some of them have similarities, so, after reorganization, they are divided into the following 8 categories: shopping, family, entertainment, life, food, hotel, real estate, and medical care.

Ground Truth
This paper refers to Beijing's Overall Urban Plan (2016-2035) issued by The People's Government of Beijing Municipality [63]. Two kinds of classifications are used to test the model. jobs-housing based classification classifies all subway stations into four categories: residential, work, hybrid, and transportation. In addition, multi daily activities-based classification classifies all subway stations into eight categories: science education, health, leisure entertainment, sport, greenbelt, residential regions, work residential regions, and transport residential regions.

Baseline Method
The framework we proposed is compared with a variety of competing methods grouped into the following categories: The classical methods, liner regression (LR), random forest regression (RFR), and support vector regression (SVR). Deep learning methods, including deep neural networks (DNNs), and Conv based methods, convolutional neural networks (CNNs), ConvLSTM and ConvGRU.

Evaluation Metrics
The mean square error (MSE) and R-square are chosen to represent the difference between the test value of subway station functional region and output value from the proposed framework.
where y i andŷ i are the available ground truths and the corresponding predicted values, respectively; T is the number of all available ground truths, y i denotes to the average value of y i .

Comparison with Baseline Methods
We compare the overall performance between SAE-DNNs + ResNet and 19 baseline methods. As shown in Table 1, SAE-DNNs + ResNet achieves the lowest RMSE (40.25), and the highest R 2 (0.86) among all the methods, which is 18.82% (MSE), and 3.61% (R 2 ) relative improvement over the best performance among baseline methods. More specifically, methods including classical methods perform poorly, as they purely mine deep level features. SAE-DNN and ResNet further capture deep level features, and thus achieve better performance. The bold denotes to the best performance.

Comparison with Variants of the Framework
We further analyzed the impact of hyperparameters in the framework, including the structure of SAE-DNNs and the residual units, and filters. In the following discussion, we change one hyperparameter while keeping other hyperparameters unchanged. Table 2 shows the performance of different SAE-DNNs structures, the structure of 4 hidden layers and [200, 400, 400, 200] hidden units perform best. As can be seen in Figures 5 and 6, the RMSE first declines and then goes up with the increase of the quantity of residual units, or filters. We discovered that when six residual units and 64 filters are used, our method exhibited the best performance.   In this study, training data sets were divided into seven types to test whether multi-dimensional features can improve the performance of the prediction model. Table 3 illustrates that when using two kinds of data sources, the performances were better than only using one, and considering all three data sources, the model performed best. It proved that the functional region of subway station is not only determined by the passenger features, but only determined by the built environment features. Introducing both static information about buildings and dynamic information about passenger preference could promote the accuracy of the framework. The bold denotes to the best performance.  In this study, training data sets were divided into seven types to test whether multi-dimensional features can improve the performance of the prediction model. Table 3 illustrates that when using two kinds of data sources, the performances were better than only using one, and considering all three data sources, the model performed best. It proved that the functional region of subway station is not only determined by the passenger features, but only determined by the built environment features. Introducing both static information about buildings and dynamic information about passenger preference could promote the accuracy of the framework. The bold denotes to the best performance. In this study, training data sets were divided into seven types to test whether multi-dimensional features can improve the performance of the prediction model. Table 3 illustrates that when using two kinds of data sources, the performances were better than only using one, and considering all three data sources, the model performed best. It proved that the functional region of subway station is not only determined by the passenger features, but only determined by the built environment features. Introducing both static information about buildings and dynamic information about passenger preference could promote the accuracy of the framework.  Table 4 shows a confusion matrix of jobs-housing based classification. The accuracy of work regions reached 0.876 and the accuracy of hybrid regions reached 0.902. However, the accuracy of residential regions and transport regions were lower than these, reaching 0.833 and 0.818 respectively. There are many shops and schools around some communities, leading to classification of some residential regions as hybrid regions, which caused the lowest accuracy for residential regions. The most common misjudgments are classifying residential regions as hybrid regions, classifying hybrid regions as residential regions, and classifying work regions as hybrid regions. We could find that there are still some errors in identifying the boundaries of hybrid regions, and this is a good direction for future work.  Table 5 shows the confusion matrix of multi daily activities-based classification. Mostly the identification accuracy was higher than 0.8, only the greenbelt identification accuracy was 0.667; 33.3% greenbelts are classified as leisure entertainment, considering that some scenic spots have both entertainment and green land, in fact, they are acceptable to be classified into any category. Table 5. Confusion matrix of multi daily activities-based classification. Science education (A), health (B), leisure entertainment (C), sport (D), greenbelt (E), residential regions (F), work regions (G) and transport regions (H).

Comparation with Other Researches
We compared similar research from the past five years, including city, data, and results, as shown in Table 6. It can be concluded that methods containing travel data (bicycle sharing and smart card) perform better. Using multi-sources data, our proposed method could adapt to two classification scenarios (jobs-housing-based classification and multi daily activities-based classification), other methods have not been proved this. The trend of functional region classification is to apply more dimensional data to solve more classification problems.

Conclusions
Nowadays, big data analytics and deep learning methods are increasingly used in transportation and land planning. Meanwhile, more and more travel companies publish open source data. These changes make it possible to conduct relevant analyses and quantify the urban functional regions.
This paper proposes a two-stage framework and uses SCD, POI data, and OTO data to identify the functional regions of subway stations. In the passenger flow stage, the SCD features are extracted and convertedin to a feature map. Then this map is fed into an identification model to capture passenger flow pattern and a ResNet model is used to build a prediction model. In built environment stage, the POI and OTO features are extracted, including static information about buildings and dynamic information about passenger preference. A SAE-DNN model is built for suspected anomaly recognition. Finally, we connect the outputs of two stages and use a SoftMax function to make the final identification of the functional region. We compare our methods with other classical methods, change hyperparameter, and try different datasets.
Experimental results show that the method, the hyperparameter, and the dataset used in this paper have the best MSE and R 2 , which means that this method achieves the best performance, and this model can be used in multiple scenarios. The classification results can help to better plan the surrounding areas of the subway stations, build supporting facilities to facilitate residents' travel life, and construct smart cities. In the future, people can use this model to further optimize data collection, then to select areas for new subway stations and to plan for the upcoming subway stations, that is, to construct surrounding facilities in advance according to their corresponding functional area divisions.
In order to achieve better effects, there are some points we can improve. First, we need to solve the errors that appear when identifying the boundary of hybrid regions. Additionally, several different classification types need to be done to test this model. Next, in order to obtain more stable conclusions, we need to extend our analysis by training models based on data from different seasons. Finally, combining various kinds of travel data can be effective to weaken the impacts of random factors on the recognition results. Even though this study fails to collect other traffic data from the same periods (such as bus and bicycle sharing data), multivariate data fusion can greatly enhance the accuracy and reliability of this model, which has good application prospects.