An Integrated Framework Combining Multiple Human Activity Features for Land Use Classification

Urban land use information is critical to urban planning, but the increasing complexity of urban systems makes the accurate classification of land use extremely challenging. Human activity features extracted from big data have been used for land use classification, and fusing different features can help improve the classification. In this paper, we propose a framework to integrate multiple human activity features for land use classification. Features were fused by constructing a membership matrix reflecting the fuzzy relationship between features and land use types using the fuzzy c-means (FCM) clustering method. The classification results were obtained by the fuzzy comprehensive evaluation (FCE) method, which regards the membership matrix as the fuzzy evaluation matrix. This framework was applied to a case study using taxi trajectory data from Nanjing, and the outflow, inflow, net flow and net flow ratio features were extracted. A series of experiments demonstrated that the proposed framework can effectively fuse different features and increase the accuracy of land use classification. The classification accuracy achieved 0.858 (Kappa = 0.810) when the four features were fused for land use classification.


Introduction
Urban land use information is the foundation of urban planning, and it plays an important role in government management, policy formulation and resource allocation [1][2][3][4][5][6].Although the government has land use registration information, it is difficult to update and acquire land use information in a timely manner because urban land use and spatial structure are changing rapidly in developing countries such as China [7][8][9].To solve this problem, a fast and accurate method for urban land use classification needs to be developed.
Remote sensing techniques classify urban land use with spectral and texture information, and they have a good ability to reveal the physical characteristics of the earth's surface, such as water and buildings [10][11][12][13].However, it is hard to distinguish land use types in more detail relying solely on remote sensing images [14][15][16], such as identifying residential and commercial land from buildings, whereas detailed urban land use is usually associated with social functions [17][18][19].Human activities interact with social functions of distinct regions [20].Many scholars have studied the impact of urban land use on human activities, such as traffic demand forecasting and commuting patterns research [21][22][23][24][25]. Conversely, it is also feasible to identify social functions and infer urban land use based on human activities [26][27][28][29][30][31][32].The traditional sources of human activity information rely on travel surveys [33,34].The survey data record the activities of subjects during the observation period and play a major role in classical urban studies, but data acquisition is time-consuming, which has limited the development of related studies [35][36][37].
With the rapid development of information and communication technologies (ICT), massive amounts of crowdsourced data (e.g., mobile phone record data, taxi trajectory data and social media check-in data) are well captured.These data are plentiful and accessible, and they contain a wealth of information about human activities and socioeconomics, providing strong support for understanding urban land use [38][39][40].Reades et al. [41] analyzed the relationship between mobile phone data and business land and identified different mobile phone usage patterns between business and residential land.Calabrese et al. [42] successfully classified the campus environment based on the Wi-Fi network.Qi et al. [43] qualitatively analyzed the relationship between taxi trajectory data and the social functions of a city.These studies demonstrate that the time series representing human activity variations are very useful for land use classification.After that, Soto and Frias-Martinez [44,45] built a time series of hourly calling volume feature on weekdays and weekends and introduced clustering methods to classify urban land use.Liu et al. [26] constructed a time series of the differences between the volumes of pick-up and drop-off points and classified land use in Shanghai using the k-means clustering method.Time series of human activity features have been widely applied to land use classification [46][47][48][49].
Integrating human activity features can capture different aspects of human activities and provide more information for land use classification.Toole et al. [50] constructed a new time series by adding the total calling volume feature to the time series of hourly calling volume feature.Pei et al. [51] extended the new time series by introducing more information on the hourly calling volume feature and proved that the classification accuracy based on the new time series is better than the hourly calling volume feature and total calling volume feature used alone.These studies demonstrate the feasibility and advantages of fusing human activity characteristics in land use classification, but the feature combination is limited to two features.Therefore, it is meaningful to explore the performance of integrating more features in land use classification.Feature combination is often implemented by connecting the time series of each feature.Pan et al. [52] fused the outflow and inflow features of the taxi trajectory data by splicing the outflow time series and inflow time series.Liu et al. [53] also spliced the time series of the outflow and inflow features in the land use classification of Shanghai.However, a high-dimensional time series will be formed when more features are fused.In many cases, the similarity between time series is closely related to classification results.When the time series is of high dimensions, the traditional distance functions (e.g., Euclidean distance) are invalid, which will affect the classification accuracy [54][55][56].Therefore, it is necessary to find a new method of combining multiple features for land use classification.
In this study, we propose an integrated framework to fuse features for land use classification, and it was inspired by the fuzzy comprehensive evaluation (FCE) method.Time series were built for each feature and clustered by the fuzzy c-means (FCM) clustering method.A membership matrix was constructed to fuse features based on the clustering results, and the FCE method was utilized to determine the land use type based on this matrix.The proposed framework can combine multiple human activity features without generating a high-dimensional time series, and it has been applied in the land use classification of Nanjing.
The remainder of this paper is organized as follows.Section 2 introduces the framework that combines multiple human activity features for land use classification.Section 3 introduces a case study using taxi trajectory data from Nanjing.The framework is discussed in Section 4. Section 5 summarizes our study and discusses future work.

Method
A flowchart of the framework is shown in Figure 1, and it includes the following three steps and a training process.First, human activity features were extracted from the taxi trajectory data, and then the time series of each feature were built.Next, the FCM method was utilized to cluster the time series of each feature.The centers of land use types were calculated to match cluster centers with land use types, and membership degree was used to construct the membership matrix, which is regarded as the fuzzy evaluation matrix in the FCE method.Finally, the classification results were obtained based on the FCE method.A training process was performed to determine the weight set in the FCE method.The specific implementation was as follows.
ISPRS Int.J. Geo-Inf.2019, 8, x FOR PEER REVIEW 3 of 16 series of each feature.The centers of land use types were calculated to match cluster centers with land use types, and membership degree was used to construct the membership matrix, which is regarded as the fuzzy evaluation matrix in the FCE method.Finally, the classification results were obtained based on the FCE method.A training process was performed to determine the weight set in the FCE method.The specific implementation was as follows.

Extracting Features and Constructing Time Series
Different human activity features can be extracted from the same data source.We regard a journey of passengers as a flow from the pick-up point to the drop-off point; then, the pick-up point and drop-off point can represent the outflow and inflow of the region, respectively.The outflow, inflow, net flow (inflow − outflow) and net flow ratio ( ) features can be extracted from the taxi trajectory data [43,57].The construction of the time series is flexible.We can not only aggregate the data into a week or a day [41,42] but also divide a week to obtain greater detail, such as distinguishing human activity patterns on weekdays and weekends [58] and distinguishing human activity patterns on normal workdays, Fridays, Saturdays and Sundays [51].In addition, the interval of time series can be set as needed, such as 10 minutes or one hour [59,60].
The study area should be divided into various unclassified areas, but the division method can be flexibly selected, such as dividing based on grids or traffic analysis zones (TAZs).For each unclassified region, if  human activity features are extracted, the time series of each feature can be built as where  is the unnormalized time series of feature  ( = 1,2, … , ). is the value of feature  over the period . is the dimension.Normalization of the time series is key to ensuring that the classification results correspond well to the land use types [53].Thus,  is normalized using Z-score.

Extracting Features and Constructing Time Series
Different human activity features can be extracted from the same data source.We regard a journey of passengers as a flow from the pick-up point to the drop-off point; then, the pick-up point and drop-off point can represent the outflow and inflow of the region, respectively.The outflow, inflow, net flow (inflow − outflow) and net flow ratio ( inflow−outflow inflow+outflow ) features can be extracted from the taxi trajectory data [43,57].The construction of the time series is flexible.We can not only aggregate the data into a week or a day [41,42] but also divide a week to obtain greater detail, such as distinguishing human activity patterns on weekdays and weekends [58] and distinguishing human activity patterns on normal workdays, Fridays, Saturdays and Sundays [51].In addition, the interval of time series can be set as needed, such as 10 minutes or one hour [59,60].
The study area should be divided into various unclassified areas, but the division method can be flexibly selected, such as dividing based on grids or traffic analysis zones (TAZs).For each unclassified region, if F human activity features are extracted, the time series of each feature can be built as where z 0 i is the unnormalized time series of feature i (i = 1, 2, . . ., F). N t i is the value of feature i over the period t.H is the dimension.Normalization of the time series is key to ensuring that the classification results correspond well to the land use types [53].Thus, z 0 i is normalized using Z-score.
where µ i and σ i are the mean and standard deviation of the time series of feature i, respectively.

Constructing the Membership Matrix
The time series of each feature are clustered by the FCM algorithm after the construction of time series.The FCM algorithm is chosen because it introduces fuzzy partitioning and membership degree theory in clustering, which allows the unclassified area to simultaneously belong to different land use types [61].At the same time, the FCM algorithm has a solid theoretical foundation and broad applications [62][63][64].
Given the time series of feature i, the FCM algorithm returns a list of cluster centers v i,j and membership degree u i,j .v i,j is the cluster center j (j = 1, . . ., L), and u i,j is the membership degree of the unclassified area to cluster center j.L is the number of clusters, which is set to the number of land use types that can be obtained from the land use data of the study area.For each unclassified area, u i,j satisfies the conditions in Equation (3).
To construct the membership matrix for each unclassified region, cluster centers need to be matched with land use types.In this case, u i,j can represent the membership degree of the unclassified region to land use type j when the feature i is utilized for land use classification, and the membership matrix U can be constructed using the membership degree in the clustering results of each feature.
The following steps are achieved to match the cluster centers to land use types.First, the land use data of the study area are divided into a training set and a test set at a ratio of 3:1.The same land use type occupies the same proportion in the two sets.If there are S unclassified areas in the training set, and the number of unclassified areas belonging to land use type j is M j , then S = L ∑ j=1 M j .Next, the center of land use type j is calculated according to Equation (5).
where z i,j,k is the time series of feature i extracted from the unclassified region k belonging to land use type j.Finally, the land use type of each cluster center is determined by locating the minimum distance between c i,j and v i,j when the feature i is applied to land use classification.

Determining land use types
In this step, F features are considered evaluation indices, and L land use types are considered remarks.The evaluation of the unclassified area based on each index constitutes the fuzzy evaluation matrix in the FCE method, and the membership matrix U built in Section 2.2 is used as the fuzzy evaluation matrix to determine the land use type of the unclassified area.The FCE method uses membership degree theory and can comprehensively evaluate objects affected by multiple factors [65], and it has been applied to address classification uncertainty in mineral prospectivity mapping, quality analysis and other fields [66][67][68].The model of the FCE method can be expressed as where E is the evaluation result, and W is the weight set.W = (w 1 , w i , . . . ,w F ) and satisfies ∑ F i=1 w i = 1.w i is the weight of feature i. F is the number of features.
• is an operator.
The operators often employed in the FCE method include M( , ), M( , •), M( , ) and M(+, •).We calculated Equation ( 6) with M(+, •) because it can make full use of all evaluation information, and it is a relatively ideal operator [69].Thus, the evaluation result can be acquired according to Equation ( 7).
where e j is the membership degree of the unclassified region to land use type j.The land use type is determined according to the principle of maximum membership [70], so the land use type of the unclassified region is set to the land use type corresponding to the largest e j .
The weight set W = (w 1 , w i , . . . ,w F ) is determined by a training process, and the training set has been divided in Section 2.2.If Z n is the nth unclassified area in the training set, G n is the classification result of Z n , and G n is the real land use type, the weight set can be acquired by minimizing the objective function of Equation (8).
where S is the number of Z n .I(Z n ) is an indicator function, and

Case Study Using Taxi Trajectory Data from Nanjing
The proposed framework was applied to classify land use in Nanjing.The outflow, inflow, net flow and net flow ratio features were extracted from the taxi trajectory data.In studies based on taxi trajectory data, the outflow and inflow features have been integrated for land use classification, but they are rarely combined with other features.In this study, we not only integrated the outflow and inflow features but also fused the net flow and net flow ratio features, respectively, with them.At the same time, the four features were also fused for the land use classification.In the results section, the classification results of the framework based on different feature combinations were compared.A comparative experiment was also conducted to compare the classification results of the framework with other methods.

Study Area and Data Preparation
Nanjing, a megacity in the Yangtze River Delta, covers an area of 6,587 km 2 and governs 11 districts (Figure 2a).In 2016, the resident population was approximately 8.27 million, and the urbanization rate reached 82%.In this study, nine districts (Gulou, Jianye, Qinhuai, Xuanwu, Jiangning, Luhe, Pukou, Qixia and Yuhuatai) were selected as the study area.
Nanjing, a megacity in the Yangtze River Delta, covers an area of 6,587 km 2 and governs 11 districts (Figure 2a).In 2016, the resident population was approximately 8.27 million, and the urbanization rate reached 82%.In this study, nine districts (Gulou, Jianye, Qinhuai, Xuanwu, Jiangning, Luhe, Pukou, Qixia and Yuhuatai) were selected as the study area.The taxi trajectory data (5 December to 25 December 2016) come from the Nanjing Information Center (http://www.njinfo.gov.cn/).Each data entry includes the plate number, record time, longitude and latitude of the taxi location, status (whether carrying passengers) and speed.The record interval is approximately 10-30 s.We extracted pick-up and drop-off points based on the change of status and divided the study area into 500 m × 500 m cells.The resolution was determined through comparative experiments at different resolutions.To ensure that the features in the cells were stable and had statistical significance, cell filtering was necessary.In this study, the total number of pick-up and drop-off points in the reserved cells exceeded 50, and 2114 cells were obtained in Figure 2b.We aggregated the taxi trajectory data to one week and distinguished between weekdays and weekends because human dynamics differ greatly between weekdays and weekends [44,45].The 1-hour interval, which has been widely used in many studies, was chosen [26,37].Time series of each feature were also built.
The land use data for 2016 was obtained from the Geographical Information Monitoring Cloud Platform (http://www.dsac.cn/),and it was divided into five land use types: commercial land, residential land, industrial land, open space and others.To facilitate the comparison of the land use data and classification results, the land use data was mapped to cells (Figure 3).The proportion and number of cells per land use type are shown in Table 1.Note that open space includes parks, scenic spots, and occupies a large area, but taxis are not allowed to enter it in most cases.Consequently, the cells located in the open space were deleted because the total number of pick-up points and drop-off points in these cells was too small, meaning that only a small number of open space cells can be used.The land use data was divided into a training set and a test set at a ratio of 3:1, and 528 cells belonging to the test set were used to evaluate the classification results.
cells located in the open space were deleted because the total number of pick-up points and drop-off points in these cells was too small, meaning that only a small number of open space cells can be used.The land use data was divided into a training set and a test set at a ratio of 3:1, and 528 cells belonging to the test set were used to evaluate the classification results.

Results
Table 2 shows the classification accuracy of each feature combination.The combination of the outflow and inflow features had an overall accuracy of 0.742, and the kappa coefficient was 0.659.When the net flow and net flow ratio features were added separately, the overall accuracy was enhanced by 0.061 and 0.042, and the kappa coefficient was increased by 0.079 and 0.053, respectively.The classification accuracy achieved 0.858 (Kappa = 0.810) when four features were fused.The classification results were significantly improved.These results demonstrate that the proposed framework can effectively fuse features, and combining the four features can significantly enhance the overall accuracy of land use classification.The classification results of the feature combinations are displayed in Figure 4. Differences can be seen in the three highlighted typical areas.The confusion matrices are shown in Figure 5. Region #1 belongs to industrial land, but it was misidentified by the combination of the outflow and inflow features (Figure 4a) and the combination of the outflow, inflow and net flow features (Figure 4b).When the net flow ratio feature was combined with the two feature combinations separately, region #1 was correctly classified in Figure 4c,d.Region #2 is a typical open space, and region #3 belongs to others, but both regions were misidentified in Figure 4a,c when applying the outflow and inflow features and applying the outflow, inflow and net flow ratio features.The mixing of open space with others was also obvious in Figure 5a,c.However, this situation was improved in Figure 5b,d when the net flow feature was fused with the two feature combinations separately.Region #2 and region #3 were also correctly identified in Figure 4b,d.As shown in Figure 4d, the classification results based on the four features accurately classified the three typical areas, and the confusion matrix shown in Figure 5d was significantly better than those of the other feature combinations.Thus, it is reasonable to fuse distinct human activity features for land use classification, and fusing the four features based on the proposed framework can improve land use classification.
others was also obvious in Figure 5a,c.However, this situation was improved in Figure 5b,d when the net flow feature was fused with the two feature combinations separately.Region #2 and region #3 were also correctly identified in Figure 4b,d.As shown in Figure 4d, the classification results based on the four features accurately classified the three typical areas, and the confusion matrix shown in Figure 5d was significantly better than those of the other feature combinations.Thus, it is reasonable to fuse distinct human activity features for land use classification, and fusing the four features based on the proposed framework can improve land use classification.Table 3 shows the classification accuracies of land use types.Producer's accuracy indicates the probability that the real land use types in land use data are correctly identified, and user's accuracy is the probability that the land use types in the classification results are correctly classified.Regarding all the land use types, both the producer's and user's accuracies were enhanced when the net flow feature and the net flow ratio feature were combined with the outflow and inflow features, Table 3 shows the classification accuracies of land use types.Producer's accuracy indicates the probability that the real land use types in land use data are correctly identified, and user's accuracy is the probability that the land use types in the classification results are correctly classified.Regarding all the land use types, both the producer's and user's accuracies were enhanced when the net flow feature and the net flow ratio feature were combined with the outflow and inflow features, respectively.The highest producer's and user's accuracies for each land use type were obtained when applying all the features to the land use classification.The producer's accuracies of all land use types were higher than 0.735, and the user's accuracies were above 0.700.The residential and industrial land had higher accuracies than those for commercial land, open space and others, and both the producer's and user's accuracies for residential and industrial land exceeded 0.870.The reasons for the differences among the accuracies of the land use types will be discussed in the next section.The above results prove that the proposed framework can effectively fuse different features and obtain an overall accuracy of 0.858 (Kappa = 0.810).However, the advantages of the framework in land use classification remain to be verified.Thus, we conducted the next experiment.The expectation-maximization (EM) algorithm and a time series spliced by the outflow time series and inflow time series were used to classify land use in Nanjing, referring to the aggregated IS method in the study of Liu et al. [53].This method was named OI_EM in this study, and it obtained an overall accuracy of 0.720 (Kappa = 0.632), which was lower than the classification accuracy of the proposed framework.Figure 6 shows the classification results based on the OI_EM method and the framework.Differences can be seen in the three highlighted typical areas.The confusion matrix of the OI_EM method is displayed in Figure 7. Table 4 shows the accuracies of the land use types.The above results prove that the proposed framework can effectively fuse different features and obtain an overall accuracy of 0.858 (Kappa = 0.810).However, the advantages of the framework in land use classification remain to be verified.Thus, we conducted the next experiment.The expectation-maximization (EM) algorithm and a time series spliced by the outflow time series and inflow time series were used to classify land use in Nanjing, referring to the aggregated IS method in the study of Liu et al. [53].This method was named OI_EM in this study, and it obtained an overall accuracy of 0.720 (Kappa = 0.632), which was lower than the classification accuracy of the proposed framework.Figure 6 shows the classification results based on the OI_EM method and the framework.Differences can be seen in the three highlighted typical areas.The confusion matrix of the OI_EM method is displayed in Figure 7. Table 4 shows the accuracies of the land use types.As shown in Figure 6, region #1 and region #2 are typical residential land, and region #3 belongs to open space.They were correctly identified by the proposed framework in Figure 6b. Figure 7 shows that residential land was easily confused with commercial land and others when applying the OI_EM method, similar to region #1 and region #2, which were both misidentified in Figure 6a.Many cells were also misclassified as commercial land by the OI_EM method, such as region #3, which belongs to open space.Compared with Figure 7, the mixing of land use types in the confusion matrix of the framework (Figure 5d) was significantly weaker than that of the OI_EM method.For all the land use types, the framework (Experiment D in Table 3) also had higher accuracy than the OI_EM method (Table 4).These comparisons demonstrate the advantages of the framework in land use classification.

Discussion
Combining human activity features can help increase the accuracy of urban land use classification [71].However, feature combination in many studies is limited to two features, and few studies have effectively combined multiple human activity features [50,51].The framework proposed in this paper can provide more information for land use classification by combining multiple human activity features based on the FCM algorithm and the fuzzy comprehensive evaluation (FCE) method, which can obtain better accuracy than the method fusing two features.The classification results in Nanjing indicated that the framework can effectively integrate the outflow, inflow, net flow and net flow ratio features of taxi trajectory data and achieve an accuracy of 0.858 (Kappa = 0.810).At the same time, the framework had higher accuracy than the OI_EM method (OA = 0.720, Kappa = 0.632) that integrated the outflow and inflow features.Therefore, the proposed framework can effectively  As shown in Figure 6, region #1 and region #2 are typical residential land, and region #3 belongs to open space.They were correctly identified by the proposed framework in Figure 6b. Figure 7 shows that residential land was easily confused with commercial land and others when applying the OI_EM method, similar to region #1 and region #2, which were both misidentified in Figure 6a.Many cells were also misclassified as commercial land by the OI_EM method, such as region #3, which belongs to open space.Compared with Figure 7, the mixing of land use types in the confusion matrix of the framework (Figure 5d) was significantly weaker than that of the OI_EM method.For all the land use types, the framework (Experiment D in Table 3) also had higher accuracy than the OI_EM method (Table 4).These comparisons demonstrate the advantages of the framework in land use classification.

Discussion
Combining human activity features can help increase the accuracy of urban land use classification [71].However, feature combination in many studies is limited to two features, and few studies have effectively combined multiple human activity features [50,51].The framework proposed in this paper can provide more information for land use classification by combining multiple human activity features based on the FCM algorithm and the fuzzy comprehensive evaluation (FCE) method, which can obtain better accuracy than the method fusing two features.The classification results in Nanjing indicated that the framework can effectively integrate the outflow, inflow, net flow and net flow ratio features of taxi trajectory data and achieve an accuracy of 0.858 (Kappa = 0.810).At the same time, the framework had higher accuracy than the OI_EM method (OA = 0.720, Kappa = 0.632) that integrated the outflow and inflow features.Therefore, the proposed framework can effectively integrate multiple human activity features and improve land use classification.
In the framework, features participated in the classification process with different weights.The weight sets of feature combinations (Table 5) show that the weights for the inflow feature were larger than those of the outflow feature in all the experiments.In Experiments B and D, using the net flow feature, the weights of the net flow feature exceeded 0.300.To determine the reasons for the different feature weights, we input the four features into the proposed framework, respectively.The classification accuracies of the features in Table 6 show that each feature can be used to identify urban land use types.This finding agrees with previous studies that we can inform the social function of urban areas by using passengers' pick-up/set-down dynamics [26,52].The four features had different land use classification accuracies.The net flow feature had the highest accuracy, which was followed by those of the inflow and net flow ratio features, whereas the outflow feature exhibited the poorest accuracy.Therefore, features with high accuracies, such as the inflow and net flow features, almost had large weights in the feature combinations.
By comparing the classification accuracies of features (Table 6) and feature combinations (Table 2 in Section 3.2), the feature combinations were found to have higher accuracies than each feature fused in the combinations, such as Experiment D, which fused the four features (OA = 0.858, Kappa = 0.810) and had a higher accuracy than the outflow (OA = 0.563, Kappa = 0.440), inflow (OA = 0.691, Kappa = 0.593), net flow (OA = 0.741, Kappa = 0.655) and net flow ratio features (OA = 0.636, Kappa = 0.517).This indicates that fusing features based on the proposed framework can increase land use classification accuracy.The accuracies of land use types (Table 3 in Section 3.2) show that the framework produced higher classification accuracies for residential and industrial land than for commercial land, open space and others.To determine the reasons, we drew the centers of the land use types and the cluster centers of each feature in Figure 8. Regarding residential and industrial land, the cluster centers are similar to the centers of land use types, and peaks of these curves are obvious.Regarding commercial land, open space and others, the cluster centers are different from the centers of land use types, and the characteristics of these curves are not prominent.At the same time, the curves of open space and others have small differences.Thus, it is difficult to correctly identify commercial land, open space and others.In addition, some factors affecting the classification accuracy of the framework need to be discussed.First, the FCM algorithm was utilized to cluster the time series of each feature, and its classification accuracy has an important impact on the framework.At present, many studies have improved the FCM algorithm [72,73] and using the improved FCM method, therefore, can help increase the classification accuracy of the framework.Second, the study area was divided into 500 m × 500 m cells in this study, but this division method cannot guarantee the continuity of land use types in cells.The land use types in the cells at the junction of different land use types are highly mixed, which makes it difficult to identify the land use types of these cells.At the same time, we mapped the land use data to cells by calculating the proportion of land use types in the cell and assigning the land use type with the largest proportion to the cell.Highly mixed cells also affect the determination of real land types.The framework supports different division methods, so choosing a better division method can help increase the classification accuracy of the framework.

Conclusions
Big data records human activities in urban areas and enables us to infer land use types by considering collective activity features.Fusing different human activity features can provide more information on land use classification.In this study, we proposed an integrated framework to combine multiple features for land use classification.By clustering the time series of each feature with the fuzzy c-means (FCM) clustering method, a membership matrix reflecting the fuzzy relationship between features and land use types was built for each unclassified area.The fuzzy comprehensive evaluation (FCE) method was used to determine the land use type of each unclassified area based on the membership matrix.The results of the case study in Nanjing indicated that the proposed framework can effectively fuse different features and increase the accuracy of land use classification.When applying the outflow, inflow, net flow and net flow ratio features to land use classification, the In addition, some factors affecting the classification accuracy of the framework need to be discussed.First, the FCM algorithm was utilized to cluster the time series of each feature, and its classification accuracy has an important impact on the framework.At present, many studies have improved the FCM algorithm [72,73] and using the improved FCM method, therefore, can help increase the classification accuracy of the framework.Second, the study area was divided into 500 m × 500 m cells in this study, but this division method cannot guarantee the continuity of land use types in cells.The land use types in the cells at the junction of different land use types are highly mixed, which makes it difficult to identify the land use types of these cells.At the same time, we mapped the land use data to cells by calculating the proportion of land use types in the cell and assigning the land use type with the largest proportion to the cell.Highly mixed cells also affect the determination of real land types.The framework supports different division methods, so choosing a better division method can help increase the classification accuracy of the framework.

Conclusions
Big data records human activities in urban areas and enables us to infer land use types by considering collective activity features.Fusing different human activity features can provide more information on land use classification.In this study, we proposed an integrated framework to combine multiple features for land use classification.By clustering the time series of each feature with the fuzzy c-means (FCM) clustering method, a membership matrix reflecting the fuzzy relationship between features and land use types was built for each unclassified area.The fuzzy comprehensive evaluation (FCE) method was used to determine the land use type of each unclassified area based on the membership matrix.The results of the case study in Nanjing indicated that the proposed framework can effectively fuse different features and increase the accuracy of land use classification.When applying the outflow, inflow, net flow and net flow ratio features to land use classification, the classification results achieved an overall accuracy of 0.858 (Kappa = 0.810).For all the land use types, the producer's accuracies were higher than 0.735, and the user's accuracies exceeded 0.700.Both the producer's and user's accuracies for residential and industrial land were higher than 0.870.
The human activity features fused in the framework can come from different data sources, and we can build a membership matrix for each unclassified area based on multisource big data.Some studies have applied multisource data to land use classification and building function identification [74][75][76], so integrating multisource data contributes to land use classification.In the future, we will explore the performance of the framework in fusing multisource data for land use classification.

Figure 1 .
Figure 1.Flowchart of the proposed framework.

Figure 1 .
Figure 1.Flowchart of the proposed framework.

Figure 4 .
Figure 4. Classification results of feature combinations.(a) Outflow and inflow features; (b) outflow, inflow and net flow features; (c) outflow, inflow and net flow ratio features; (d) outflow, inflow, net flow and net flow ratio features.

Figure 5 .
Figure 5. Confusion matrices of feature combinations.(a) Outflow and inflow features; (b) outflow, inflow and net flow features; (c) outflow, inflow and net flow ratio features; (d) outflow, inflow, net flow and net flow ratio features.

Figure 5 .
Figure 5. Confusion matrices of feature combinations.(a) Outflow and inflow features; (b) outflow, inflow and net flow features; (c) outflow, inflow and net flow ratio features; (d) outflow, inflow, net flow and net flow ratio features.

Figure 6 .
Figure 6.Classification results based on different methods.(a) The method using the EM algorithm and the time series that integrates the outflow and inflow features (OI_EM method); (b) framework in this study.

Figure 6 .
Figure 6.Classification results based on different methods.(a) The method using the EM algorithm and the time series that integrates the outflow and inflow features (OI_EM method); (b) framework in this study.

Figure 7 .
Figure 7. Confusion matrix of the OI_EM method.

Figure 7 .
Figure 7. Confusion matrix of the OI_EM method.

16 Figure 8 .
Figure 8. Centers of land use types and clusters centers.(a)-(d) Centers of land use types; (e)-(h) cluster centers.

Figure 8 .
Figure 8. Centers of land use types and clusters centers.(a)-(d) Centers of land use types; (e)-(h) cluster centers.

Table 1 .
Proportion and number of cells per land use type.

land Residential land Industrial land Open space
Figure 3. Land use in Nanjing.

Table 1 .
Proportion and number of cells per land use type.

Table 2 .
Classification accuracy of feature combinations.
1OA is overall accuracy Exp.

Outflow Inflow Net flow Net flow ratio OA Kappa
1OA is overall accuracy

Table 3 .
Classification accuracies of land use types based on different feature combinations.
1PA represents the producer's accuracy; UA represents the user's accuracy.

Table 4 .
Classification accuracy of land use types based on the OI_EM method.

land Residential land Industrial land Open space Others
1PA represents the producer's accuracy; UA represents the user's accuracy.

Table 4 .
Classification accuracy of land use types based on the OI_EM method.
1PA represents the producer's accuracy; UA represents the user's accuracy.

Table 5 .
Weight sets of feature combinations.

Table 6 .
Classification accuracies of features.