1. Introduction
Urban land use information is the foundation of urban planning, and it plays an important role in government management, policy formulation and resource allocation [
1,
2,
3,
4,
5,
6]. Although the government has land use registration information, it is difficult to update and acquire land use information in a timely manner because urban land use and spatial structure are changing rapidly in developing countries such as China [
7,
8,
9]. To solve this problem, a fast and accurate method for urban land use classification needs to be developed.
Remote sensing techniques classify urban land use with spectral and texture information, and they have a good ability to reveal the physical characteristics of the earth’s surface, such as water and buildings [
10,
11,
12,
13]. However, it is hard to distinguish land use types in more detail relying solely on remote sensing images [
14,
15,
16], such as identifying residential and commercial land from buildings, whereas detailed urban land use is usually associated with social functions [
17,
18,
19]. Human activities interact with social functions of distinct regions [
20]. Many scholars have studied the impact of urban land use on human activities, such as traffic demand forecasting and commuting patterns research [
21,
22,
23,
24,
25]. Conversely, it is also feasible to identify social functions and infer urban land use based on human activities [
26,
27,
28,
29,
30,
31,
32]. The traditional sources of human activity information rely on travel surveys [
33,
34]. The survey data record the activities of subjects during the observation period and play a major role in classical urban studies, but data acquisition is time-consuming, which has limited the development of related studies [
35,
36,
37].
With the rapid development of information and communication technologies (ICT), massive amounts of crowdsourced data (e.g., mobile phone record data, taxi trajectory data and social media check-in data) are well captured. These data are plentiful and accessible, and they contain a wealth of information about human activities and socioeconomics, providing strong support for understanding urban land use [
38,
39,
40]. Reades et al. [
41] analyzed the relationship between mobile phone data and business land and identified different mobile phone usage patterns between business and residential land. Calabrese et al. [
42] successfully classified the campus environment based on the Wi-Fi network. Qi et al. [
43] qualitatively analyzed the relationship between taxi trajectory data and the social functions of a city. These studies demonstrate that the time series representing human activity variations are very useful for land use classification. After that, Soto and Frias-Martinez [
44,
45] built a time series of hourly calling volume feature on weekdays and weekends and introduced clustering methods to classify urban land use. Liu et al. [
26] constructed a time series of the differences between the volumes of pick-up and drop-off points and classified land use in Shanghai using the k-means clustering method. Time series of human activity features have been widely applied to land use classification [
46,
47,
48,
49].
Integrating human activity features can capture different aspects of human activities and provide more information for land use classification. Toole et al. [
50] constructed a new time series by adding the total calling volume feature to the time series of hourly calling volume feature. Pei et al. [
51] extended the new time series by introducing more information on the hourly calling volume feature and proved that the classification accuracy based on the new time series is better than the hourly calling volume feature and total calling volume feature used alone. These studies demonstrate the feasibility and advantages of fusing human activity characteristics in land use classification, but the feature combination is limited to two features. Therefore, it is meaningful to explore the performance of integrating more features in land use classification. Feature combination is often implemented by connecting the time series of each feature. Pan et al. [
52] fused the outflow and inflow features of the taxi trajectory data by splicing the outflow time series and inflow time series. Liu et al. [
53] also spliced the time series of the outflow and inflow features in the land use classification of Shanghai. However, a high-dimensional time series will be formed when more features are fused. In many cases, the similarity between time series is closely related to classification results. When the time series is of high dimensions, the traditional distance functions (e.g., Euclidean distance) are invalid, which will affect the classification accuracy [
54,
55,
56]. Therefore, it is necessary to find a new method of combining multiple features for land use classification.
In this study, we propose an integrated framework to fuse features for land use classification, and it was inspired by the fuzzy comprehensive evaluation (FCE) method. Time series were built for each feature and clustered by the fuzzy c-means (FCM) clustering method. A membership matrix was constructed to fuse features based on the clustering results, and the FCE method was utilized to determine the land use type based on this matrix. The proposed framework can combine multiple human activity features without generating a high-dimensional time series, and it has been applied in the land use classification of Nanjing.
The remainder of this paper is organized as follows.
Section 2 introduces the framework that combines multiple human activity features for land use classification.
Section 3 introduces a case study using taxi trajectory data from Nanjing. The framework is discussed in
Section 4.
Section 5 summarizes our study and discusses future work.
2. Method
A flowchart of the framework is shown in
Figure 1, and it includes the following three steps and a training process. First, human activity features were extracted from the taxi trajectory data, and then the time series of each feature were built. Next, the FCM method was utilized to cluster the time series of each feature. The centers of land use types were calculated to match cluster centers with land use types, and membership degree was used to construct the membership matrix, which is regarded as the fuzzy evaluation matrix in the FCE method. Finally, the classification results were obtained based on the FCE method. A training process was performed to determine the weight set in the FCE method. The specific implementation was as follows.
2.1. Extracting Features and Constructing Time Series
Different human activity features can be extracted from the same data source. We regard a journey of passengers as a flow from the pick-up point to the drop-off point; then, the pick-up point and drop-off point can represent the outflow and inflow of the region, respectively. The outflow, inflow, net flow (
) and net flow ratio (
) features can be extracted from the taxi trajectory data [
43,
57]. The construction of the time series is flexible. We can not only aggregate the data into a week or a day [
41,
42] but also divide a week to obtain greater detail, such as distinguishing human activity patterns on weekdays and weekends [
58] and distinguishing human activity patterns on normal workdays, Fridays, Saturdays and Sundays [
51]. In addition, the interval of time series can be set as needed, such as 10 minutes or one hour [
59,
60].
The study area should be divided into various unclassified areas, but the division method can be flexibly selected, such as dividing based on grids or traffic analysis zones (TAZs). For each unclassified region, if
human activity features are extracted, the time series of each feature can be built as
where
is the unnormalized time series of feature
(
).
is the value of feature
over the period
.
is the dimension. Normalization of the time series is key to ensuring that the classification results correspond well to the land use types [
53]. Thus,
is normalized using Z-score.
where and are the mean and standard deviation of the time series of feature , respectively.
2.2. Constructing the Membership Matrix
The time series of each feature are clustered by the FCM algorithm after the construction of time series. The FCM algorithm is chosen because it introduces fuzzy partitioning and membership degree theory in clustering, which allows the unclassified area to simultaneously belong to different land use types [
61]. At the same time, the FCM algorithm has a solid theoretical foundation and broad applications [
62,
63,
64].
Given the time series of feature
, the FCM algorithm returns a list of cluster centers
and membership degree
.
is the cluster center
(
), and
is the membership degree of the unclassified area to cluster center
.
is the number of clusters, which is set to the number of land use types that can be obtained from the land use data of the study area. For each unclassified area,
satisfies the conditions in Equation (3).
To construct the membership matrix for each unclassified region, cluster centers need to be matched with land use types. In this case,
can represent the membership degree of the unclassified region to land use type
when the feature
is utilized for land use classification, and the membership matrix
can be constructed using the membership degree in the clustering results of each feature.
The following steps are achieved to match the cluster centers to land use types. First, the land use data of the study area are divided into a training set and a test set at a ratio of 3:1. The same land use type occupies the same proportion in the two sets. If there are
unclassified areas in the training set, and the number of unclassified areas belonging to land use type
is
, then
. Next, the center of land use type
is calculated according to Equation (5).
where is the time series of feature extracted from the unclassified region belonging to land use type . Finally, the land use type of each cluster center is determined by locating the minimum distance between and when the feature is applied to land use classification.
2.3. Determining land use types
In this step,
features are considered evaluation indices, and
land use types are considered remarks. The evaluation of the unclassified area based on each index constitutes the fuzzy evaluation matrix in the FCE method, and the membership matrix
built in
Section 2.2 is used as the fuzzy evaluation matrix to determine the land use type of the unclassified area. The FCE method uses membership degree theory and can comprehensively evaluate objects affected by multiple factors [
65], and it has been applied to address classification uncertainty in mineral prospectivity mapping, quality analysis and other fields [
66,
67,
68]. The model of the FCE method can be expressed as
where is the evaluation result, and is the weight set. and satisfies . is the weight of feature . is the number of features. is an operator.
The operators often employed in the FCE method include
,
,
and
. We calculated Equation (6) with
because it can make full use of all evaluation information, and it is a relatively ideal operator [
69]. Thus, the evaluation result can be acquired according to Equation (7).
where
is the membership degree of the unclassified region to land use type
. The land use type is determined according to the principle of maximum membership [
70], so the land use type of the unclassified region is set to the land use type corresponding to the largest
.
The weight set
is determined by a training process, and the training set has been divided in
Section 2.2. If
is the
nth unclassified area in the training set,
is the classification result of
, and
is the real land use type, the weight set can be acquired by minimizing the objective function of Equation (8).
where is the number of . is an indicator function, and when is correctly classified. Otherwise, .
4. Discussion
Combining human activity features can help increase the accuracy of urban land use classification [
71]. However, feature combination in many studies is limited to two features, and few studies have effectively combined multiple human activity features [
50,
51]. The framework proposed in this paper can provide more information for land use classification by combining multiple human activity features based on the FCM algorithm and the fuzzy comprehensive evaluation (FCE) method, which can obtain better accuracy than the method fusing two features. The classification results in Nanjing indicated that the framework can effectively integrate the outflow, inflow, net flow and net flow ratio features of taxi trajectory data and achieve an accuracy of 0.858 (Kappa = 0.810). At the same time, the framework had higher accuracy than the OI_EM method (OA = 0.720, Kappa = 0.632) that integrated the outflow and inflow features. Therefore, the proposed framework can effectively integrate multiple human activity features and improve land use classification.
In the framework, features participated in the classification process with different weights. The weight sets of feature combinations (
Table 5) show that the weights for the inflow feature were larger than those of the outflow feature in all the experiments. In Experiments B and D, using the net flow feature, the weights of the net flow feature exceeded 0.300. To determine the reasons for the different feature weights, we input the four features into the proposed framework, respectively. The classification accuracies of the features in
Table 6 show that each feature can be used to identify urban land use types. This finding agrees with previous studies that we can inform the social function of urban areas by using passengers’ pick-up/set-down dynamics [
26,
52]. The four features had different land use classification accuracies. The net flow feature had the highest accuracy, which was followed by those of the inflow and net flow ratio features, whereas the outflow feature exhibited the poorest accuracy. Therefore, features with high accuracies, such as the inflow and net flow features, almost had large weights in the feature combinations.
By comparing the classification accuracies of features (
Table 6) and feature combinations (
Table 2 in
Section 3.2), the feature combinations were found to have higher accuracies than each feature fused in the combinations, such as Experiment D, which fused the four features (OA = 0.858, Kappa = 0.810) and had a higher accuracy than the outflow (OA = 0.563, Kappa = 0.440), inflow (OA = 0.691, Kappa = 0.593), net flow (OA = 0.741, Kappa = 0.655) and net flow ratio features (OA = 0.636, Kappa = 0.517). This indicates that fusing features based on the proposed framework can increase land use classification accuracy.
The accuracies of land use types (
Table 3 in
Section 3.2) show that the framework produced higher classification accuracies for residential and industrial land than for commercial land, open space and others. To determine the reasons, we drew the centers of the land use types and the cluster centers of each feature in
Figure 8. Regarding residential and industrial land, the cluster centers are similar to the centers of land use types, and peaks of these curves are obvious. Regarding commercial land, open space and others, the cluster centers are different from the centers of land use types, and the characteristics of these curves are not prominent. At the same time, the curves of open space and others have small differences. Thus, it is difficult to correctly identify commercial land, open space and others.
In addition, some factors affecting the classification accuracy of the framework need to be discussed. First, the FCM algorithm was utilized to cluster the time series of each feature, and its classification accuracy has an important impact on the framework. At present, many studies have improved the FCM algorithm [
72,
73] and using the improved FCM method, therefore, can help increase the classification accuracy of the framework. Second, the study area was divided into 500 m × 500 m cells in this study, but this division method cannot guarantee the continuity of land use types in cells. The land use types in the cells at the junction of different land use types are highly mixed, which makes it difficult to identify the land use types of these cells. At the same time, we mapped the land use data to cells by calculating the proportion of land use types in the cell and assigning the land use type with the largest proportion to the cell. Highly mixed cells also affect the determination of real land types. The framework supports different division methods, so choosing a better division method can help increase the classification accuracy of the framework.
5. Conclusions
Big data records human activities in urban areas and enables us to infer land use types by considering collective activity features. Fusing different human activity features can provide more information on land use classification. In this study, we proposed an integrated framework to combine multiple features for land use classification. By clustering the time series of each feature with the fuzzy c-means (FCM) clustering method, a membership matrix reflecting the fuzzy relationship between features and land use types was built for each unclassified area. The fuzzy comprehensive evaluation (FCE) method was used to determine the land use type of each unclassified area based on the membership matrix. The results of the case study in Nanjing indicated that the proposed framework can effectively fuse different features and increase the accuracy of land use classification. When applying the outflow, inflow, net flow and net flow ratio features to land use classification, the classification results achieved an overall accuracy of 0.858 (Kappa = 0.810). For all the land use types, the producer’s accuracies were higher than 0.735, and the user’s accuracies exceeded 0.700. Both the producer’s and user’s accuracies for residential and industrial land were higher than 0.870.
The human activity features fused in the framework can come from different data sources, and we can build a membership matrix for each unclassified area based on multisource big data. Some studies have applied multisource data to land use classification and building function identification [
74,
75,
76], so integrating multisource data contributes to land use classification. In the future, we will explore the performance of the framework in fusing multisource data for land use classification.