A Deep Learning Approach to Urban Street Functionality Prediction Based on Centrality Measures and Stacked Denoising Autoencoder

: In urban planning and transportation management, the centrality characteristics of urban streets are vital measures to consider. Centrality can help in understanding the structural properties of dense trafﬁc networks that affect both human life and activity in cities. Many cities classify urban streets to provide stakeholders with a group of street guidelines for possible new rehabilitation such as sidewalks, curbs, and setbacks. Transportation research always considers street networks as a connection between different urban areas. The street functionality classiﬁcation deﬁnes the role of each element of the urban street network (USN). Some potential factors such as land use mix, accessible service, design goal, and administrators’ policies can affect the movement pattern of urban travelers. In this study, nine centrality measures are used to classify the urban roads in four cities evaluating the structural importance of street segments. In our work, a Stacked Denoising Autoencoder (SDAE) predicts a street’s functionality, then logistic regression is used as a classiﬁer. Our proposed classiﬁer can differentiate between four different classes adopted from the U.S. Department of Transportation (USDT): principal arterial road, minor arterial road, collector road, and local road. The SDAE-based model showed that regular grid conﬁgurations with repeated patterns are more inﬂuential in forming the functionality of road networks compared to those with less regularity in their spatial structure.


Introduction
Urban commutes using automobiles are important daily tasks for most city dwellers. The urban streets system is a vital network that connects places and people within and across urban areas. The urban street system can be effectively modeled as a network using graph theory, and the commutes become network-constrained movements [1] with each section of the network responsible to move traffic towards the destination. Having information about the elements of the network with particular objectives is very important for planners and engineers. These objectives range from long-distance adequately model SFC. To consider centrality measures in urban networks, patterns of streets and roads are considered based on graph theory, which helps to extract spatial topology attributes of streets. The goal of centrality measurement is finding the most important central places in the network and their attributes that play a pivotal role in monitoring the efficiency and accessibility of transportation networks [21][22][23][24]. The centralities can help analyzers understand complex networks more effectively. Technically, centrality measurements inspired our work, where they are used to help domain users explore urban transportation data and provide initial important features from road networks to do a higher-level of analysis in the urban transportation network [25,26]. Some experimental studies demonstrate the relationship between the spatial configuration of city structure and traffic in city streets [6,9,23,[27][28][29]. Along with other studies, a great deal of research has tried to unveil the movement patterns of individuals by examining the structural features of street networks. Many parameters including the network's geometrical features, driver movement behavior, and the spatial distribution of urban land uses are influential in traffic distribution. Furthermore, research has shown that these parameters are influenced by the network's spatial structure. For instance, the spatial structure of the urban network has a high influence on human behavior and the demand for making intra-city trips [6,30,31]. The first time a Self-Organizing Map (SOM) neural network was used in city network generalization was by Kohonen [32]. Kohonen, in his network, considered the topological, geometrical, and semantic features of the streets. Jiang and Harrie [1] utilized the SOM neural network for clustering the streets based on their features. In 2012, Zhou [33] applied SOM and Back-Propagation Neural Network (BPNN) for urban network generalization based on the city plan before city design. Their results showed improvement compared to those achieved with SOM alone.
Deep multilayer neural networks have many nonlinear levels of learning allowing them to compactly learn highly nonlinear and highly-varying functions, and have found a lot of attraction in various earth science research areas [34][35][36][37][38], particularly in urban planning problem analysis [17,39,40]. Lv et al. [40] started applying deep neural networks known as a Stacked Autoencoder (SAE) for traffic flow prediction on a big dataset. In their implementation, they achieved over 93% for traffic flow prediction. They compared their model with some traditional techniques such as Support Vector Machine (SVM), Random Forest (RF), and Back-Propagation Neural Network (BPNN), and SAE [41] achieved better results. In another research, inclement weather condition information in connection to cars is added to input features for a deep learning model for traffic flow prediction [42]. They were looking for a correlation between weather conditions and traffic flow prediction. The ability of deep learning to process big data [34,43,44], considering more correlation between datasets and solving the complexity and nonlinearity of datasets, inspired us to rethink analyzing the functionality of streets. In our work, due to having a tabular structure in the input data, a greedy layer-wise deep learning model is applied for unsupervised feature learning to understand the nonlinearity of input data and then logistic regression is used as a classifier. The SDAE model, as one of the best greedy layer-wise unsupervised models, was used to solve the problem of training deep networks [45]. For evaluating model performance, we compare several machine learning models such as logistic regression, Multi-Layer Perceptron (MLP) [46], SVM [47], and RF [48], with SDAE on four cities with different spatial structures. The four cities used are Tehran, Iran; Isfahan, Iran; Enschede, Netherlands; and Paris, France, and Figure 1 shows their spatial structure.
Our study proposes a method that can investigate the basic concept of a network's manner, that is, the spatial structure of a street network. First, we prove that we can extract the functionality of a street with the spatial measurement within an acceptable percentage, and then we suggest the proposed classification method for classifying streets that do not belong to a specific functional class. In this work, first, the USN is modeled as a network using graph theory [1], then the spatial structure of the USN is described using nine centrality measurements to better understand the complexity of the network to SFC. To utilize SFC, the powerful nonlinear learning model called deep learning has been applied to take advantage of its ability to understand the complexity of the USN and street functionality. Our study investigates the structural properties of each functionality class in the real world. The major contributions of this paper are summarized as follows.

•
Considering the challenge of street functional classification based on the spatial structure of streets, mainly centrality measures.

•
Developing an unsupervised deep learning model to improve the accuracy of the street functional classification compared to traditional techniques.

•
Analyzing the importance of each centrality measure into street functional classification by using random forest technique.

•
Investigating the impacts of the street network regularity on street functional classification.
In this study, we choose a Stacked Denoising Autoencoder (SDAE) [41], as it is one of the best greedy layer-wise, unsupervised learning models [45,49]. Although our input data is labeled, we use the SDAE to learn features and weights in an unsupervised, greedy, layer-wise manner, while the supervised fine-tuning is used to further adjust the network's weights for classification using the labeled data. For fine-tuning classification, we applied logistic regression, but other techniques would work. We compare our deep learning model with four traditional machine learning models.
The rest of the paper describes our work in detail, with Section 2 explaining SDAE and describing the centrality measures we used. Section 3 presents the methodology of our evaluation of our implementation using four different data sets. Numerical results from our experiments are discussed in Section 4, and Section 5 presents our conclusions.

Materials and Methods
This section explains street network modeling based on graph theory, and extracting the centrality measures used in this research for training the model. Additionally, our SDAE deep learning model is discussed. Figure 2 is a schematic of the whole process of SFC based on centrality measures using a deep learning model. At the first stage, the USN for 4 different cities studied in this work is modeled using graph theory. Then, 9 centrality measures for all cities are calculated, and their functional classes are extracted from the database. The SDAE deep learning model is then applied to SFC. Furthermore, the results are compared to traditional machine learning models. Finally, the importance of each centrality measure is considered based on the random forest technique, also the impacts of street network regularity on SFC based on mixing ratio are discussed.

USN Modeling Using Graph Theory
To model the urban street network, first, the exact concept of each street type should be defined, and second, the mathematical model for network representation should be considered. There are three methods for defining the basic element in a street network: axial line [50], segment streets, and stroke [51]. Axial lines were used in space syntax theory for modeling streets. An axial line represents the longest channels people move through in a city. Another concept used to define a street is segment streets, the connection between two intersections in the street network. Stroke is another way to define the concept of streets. The original idea of building a road segment into a stroke was proposed by Thomson and Richardson [51]. The basic principle was very simple: "Elements that appear to follow in the same direction tend to be grouped", which follows the "principle of good continuation" theme in a visual perspective [33]. In this process, a simple geometric criterion, the deflection angle which is the deviation from 180 degrees of the angle between two road segments, was employed as a criterion for judging which two road segments should be concatenated. It was suggested to make use of a small deflection angle between 40 and 60 degrees [52] as a threshold to ensure that all the strokes follow the principle of good continuity. Moreover, in comparison with other existing methods to model the street entity, stroke usually provides better results in predicting movement patterns of individuals in urban networks [33] and is suited for traffic management and scheduling in urban networks [1].
The process of building strokes starts from an arbitrary road segment. When reaching an intersection with at least two other road segments, it is necessary to decide which one to connect to with three potential strategies: self-fit, self-best-fit, and every-best-fit [1]. The every-best-fit strategy considers every pair of road segments for comparison and selects the pair with the smallest deflection angle for concatenation. Optimum results will be obtained by using every-best-fit, as this strategy considers all possible concatenations at each intersection (for more information refer the work in [33]). The street network can be represented by a connectivity graph, consisting of vertices and edges. There is a representation of the street network based first on a primal graph, where intersections are turned into nodes and streets into edges. In a second step, a dual graph is used, where streets are nodes and intersections are edges. In our study, we use the every-best-fit method to construct strokes and also dual graphs to present street entities considering the direction of the streets. Moreover, the streets are given some weights which are determined in proportion to their lengths in the real world. Results of using this method are shown in Figure 1.

Centrality Measures
In order to analyze the spatial structure of a network, there is a need for some measures to quantify the structural features of each street in those networks. The quantification measures used to evaluate the structural features of networks are known as "centrality measures". In this study, a total of nine measures are used to evaluate the structural importance of each street. In order to assess the structural importance of a street, it is necessary to make use of more than a single measure. As a single measure considers the importance of the street from a single point of view, utilizing a variety of evaluation measures can help us look at the problem space from different aspects. The measures used in this study are discussed below.

Betweenness Centrality
Betweenness centrality (C B ) measures how often a node is traversed by the shortest path connecting all pairs of nodes in the network. C B , for node i, is defined by Equation (1): where n jk is the number of the shortest path between nodes j and k, N is total number of nodes, and n jk (i) is the number of the shortest paths which contain node i. Generally, nodes with higher betweenness are more involved in directing and transferring flow in the network, so they play a significant role in node communications [53]. In this study, betweenness centrality is used to identify those streets which have a bridging role between different topological shortest paths. According to the definition, this measure seems to be good for detection of high-traffic or Arterial streets. Figure 3 depicts all centrality measures include betweenness (Figure 3a) centrality for the Tehran, Iran USN.

In/Out-Degree and Weighted In/Out-Degree
Based on the definition by Freeman [54], the degree of a focal node i(deg i ) is the adjacency in the network, which means the number of directly connected nodes to the focal node i: where i is the focal node, j is all other nodes, N is the total number of nodes, and a is the adjacency matrix. To use this measure in a directed and weighted network, it is necessary to consider the direction and the weights of the connection. Regarding the direction of the connection, this measure is divided into two measures: indegree, deg in i , and outdegree, deg out i . Indegree is the number of connections leading to a given node, and the outdegree represents the number of connections that can be accessed from that node. The degree has generally been extended to the sum of weights when analyzing weighted networks [55]. Regarding the weights of the network, this measure is divided into two measures: Weighted indegree, wdeg in i , and Weighted outdegree, wdeg out i (i.e., the sum of connection weights leading to a given node or can be accessed from that node), where w is the weighted adjacency matrix if the node i is connected to node j, and when w ij is greater than 0, there is connection between nodes i and j. In the urban street network, the degree of each street indicates the number of streets that directly access that street which can measure the accessibility of the streets in the urban network. To use this measure in the weighted and directed street network, the direction and weight of streets (length) should also be taken into account. Figure 3b-e shows the indegree, outdegree, weighted outdegree and weighted in-degree centrality measures for Tehran.

Clustering Coefficient
Local clustering coefficient C Lcc refers to the probability of having two adjacent nodes that are connected. This is calculated as the number of present connections over the number of possible connections between the node's neighbors. Therefore, the outcome ranges between 0 and 1: 0 if no connections exist between the neighbors and 1 if all possible connections exist [6], where Na ij is the number of actual connections and N p ij is the number of possible connections between nodes i and j. The existence of a connection between adjacent nodes means they can send and receive network flow directly without any need for intermediation. A street with a high clustering coefficient means that its neighboring streets have direct access to each other. As a result, there is no need to cross that street to reach others, so the traffic decreases. Conversely, if there is no direct connection between the neighbors of a street, that street has a more substantial role in passing people to their destination. Figure 3f shows the clustering coefficient centrality measures for Tehran.

Weighted Average Centrality Rank (WACR)
This measure is developed to evaluate the amount of control each node has on the network flow. A higher value of this measure shows a more important status of the node inflow transmission in the entire network [20].
where w is is the ratio of the degree at node s to the sum of degree at all i's adjacent nodes, and m is the size of the set N i . Figure 3i shows the WACR centrality measures for Tehran.

Page-Rank Centrality
PageRank is a key technology behind the Google search engine that decides the relevance and importance of individual web pages. Its computation is done through a web graph in which nodes and links represent individual web pages and hotlinks [56]. The web graph is a directed graph, i.e., a hotlink from page A to B does not imply another hotlink from B to A. The basic idea of PageRank is that a highly ranked node is one that highly ranked nodes point to [56], a recursive definition. PageRank is used to rank individual web pages in a hyperlinked database. It is defined formally as follows [6], where n is the total number of nodes; ON(i) is the outline neighbors (i.e., those nodes that point to node i ); PR(i) and PR(j) are rank scores of nodes i and j, respectively; n j denotes the number of outline nodes of node j; and d is a damping factor, which is usually set to 0.85 for ranking web pages.
In the urban streets network, this measure means that if a person randomly chooses routes in the urban network, the streets with high page-rank are more likely to be those passing routes. Figure 3h shows the page-rank centrality measures for Tehran.

Closeness Centrality
Closeness is defined as the inverse of fairness, which in turn, is the sum of distances to all other nodes [54]. The intent behind this measure was to identify the nodes which could reach others quickly. This measures to what extent node i is near to all the other nodes along the shortest paths and is defined by Equation (9): where d ij is the shortest path length between i and j, defined, in a valued graph, as the smallest sum of the edge length throughout all possible paths in the graph between i and j. Figure 3g shows the closeness centrality measures for Tehran.

Regularity Measurement: Spatial Configuration of an Urban Street Network
The spatial configuration of an urban network is quite varied in different cities because they are constructed at different times and in different contexts. Social, cultural, and political factors affect the configuration and arrangement of the streets in different cities. In some cities, the configuration of roads is well disciplined and follows a uniform pattern, while in some others it is chaotic and no specific pattern can be found in them. To asses different configurations, we need a quantification measure, because two visually different configurations may be similar from the structural regularity point of view. Mixing rate is an algorithm to determine the level of structural regularity in a network.
To explain the mixing rate, consider a person who walks randomly in a network, during this random walk, they may visit some new nodes or pass nodes seen before. After taking several steps, the frequency of crossing a given node decreases until it converges to a constant value. The frequency of crossing a node is calculated as the number of times a walker passes a node divided by the total number of moves. It is proved that this frequency is in proportion to the node's degree and is calculated as in Equation (10) [57]: where deg(i) is the node's degree and E is the total number of links. The rate of reduction that occurs in the frequency of crossing for network nodes is a measure used to evaluate the structural regularity level of a network [58]. This measure is defined as Equation (11): where η is the minimum of upper boundaries or supremum. According to the definition, the supremum of the set S, which is a subset of A, if it exists, is the smallest quantity that is greater than or equal to each member of set S. The mixing rate is a measure between 0 and 1 which is determined based on the level of regularity in spatial structures.

Stacked Denoising Autoencoder
Autoencoders are a specific type of feedforward neural network with symmetric structure, which consists of one input layer, one hidden layer, and one output layer [59,60]. The middle layer, called the bottleneck, is trained to reconstruct the inputs as closely as possible. The rule of the bottleneck is compressing the input data into a lower-dimension code to force the autoencoder to learn the most informative features (latent features). The autoencoder has three functional sections: encoding to encode the input predictors and learn latent features, decoding to reconstruct the representation by using the latest features, and a loss function to calculate the errors of reconstruction [59,60]. Figure 4 gives an illustration of an autoencoder. In the autoencoder, both the encoder and the decoder are fully-connected feedforward neural networks, and the size of the input features and the reconstructed features should be the same. However, the bottleneck layer is the heart of an autoencoder and changing it allows one to manipulate the architecture and enhance performance [59,61]. Generally, the number of nodes per layer is the most important hyperparameter for autoencoders, mostly for the bottleneck layer. The smaller the size is compared to the number of input predictors results in more meaningful representation. Backpropagation is a technique often used for training autoencoders. In the following, formulas (12) and (13) discuss the basic structure of an autoencoder. The simple autoencoder consists of an encoder and decoder defined by two weight matrices and two bias vectors: where f and g are the encoding and decoding functions, respectively, determined by the weights w and biases b, and s 1 and s 2 denote the activation functions, which usually are nonlinear. The activation function receives several inputs from the preceding layers, computes the weighted sum of these inputs, and produces the outputs based on its function. The Sigmoid function is the most common activation function for autoencoders [59]. The objective of an autoencoder is to optimize w and b so as to minimize modeling error. The two most common optimization techniques used for autoencoders are cross-entropy [49] and minimum mean square error [62]. Backpropagation is used to update weights and biases to minimize the reconstruction error.
As discussed before, one of the ways to enhance the performance of an autoencoder is by adding more layers, creating stacked autoencoders (SAE). The SAE is an unsupervised deep learning model with several layers to better model the complexity of input data. A method to force an SAE to learn useful features is adding random noise to its inputs and making it recover the original noise-free data.
This way, the autoencoder cannot simply copy the input to its output because the input also contains random noise. This is called a Stacked Denoising Autoencoder (SDAE) [40,59].
A SDAE enjoys all the benefits of any deep network such as greater expressive power. This technique is a greedy-wise layer training model and was first proposed by Hinton [45]. A common algorithm used for optimizing weights and biases in autoencoders are stochastic gradient descent (SGD). At each step, the gradient of the objective function concerning the parameters shows the direction of the steepest slope and allows the algorithm to modify the parameters to search for a minimum of the function. Moreover, to compute the necessary gradients, backpropagation is applied [40,59]. To use the SDAE network for street functionality classification, we need to add a standard classifier as the top layer. In this paper, we put a logistic regression layer on top of the network for supervised street functionality classification. The SDAE plus the classifier comprise the whole deep architecture model illustrated in Figure 4.

Results
This section describes how we compared the results of our proposed model with four other machine learning models. We describe the datasets we used, the metrics employed, the models we compared with, how we tuned those models, and finally the results of street functionality classification.

Data Description
To extract the centrality measures, the main axes of all roads are modeled using the street segment method. This means that the road between two consecutive junctions is considered a separate segment in the network. The modeled street segments were used to construct the strokes. These strokes were constructed using the every best fit method. Street directions were also considered in designing the strokes. In this study, the datasets for four different cities are utilized. The streets for each city are grouped into four classes based on the functionalityL Principal Arterial road (PAr), Minor Arterial road (MAr), Collector road (Cr), and Local road (Lr). For each street, nine different centrality features are determined, and the statistical information of all nine centrality measures has been summarized in Table 1. After preprocessing the data, the structural feature vector is computed for each stroke, then the vectors are normalized to the range of [−1,1]. As a single stroke consists of several street segments with different functionality in the real world, the structural measures of each stroke are assigned to all constituent street segments. Through all of the different machine learning model implementations, 75% of measures are used for training and 25% for testing.

Algorithm Set-Up
We chose four common machine learning models, namely, Logistic Regression (LR), MultiLayer Perceptron (MLP), Support Vector Machines (SVM), and Random Forest (RF), to compare to the proposed SDAE model for use in SFC. Machine learning and deep learning classifiers usually have parameters that need to be set by the user, known as hyperparameters. Hyperparameter tuning involves choosing values for these hyperparameters that result in the optimal performance of a model. The hyperparameters used for each of the traditional machine learning models and our SDAE model are given in Table 2. The hyperparameter values are selected based on the grid-search technique [63] with cross-validation [64]. Each hyperparameter is given a list of discrete values, and for each combination of different hyperparameters, a 5-fold cross-validation training is employed. The data is broken into five equal parts, and each part is used as testing data, while the rest of the data (the other four parts) is used for training, giving 5-fold cross-validation. The set of hyperparameters that achieve the highest classification accuracy are chosen as the hyperparameters to use.
Overfitting is a common problem in all machine learning and deep learning models. Overfitting means the model performed very well on training data but not on test data (unseen data). Regularization is a technique to generalize a model and, in turn, improves the model's performance on the unseen test data to ameliorate the overfitting problem. Regularization has the same effect on machine learning and deep learning but in different ways. In machine learning, regularization penalizes the coefficients, but in deep learning, it penalizes the weight matrices of the nodes [45]. To design the architecture of the SDAE model we used, a different number of hidden layers and the number of neurons for each layer were tested. The SDAE is trained based on stochastic gradient descent for 75% of a dataset, and the experiment is repeated 20 times to assess the variability of the process. The trained model is finally evaluated on the test data, the remaining unseen 25% of a dataset. In this experiment, we vary the number of neurons in the hidden layer from 1-20 to determine the optimum number and also vary the number of hidden layers from two to three. Based on the 95% confidence interval and standard error estimation for 20 iterations, we end up with the optimum 10 neurons for the first layer and five neurons for a second or bottleneck layer. Generally, for an SAE model there are three ways to control overfitting: adding a penalty term in the loss function (regularization term) to control the weights, adding noise to the input data to force the SAE model to learn more informative features, and sparse decoding by deactivating some of the nodes in each layer randomly. To avoid overfitting, 0-30% noise is added to the input data (only the first layer) to satisfy the stacked denoising autoencoder model, and based on the results, 10% was the optimal value. Moreover, for the regularization term, 0.001 is chosen based on the grid-search technique between the values in the range of 0 to 0.0001 to avoid growing the weights in the deep learning model and overfitting. LR is a common benchmark machine learning model and is the one simplest used in this work. The regularization term used in the LR model was tested on the range (0.1-10) based on a grid-search strategy to find the optimal value, with 1 chosen. Moreover, to optimize the LR model two different techniques have been tested: SGD and Limited memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) [65]; LBFGS was the best optimization technique to use in this work. To configure the MLP model, the number hidden layers, the number of neurons per layer, the type of activation function, the optimization technique, and the learning rate are tested in various combinations to find the best configuration for the MLP model. For the LBFGS optimization technique, three hidden layers and 100 neurons for each of them activated by the ReLU activation function were the best hyperparameters for the MLP model. For SVM, an RBF kernel is tested as one of the best kernel methods in kernel-based machine learning models. The values for C, a regularization term, and Gamma as hyperparameters for the RBF kernel were found based on the grid-search strategy: 1 and 10 for C and Gamma, respectively. In this work, RF as one of the best ensemble models has been applied. In the RF training process, there are a two methods to control overfitting: limit the number of nodes per leaf or limit the depth of trees. Based on a grid-search, the optimal values for both of these two hyperparameters for RF were 5 for maximum depth and 4 for the minimum number of nodes per leaf.

Experimental Results
Our proposed SDAE model along with four machine learning models were used to classify the functionality of network streets in four different cities. In this section, the results for all models are shown and discussed. To consider the influence of the amount of input data for different machine learning algorithms and the convergence of the training process, the learning curves for the different models are provided. Each of the input predictors, or the input features, plays a different role and possess a different level of importance on the final performance for a model. In this section, the importance of each input predictor has been considered and discussed. Moreover, the impact of the regularity of the network street and the relationship between regularity of each city on the results of classification is discussed.
Accuracy assessment is an essential step in evaluating the performance and efficiency of different classifiers. In this section, the results of the different implemented models on real data are calculated. To evaluate the performance of the different classifiers, we look at the confusion matrix, R 2 , and Root Mean Squared Error (RMSE). A confusion matrix is a table often used to describe the performance of classification techniques (for more information about the confusion matrix refer to the work in [66]). Overall Accuracy (OA) based on this model is the proportion of the total number of correct predictions. The F-measure metric is based on precision (P) and recall (R) (P is the proportion of the predicted positive cases that were correct and R is the same as true positive), and is calculated as RMSE [67] is regularly employed in model evaluation studies. RMSE provides a complete interpretation of the error distribution in the range of [0, 1], where values close to zero are better. Moreover, to evaluate the best prediction R 2 is used [68], the range of this metric is between [0, 1], where values close to 1 is better. For the rest of the paper, OA for training data is denoted as "OA-Tr", and OA for the testing dataset as "OA-Te". Moreover, the F1 score for PAr, MAr, Cr, and Lr classes is denoted as "F1-PAr", "F1-MAr", "F1-Cr", and "F1-Lr", respectively.
The classification results of our implementation machine learning and deep learning models are presented in Tables 3-6 for four cities. The results for all datasets reveal that machine learning and deep learning can predict and classify the functionality of the street only based on structural properties of streets. As the results are shown in Tables 3-6, SDAE has been able to produce more than 90% accuracy of prediction for the training and test sample streets of cities, which are higher than all the proposed models in this experiment.
The overall accuracy for test datasets (OA-Te) is 92%, 89%, 92%, and 92% for Isfahan, Enschede, Tehran, and Paris, respectively. The results help to infer that the spatial structure of streets plays an important role in forming the functionality of urban roads, because the functionality of streets is detectable by only using structural and spatial properties. Based on the results of the F1-score proposed for each class of street functionality, it can be stated that every functionality class possessed a specific structural pattern that was distinguishable. The graph depicted in Figure 5 shows the prediction results of the SDAE model for testing sample datasets of each class of street functionality. As shown in the graph, the majority of errors in functionality classes occurred in the most similar classes. Thus, misclassification mostly happens between classes which are similar, rather than between those structurally completely different. For example, misclassification will happen for Cr as Lr classes are structurally and spatially similar and the same for the two classes of primary roads. Figure 5 reveals that the accuracy to detect the class of PAr is higher than MAr and more detectable because spatial properties of MAr's is very close to PAr, and the same analysis for Cr which has received lower accuracy than Lr.

Discussion
In addition to SFC, the impact of regularity of cities on classification, importance of each centrality measure, and the role of deep learning for such a big data processing have been discussed in this section.

Regularity of Cities and Its Influence on Classification Results
Evaluating the effect of structural patterns on forming the functionality of roads in the real world is very important. It appears that the spatial configuration and structure significantly effects how these patterns are distributed. Different regularity levels in network structures have different impacts on shaping the functionality of urban roads in the real world. To investigate this idea, the level of structural regularity for all cities has been calculated using mixing ratio Equations (10) and (11). Moreover, the classification results are compared with the level of regularity in each network. Figure 6 shows the level of structural regularity (mixing rates) of the cities in the X-axis and their total classification accuracy in the Y-axis. Based on the results provided in Figure 6 for the overall classification mixing rate for all different cities, it can be concluded that the hierarchical nature of roads in urban areas can also be revealed based on their structural features. According to the results, it can be inferred that in the networks with regular configuration (where the same pattern is repeated in the network), there is a strong relationship between the functionality of a road and its structural properties. In other words, in regular networks like Tehran and Isfahan, which proposed a higher mixing rate and overall accuracy, the spatial structure showed great influence on forming the class of functionality in real-world streets. On the other hand, for cities in this study in which the arrangement of streets is less regular, like Paris or Enschede with less than 72% mixing rate and less than lower overall accuracy, the existing spatial structure has a lower influence on shaping the street functionalities leading to weak structural patterns in each functionality class. In conclusion, for the case study in the present research, the level of structural regularity in the spatial configuration of urban networks is an effective factor in forming the functionality of urban roads, although more investigation is required in other cities. Figure 7 presents the importance of the contribution of each feature in the classification for all four test datasets; the horizontal axis indicates the feature and the vertical axis is the percentage of importance. The most important feature for each city is the highest column in the graph. The importance of each feature is calculated based on the concepts of the random forest, as the node impurity weight is decreased by the probability of reaching that node. The node probability is calculated by the number of samples that reach the node, divided by the total number of samples. The higher the value, the more important the feature [69]. Concerning the results in Figure 7, for Paris and Enschede, betweenness is the most important feature, and for Tehran and Isfahan, it is one of the most important features. Generally, this means betweenness is the most important feature in the prediction of street functionality and even the movement patterns. This result reveals that the shortest path plays a pivotal role in finding the destination for people. More crowded streets contribute more to the shortest path in the network, which is the reason for all cities having betweenness as one of the most important features. Due to this fact, calculating the betweenness of streets results in better prediction of the amount of traffic. Calculating betweenness in the network can help to have a better design for new street construction.

The Importance of Each Centrality Features
After betweenness, weighted in and out-degree is the most important features mostly for Isfahan and Tehran. Street degree (in and out-degree) contributes to having more connections and more applicability in use for people to reach their destination. For Isfahan and Tehran with a higher level of regularity, the importance of betweenness is not the highest due to a high organization and regularity level of the street network and all streets have close contributions to carry traffic. In weighted degrees, streets with higher weights have more connection with Primary Arterial roads, which in such regular cities like Isfahan and Tehran contributed more in the prediction of traffic. Closeness for all the cities in our study has the lowest importance because people mostly driving in the shortest path to their destination, not the roads with higher accessibility and more connections (the closeness definition).

Deep Learning for Big Data Processing
In this fast-growing scientific world and the enabling of collecting big data, deep learning is playing an important role in big data solutions. Except for having large volume, the large variety and complexity characteristic of big data is a big challenge for shallow neural networks and traditional machine learning. In this work, we applied a large dataset for four cities with high complexity and nonlinearity. The RMSE metric based on linear regression for all datasets is 0.69, 0.72, 0.65, and 0.72 for Isfahan, Enschede, Tehran, and Paris, respectively. These results for linear regression RMSE point out the high nonlinearity of our datasets.
Unlike traditional machine learning and shallow neural networks, deep learning models take advantage of massive nonlinear functions which can learn representation and complexity of features. Most importantly, greedy layer-wise unsupervised learning models such as SDAE can learn latent features (the most important features of input predictors to feed into the classifier) at a higher presentation level. Tables 3-6 show SDAE outperforming LR and the shallow neural networks: SVM and random forest. In addition, LR is used as a classifier separately to show that the feature generated by SDAE in a higher level of representation are more separable than raw input data and based on our results provided in Tables 3-6, SDAE+LR worked better than LR as a supervised learning classifier. This is because the SDAE learns the features in a nonlinear manner based on a greedy layer-wise technique and generates new features that are combinations of the existing features, then LR classifies using these new features based on corresponding labeled data.
When we are using machine learning and deep learning models, we want to keep errors as low as possible. There are two major sources of errors for machine learning models: bias and variance. The amount by which the estimation varies as we change the training dataset is called variance, and the bias is the number of errors due to the assumption of linearity of the dataset. To consider the variance and bias of prediction by different models, datasets are split into separate training and validation sets. In this research, the 5-fold cross-validation technique has been used. The learning curve is the best technique to check these two sources of error, which have been depicted for the deep learning models in Figure 8 and all machine learning models for Isfahan in Figure 9. The horizontal axis is the training set size and the vertical axis is accuracy (it can be error instead). The learning curves shown in Figure 8 for the SDAE model show that this model has provided the best results for all cities tested with the best convergence. A good convergence means the algorithm can model the nonlinearity and complexity of the input features, and the algorithm does not need more training samples to understand the behavior of the input features to model.
In Figure 9, the learning curve for all machine learning models for Isfahan is depicted. The model MLP-3-100-RELU-LBFGS has been able to predict but still, there is a low bias problem by providing low accuracy. SVM provided a good convergence condition and good variance but low bias and weakness to predict the complexity and nonlinearity of features. RF models provided higher accuracy with high bias and low variance, but still, there exists a problem where there is a big gap in the middle and even end of two green and red lines, which means this model still needs more data for training the model to converge to a good error rate. Based on the learning curve for the logistic regression model, it is clear that this model has not been able to predict the complexity and nonlinearity of all training features very well. Because, first of all, the accuracy of prediction is not good so the rate of training error is high (low bias problem). Moreover, it seems that this amount of data is a lot for these models because there is no gap between two learning lines at the beginning (high variance problem, a wide area around the line), so these models could not predict the complexity of the input features, and, as we can see, the results are not good either.

Conclusions
There have been many studies conducted on understanding the role of spatial structures in the individual movement pattern. The current study provides two new perspectives on a spatial structural study in urban networks. In this study, nine different structural measures were used to examine the spatial structural effect on forming the functionality of urban roads. Different machine learning techniques, such as logistic regression, MLP, SVM, and random forest, alongside a deep learning model called Stacked Denoising Autoencoder, are applied to reveal the patterns existing in each functionality class of urban roads. To achieve this goal, a training set of street segments, defined by their feature vector and functionality class was fed to the different models.
The results show that with an acceptable accuracy provided by SDAE, it is possible to predict the functionality of streets just based on their spatial structural properties. This means that for each real-world functionality class, there exists a specific spatial structural pattern, and the structural properties of streets within a functionality class seem quite similar. It can be concluded that the spatial structure of urban networks is an effective factor in forming the role and importance of each street in the real world. In other words, the structural importance of some urban roads has caused them to be used more frequently than others, which in turn leads to some physical changes in the road's features to adapt to the high traffic demands. This consequently turns most of these roads to famous roads with high capacity, mostly known as arterial roads. On the other hand, some other roads, because of their spatial positions in city networks, are usually used by fewer passengers; this situation leads to these roads becoming access roads which are known as minor roads.
The classification results also presented a hierarchy that was interestingly similar to the road conceptual hierarchy in the real world. In other words, although this classification was performed by just applying structural properties, its results were arranged exactly in the same way urban roads did. This means that in all functionality classes predicted by machine learning and deep learning models, a majority of errors occurred in the most similar classes. It would be remarkable when we notice that the training dataset did not have any information about the order of functionality classes and the resulted hierarchy in classification has occurred just based on structural properties. Furthermore, the results showed that in regular networks, in which a spatial pattern is repeated in different parts of the city, the deep learning model was able to predict the real-world functionality class more accurately. It implies that in regular networks, there is a prominent spatial structure pattern in each functionality class in comparison with less regular networks. It means that in regular networks, the spatial structures and configurations have a higher impact on forming street roles in the real world. On the other hand, in less regular networks, the significance of spatial structure is reduced in forming street functionality, which in turn produces weak structural patterns in each functionality class in the real world. In conclusion, the level of structural regularity in urban networks is a key factor in forming functionality and the importance of streets in the real world. Furthermore, SDAE demonstrated that for processing big data with nonlinearity and complexity, deep learning models outperform all traditional machine learning and ensemble models.