A Framework to Analyze Function Domains of Autonomous Transportation Systems Based on Text Analysis

: With the development of information and communication technologies, the current intelligent transportation systems (ITSs) will gradually become automated and connected, and can be treated as autonomous transportation systems (ATSs). Function, which unites cutting-edge technology with ATS services as a fundamental component of ATS operation, should be categorized into function domains to more clearly show how ATS operates. Existing ITS function domains are classiﬁed mostly based on the experience of experts or the needs of practitioners, using vague classiﬁcation criteria. To ensure tractability, we aim to categorize ATS functions into function domains based on text analysis, minimizing the reliance on subjective experience. First, we introduce the Latent Dirichlet Allocation (LDA) topic model to extract text features of functions into distribution weights, reﬂecting the semantics of the text data. Second, based on the LDA model, we categorize ATS functions into twelve function domains by the k-means method. The comparison between the proposed function domains and the existing counterparts of other ITS framework demonstrates the effectiveness of the LDA-based classiﬁcation method. This study provides a reference for text processing and function classiﬁcation of ATS architecture. The proposed functions and function domains reveal the objectives in future transportation systems, which could guide urban planners or engineers to better design control strategies when facing new technologies.


Background
The current intelligent transportation systems (ITSs) will evolve towards connectivity and automaticity, paving the way for autonomous transportation systems (ATSs) to meet increasingly complex information in the future [1,2]. Through the logic of autonomous perception, learning, decision, and response, ATS will operate autonomously to satisfy the needs of users in the system, including travelers and managers. The development of technology has not only provided new services for the transportation system but also brought about a massive amount of data [3], which must be carefully managed. As a society evolves, the mobility demands of people and goods, as well as the mobility supply given by carriers and infrastructures, become more numerous and diverse. Such rapid expansion can greatly complicate efforts to supply the demand for transportation. Incremental pressures are placed on modern transportation systems to provide more intelligent and autonomous services to alleviate traffic congestion, optimize resource allocation, minimize user costs, and enhance service quality. ATS can accept increasingly autonomous and intelligent services due to the seamless connectivity of the next-generation network and its broad integration with developing technologies, such as big data, artificial intelligence, and federated learning. Consequently, operation costs can be decreased and service quality

Paper Contribution
This study proposes an analytical framework to construct the function architecture of ATS referring to ITS architectures. Considering that functions are characterized by text data, we introduce a topic model to extract features of functions. The features are clustered by k-mean and k-medoid algorithms, and the results are evaluated by silhouette coefficient. According to the topic of each cluster, we define function domains, which can further clarify the relationship among ATS elements. In all, the contributions of this study are as follows: (1) We propose a framework to obtain ATS functions by analyzing the requirements of ATS.
(2) We classify ATS functions into twelve function domains based on the LDA model. (3) The comparison between ATS function domains and those from traditional ITS architectures demonstrate the applicability of the proposed function domains.

Organization
The remaining sections of this paper are structured as follows. Section 2 provides a literature review of ITS architectures and text analysis, and indicates the research gap. Section 3 provides an illustration of ATS with a specific example. Section 4 presents data and methodology using a flow chart, and explains the methods in detail. Section 5 describes the results, including the silhouette coefficient of each cluster and topic of each domain. Section 6 discusses the results compared with ITS architectures in China, America, and Europe, which demonstrates the rationality of ATS function domains. Section 7 concludes the article and proposes future work.

Traditional Classifications of Functions
Traditionally, different countries and regions classify ITS functions based on different objectives and criteria, formulating distinct ITS frameworks [4]. The International Organization for Standardization (ISO) has established standards referring to different representative frameworks [5], such as the Architecture Reference for Cooperative and Intelligent Transportation (ARC-IT), Connected Vehicle Reference Implementation Architecture, and European ITS Framework Architecture. These ITS architectures have provided practical guidance in both the research field and engineering applications. However, there are still differences in some countries or regions, represented by ARC-IT. The United States proposed ARC-IT, taking advice from transportation practitioners, systems engineers, system developers, technical experts, and consultants. ARC-IT consists of four views: user view, function view, physical view, and communications view [6]. User view describes the organizations in the system and the relationships among them. Function view demonstrates function elements (also known as processes) and their logical interaction modes in the system. Physical view describes physical objects such as systems and equipment and their relations with functions. Communication view illustrates the set of protocols needed to communicate between physical objects.
The ITS framework of the European Union, known as the FRAME architecture, has been proposed since the 1990s, and it has been updated to version 4.1 [7]. The architecture contains a number of views, including functional view, physical view, communications view, organizational view, and information view. Regarding the functional view, the architecture arranges functions according to their complexity in a hierarchy. Besides this, the description of function domains consists of the functionality of domains and the links among domains. Specific function domains are shown in the discussion section.
The ITS framework in China is mainly composed of user services, logical framework, physical framework, and standards [8]. The designers define eight service domains based on ISO and then propose users' demand for ITS. The function set is divided into three levels: system function, process, and sub-process, which are used to illustrate the connection between the physical framework and the logical framework. The Chinese ITS framework serves as a foundation for the development of local transportation systems [9][10][11][12]. With the development of autonomous driving, Chinese scholars have proposed a corresponding framework, which integrates a cooperative vehicle infrastructure system (CVIS) into the ITS framework [13].
Based on ITS, many subsystems have been developed considering emerging technologies, such as vehicle tracking in ITS [14], accident management systems [15], and communication systems [16]. However, few studies focus on the whole transportation system, neglecting its adaptability to future technologies and requirements.
In addition, although the elements in different ITS framework are similar and function is the core among these, the traditional classifications of functions are still based on the experience of designers, which lacks specific criteria.

Text Analysis
Clustering is a traditional method to divide objectives into different categories without preset labels. Multiple clustering algorithms exist, including hierarchical clustering [17], division-based k-value clustering [18], density-based spatial clustering, grid-based statistical information clustering, and model-based latent class analysis [19]. The selection of clustering algorithms is commonly determined by the data types.
However, functions are characterized by text data, which could not be clustered by traditional algorithms, as it needs to extract text features to calculate the similarity. Latent Dirichlet Allocation is a three-level hierarchical Bayesian model [20,21], which has been widely applied to various fields of text analysis [22][23][24]. Guo et al. used LDA to make a cluster analysis of 266,544 user comments on hotels and to dig out the factors that have an important influence on hotel operation [25]. Tirunillai  reviews of 15 companies' products, mining potential information to develop business strategies [26]. After feature extraction by LDA, a weight distribution will be obtained to denote the original function text. Text similarity could be measured by different metrics, including Euclidean distance, Manhattan distance, Hamming distance, etc.
Inspired by the above studies, we introduce LDA to extract features of function text, which can measure the similarity among functions. In addition, we calculate the similarity of weight distribution obtained from LDA and utilize the k-value clustering algorithm to obtain function domains.

Autonomous Transportation Systems
Referring to the existing transportation system, ATS is composed of five elements, i.e., component, demand, service, function, and technology. Component consists of four parts: user, facility, vehicle, and environment. Produced by the users in the component, demand includes the activities in ATS. Service meets users' needs based on demand. Function is supported by technology and related equipment to implement services. These five elements work together to ensure that ATS can operate autonomously in the logic of autonomy and adapt to changes in technology and demand. To meet the requirements of the whole transportation system, elements will change significantly in both expression forms and operation mechanisms in different generations of technology.
To further illustrate ATS, we take a trip as an example, as shown in Figure 1. The components involved in a trip include users, vehicles, the highway, and the environment. The basic demand of travelers, the main users, is to get somewhere. The goals of this demand are diverse, including safety, convenience, efficiency, environmental protection, and economy. A series of services are provided to satisfy these needs, such as travel planning, automatic driving, facilities maintenance, and traffic management. To provide these services, ATS integrates technologies to support functions, including sensor technology, data calculation, and communication. Functions operate in the logic of perception, learning, decision, and response, which are the characteristics of ATS. For instance, to provide a travel-planning service, it is necessary to collect users' travel intention and traffic network data, which are analyzed to generate plans for users. Function is directly driven by technology, which is the key element in the framework. Functions will update as technology advances, and other elements will change accordingly. Therefore, it is necessary to explore the mechanisms of interaction among functions based on the above architecture to make sure that ATS operates and iterates autonomously.

Methodology
This study consists of four parts, namely, data collection, data processing, data clustering, and results analysis, as indicated in Figure 2. Aiming at supporting services, this paper considers function with the logic of autonomy combining related frameworks and literature in the first part. In the second part, we employ the topic model latent Dirichlet allocation (LDA) to extract text features after tokenization and obtain topic distribution weight. In the clustering part, we input the distribution-weight matrix into k-mean and kmedoid algorithms, and the results are compared through a word cloud. To better analyze the clusters in the fourth part, we visualize results in low-dimensional space. The final clustering results are output as function domains after comparison with their counterpart in ITS frameworks. The pseudo-code of the whole methodology is defined Algorithm 1: Divide function text into words with Jieba 2.
Calculate the similarity among samples in using Equation (7) 5.

Repeat
Step 4 and Step 5 using low-dimensional 8. Compare the results from Step 5 and Step 7 9.
Obtain the final clustering results • End

Methodology
This study consists of four parts, namely, data collection, data processing, data clustering, and results analysis, as indicated in Figure 2. Aiming at supporting services, this paper considers function with the logic of autonomy combining related frameworks and literature in the first part. In the second part, we employ the topic model Latent Dirichlet Allocation (LDA) to extract text features after tokenization and obtain topic distribution weight. In the clustering part, we input the distribution-weight matrix into k-mean and k-medoid algorithms, and the results are compared through a word cloud. To better analyze the clusters in the fourth part, we visualize results in low-dimensional space. The final clustering results are output as function domains after comparison with their counterpart in ITS frameworks. The pseudo-code of the whole methodology is defined Algorithm 1: Divide function text into words with Jieba 2.
Calculate the similarity among samples in G using Equation (7) 5.
Repeat Step 4 and Step 5 using low-dimensional G 8.
Compare the results from Step 5 and Step 7 9.
Obtain the final clustering results C  Figure 2. Flow chart of the study.

Data
According to the definition of ATS services, we build an analytical framework to obtain functions in the logic of autonomy, i.e., perception, learning, decision and response. Function set can be denoted by , , … , . Denoting service set as , , … , , any service requires the combination of functions , , … , . When analyzing the functions, we use the framework considering related references and externalize each layer of logic, as shown in Figure 3. To better utilize data, we define ATS functions with six attributes, including function provider , process information , service object , realization approach , logic , and technology , as shown in Table 1. Based on these attributes, one function can be described as , , , , , and the symbols denote the six attributes, which all have corresponding text values, respectively. and come from one of the basic elements in ATS (i.e., component), while belongs to the technology used in ATS.s

Data
According to the definition of ATS services, we build an analytical framework to obtain functions in the logic of autonomy, i.e., perception, learning, decision and response. Function set can be denoted by { f 1 , f 2 , . . . , f n }. Denoting service set as {s 1 , s 2 , . . . , s u }, any service s i requires the combination of functions . . , f t i . When analyzing the functions, we use the framework considering related references and externalize each layer of logic, as shown in Figure 3. To better utilize data, we define ATS functions with six attributes, including function provider P, process information I, service object O, realization approach A, logic L, and technology T, as shown in Table 1. Based on these attributes, one function f j can be described as P j , I j , O j , A j , L j , T j and the symbols denote the six attributes, which all have corresponding text values, respectively. P j and O j come from one of the basic elements in ATS (i.e., component), while T j belongs to the technology used in ATS.s  Regarding how to obtain functions using an analytical framework, we take an automatic parking service as an example, as shown in Figure 3. To support this service, perception is the first thing to consider, which requires acquiring surrounding information, i.e., monitoring real-time surroundings. Learning is a process of transferring and analyzing data to provide detailed information to decision units in vehicles, which needs the functions store, import, and analyze surrounding information. 'Decision' denotes making a decision or generating plans according to analysis results, which requires the function generate scheme in this service. 'Response' represents carrying out decisions through a series of activities including releasing orders and controlling devices. In the service automatic parking, a vehicle needs to control the parking device to perform the scheme generated from decision units. The mentioned functions are shown in Table 1 as semi-structured text data. Each function is determined by six attributes to form redefinitions using a unified paradigm. For instance, to support the automatic parking service, the vehicle provides the perception function, i.e., monitoring real-time surroundings. The surrounding environmental information is utilized by the subsequent functions through data collection, data analysis, equipment control, etc. Sensor technology is essential technology in this function. The perception module in automatic parking is accomplished through the above process.   Regarding how to obtain functions using an analytical framework, we take an automatic parking service as an example, as shown in Figure 3. To support this service, perception is the first thing to consider, which requires acquiring surrounding information, i.e., monitoring real-time surroundings. Learning is a process of transferring and analyzing data to provide detailed information to decision units in vehicles, which needs the functions store, import, and analyze surrounding information. 'Decision' denotes making a decision or generating plans according to analysis results, which requires the function generate scheme in this service. 'Response' represents carrying out decisions through a series of activities including releasing orders and controlling devices. In the service automatic parking, a vehicle needs to control the parking device to perform the scheme generated from decision units. The mentioned functions are shown in Table 1 as semi-structured text data. Each function is determined by six attributes to form redefinitions using a unified paradigm. For instance, to support the automatic parking service, the vehicle provides the perception function, i.e., monitoring real-time surroundings. The surrounding environmental information is utilized by the subsequent functions through data collection, data analysis, equipment control, etc. Sensor technology is essential technology in this function. The perception module in automatic parking is accomplished through the above process.
In order to extract feature of function data, we use a tokenization tool, named Jieba, to process the unstructured part of the text. Tokenization is basic processing in text mining [27], which divides sentences into words or phrases. Natural Language Toolkit and Stanford CoreNLP are commonly used in English text mining [28]. However, Chinese text is different from English text in that there are no spaces between words in a sentence, and the combination of words may show very different meanings. Tokenization of Chinese text is more complicated, which needs more training. Jieba is the most widely used Chinese tokenization tool with high accuracy [29][30][31]. Therefore, we use Jieba in the following process.
Latent Dirichlet Allocation is a three-level hierarchical Bayesian model. LDA models each item as a finite mixture over underlying topics [20,21]. LDA obtains insightful experimental findings in various fields [32][33][34][35]. In LDA, a word, the basic unit of discrete data, is defined to be an item from a vocabulary denoted by {1, . . . , V}. An N-word document is characterized by the notation w = (w 1 , w 2 , . . . , w N ). The corpus is a set of M documents represented by D = {w 1 , w 2 , . . . , w N }. As shown in Figure 4, LDA models documents as a random mix of potential topics with unique word distributions.

Feature Extraction
In order to extract feature of function data, we use a tokenization tool, named Jieba, to process the unstructured part of the text. Tokenization is basic processing in text mining [27], which divides sentences into words or phrases. Natural Language Toolkit and Stanford CoreNLP are commonly used in English text mining [28]. However, Chinese text is different from English text in that there are no spaces between words in a sentence, and the combination of words may show very different meanings. Tokenization of Chinese text is more complicated, which needs more training. Jieba is the most widely used Chinese tokenization tool with high accuracy [29][30][31]. Therefore, we use Jieba in the following process.
Latent Dirichlet allocation is a three-level hierarchical Bayesian model. LDA models each item as a finite mixture over underlying topics [20,21]. LDA obtains insightful experimental findings in various fields [32][33][34][35]. In LDA, a word, the basic unit of discrete data, is defined to be an item from a vocabulary denoted by 1, … , . An N-word document is characterized by the notation = ( , , … , ). The corpus is a set of M documents represented by = , , … , . As shown in Figure 4, LDA models documents as a random mix of potential topics with unique word distributions. (b) Select a word from ( | , β), a multinomial probability subject to the topic .
A k-dimensional variable can obtain values in the ( − 1)-simplex (a k-vector exists in the ( − 1)-simplex if 0, ∑ = 1 ), with the following probability density on the simplex: where is a k-vector with components 0, and Γ( ) is the gamma function. Determined the parameters and , the joint distribution of a topic mixture , topics , and words are as follows: Select θ ∼ Dir(α); 3.
For each of the N words w n : (a) Select a topic z n ∼ Multinomial(θ); (b) Select a word w n from p(w n |z n , β) , a multinomial probability subject to the topic z n .
, with the following probability density on the simplex: where α is a k-vector with components α i > 0, and Γ(x) is the gamma function. Determined the parameters α and β, the joint distribution of a topic mixture θ, N topics z, and N words w are as follows: where p(z n |θ) is simply θ i for the unique i such that z i n = 1. Subsequently, integrating over θ and summing over z, the marginal distribution of the document is as follows: The above equation denotes the probability distribution of the document given α and β. With the marginal probabilities of single documents, the probability of a corpus is calculated by: This paper uses the LDA database built by Baidu for feature extraction. When we have enough experimental data, it is preferable to use LDA, taking part of the data as the training set, and build our topic library for higher accuracy. However, the amount of function text data is less than the requirement of the self-built topic database. Therefore, we select the developed database in related fields, such as the Baidu LDA topic database. Baidu LDA news uses massive news data training with a list containing 294,657 words and 2000 topics. It is extensively utilized in both research and industry. Therefore, we select Baidu LDA for function-feature extraction.

Clustering Method
After processing by LDA, the feature of one function f j can be expressed as the row vector g j1 , g j2 , . . . , g jm and the weight matrix of the whole function set can be denoted by: This study inputs the weight matrix G to the clustering algorithm. Assuming that each dimension weight is the same, we calculate the spatial distance as the similarity of functions. There are various distance calculation formulas to represent the similarity between points in the spatial coordinate system, including Euclidean distance and cosine distance. Due to high-dimensional topic distribution, the paper employs Euclidean distance as the similarity index to simplify the calculation.
K-mean and k-medoid methods are utilized to cluster before and after dimensionality reduction [36][37][38]. The extracted weight matrix is high-dimensional, which has a great impact on clustering results [39]. Therefore, this paper adopts t-distributed stochastic neighbor embedding (t-SNE) to reduce the dimension of the weight matrix [40]. The silhouette coefficient is used to evaluate the results [41]. The above methods are demonstrated as follows.

K-Means
K-means, a traditional unsupervised algorithm, partitions objects into k clusters, enabling each object to cluster around the nearest centroid [42]. K-means method outputs clusters with centroids, which minimizes the sum of error over all k clusters, i.e., the function below: where C 1 , C 2 , . . . , C k denote the clusters, µ(C i ) denotes the mean of cluster C i , and d(x, µ(C i ) represents the similarity between the observation x and µ(C i ).
There are different methods defining the dissimilarity d(x, µ(C i ), and a Euclidean metric is typically used. The metric is defined as follows: where x j and µ j denote a point and the centroid of a cluster, respectively.
The steps of the k-means algorithm could be described as follows [43]:

•
Step 1: Choose centroids C 1 , C 2 , . . . , C k from the dataset D randomly as initialized centroids of k clusters.

•
Step 2: Allocate the remaining data to the clusters with the closest centroids by calculating the distance between the points and centroids.

•
Step 3: Take the mean value of each cluster as the new centroid.

•
Step 4: Repeat steps 3 and 4 until there are no changes in the error function or the loop satisfy the pre-set iterations.

•
Step 5: Output the final centroids and clusters as results.

K-Medoids
Unlike k-means, the k-medoid algorithm uses the median of the data, called medoids, instead of the mean. The error function calculates the dissimilarities between each data point and its corresponding medoid as follows: where C 1 , C 2 , . . . , C k denote the clusters and m(C i ) represents the medoid of cluster C i . Several algorithms are derived from k-medoids, and partitioning around medoids is one representative among them. The basic steps could be described as follows [42]:

•
Step 1: Select k points from the dataset D randomly as initialized medoids of k clusters.

•
Step 2: Allocate the remaining data to the clusters with the closest medoids by calculating the distance between the points and medoids.

•
Step 3: Choose a non-medoid point from D randomly and compute the new sum error.

•
Step 4: If the new error calculated in Step 3 is lower than the old value, then exchange the old medoid with the new one.

•
Step 5: Repeat steps 2 to 4 until there are no changes in the error function or the loop satisfy the pre-set iterations.

•
Step 6: Output the final centroids and clusters as results.

t-Distributed Stochastic Neighbor Embedding
The t-distributed stochastic neighbor embedding is adopted to reduce the dimension of the weight matrix, and its main steps can be demonstrated as follows [44]:

1.
For the data X = {x 1 , x 2 , . . . , x n }, the conditional probability p j|i represents the similarity of data point x j to x i , which can be calculated by the formula: where δ i stands for the variance of the Gaussian, which is centered on data point x i .

2.
In the high-dimensional space, the joint probability p ij is a symmetrical conditional probability, which can be defined as: where n denotes the amount of data points.

4.
In the low-dimensional space, the joint probability q ij is defined based on Student's t-distribution with one degree of freedom:

5.
To measure similarity between the joint probability P in high-dimensional space and joint probability Q in low-dimensional space, the Kullbach-Leibler divergence between P and Q is defined as: Following this, a gradient descent algorithm is used to minimize the above cost function, and the gradient is computed by: 6. The low-dimensional space data points is calculated by the formula: where learning rate η and momentum α(t) are optimization parameters that should be pre-set.

Silhouette Coefficient
In both k-mean and k-medoid algorithms, the most critical choice is the number of clusters K, and we choose the silhouette coefficient to measure the results. The silhouette coefficient ranges from −1 to 1, with larger values indicating greater results in terms of intra-cluster homogeneity and inter-cluster separation [42]. Here is the formula for the silhouette coefficient: where: i is an object in a cluster; S(i) is the silhouette coefficient of i; a(i) is the average dissimilarity of i to remaining objects within the same cluster; b(i) is the minimum value of average dissimilarity of i to objects in other clusters. We calculate similarities among functions through the weight of topic distribution G (Equation (5)) as the input of the clustering algorithm. We obtain clustering result C for each function and demonstrate function f j as P j , I j , O j , A j , L j , T j , C j . Therefore, the function set can be described as: The value of C varies in {C 1 , C 2 , . . . , C k }, where k represents the clustering number in the algorithm. To analyze the characteristic of function domains, we consider C as the label to divide the function set F into different categories {F 1 , F 2 , . . . , F k }, which are the basis of function domains in the following process.

Results
After processing with Jieba and LDA models, we obtain tokenized text and topic keywords. Figure 5a,b show the word cloud of function-text words and topic keywords, respectively. The most frequently appearing texts are traffic, product, service, information, management, and data. From Figure 5b, we can observe that traffic, information, data, analysis, management, etc., also appear frequently. The similarity between the two figures demonstrates the rationality of LDA processing.

Results
After processing with Jieba and LDA models, we obtain tokenized text and topic keywords. Figure 5a,b show the word cloud of function-text words and topic keywords, respectively. The most frequently appearing texts are traffic, product, service, information, management, and data. From Figure 5b, we can observe that traffic, information, data, analysis, management, etc., also appear frequently. The similarity between the two figures demonstrates the rationality of LDA processing. The optimal clustering number K = 12 and the clustering method k-means are determined by the silhouette coefficient. The average silhouette coefficient of each cluster is shown in Figure 6a. We could obtain the following information from the figure: • The results after dimension reduction are better than those obtained from the original data.

•
Results from the k-mean method are superior to the results from the k-medoid method before and after dimension reduction.

•
When the cluster number lies in the interval [7,12], the variation in the index from kmeans clustering is small.
To further compare the results from the two algorithms, we record the index changes during the study. In Figure 6b, k-means clustering results are rather stable at the beginning; only when there are more than 12 clusters will there be fluctuations. The optimal clustering number K = 12 and the clustering method k-means are determined by the silhouette coefficient. The average silhouette coefficient of each cluster is shown in Figure 6a. We could obtain the following information from the figure: • The results after dimension reduction are better than those obtained from the original data.

•
Results from the k-mean method are superior to the results from the k-medoid method before and after dimension reduction.

•
When the cluster number lies in the interval [7,12], the variation in the index from k-means clustering is small. method before and after dimension reduction.

•
When the cluster number lies in the interval [7,12], the variation in the index from kmeans clustering is small.
To further compare the results from the two algorithms, we record the index changes during the study. In Figure 6b, k-means clustering results are rather stable at the beginning; only when there are more than 12 clusters will there be fluctuations. To further compare the results from the two algorithms, we record the index changes during the study. In Figure 6b, k-means clustering results are rather stable at the beginning; only when there are more than 12 clusters will there be fluctuations.
According to the above results, we determine K = 12 and select the k-means algorithm. We visualize the results in Figure 7, showing the results in two-dimensional space. It is evident that all types of clusters are evenly distributed and that the cluster partition is clear, with a distinct clustering center and perimeter. We compare the results mapping cluster labels with function text. In conclusion, the results obtained by the k-means algorithm after dimensionality reduction are more reasonable. According to the above results, we determine K = 12 and select the k-means algorithm. We visualize the results in Figure 7, showing the results in two-dimensional space. It is evident that all types of clusters are evenly distributed and that the cluster partition is clear, with a distinct clustering center and perimeter. We compare the results mapping cluster labels with function text. In conclusion, the results obtained by the k-means algorithm after dimensionality reduction are more reasonable. According to the results, we extract keywords of function in 12 clusters. To establish the topic of each cluster, we plot the word cloud of keywords in each topic according to the frequency and mark the topic keywords that occur frequently as shown in Table 2. The combination of keywords can suggest the topics of clusters.  According to the results, we extract keywords of function in 12 clusters. To establish the topic of each cluster, we plot the word cloud of keywords in each topic according to the frequency and mark the topic keywords that occur frequently as shown in Table 2. The combination of keywords can suggest the topics of clusters.  According to the results, we extract keywords of function in 12 clust the topic of each cluster, we plot the word cloud of keywords in each to the frequency and mark the topic keywords that occur frequently as sh The combination of keywords can suggest the topics of clusters.  According to the results, we extract keywords of function in 12 clusters. To est the topic of each cluster, we plot the word cloud of keywords in each topic accord the frequency and mark the topic keywords that occur frequently as shown in T The combination of keywords can suggest the topics of clusters.  According to the results, we extract keywords of function in 12 clusters. To establish the topic of each cluster, we plot the word cloud of keywords in each topic according to the frequency and mark the topic keywords that occur frequently as shown in Table 2. The combination of keywords can suggest the topics of clusters.

Discussion
According to the results above, we speculate about function doma frequency and functions in the same cluster. From topic 1 to topic 12, fu could be named as shown in Table 3. Table 3. Description of function domains.

No. Function Domain Description
This domain mostly provides perception-related functions to c of vehicles. Functions provided by vehicle equipment and road ment capture data on the ego vehicle, driver, and surrounding

Discussion
According to the results above, we speculate about function domains by ke frequency and functions in the same cluster. From topic 1 to topic 12, function do could be named as shown in Table 3. Table 3. Description of function domains.

No. Function Domain Description
This domain mostly provides perception-related functions to collect the of vehicles. Functions provided by vehicle equipment and roadside equ ment capture data on the ego vehicle, driver, and surrounding vehicles.

Discussion
According to the results above, we speculate about function domains by keyword frequency and functions in the same cluster. From topic 1 to topic 12, function domains could be named as shown in Table 3. Table 3. Description of function domains.

No. Function Domain Description
This domain mostly provides perception-related functions to collect the data of vehicles. Functions provided by vehicle equipment and roadside equipment capture data on the ego vehicle, driver, and surrounding vehicles. This

Discussion
According to the results above, we speculate about function doma frequency and functions in the same cluster. From topic 1 to topic 12, fu could be named as shown in Table 3. Table 3. Description of function domains.

No. Function Domain Description
This domain mostly provides perception-related functions to c of vehicles. Functions provided by vehicle equipment and road ment capture data on the ego vehicle, driver, and surrounding

Discussion
According to the results above, we speculate about function domains by ke frequency and functions in the same cluster. From topic 1 to topic 12, function do could be named as shown in Table 3. Table 3. Description of function domains.

No. Function Domain Description
This domain mostly provides perception-related functions to collect the of vehicles. Functions provided by vehicle equipment and roadside equi ment capture data on the ego vehicle, driver, and surrounding vehicles.

Discussion
According to the results above, we speculate about function domains by keyword frequency and functions in the same cluster. From topic 1 to topic 12, function domains could be named as shown in Table 3. Table 3. Description of function domains.

No. Function Domain Description
This domain mostly provides perception-related functions to collect the data of vehicles. Functions provided by vehicle equipment and roadside equipment capture data on the ego vehicle, driver, and surrounding vehicles. This

Discussion
According to the results above, we speculate about function doma frequency and functions in the same cluster. From topic 1 to topic 12, fu could be named as shown in Table 3.

Discussion
According to the results above, we speculate about function domains by ke frequency and functions in the same cluster. From topic 1 to topic 12, function do could be named as shown in Table 3.

Discussion
According to the results above, we speculate about function domains by keyword frequency and functions in the same cluster. From topic 1 to topic 12, function domains could be named as shown in Table 3. Table 3. Description of function domains.

No. Function Domain Description
This domain mostly provides perception-related functions to collect the data of vehicles. Functions provided by vehicle equipment and roadside equip-

Discussion
According to the results above, we speculate about function domains by keyword frequency and functions in the same cluster. From topic 1 to topic 12, function domains could be named as shown in Table 3. In Table 3, we describe each function domain from the perspective of its component and the connection among domains. Functions must have something common such as theme, target, effect, or logic of realization within a single domain. To support one service, it normally needs functions from different domains to collaborate. For instance, when dealing with emergencies, the system needs to arrange ambulance routes and lead traffic flow, which involves functions in transportation management and vehicle operation. The interaction among domains is inevitable, as seen in Figure 8.

Conclusions
In this study, we introduced the ATS architecture briefly and proposed a method to construct function domains composed of similar functions. To obtain ATS functions, we designed a framework to analyze services in ATS and reconstructed functions with six attributes defined by us. According to the function text, we introduced the LDA model to extract function features and clustered them into 12 categories. Compared to function domains of other architecture from China, America, and Europe, we defined 12 ATS domains that are essential to the whole architecture. Function domains are connected by functions as shown in Section 5, which further reflects the logic of system operation.
Analyzing the basic element of ATS, we provide an idea for developing the architecture with mathematical technology, which could be applied in various fields involving text data. The proposed functions and function domains are fundamental to design equipment in the ATS, since they provide the objectives and classify the connections among ATS elements. With respect to functions, this could provide guidance for designers or researchers to better design future transportation systems. The function domains classify functions into different fields identifying the relationship among functions as well as other ATS elements. Future research should focus on the quantitative analysis of relationships among different ATS elements, as well as the data transformation between different ATS elements.  We compared the description in Table 3 with other architecture. In Figure 8, there are three architectures of transportation systems as reference. The same color means these function domains are the same or similar except that the domains in red mean there is no match. From the figure, we could see that most domains defined by us are similar to those in the existing architecture, such as public transport information management, transportation infrastructure management, traffic information collection, commercial traffic management, freight transport, emergency response, data management and collaboration, and traffic management. Compared to other classifications, the most obvious difference is domains related to vehicles. As one of the most significant elements in transportation, each architecture sets an independent domain for vehicle operation. In ATS, the vehicle conveys more functions with autonomy, which could replace some functions provided by humans. Additionally, as technology advances, ATS may need additional autonomous functions equipped on vehicles. Therefore, it makes sense to set three domains for vehicles according to the operation of the vehicle. The other distinction is environmental information management, which other architectures do not set. Besides the reason that function domain classification in ATS is more detailed, this is because the influence of the environment on traffic becomes increasingly important.
The relationships between the functions become more apparent when the function set is divided into 12 function domains. In the function set, service is the basic starting point. There is only a top-down relationship between functions that work together to support a service. In function domains, there are obvious similarities between functions. Functions in the same domain have competitive relationships, while functions in different domains have cooperative, competitive, or completely unrelated relationships. As systems evolve, some functions will grow stronger while some may disappear without competitiveness. However, it is hard to classify such relationships without function domains. The function domain based on function similarity is necessary for the evolution of the ATS framework, since domains could indicate the intrinsic operation of the system.

Conclusions
In this study, we introduced the ATS architecture briefly and proposed a method to construct function domains composed of similar functions. To obtain ATS functions, we designed a framework to analyze services in ATS and reconstructed functions with six attributes defined by us. According to the function text, we introduced the LDA model to extract function features and clustered them into 12 categories. Compared to function domains of other architecture from China, America, and Europe, we defined 12 ATS domains that are essential to the whole architecture. Function domains are connected by functions as shown in Section 5, which further reflects the logic of system operation.
Analyzing the basic element of ATS, we provide an idea for developing the architecture with mathematical technology, which could be applied in various fields involving text data. The proposed functions and function domains are fundamental to design equipment in the ATS, since they provide the objectives and classify the connections among ATS elements. With respect to functions, this could provide guidance for designers or researchers to better design future transportation systems. The function domains classify functions into different fields identifying the relationship among functions as well as other ATS elements. Future research should focus on the quantitative analysis of relationships among different ATS elements, as well as the data transformation between different ATS elements.