Relationship between Spatio-Temporal Travel Patterns Derived from Smart-Card Data and Local Environmental Characteristics of Seoul, Korea

: With the incorporation of an automated fare-collection system into the management of public transportation, not only can the quality of transportation services be improved but also that of the data collected from users when coupled with smart-card technology. The data collected from smart cards provide opportunities for researchers to analyze big data sets and draw meaningful information out of them. This study aims to identify the relationship between travel patterns derived from smart-card data and urban characteristics. Using seven-day transit smart-card data from the public-transportation system in Seoul, the capital city of the Republic of Korea, we investigated the temporal and spatial boarding and alighting patterns of the users. The major travel patterns, classiﬁed into ﬁve clusters, were identiﬁed by utilizing the K-Spectral Centroid clustering method. We found that the temporal pattern of urban mobility reﬂects daily activities in the urban area and that the spatial pattern of the ﬁve clusters classiﬁed by travel patterns was closely related to urban structure and urban function; that is, local environmental characteristics extracted from land-use and census data. This study conﬁrmed that the travel patterns at the citywide level can be used to understand the dynamics of the urban population and the urban spatial structure. We believe that this study will provide valuable information about general patterns, which represent the possibility of ﬁnding travel patterns from individuals and urban spatial traits.


Introduction
The identification of urban structure is a topic that has long been studied by urban geographers and planners [1][2][3][4]. It is important to measure urban structures and identify the underlying activity pattern for the sake of supporting an evidence-based urban planning policy. Identifying activity centers, clusters and their characteristics not only gives urban planners a better understanding of the current structure of a city but also allows them to assess how their planning is being reflected [5].
In order to obtain a better understanding of the urban spatial structure, researchers have been increasingly scrutinizing urban mobility dynamics and their impact on urban environments, since the pattern of how people move about a city is closely related to urban spatial structures [6][7][8][9][10]. In the past, data was insufficient to analyze urban movement. However, in recent years, smart-card data from a public transportation system have opened a new opportunity to plot and understand urban dynamics. Consequently, as the data becomes more available, it has been facilitating spatial and temporal analysis of urban characteristics. Additionally, with the advances in technology, finer-resolution geospatial data have become available for modeling urban structures and dynamics [11,12].
Much literature has focused on the interrelationship between travel patterns and local environmental characteristics [13][14][15][16]. Much of the research on the link between the urban form and travel patterns belongs to the category of aggregate analysis [17,18]. Data aggregation could help screen idiosyncratic travel behaviors and identify the underlying fundamental aspects of human urban mobility [19]. Related studies have classified the travel patterns of public transportation passengers by using transit data. Ma et al. [20] defined five travel patterns extracted from the transit data of Beijing, China, through K-Means++ clustering. Goulet-Langlois et al. [21] classified 11 travel patterns from the users of London's public-transportation network. Based on the travel pattern extracted from the transit data, several studies segmented urban areas to identify the underlying urban structure and regional functions. Yuan, Zheng, Xie, Wang, Zheng and Xiong [7] proposed a topic-modeling-based method to cluster the segmented regions into functional zones using taxi trajectories, public-transit data, points of interest (POIs), and road networks. Cats, Wang and Zhao [5] revealed urban structure dynamics using a spatial-temporal distribution of the public transportation passenger flow. Roth et al. [22] revealed the polycentric structure of London using smart-card data from the London Underground.
While previous studies highlighted the potential of smart-card data to classify travel patterns and urban forms, less is known about how point-based mobility data is distributed within urban areas. Generally, the data on the movement of people has been collected using point-based locations such as bus stops and subway stations. In most previous studies, the information of the point-based data (boarding and alighting counts) was simply summed over the unit area. However, problems can occur when the boarding and alighting point is not the final destination, which is the case for most passengers. Therefore, we developed the road-based mobility distribution model to distribute the point-based ridership within the unit area.
At the same time, it is important to examine whether mobility data are statistically meaningful to investigate the interrelation between human mobility and regional characteristics. In the case of Seoul, where using public transportation is more common than other equivalent cities, it is worthwhile analyzing public transportation data to identify the interrelation between urban mobility patterns and city characteristics. Additionally, due to the immense convenience of the public transportation system in Seoul, the usage of the transit smart card had increased to 98.9% by 2014 [23]. The transit smart card system generated about 20 million records daily; with stored passenger locations and time of ridership daily within the Seoul metropolitan area [24]. In this regard, we used smart-card data from the Korean automated fare collection (AFC) system to identify general travel patterns throughout the city, and investigated the interrelationship between travel patterns and urban spatial characteristics with land use and socio-demographic data. Our study will provide valuable information concerning the spatial and temporal characteristics of intra-urban mobility, the effect of built environments on travel patterns, and vice versa. We believe that this gives an insight into determining general patterns of geographical areas, as well as the traits of the areas.
The rest of the paper is organized as follows: Section 2 describes the study area and the datasets considered: smart-card data and local environmental data (land-use and socio-demographic data); Section 3 presents the methodology defined for the mobility-pattern analysis and its relationship to local environments; and the discussion and conclusion are given in Sections 4 and 5, respectively.

Study Area
Seoul is a vibrant capital city that has historically attracted people and commerce. As a political and economic center, Seoul is the largest metropolis in and the capital of South Korea. The 2015 population of Seoul was 10.3 million and the metropolitan area had 25.5 million people, which is half the population of Korea [25]. At 605.2 km 2 , 0.6% the total area of Korea, and a population density of 17,014 persons per square kilometer, Seoul is one of the most densely populated cities in the world. It is comprised of 25 gu (local government districts) and the study area covers the entire city ( Figure 1). The Korean public transit system, including buses and subways, serves a large part of inter-and intra-city travel. Nine major subway lines and six different bus categories run throughout the city. In 2014, the public transportation system catered to about 55% of travel in Seoul (24.6% by bus, 10.5% by subway and train, and 19.4% by bus and subway) [26]. Accordingly, buses and metros play an essential part of urban trips in Seoul. Since transit smart-card usage of public transportation users in Seoul is close to 100%, as mentioned earlier, transit data produced from the study area is considered to be a reliable source of information for explaining the flows of urban passengers.

Smart-Card Data
The source data holds transit smart-card records from 18-24 March 2015, in Seoul. We used a data set that covers seven days consecutively from Monday to Sunday. The dataset used to describe urban mobility patterns covers the two major public transport modes, i.e., bus and subway. The smart-card data was gathered from 7 million smart cards of public transit users and contained about 20 million transport records daily. From the smart-card data, we could extract the time and the location of boarding and alighting without reference to the public transportation modes. Considering that we concentrated on the departure point and the destination point of a trip, we reorganized the smart-card records into a table form that contains initial boarding information and final alighting information. The arranged data consists of a transaction number, a boarding time, an ID of the boarding location, alighting time, an ID of the alighting location, and the number of passengers. The configuration of the arranged transit data is given in Table 1.  1  20150318092455  4196066  20150318092714  4196117  1  2  20150318214734  4196061  20150318215035  4196065  1  3  20150318225205  12174  20150318225855  8001017  1  4  20150318075726  70647  20150318082805  10585  1  5  20150318070948  216  20150318084718  4101317  1  … … … … … … The Korean public transit system, including buses and subways, serves a large part of inter-and intra-city travel. Nine major subway lines and six different bus categories run throughout the city. In 2014, the public transportation system catered to about 55% of travel in Seoul (24.6% by bus, 10.5% by subway and train, and 19.4% by bus and subway) [26]. Accordingly, buses and metros play an essential part of urban trips in Seoul. Since transit smart-card usage of public transportation users in Seoul is close to 100%, as mentioned earlier, transit data produced from the study area is considered to be a reliable source of information for explaining the flows of urban passengers.

Smart-Card Data
The source data holds transit smart-card records from 18-24 March 2015, in Seoul. We used a data set that covers seven days consecutively from Monday to Sunday. The dataset used to describe urban mobility patterns covers the two major public transport modes, i.e., bus and subway. The smart-card data was gathered from 7 million smart cards of public transit users and contained about 20 million transport records daily. From the smart-card data, we could extract the time and the location of boarding and alighting without reference to the public transportation modes. Considering that we concentrated on the departure point and the destination point of a trip, we reorganized the smart-card records into a table form that contains initial boarding information and final alighting information. The arranged data consists of a transaction number, a boarding time, an ID of the boarding location, alighting time, an ID of the alighting location, and the number of passengers. The configuration of the arranged transit data is given in Table 1.

Local Environmental Data
The data sets used in this study also include socio-demographic statistics and a land-use land cover (LULC) map provided by Statistics Korea and the Korean Ministry of Environment (MOE). The socio-demographic statistics are provided on a census output area (OA) that is based on smaller blocks than administrative districts, and contain four main categories: population, household, housing, and business [27]. The socio-demographic data provided detailed information on the size, distribution and structure of population, housing, and businesses in Korea. LULC data contains detailed information on the urban area such as residential, industrial, commercial and recreational facilities areas. Table 2 shows the detailed information of the variables used.

Unit Area and Data Preparation
The socio-demographic data used in this study was provided on OA. OA in Korea is the smallest geographic unit for the publication of statistical data and is designed to contain approximately 500 residents [28]. Even though many studies use geographical areas as the unit of analysis, spatial units can greatly influence the result of a study, which is known as the modifiable areal unit problem (MAUP) [29].
OAs and grids are widely used as the units of analysis. In urban environments, however, the segmentation of an urban area based on a road network is more natural than other criteria. Since people usually live in road-segmented regions and travel among road-segmented regions, we aggregated the OAs of the study area with the road network using the road-based segmentation method [30]. For this, we extracted the road-network data provided by the Korean Ministry of Land, Infrastructure and Transport (MOLIT). Aggregated OAs (AOAs) by the road network are considered as a basic unit of our study on the assumption that the road-segmented regions are the bases of daily activity and human mobility ( Figure 2).

Methods
Methods include region clustering to identify the human-mobility patterns of Seoul and correspondence analysis (CA) to discover the relationships between human-mobility patterns and local environments. Figure 3 depicts the overall processing scheme of the study. Each step will be explained in detail in the following sections.

Road-Based Mobility Distribution
Boarding and alighting counts of transit records are represented as point-based, as listed in Table  1. For spatial clustering, counts of boarding and alighting in each unit area are needed for the

Methods
Methods include region clustering to identify the human-mobility patterns of Seoul and correspondence analysis (CA) to discover the relationships between human-mobility patterns and local environments. Figure 3 depicts the overall processing scheme of the study. Each step will be explained in detail in the following sections.

Methods
Methods include region clustering to identify the human-mobility patterns of Seoul and correspondence analysis (CA) to discover the relationships between human-mobility patterns and local environments. Figure 3 depicts the overall processing scheme of the study. Each step will be explained in detail in the following sections.

Aggregate Human-Mobility Patterns from Smart-Card Records
Boarding and alighting counts of transit records are represented as point-based, as listed in Table  1. For spatial clustering, counts of boarding and alighting in each unit area are needed for the

Road-Based Mobility Distribution
Boarding and alighting counts of transit records are represented as point-based, as listed in Table 1. For spatial clustering, counts of boarding and alighting in each unit area are needed for the calculation. For this, boarding and alighting counts inputted into unit areas (i.e., AOAs) are aggregated. In order to distribute the point-based ridership into each AOA, we used the street-weighting method that utilizes the vector street network [31,32]. The average walking distance of public transportation users, the distance from a station or a bus stop to users' origin or destination, were 432 to 525 meters over the study area [33]. According to the Seoul Metropolitan City Planning Decree, the station influence area is defined as a 500-m radius from a station [34]. Accordingly, the point-based boarding and alighting data was redistributed into the street within a radius of 500 m, since we assumed that users of the public transportation travel on foot after getting on/off public transportation; the street network indicates the pedestrian level ( Figure 4). calculation. For this, boarding and alighting counts inputted into unit areas (i.e., AOAs) are aggregated. In order to distribute the point-based ridership into each AOA, we used the streetweighting method that utilizes the vector street network [31,32]. The average walking distance of public transportation users, the distance from a station or a bus stop to users' origin or destination, were 432 to 525 meters over the study area [33]. According to the Seoul Metropolitan City Planning Decree, the station influence area is defined as a 500-m radius from a station [34]. Accordingly, the point-based boarding and alighting data was redistributed into the street within a radius of 500 m, since we assumed that users of the public transportation travel on foot after getting on/off public transportation; the street network indicates the pedestrian level ( Figure 4).  After distributing boarding and alighting of all stations/bus stops into the streets, boarding and alighting counts are recalculated within each AOA and accumulated at hourly intervals. Mathematically, we denote a transit data for each AOA , from time 1 to , as given in Equation (1): where and are sequences of boarding or alighting counts of time-stamped AOAs.

Clustering Analysis
Cluster analysis is a common approach for discovering the grouping of a set of patterns. Generally, clustering methods use distance measures, such as Euclidean distance or Manhattan distance, to define similarity among different objects. However, clustering in high-dimensional spaces is often problematic with those distance measures because distance functions are not always suitable for measuring correlation among the objects [35]. As the dimension of the data grow, the difference between close and distant objects becomes useless or some attribute values become insignificant in a given cluster [36].
Therefore, to derive a new distance-measurement scheme, we started with the notion that the arranged transit data consists of hourly boarding and alighting counts during seven days. Transit data arranged in Section 3.1.1 consists of hourly boarding and alighting counts during seven days. If we consider that the columns of the dataset are independent among others, 24 × 7 × 2 variables (24 h, 7 days, boarding and alighting) should be used for clustering. Since the pattern of human mobility After distributing boarding and alighting of all stations/bus stops into the streets, boarding and alighting counts are recalculated within each AOA and accumulated at hourly intervals. Mathematically, we denote a transit data for each AOA R i , from time t 1 to t N , as given in Equation (1): where s b R i and s a R i are sequences of boarding or alighting counts of time-stamped AOAs.

Clustering Analysis
Cluster analysis is a common approach for discovering the grouping of a set of patterns. Generally, clustering methods use distance measures, such as Euclidean distance or Manhattan distance, to define similarity among different objects. However, clustering in high-dimensional spaces is often problematic with those distance measures because distance functions are not always suitable for measuring correlation among the objects [35]. As the dimension of the data grow, the difference between close and distant objects becomes useless or some attribute values become insignificant in a given cluster [36]. Therefore, to derive a new distance-measurement scheme, we started with the notion that the arranged transit data consists of hourly boarding and alighting counts during seven days. Transit data arranged in Section 3.1.1 consists of hourly boarding and alighting counts during seven days. If we consider that the columns of the dataset are independent among others, 24 × 7 × 2 variables (24 h, 7 days, boarding and alighting) should be used for clustering. Since the pattern of human mobility behavior is very regular and diurnal, it is important to find distinct patterns by matching the similarity of data at the same time of day [37]. To do so, the distance measure as shown in Equation (2) was utilized. Based on one day (24 h), given two time series s and s , the distance d(s, s ) between them is calculated: where · is the l 2 norm. The distance between a pair of time-series data can be regarded as an indicator to show the similarity between them. The smaller the distance, the higher the similarity becomes.
Since the peak in the transit-data information at the given interval is also important, time series s with s for the same period were compared. As our data contains seven days and boarding/alighting components, the distance measure used can be represented as shown in Equation (3). In Equation (3), D R i , R j means the similarity measure of boarding/alighting data between ith and jth region during the given days: Next, to find clusters of the time series that share a distinct temporal pattern, the K-SC clustering algorithm was chosen. K-SC, similar to K-means clustering, is an iterative algorithm that uses a time-series distance metric to calculate cluster centroids. K-SC computes more accurate and informative cluster centroids by matching the variation of time series data [38], which can be readily applied to identify common travel patterns from transit data.

Cluster Validity Measures
Determining the most appropriate number of clusters is one of the trying problems in cluster analysis. The K-SC algorithm, in common with other variants of K-means, needs the number of clusters (K) to be specified by users. To find the optimal number of clusters, numerous approaches have been suggested [39][40][41][42][43]. The common method for estimating the best number of clusters is to measure the quality of the clustering given a specific number of clusters with a criterion [44]. To measure the goodness of the clustering, various validation indices have been proposed. Since we could rely on the data itself for clustering, frequently-used internal clustering validation indices were used; the Calinski-Harabasz index [39], Hartigan's Index [40], and the Average Silhouette [41], while implementing K-SC with a different number of clusters. Those clustering validation indices were compared to determine the optimal number of clusters for our study.

Relating Mobility Pattern and Local Environmental Characteristics
Principal component analysis (PCA) using the local environment data was utilized to analyze the socio-demographic data (population census, business, and land use). The reason for incorporating this procedure is that there are too many variables related to each other. By applying PCA, we were able to reduce the number of variables and reorganize the data sets. After reducing the variables, a multiple correspondence analysis (MCA) was implemented to find the relationship with mobility patterns.

Variable Reduction Using Principal Component Analysis (PCA)
PCA is a classical technique in statistical data analysis for data reduction. The purpose of PCA is for structuring many variables into a smaller number of components while retaining much of the information of the original data. Because variables from census data are often correlated with each other, PCA can be efficiently used for removing the collinearity of variables and uncovering latent variables [45]. There are a total of 142 local environmental variables including demographic, business and LULC characteristics, as summarized in Table 2. To simplify the structure of variables for further analysis, PCA was applied to extract fewer uncorrelated components. The resulting components are used as key variables in the next step.

Multiple Correspondence Analysis (MCA)
MCA is an extension of correspondence analysis, which is a statistical technique to analyze the interrelationships among several categorical dependent variables [46]. Correspondence analysis is suited for analyzing contingency tables, which examine the associations among variables. The distinct advantage of correspondence analysis is that it can be used to find the relationship within dependent variables or within independent variables, as well as the interrelationship between dependent variables and independent variables. It is useful to represent the interrelations among variables on a map [46,47]. The reason for applying MCA in our case is that it could reveal the relationship between mobility patterns and other variables without prior knowledge and provide insight into the relationships between them. Moreover, MCA could visualize the relationships between variables on the plane.

Clustering (Analysis of Human Activities from the Aggregated Perspective)
4.1.1. Clustering Validity Figure 5 shows the values of the three measures, the Calinski-Harabasz index (CH), Hartigan's Index (HA), and the Average Silhouette (AS), as a function of the number of clusters. We experimented with K = 2 to 10 and each measure was normalized from zero to one. The higher value indicates good clustering, but those measures do not always match each other. As shown in Figure 5, the tendency of CH and AS is the opposite but in case of HA, the value; when K = 5, is the highest. Since it implies that K = 5 gives the best clustering results, we chose K = 5 as the number of clusters for our data sets. MCA is an extension of correspondence analysis, which is a statistical technique to analyze the interrelationships among several categorical dependent variables [46]. Correspondence analysis is suited for analyzing contingency tables, which examine the associations among variables. The distinct advantage of correspondence analysis is that it can be used to find the relationship within dependent variables or within independent variables, as well as the interrelationship between dependent variables and independent variables. It is useful to represent the interrelations among variables on a map [46,47]. The reason for applying MCA in our case is that it could reveal the relationship between mobility patterns and other variables without prior knowledge and provide insight into the relationships between them. Moreover, MCA could visualize the relationships between variables on the plane.

Clustering (Analysis of Human Activities from the Aggregated Perspective)
4.1.1. Clustering Validity Figure 5 shows the values of the three measures, the Calinski-Harabasz index (CH), Hartigan's Index (HA), and the Average Silhouette (AS), as a function of the number of clusters. We experimented with K = 2 to 10 and each measure was normalized from zero to one. The higher value indicates good clustering, but those measures do not always match each other. As shown in Figure  5, the tendency of CH and AS is the opposite but in case of HA, the value; when K = 5, is the highest. Since it implies that K = 5 gives the best clustering results, we chose K = 5 as the number of clusters for our data sets.  Figure 6 presents the total boarding and alighting counts of the study area: the temporal pattern for public transportation. The temporal pattern for public transportation refers to the boarding and alighting counts over time. The temporal pattern during weekdays exhibits a bimodal shape over a day. During the morning rush hour, the alighting pattern has the highest peak point between 08:00 and 09:00. In comparison, the boarding counts during 07:00-08:00 and 08:00-09:00 have similar values. During the evening rush hour, the boarding pattern has the highest peak point between 18:00 and 19:00 while the alighting counts during 18:00-19:00 and 19:00-18:00 are similar. This reflects commuting time from boarding to alighting, and clearly shows the use of public transportation for commuting during general working hours in Korea (09:00-18:00). Meanwhile, the temporal patterns during the weekend are much smoother than those of weekdays and similar for both boarding and  Figure 6 presents the total boarding and alighting counts of the study area: the temporal pattern for public transportation. The temporal pattern for public transportation refers to the boarding and alighting counts over time. The temporal pattern during weekdays exhibits a bimodal shape over a day. During the morning rush hour, the alighting pattern has the highest peak point between 08:00 and 09:00. In comparison, the boarding counts during 07:00-08:00 and 08:00-09:00 have similar values.

Temporal Pattern of Public-Transit Passengers
During the evening rush hour, the boarding pattern has the highest peak point between 18:00 and 19:00 while the alighting counts during 18:00-19:00 and 19:00-18:00 are similar. This reflects commuting time from boarding to alighting, and clearly shows the use of public transportation for commuting during general working hours in Korea (09:00-18:00). Meanwhile, the temporal patterns during the weekend are much smoother than those of weekdays and similar for both boarding and alighting. The fact that the temporal patterns are grouped into two i.e., weekdays (Mon-Fri) and the weekend (Sat-Sun) can be visually identified.
It can be numerically confirmed that the temporal patterns can be separated into two groups through Pearson's correlation coefficients, as shown in Figure 7. In the case of weekdays, boarding or alighting patterns have strong correlations (0.991-0.999). For the weekend, temporal patterns are also highly correlated (0.989-0.990). Meanwhile, the correlation between boarding and alighting patterns of weekdays is relatively low (0.584-0.624); and the correlation between the boarding and alighting pattern of the weekend is higher than that of weekdays (0.829-0.847). Since the patterns between weekdays and the weekend are distinct, we further analyzed the commuting patterns using two categories (i.e., weekday and weekend). alighting. The fact that the temporal patterns are grouped into two i.e., weekdays (Mon-Fri) and the weekend (Sat-Sun) can be visually identified. It can be numerically confirmed that the temporal patterns can be separated into two groups through Pearson's correlation coefficients, as shown in Figure 7. In the case of weekdays, boarding or alighting patterns have strong correlations (0.991-0.999). For the weekend, temporal patterns are also highly correlated (0.989-0.990). Meanwhile, the correlation between boarding and alighting patterns of weekdays is relatively low (0.584-0.624); and the correlation between the boarding and alighting pattern of the weekend is higher than that of weekdays (0.829-0.847). Since the patterns between weekdays and the weekend are distinct, we further analyzed the commuting patterns using two categories (i.e., weekday and weekend).  To investigate patterns of weekdays and the weekend, counts of five weekdays (Mon-Fri) and the weekend (Sat-Sun) were averaged and the results are shown in Figures 8 and 9. Each cluster for K = 5 is labeled as C1, C2, C3, C4, and C5, respectively. The clusters were sorted out so that C1 is the largest and C5 is the smallest in the morning peak hour (the boarding peak volume is in an ascending order). In Figure 8, representing the temporal pattern of weekdays, the boarding of C1 begins to increase gradually in the morning and reaches its peak between 18:00-19:00. Meanwhile, the alighting of C1 soars rapidly in the morning and decreased after that time. Moreover, the peaks of C1 is the  alighting. The fact that the temporal patterns are grouped into two i.e., weekdays (Mon-Fri) and the weekend (Sat-Sun) can be visually identified. It can be numerically confirmed that the temporal patterns can be separated into two groups through Pearson's correlation coefficients, as shown in Figure 7. In the case of weekdays, boarding or alighting patterns have strong correlations (0.991-0.999). For the weekend, temporal patterns are also highly correlated (0.989-0.990). Meanwhile, the correlation between boarding and alighting patterns of weekdays is relatively low (0.584-0.624); and the correlation between the boarding and alighting pattern of the weekend is higher than that of weekdays (0.829-0.847). Since the patterns between weekdays and the weekend are distinct, we further analyzed the commuting patterns using two categories (i.e., weekday and weekend).  To investigate patterns of weekdays and the weekend, counts of five weekdays (Mon-Fri) and the weekend (Sat-Sun) were averaged and the results are shown in Figures 8 and 9. Each cluster for K = 5 is labeled as C1, C2, C3, C4, and C5, respectively. The clusters were sorted out so that C1 is the largest and C5 is the smallest in the morning peak hour (the boarding peak volume is in an ascending order). In Figure 8, representing the temporal pattern of weekdays, the boarding of C1 begins to increase gradually in the morning and reaches its peak between 18:00-19:00. Meanwhile, the alighting of C1 soars rapidly in the morning and decreased after that time. Moreover, the peaks of C1 is the To investigate patterns of weekdays and the weekend, counts of five weekdays (Mon-Fri) and the weekend (Sat-Sun) were averaged and the results are shown in Figures 8 and 9. Each cluster for K = 5 is labeled as C1, C2, C3, C4, and C5, respectively. The clusters were sorted out so that C1 is the largest and C5 is the smallest in the morning peak hour (the boarding peak volume is in an ascending order). In Figure 8, representing the temporal pattern of weekdays, the boarding of C1 begins to increase gradually in the morning and reaches its peak between 18:00-19:00. Meanwhile, the alighting of C1 soars rapidly in the morning and decreased after that time. Moreover, the peaks of C1 is the highest in both boarding and alighting. It means that C1 has many incoming passengers in the morning and many people outgoing in the evening. In the case of C2, the boarding in the morning is slightly high higher than C1 but the peak in the evening is less than half of that of C1. And C2 has the lowest public transport passengers in the middle of the day. The boarding and alighting patterns of C3 are similar to those of C1 and C2, but there is a difference in the boarding pattern of C3 at 22:00. C3 has a third peak point at night and it means more people stay in C3 than in C2 in the evening. In the case of C4, the boarding peak in the morning exceeds the peak in the evening. C5 shows a similar tendency, but its boarding in the morning and alighting in the evening are higher than those of C4. This signifies that many people leave C4 and C5 in the morning and arrive in the evening to night periods. highest in both boarding and alighting. It means that C1 has many incoming passengers in the morning and many people outgoing in the evening. In the case of C2, the boarding in the morning is slightly high higher than C1 but the peak in the evening is less than half of that of C1. And C2 has the lowest public transport passengers in the middle of the day. The boarding and alighting patterns of C3 are similar to those of C1 and C2, but there is a difference in the boarding pattern of C3 at 22:00. C3 has a third peak point at night and it means more people stay in C3 than in C2 in the evening. In the case of C4, the boarding peak in the morning exceeds the peak in the evening. C5 shows a similar tendency, but its boarding in the morning and alighting in the evening are higher than those of C4. This signifies that many people leave C4 and C5 in the morning and arrive in the evening to night periods.  Figure 9 shows the temporal pattern of boarding and alighting during the weekend. Unlike the pattern of weekdays, the diurnal pattern of the weekend shows a unimodal shape. Compared to the patterns of weekdays with peak points in the morning or evening, the patterns of the weekend have peak points during the daytime. It can be inferred that the movement patterns of people during the weekend and weekdays are significantly different. In addition, the shapes of boarding and alighting during the weekend are relatively similar.  Figure 9 shows the temporal pattern of boarding and alighting during the weekend. Unlike the pattern of weekdays, the diurnal pattern of the weekend shows a unimodal shape. Compared to the patterns of weekdays with peak points in the morning or evening, the patterns of the weekend have peak points during the daytime. It can be inferred that the movement patterns of people during the weekend and weekdays are significantly different. In addition, the shapes of boarding and alighting during the weekend are relatively similar.
In C1, there was a great influx of people between 09:00 and 14:00 and many people moved out of C1 from late afternoon until night. Also, the highest peak points of boarding and alighting are in C1, which means many people visit C1 during the weekend as well as weekdays. C2 has the least percentage of people who utilize public transport, exhibiting no discernible peaks. C3 showed a similar pattern with C1, having higher alighting counts than C1 in the evening to night. C2 is the region that attracts a substantial number of people even later in the evening. C4 and C5 have opposite trends of C1 and C2. Although the boarding and alighting of C5 is higher than those of C4, their patterns are similar to each other. Lots of people ride public transportation in the morning and alight in the evening; many people in C4 and C5 leave these areas in the morning and return in the evening to night. In C1, there was a great influx of people between 09:00 and 14:00 and many people moved out of C1 from late afternoon until night. Also, the highest peak points of boarding and alighting are in C1, which means many people visit C1 during the weekend as well as weekdays. C2 has the least percentage of people who utilize public transport, exhibiting no discernible peaks. C3 showed a similar pattern with C1, having higher alighting counts than C1 in the evening to night. C2 is the region that attracts a substantial number of people even later in the evening. C4 and C5 have opposite trends of C1 and C2. Although the boarding and alighting of C5 is higher than those of C4, their patterns are similar to each other. Lots of people ride public transportation in the morning and alight in the evening; many people in C4 and C5 leave these areas in the morning and return in the evening to night.

Temporal Pattern of Public-Transit Passengers
Through clustering, we could also obtain the spatial pattern of public-transport travel. The spatial distribution of clusters is shown in Figure 10. C1 contains the main central business district (CBD) and the major business districts, such as Gangnam and Yeoeuido. C2 and C3 are located beside C1 areas. C4 and C5 are located on the edge of the city. The spatial distribution of C1 is consistent

Temporal Pattern of Public-Transit Passengers
Through clustering, we could also obtain the spatial pattern of public-transport travel. The spatial distribution of clusters is shown in Figure 10. C1 contains the main central business district (CBD) and the major business districts, such as Gangnam and Yeoeuido. C2 and C3 are located beside C1 areas. C4 and C5 are located on the edge of the city. The spatial distribution of C1 is consistent with employment centers in Seoul in other research, which was undertaken by using statistical data only [48,49]. In addition, the spatial pattern of the clusters resembles the urban structure from the spatial restructuring plan developed by the Seoul Metropolitan Government [50]. This enables us to give meaning to our results and to explain the relationship between travel patterns and local environments; the spatial pattern of human mobility gives information on the influence of built-environment structures on travel patterns.
Sustainability 2018, 10, x FOR PEER REVIEW 12 of 17 spatial restructuring plan developed by the Seoul Metropolitan Government [50]. This enables us to give meaning to our results and to explain the relationship between travel patterns and local environments; the spatial pattern of human mobility gives information on the influence of builtenvironment structures on travel patterns.  Table 3 shows the mean values of a set of socio-demographic, land-use and transportation characteristics of each identified cluster. C1 has the highest concentration of companies, more than two times that of the other clusters, and consequently the lowest population. C2 has the highest industrial area. C5 has the highest number of residents and the highest residential area. C5 also has the lowest commercial area and the lowest value of aging. C3 and C4 have the third and second highest population and almost half of their area is residential. An important difference between these two clusters is the concentration of companies. C3 also has the second highest number of businesses. In terms of transportation characteristics, C1 and C5 are characterized by a large number of public transportation users. C2 has the lowest passengers even though the population and residential area of C2 are higher than those of C1. Since C1 has more than twice the number of employees compared to C2, it can be considered that the number of employees has a large influence on the number of passengers. Besides, C2 has the smallest number of subway stations. It was observed that the number of public-transportation users is positively correlated to the number of subway stations.

Descriptive Analysis
Relating to the transportation characteristics, boarding around the morning peak appears to be a function of population and alighting around the morning peak is a function of the number of employees/businesses in the area. In addition, the average daily number of passengers appears to have a linear relationship with the summation of population and the number of employees except in C1. Excluding C1, the number of passengers has a positive linear relationship with the population Figure 10. Spatial distribution of clusters (spatial pattern of mobility). Table 3 shows the mean values of a set of socio-demographic, land-use and transportation characteristics of each identified cluster. C1 has the highest concentration of companies, more than two times that of the other clusters, and consequently the lowest population. C2 has the highest industrial area. C5 has the highest number of residents and the highest residential area. C5 also has the lowest commercial area and the lowest value of aging. C3 and C4 have the third and second highest population and almost half of their area is residential. An important difference between these two clusters is the concentration of companies. C3 also has the second highest number of businesses. In terms of transportation characteristics, C1 and C5 are characterized by a large number of public transportation users. C2 has the lowest passengers even though the population and residential area of C2 are higher than those of C1. Since C1 has more than twice the number of employees compared to C2, it can be considered that the number of employees has a large influence on the number of passengers. Besides, C2 has the smallest number of subway stations. It was observed that the number of public-transportation users is positively correlated to the number of subway stations.

Descriptive Analysis
Relating to the transportation characteristics, boarding around the morning peak appears to be a function of population and alighting around the morning peak is a function of the number of employees/businesses in the area. In addition, the average daily number of passengers appears to have a linear relationship with the summation of population and the number of employees except in C1. Excluding C1, the number of passengers has a positive linear relationship with the population and a negative linear relationship with the number of employees. C1 is the least populated region in Seoul, but there are a lot of passengers (movements); alighting and boarding throughout the area. Also, C1 has the largest number of stations although the total area of C1 is the smallest. It might be attributed to the highest number of businesses and employees in C1 among the clusters and concentration of public-transportation facilities. Meanwhile, travel in C1 during the weekend becomes like that in C3. C1 and C3 are closely located geographically as shown in Figure 10. It is thought that weekday patterns of C1 and C3 may vary significantly due to the large number of employees, but the difference of the weekend is reduced because of the low impact of commuting. C3 has the largest recreational facilities area and the least passenger gap between weekdays and the weekend. C4 and C5 are both primarily residential areas, but the ridership volume of C5 is higher than that of C4. Heavier residential density in C5 probably contributes to the number of passengers. The local environment data set was reduced from 142 dimensions to only 3 by ignoring eigenvectors that have insignificant eigenvalues. Using the varimax rotation, three components were retained based on visual inspection of the scree plot. The three-component solution explains 53.69% of total variance. The first principal component (PC1) explains 40.18% of the total variance and is dominated by total population and number of households. The second principal component (PC2) that explains 9.69% of the total variance is influenced by the number of businesses, accommodation businesses, food services, etc. The third component (PC3) accounts for 3.81% of the total variance and is dominated by studio apartments, unrelated households, rooms for monthly rent, and single-member households. Based on these, three components could be considered as a residential function (family unit), commercial function, and residential function (single unit), respectively.

Relationship between Mobility Pattern and local Environments
For each cluster of the travel patterns, the average values of newly constructed variables: PC1, PC2, PC3, are plotted in Figure 11. With regard to the pattern of C1 areas, the residential function (family unit) component is the lowest and the commercial function component is higher than the other clusters. This indicates that these local environmental characteristics tend to attract people in the morning but drive them out in the evening. For C5, the residential function component is the highest and the commercial function component is the lowest in clusters, indicating that areas of C5 tend to be associated with residential-dominated areas rather than commercial areas. C2, C3, and C4 have moderate characteristics. Out of them, C3 has similar pattern with C1; negative PC1, positive PC2 and PC3. This is likely to be linked to the similar temporal pattern of C1 and C3 during the weekend, as discussed in previous section.
In order to further analyze the relationship between mobility patterns and environmental characteristics, a correspondence analysis is applied. We categorize each PC value into three, high, medium, and low, according to normalized values of the three components. The graph from the correspondence analysis, as shown in Figure 12, shows the relationship between the clusters identified by mobility patterns and local environmental characteristics. In the graph, trip patterns are represented by black rectangles, and socio-demographic variables are in various shapes. This shows that C1 is quite close to PC2-high, indicating that this area tends to be located in commercial areas. Both C2 and C3 are close to PC1-medium. But C3 is closer to PC3-high than the other clusters, indicating that C3 tends to have a residential function in single units such as one-person households. C5 is close to PC3-low, indicating that C5 has a family unit residential function. A visual inspection of the CA bi-plot revealed that the locations of trip patterns and local environmental variables show similarities between them. tend to be associated with residential-dominated areas rather than commercial areas. C2, C3, and C4 have moderate characteristics. Out of them, C3 has similar pattern with C1; negative PC1, positive PC2 and PC3. This is likely to be linked to the similar temporal pattern of C1 and C3 during the weekend, as discussed in previous section. In order to further analyze the relationship between mobility patterns and environmental characteristics, a correspondence analysis is applied. We categorize each PC value into three, high, medium, and low, according to normalized values of the three components. The graph from the correspondence analysis, as shown in Figure 12, shows the relationship between the clusters identified by mobility patterns and local environmental characteristics. In the graph, trip patterns are represented by black rectangles, and socio-demographic variables are in various shapes. This shows that C1 is quite close to PC2-high, indicating that this area tends to be located in commercial areas. Both C2 and C3 are close to PC1-medium. But C3 is closer to PC3-high than the other clusters, indicating that C3 tends to have a residential function in single units such as one-person households. C5 is close to PC3-low, indicating that C5 has a family unit residential function. A visual inspection of the CA bi-plot revealed that the locations of trip patterns and local environmental variables show similarities between them.

Conclusions
This study aimed to identify the travel patterns from transit smart-card data and its association with local environmental characteristics. Previous studies have focused on explaining trip purpose and its relation with land-use type. In contrast, we found major travel patterns on a citywide level while simultaneously assessing the inter-relationships with local environmental characteristics. Using seven-day transit smart-card data in Seoul, we investigated the temporal and spatial patterns of boarding and alighting of public transportation, and identified the links between travel patterns and other local environmental factors in urban environments.
We found that the temporal pattern of urban mobility reflects the daily activities in the urban areas. The major travel patterns, classified in five clusters, are identified by utilizing K-SC clustering. From the five travel patterns, the main representative activities are well described. In case of C1, many people come in during the morning and move out in the evening and night. Meanwhile, C5 showed the opposite pattern. The result is strongly correlated with general daily routines. Moreover, each cluster showed its own distinctive pattern, which reflects the daily activity inside each cluster. Accordingly, the regional functions of the clusters can be estimated by travel patterns.
The spatial pattern of the five clusters is closely related to the urban function from other data sources; urban planning maps, and other research. We could identify from the spatial patterns of the clusters' population tendency and urban regional function that C1 is located in the center of a city's business district and C5 is located on the edge of a city and is a residential area. Furthermore, it was shown that the residential population, the number of employees and the number of stations are related to the public-transportation usage. Hence, when designing a new public-transportation facility, it is important to consider the factors such as population and existing transportation facilities in a comprehensive way. For instance, the C2 area in the case study area needs to be considered as a priority if new transportation facilities are to be built.
In conclusion, this study figured out that the clustered travel patterns were differentially related to environment variables. The consideration of socio-demographic characteristics from the census and land use information may be useful for identifying its own function in the region and estimating the relationships among travel patterns. The travel patterns on a citywide level can be used to understand the dynamics of the urban population that dominate a city. We can further evaluate this framework on other cities or countries in order to estimate the applicability of supporting the use of our approach to improve the understanding of urban mobility based on smart-card data sources.