Characterization of Public Transit Mobility Patterns of Different Economic Classes

: This paper analyzes public transit mobility of different economic classes of Curitiba, Brazil, exploring an ofﬁcial smart card dataset provided by the city. With the population divided into subsets corresponding to economic strata, we characterized vital spatial-temporal transit usage patterns, such as departure times and destinations reached by different economic classes. We also constructed a network representing the common origin and destination of public transit users, enabling discovering distinct patterns. Among the results, we observe that with the increase in wealth, the morning activity is postponed (on average for 2 h), and the spatial distribution of the trips becomes more localized compared with lower classes. We also show that our model captures fairly well realistic mobility patterns exploring a cheaper and larger-scale data source by comparing our results with a household travel survey from Curitiba. Understand how people in different economic classes appropriate urban spaces help to provide subsidies for, e.g., more sustainable economic development propositions.


Introduction
The amount of data generated, collected, and shared by public administrations, companies, non-profit organizations, and even the scientific community has increased considerably in recent years [1]. The use of the knowledge extracted from these digital sources has made a significant difference in businesses and people's lives [2]. In this direction, some studies propose ways in which governments can use strategic data to serve their citizens better, addressing challenges such as health care costs, job creation, natural disasters, and urban mobility (subway [3], taxi [4], and public transit systems [5]).
Urban mobility refers to the movement of the population and goods within the cities' geographical space, constituting a component of the quality of life aspired by its inhabitants [6]. This topic has gained considerable attention internationally and also in Brazil. According to Miranda et al. [5], the leading cause of urban transit mobility problems in Brazil is the increasing use of private transport instead of public transit. However, public transit is often overcrowded in different moments of the day, thus, being, in a way, a paradox, contributing to making this challenge even harder.
While several studies on public transit mobility for different cities worldwide (Shanghai [7], New York [8], Berlin, and San Francisco [9]), Brazilian cities' studies are still scarce. Thus, this study focuses on studying public transit mobility patterns performed by users of different economic classes in Curitiba, Brazil. For that, we use official transit usage based on smart card access. Specifically,

•
Approaches for inference of the home and most common destination neighborhoods by each user, and the classification of users in different economic classes exploring Census data; • Characterization of fundamental transit usage patterns of residents of Curitiba, Brazil, studying, for instance, departure times, distances traveled, and destinations reached by different economic classes; • Construction and study of a network representing common origin and destination of transit users. This structure enabled uncovering relevant transit mobility patterns from users of distinct economic classes.
The remainder of the study is organized as follows. Section 2 presents the related studies. Section 3 shows some background information regarding the city of study. Section 4 presents the methodology applied in this study. Sections 5 and 6 discuss the results and the validation of our approach, respectively. Finally, Section 7 presents the conclusions.

Related Work
Urban transit mobility is a concept that can be studied through the lens of the use of private/individual transport (e.g., cars, motorcycle, and bicycles) or the use of public transportation/transit (e.g., buses, and subways). We can also study urban transit mobility at different scales, for instance, global (infectious diseases [13], epidemic scenarios [14]), continental [15], national [16] or regional (Medellin and Manizales in Colombia [10]; Belo Horizonte in Brazil [17]).
Taking our attention to regional urban mobility, we should think about how best to ensure people's access key spots of the city (e.g., business areas, schools, hospitals, and recreation areas). In this direction, Yang et al. [18] describe the integration of smart card data (SCD) of the city's subway and points of interest (POI) to study mobility in the city of Shenzhen, China. The POIs were used to semantically contextualize information derived from the SCD, allowing the purpose of the trip to be deduced. The study showed that it was possible to infer how different groups are associated with varying behaviors of travel, travel purposes, and socioeconomic activities by linking these data.
In a similar direction, Huang et al. [4] analyze the relationship between POIs and mobility networks. Based on two cities' taxi systems in China, the authors built transit networks by assigning urban regions as vertices and their connections as edges weighted by the number of mobility flows. Spatial communities were identified based on the strength of movement between regions. POIs were mapped to vertices in the network and were considered independent variables to classify the spatial community categories. Among the results, the study demonstrates the importance of data integration to a better understanding of urban mobility patterns.
Focusing on the study of the urban mobility during large scale events, Marques-Neto et al. [17] explore mobile phone records on three different types of events: a major football match, a rock concert, and a celebration of New Year, all events in Belo Horizonte, Brazil. By analyzing the spatio-temporal dynamics of the participants' movement patterns, the authors improved the understanding of human mobility caused by large-scale events. Such analysis helped develop an application designed to help mobile operators plan their infrastructure for large events.
Still using information from mobile phones, the study of Xu et al. [19] proposes an analytical framework to better understand human mobility patterns and their relationship to the population's socioeconomic status by coupling large-scale mobile phones and urban socioeconomic datasets. In a case study conducted in Singapore and Boston, the authors compare mobility patterns' properties and analyze how they vary across socioeconomic classes. The results conclude that generally, wealthier groups of phone users tend to do shorter travels in Singapore and longer in Boston, one of the possible reasons being the location of the two cities' wealthy neighborhoods, which are respectively central and peripheral. For mobility indicators that reflect the diversity of travel patterns and individual activities, the authors concluded that the level of wealth in both cities is not a factor that restricts the way people travel, since different socioeconomic groups presented very similar characteristics. In summary, the authors suggest that the relationship between mobility and socioeconomic status may vary across cities. This relationship is influenced by spatial housing arrangement, employment opportunities, and human activities.
Few studies also investigated public transit usage, being the most similar ones to our present study. For instance, Lotero et al. [10] analyze urban transit mobility between two Colombian cities using an origin-destination survey conducted with their residents. The authors showed that spatial and temporal patterns vary among different socioeconomic groups, concluding that as wealth increases, morning activity is delayed, noon displacement becomes smoother, and travel's spatial distribution becomes localized. Zhang et al. [3] propose an analytical framework to compare urban transit mobility patterns extracted from a transit smart card dataset (bus and subway) and a dataset from Global Positioning System (GPS) taxi trajectories. Using Singapore for this case study, and using the origin-destination matrices extracted from both datasets, they conclude that spatial distributions of travel demand extracted from the two modes of transport exhibit high correlations but with a higher degree of heterogeneity regarding the use of space by public transit. The work of Oviedo et al. [11], on the other hand, uses household socio-demographic and origin-destination surveys to examine the contribution of Lima's Bus Rapid Transit (BRT) system to job accessibility in the city, especially for low-income public transport users. In a comparative analysis of two periods, before and after the implementation of the BRT system, the authors concluded that the BRT line reduced the travel time for users to reach jobs, in comparison with the city's traditional public transport. However, BRT coverage declines in areas with high concentrations of poor populations. An analysis by socioeconomic groups showed positive effects of the BRT system on accessibility for the higher income areas, which did not happen for the lower-income classes.

Background Information
This section briefly describes relevant information regarding the city studied in this work and how it is administratively structured. Also, we bring information about the economic stratification of Brazil.

The City of Curitiba
Curitiba was officially founded on 29 March 1693, and is the capital of Paraná, one of the three states that compose the southern region of Brazil. The municipality of Curitiba has, according to the Brazilian Institute of Geography and Statistics (IBGE) (The Brazilian Institute of Geography and Statistics-IBGE is the primary provider of data and information about the Country, website: https://www.ibge.gov.br/en/institutional/the-ibge.html) estimates, 1,751,907 inhabitants in a total area of 434,967 km 2 . The North-South extension is 35 km, and the East-West extension is 20 km.
Administratively Curitiba is divided into 10 Regional Administrations covering all its neighborhoods. The integrated transit system serves all these neighborhoods. This division and the main bus routes is shown in Figure 1.

Public Transit in Curitiba
The city of Curitiba has no subway system. All public transit in the city is based on the bus system, with exclusive commuting corridors where Bus Rapid Transit (BRT) runs. This BRT passes through various integration terminals, which receive local neighborhood buses allowing for system integration.
The system also has inter-neighborhood lines enabling users to move from one neighborhood to another without going through city's central region or riding BRT lines. BRT only stops at designed stations and integration terminals. Local and inter-neighborhood buses stop on specific spots on the streets and in some integration terminals.
Another essential feature of Curitiba's transit is the integrated fare. Paying only one ticket, the citizen can compose his route, moving around the city. It is also important to mention that users only tap the smart cart to enter into the system, not being registered where these users disembark from the system.
Since 2002 Curitiba has been using the smart card for public transit. The implementation of this system was necessary to reduce the cash flow in circulation in the transit system, to speed up the boarding and passage of users through the turnstiles, to discipline and measure the use of the transit system by categories that enjoy free access and exemptions, in addition to reducing operating system costs [21].
According to Urbanization of Curitiba S.A. (URBS) (URBS is the company responsible for the strategic actions of planning, operation, and supervision involving the public transit service) statistics [22], in 2018, an average of 1,365,615 passengers was transported on weekdays. These passengers are Curitiba residents and the metropolitan area of the city, who travel to Curitiba for some activity. According to the same report, about 60.96% of the fares were paid using the smart card. The number of active smart cards in 2018 was 1,928,184, divided into three of the four existing transport categories, namely: User Card, Exempt Card, Student Card and Single Use Card (for this category, there is no information in the data provided by URBS, the user do not need to associate personal information with this card).

Economic Stratification in Brazil
In Brazil, there are several criteria available to classify society according to the average family income. However, in this study, we used the rule of the Secretary of Strategic Affairs (SAE), an institution from the federal government established in 2012 [23]. The criterion was based on information acquired through the 2010 Brazilian Population Census conducted by IBGE [24]. This national survey performed by IBGE aimed to portray the Brazilian population's socioeconomic characteristics and establish a basis for all public and private planning for the decade between 2010-2020.
SAE classifies groups of population in Brazil according to average monthly family income in eight economic strata. Table 1 presents a summary of this SAE grouping. According to this classification and the estimated income per neighborhood, Curitiba presents only four of the eight classes: lower middle class, middle class, upper-middle class, and low high class. There may be divergences in intra-neighborhood family incomes. However, the classification considers the average raised during the last Brazilian population Census. The appendix of this paper (Table A1) presents the demography from Curitiba's neighborhoods according to their associated economic class.

Methodology
In this section, we describe the studied dataset and the essential methodological steps.

Studied Dataset
For the analysis of urban transit mobility, we used URBS data regarding the use of transit using any of the identified bus smart cards. A formal request to URBS is necessary to have access to this dataset. The dataset used in this study has daily records of the bus smart card uses, covering 69 days (53 weekdays and 16 weekend days) from September 5 to November 22 in 2018. For each entry (or use), the dataset provides Line Code, Line Name, Vehicle Code, Card Number, Date of Use, Date of Birth of the User, and Gender.
This dataset has several complementary information files, such as the existing bus lines and bus stops that the lines serve, with GPS and vehicle timetable. Some data are static and are updated once a day. Other data, such as vehicle locations, are updated almost in real-time, as vehicles send their location on average every 5 s. All of this information is made available through .json files.
In this study, two complementary files related to Curitiba's public transit were used: • Vehicles-this file contains the location of all vehicles that circulated in Curitiba on the date before its disclosure. The information contained in this file is distributed in five fields: Vehicle Code, Latitude, Longitude, Usage Date, and Line Code.

•
Points-a file that contains the information of all existing bus stops in Curitiba. This file has nine information fields: Point Name, Point Number, Latitude, Longitude, Sequence, Group, Direction, Type, and Line Code.

Dataset Preprocessing
This stage's main objective was to establish the origin and destination information of users of public transit. Since only the entry into the system is recorded, as a result of this preprocessing, we kept only the data from which we were able to infer the origin and destination based on the usage history of each smart card. Figure 2 shows data preprocessing steps, from obtaining raw data to creating the database used in our study. Each of these steps is detailed below. We used 58 files about smart card use and 67 Vehicles files. The smart card use represents in total 17,511,710 records, and the vehicles file 129,461,200 records.
To infer the neighborhood where each smart card was used, we used different approaches. When users enter the transit system at designed stations or integration terminals (exclusive bus stop for BRT and some special lines), the bus line code and the vehicle code linked to this entrance are directly related to the bus stop location. Therefore, the identification of the neighborhood happens with high precision. When user entrance happens at regular bus stops, we search for a match using line code, vehicle code, date and time, to obtain a GPS coordinate for each smart card entry.
Although there are cases where the GPS position recording interval is longer, on average, the GPS coordinates updates of the buses are recorded around every 5 s, making the data more detailed than needed for our purpose: infer the approximate area where the smart card was used. Thus, we keep only one update per minute for each vehicle, since within one minute, it is considered that the bus has not traveled a considerable distance. After this filter, the new vehicle information represents 18,594,495 records, which enable the inference procedure to be executed more efficiently without compromising the quality considerably.
Here it is important to comment that when matching the line code, vehicle code, date and time attributes between the base of the smart cards and the bus records, it was observed that some buses had the line code and/or the vehicle code filled in with a generic value, and it is not possible to guarantee the identification of the GPS location of the card entry. In these cases, those records have been deleted.
After enriching a smart card entry with latitude and longitude, it was associated with the neighborhood information of the particular bus stop. To make this procedure, first, the latitude and longitude inferred were matched with the nearest bus stop of a specific line of the smart card entry. This procedure was done using the R Tree data structure. The R tree was proposed by Antonin Guttman and is widely used as a space access method, allowing the indexing of multi-dimensional information as geographic coordinates [25]. At the end of this step, we could infer the neighborhood for 5,388,638 smart card entries.
Some people lend their smart cards, with or without financial advantages, to others who are traveling on the same bus line, generating noise in the records. Some entries are made within minutes on the same lines in these cases, generating a lot of entries for the same smart card. To minimize those cards' impact in our analysis, we decided to delete cards with more than 150 entries. The rationality behind this value is that the study period is 69 days (16 days of weekends) and that each user typically uses at least two tickets per day. In a similar reasoning line, cards with less than ten entries were deleted during the analysis period. They represent a small number; on average, one entry per week. By disregarding them, we also aim to make the results more robust. After these exclusions, the number of entries with neighborhood became 5,225,573.
Then we separate the entries into two groups: home neighborhood and destination neighborhood. It was observed in the data analysis a significant growth in the number of entries around 5 a.m. This growth reaches its peak at around 7 a.m., and the phenomenon ends at 10 a.m. Because of this scenario, we considered for the home neighborhood identification entries between 5 a.m. and 10 a.m., as this is a time frame in which users probably use transit to go to work or study. We consider the destination neighborhood only the entries made after 10 a.m. Recall that users in Curitiba only are obliged to use their cards to entry in the system. Thus, one reasonable way to identify the destination point is to assume that a returning bus stop, entries after 10 a.m. in our case, is nearby where the user got off from the home neighborhood.
In possession of the smart card use made between 5 a.m. and 10 a.m., we grouped data by card and selected as the home neighborhood the most frequent one. We also grouped data by card entries after 10 a.m. and applied the same threshold and procedure to identify the most frequent destination neighborhood. By analyzing the number of entries per user in both scenarios, we could verify that these thresholds preserve a significant number of smart cards, not compromising the analysis. After this procedure, we got 14,632 unique smart cards containing home and most typical destination neighborhoods. These neighborhoods' economic classes were added according to the classification presented in the appendix of this work (Table A1).
Despite the possible limitations of our approach, we got evidence that it is a good approximation. We constructed a set of manually classified smart card entries, based on interviews with the smart card owner. In this way, we have got the home neighborhood and the most common visited neighborhoods. This information was gathered from 8 users, from different economic classes and different city mobility patterns.
Our approach identified all the information correctly for this dataset. The construction of this dataset is very challenging for several reasons, such as volunteer identification and agreement with disclosing private information; however, this experiment is vital to help us get a sense of our approach's quality.

Network and Metrics
A possible tool for representing and studying mobility patterns is a graph, which consists of the mapping system's elements into vertices and edges to form networks (human urban mobility [26]; social networks [27]). It is an efficient way to model and explain the mechanisms of collective behavior [28]. Several studies on mobility patterns have demonstrated network relevance for mobility studies (identification of communities [29]; city zoning [30]; demand for urban passenger transportation [31]).
To analyze Curitiba's public transit mobility, we use the origin-destination information, described in Section 4.2, to construct urban transit mobility networks. We used as origin the inference of home neighborhood and as the destination the most frequented neighborhood of each smart card. We created a matrix M of the order N × N, where N = 74, that is, the neighborhoods of Curitiba.
The M ij elements of the matrix indicate the number of trips observed from origin node i to destination node j. With this information, we can calculate the strength of node i, i.e., the number of trips from node i. The strength of a node is represented by Equation (1).
Another analysis we can do with source-destination information is how connected the city's neighborhoods are. For this analysis, we use the adjacent matrix A, so A ij = 1 if at least one trip of ij was observed while A ij = 0, otherwise. With the matrix A, we can calculate others characteristics of this network, such as the degree of the nodes, represented by Equation (2), and the shortest path between nodes i and j, that in this case, is defined as the minimum sequence of n nodes (i, l 1 , l 2 , ..., l n1 , j). So, A il1 = A l1l2 = ... = A ln1j = 1, which also defines the distance between i and j as d ij = n.
The joining of these two matrices, M and A, allows visualization of the established transit mobility network that can be observed in Figure 3, where nodes sizes represent their strength and the thickness of connections represent their degree.

Discovery of Patterns by Economic Classes
To observe the transit mobility separated by each economic class, we use some averages of measures previously introduced. The first one corresponds to the average degree (k) of the transit mobility network: where α (α = 4,. . . , 7) represent one of each class or the whole network and N α the number of non-isolated nodes in the class α. The second average is related to strength (s): Other measure is the average path length (d): To analyze the urban displacement performed by individuals from each of the economic classes, we used the radius of gyration r g [32], calculated according to the Equation (6): where n is the number of card entries, r i is the geographical location of the entry given by latitude and longitude and r cm is the center of mass given by the midpoint between all card entries. The Euclidean distance was used to calculate the distance between entry and the center of mass (r i − r cm ). Finalizing the measures for the discovery of patterns for each economic class, we calculate the betweenness centrality to identify the places' centrality. The betweenness centrality of a node v is the sum of the fraction of all-pairs shortest paths that pass through v [33]: where V is the set of nodes, σ(s, t) is the number of shortest (s, t) − paths, and σ(s, t|v) is the number of those paths passing through some node v other than s, t. If s = t, σ(s, t) = 1, and if v ∈ s, t, σ(s, t|v) = 0.

Results
To better understand the use of urban space through public transit, some analyzes are necessary. This section presents the analyzes performed in this direction.

Users Economic Information
We start our analysis by understanding the economic characteristics regarding the population under study for better modeling of transit mobility performed by these different classes. Table 2 presents in a summarized way the distribution of the population of Curitiba according to their economic class and the number of distinct smart card users per class, with the total trips analyzed in this study. More detailed information on Curitiba's population can be found in the table available in the appendix (Table A1).

Stratified Transit Mobility Networks
We now turn our attention to the mobility performed by the studied users. Figure 3 shows our public transit mobility networks corresponding to classes 4, 5, 6, and 7 of Curitiba (the only ones observed in the city according to the 2010 Census). All networks contain all nodes, that is, all Curitiba's neighborhoods, and they are positioned in the images according to their actual positions in the city. However, we do not observe necessarily activities in all neighborhoods for all classes. There are cases where the class α (α = 4,. . . , 7) neither live nor visit other city neighborhoods. In this case, the effective number of non-isolated nodes in the transit mobility network of class α is N α < N. The edge's size reflects their weight, i.e., thicker ones are heavier ones. In each class represented, the nodes of larger sizes are those with greater strength and belong to the studied class. With the help of Figure 3, we can see very distinct transit mobility patterns among all classes. Thus, making each transit mobility network by economic classes having different structural properties. Some of the main properties are presented in Table 3, considering the aggregated network (observed all transit mobility) and separated by each economic class. We show the total number of non-isolated nodes (N), the total number of edges and the average measures introduced above. Analyzing Table 3, we can quantify the apparent differences observed visually between the existing transit mobility networks for the economic classes of Curitiba. Although class 7 has almost twice as many unique users in our dataset compared to class 6, both have a high value of average degree k α , proving to be the most connected networks. That is, users in these economic strata tend to visit a higher diversity of neighborhoods. If we compare classes 5 and 6, which have practically the same number of unique users in our dataset and the same average strength s α , class 5 has a much less connected network when analyzing its average degree. Looking at the average path length d α by the classes, we find that class 4 trips are more concentrated compared to the other classes, especially with class 7, corroborating with what it is observed in Figure 3.

Temporal Patterns of Urban Transit Mobility
This section presents results about the temporal patterns of public transit mobility in Curitiba by different economic classes. For these analyses, we considered all trips made for the 14,632 unique users in our dataset.
Analyzing Figure 4, which shows the distribution of the number of smart card use throughout the day, it is possible to observe two peaks during the weekdays, one between 5 a.m. and 7 a.m. and another between 4 p.m and 6 p.m. At midday, there is a third peak, but irrelevant compared to the other two. If we look at the weekend, there is a slightly different behavior, with a peak between 5 a.m. and 7 a.m., and then a somewhat uniform distribution in the remainder of the period, except for class 4, which has a very sharp curve.
Looking at the results for individual classes during weekdays in more detail, we observe that the early-morning peak delays as the class increases. Class 4 peaks at 5 a.m., classes 5 and 6 at 6 a.m., and class 7 at 7 a.m. At midday, there is a small peak, but class 4 does not appear in it. Similar behavior in the morning is observed in the afternoon. The lower class has its earlier displacement peak. While in class 4 the peak occurs at 4 p.m., for class 7 this peak is at 6 p.m. It is important to note that the peaks windows for class 4 is wider in this period, showing greater dispersion in transport utilization for this class.
For weekends we can observe a very different behavior for class 4 compared to the other classes. Class 4 behaves similarly on weekdays and weekends but anticipates its afternoon peak to 3 p.m. This characteristic can be attributed to the fact that Saturday is also a working day and labor in industries, for most users of class 4, which may not be the case for users of other classes. Classes 5, 6, and 7 have the same behavior observed during weekdays and what changes are later. From 10 a.m., these three classes make a homogeneous displacement, having a small peak at 3 p.m. as well.
For all classes, the vast majority of smart card users use the system only twice a day (more than 80%- Figure 5a), with an interval of more than 10 h between these uses (Figure 5b). This underscores the hypothesis that public transit users use the system for displacements for specific purposes, such as going to work.

Spatio-Temporal Patterns of Urban Transit Mobility
Now we analyze spatio-temporal characteristics of urban movements observed in the studied dataset. We perform aggregation of smart card use in three-time intervals: morning from 5 a.m. to midday, afternoon from noon to 6 p.m. and evening from 6 p.m. to midnight. The period between 0 a.m., and 5 a.m. was disregarded because the number of trips is negligible in this interval, representing only 0.05% of the total. In this period, Curitiba's public transit works with a minimal number of lines in operation, only 20, representing less than 10% of the lines in action during the day. Figure 6 shows heatmaps of the smart card's geographical location each time interval by individuals from the four economic classes identified in Curitiba. According to the heatmaps, the city's central region receives the largest number of individuals, in line with the fact that it is the most economically active region in the city and has easy access to other regions through public transit. Except for class 4, all other classes have an intra-class similarity between the afternoon and evening periods, which indicates a very similar geographic distribution of permanence for these classes. In the case of class 4, where the neighborhoods that compose this class are geographically more distant from the other regions, there is a large displacement in the morning, with a more uniform distribution in the afternoon and night. This phenomenon can be justified by analyzing that, since it is longer distances, there is longer departure time, which prevents displacements between origin and destination at intermediate times during the day.

Urban Displacement
In this section, we analyze the urban displacement performed by individuals from each of the economic classes. Using the radius of gyration r g and calculating the distances in kilometers, according to Figure 7, it is possible to observe that individuals make shorter displacements with the increase in wealth.
By this figure, it is possible to verify that classes 7 and 4 are very distinct. If we take the measure at 50% for each class as a base, we observe that class 7 travels around 4 km, class 4 has a displacement almost double, 8 km. For classes 5 and 6, in 50%, all radius of gyration is approximately up to 4.2 and 4.5 km, respectively, varying considerably less than the other classes.

Centrality of Places
This section analyzes the centrality measures mentioned above on the transit mobility networks constructed in our study. We can see from Table 4, according to the metric in-strength (i-s) "Centro" (CE) neighborhood, which is located in the central region of Curitiba, is the neighborhood with the highest network centrality in almost all classes. This is because it is towards this region that occurs the largest displacement of the smart card users studied. However, this is not the case for class 4, where "Centro" is the second most important neighborhood (Top 2), losing to "Pinheirinho". That is because "Pinheirinho" contains a practical terminal for users of class 4 to move throughout the city. This terminal located in Pinheirinho enables fast access to all regions of Curitiba through the exchange of buses, and, for this reason, it is a popular terminal in the city [22]. As this transfer is done inside this bus terminal, there is no need to pay a new fare for the trip and, consequently, there is no record of the use of the smart card. Note that, for these cases, our approach, which relies on smart card activities, do not capture those buses changes. However, this does not compromise the analysis because we are interested in the main neighborhoods where people perform activities, not just transit through it.
Analyzing the out-strength (o-s) metric, we observe that for classes 4 and 5, the largest outbound displacement occurs in the neighborhood with the largest population for these classes (as can be seen in the Appendix A- Table A1). For classes 6 and 7, this phenomenon is different. In class 6, the largest outbound displacement neighborhood for these classes is the Top 4 in terms of population, whereas, in class 7, it is the Top 3. In other words, in both classes, users from these classes who use public transport the most (using smart cards) do not belong to the most populous neighborhoods in these classes; however, they do belong to neighborhoods with lower family income.  Looking now at the betweenness (bet) centrality, we can see that for almost all classes, the most relevant neighborhoods are also the neighborhoods that have the largest out-strength centrality. Analyzing classes 5 and 6, the neighborhoods with the highest betweenness (Top 1) also have bus terminals with high demand [22], enabling transfers without paying a new fare to reach all city regions quickly. This makes these neighborhoods important bridges in urban displacement. For class 7, "Centro" appears again as the central neighborhood, now in the betweenness metric. This may be justified by the number of bus lines circulating in the city center, facilitating the movement. It is also interesting to note that, for class 4, all neighborhoods had a betweenness equal to zero.
All central places uncovered with different metrics and for different classes can be useful in different ways. The in-strength and out-strength measures allow a broader understanding of urban transit mobility, as they show the locations (neighborhoods) where users are coming from and where they are going. Note that central places' indication when looked at different classes is different from the one obtained by the aggregated network. This information helps to understand the mobility demands of different economic classes, contributing, for example, to better urban interventions in the public transit system.
As for betweenness, which identifies important bridge places for different areas of the network, it is a useful metric to help implement actions aimed at the city's population at the strategic points indicated by such measure. These actions could range from strategic information dissemination to vaccination campaigns, to mention a few. Note that these strategies could be adapted to reach specific economic classes, based on the information obtained by our approach.

Validation
This section aims to validate our approach with other official data as well as a random graph model.

Household Travel Survey Data
In 2016, the Curitiba Institute for Urban Planning and Research (IPPUC) conducted a household travel survey [12] (IPPUC survey) to draw an overview of transportation displacements and demands in Curitiba and its metropolitan region and allow greater security in defining strategies on the transit mobility of the capital.
The research was carried out through three distinct stages: home interviews, volumetric counting, and vehicle speed measurement, as well as an opinion poll, where residents could assess the quality of public transit and the road system as a whole. We explore this data to create an origin-destination network, as we created by the dataset of smart card access investigated in this study.

Random Graph Model
We also introduced a random graph model that creates a network with the same structure approached until then. This random model has the same nodes as our graph, the 74 neighborhoods. The edges that connect the neighborhoods were generated randomly, respecting the same number of edges in our graph (14,632). To study the similarity between the graphs, 100 random models were generated.

Comparisons
In this section, we compare the three origin-destination networks. Figure 8 provides a visual comparison of the displacement identified by these three networks.
We can see that the network established with the IPPUC survey data is more connected than our study. Analyzing the data, we observed that in the home interviews phase, the interviewees informed that they had to travel using public transit in more than one stage; that is, all connection (steps) used have been reported.
Curitiba has a public transit system that allows the transfer at transit terminals and stations all to different buses paying only one ticket, i.e., registering only once with the smart card. Because our research uses the information from these smart card records in the system, we were unable to observe these steps mentioned in the IPPUC survey (i.e., the average of these steps ranges from 1.02 to 1.92 in the neighborhoods of Curitiba). For our study, the displacement appears in only one step, since the dataset does not present the richness of detail that an interview can extract. In the network produced from the IPPUC survey, all stages from origin to the destination were represented. For a better validation of the results, we performed some quantitative analyzes. Observing the graph of Figure 9a, which shows the distribution of node degree centrality, it is possible to identify this difference: a higher degree for the IPPUC survey network precisely because of the interviewees' details.
From the random model, we observed a very different behavior, demonstrating that what is captured by our approach is far from random models. This difference provides us with the necessary background to conclude that our methodology was well structured. Comparing the betweenness presented in Figure 9b further promotes the observed difference between the displacement pattern reported in the IPPUC survey and the pattern observed in our study. The betweenness curve of our study is larger and has higher vertices precisely because it does not demonstrate all the displacement steps reported in the IPPUC survey. Again, from the random model, the behavior of the betweenness was different comparing the other cases, as expected.
To verify the similarity between these three established networks, we used the Pearson's correlation coefficient and the Root Mean Square Error (RMSE). Using the Pearson's correlation coefficient to compare our study and the IPPUC survey, we obtained the value of 0.4582 (p-value < 0.001), indicating that both studies capture a similar phenomenon and, potentially, can be complementary. Comparing our network and the random model, we obtained a Pearson's correlation of 0.0064 (with a confidence interval of ±0.0056). The correlation between the IPPUC survey network and the random model was 0.0114 (with a confidence interval of ±0.0081). Pearson's correlation between the random network and the other models showed the lack of linear correlation in both cases.
We now use the RMSE to assess the similarity between the networks. Comparing our approach with the IPPUC survey the value obtained was 17.27. This value indicates, again, that both networks capture a similar phenomenon. In the comparison between our approach and the random network, the RMSE value was 132.0031. In the comparison between the IPPUC survey and the random model, the RMSE value was 130.4139. As with Pearson's coefficient, the RMSE between the random network and the other models showed the lack of similarity between them.

Final Discussions and Conclusions
This work presents a study on public transit urban mobility in Curitiba based on data available from the city's public agencies. We analyze the geographical distribution of travel origin and destination and the temporal pattern of these trips by adding information about smart card users' economic status. A summary of the results can be seen in Table 5. i-s: "Centro" neighborhood, region that occurs the largest displacement of the smart card users studied; o-s: the largest outbound displacement occurs in the neighborhood with the largest population for this class, "Cidade Industrial"; bet: "Cidade Industrial" neighborhood has a bus terminal enabling transfers without paying a new fare to reach all city regions quickly. i-s: "Centro" neighborhood, region that occurs the largest displacement of the smart card users studied; o-s: "Bairro Alto" is not the most populous neighborhood for this class, but it is one of the neighborhoods in this class with the lowest family income; bet: "Bairro Alto" neighborhood has a bus terminal enabling transfers without paying a new fare to reach all city regions quickly. i-s: "Centro" neighborhood, region that occurs the largest displacement of the smart card users studied; o-s: "Centro" is not the most populous neighborhood for this class, but it is one of the neighborhoods in this class with the lowest family income; bet: the number of bus lines circulating in "Centro" neighborhood facilitates the movement of users.
The transit mobility networks associated with each of the economic classes are quite different. The most connected networks are represented by classes 6 and 7 (the higher-income brackets). Although class 5 has the same strength as class 6, it connects to fewer neighborhoods. Class 4, on the other hand, makes smaller displacements if we compare path length moved by its users.
By observing travel's temporal behavior, approximately 80% of users use the public transit system up to twice a day, with an average interval of 10 h. There are two peaks in demand throughout the day; one very early and one in the late afternoon. During weekdays these peaks are delayed as economic class increases. This behavior shows that higher classes may have greater flexibility in their working hours, which does not happen in a factory production line, for example, where many employees tend to belong to class 4. It is also possible that class 4 has more people doing informal jobs, but these are hypotheses that this work does not capture. For weekends, while class 4 presents the same behavior compared to weekdays, classes 5, 6, and 7 change their transportation use time, being practically constant after 10 a.m.
In a spatio-temporal analysis, it was observed that the city's central region is the most accessed, regardless of economic class. We found that higher classes move less according to the radius of gyration, while 80% of individuals in class 7 travel distances less than or equal to 5 km, the same amount in class 4 reaches 9 km. The other classes (5 and 6) show a significant similarity of displacement in two of the three observed periods (afternoon and evening).
Based on the network centralities, we observe that for classes 4 and 5, the most populous neighborhoods of theses classes are also the most important in terms of out-strength and betweenness. In contrast, for classes 6 and 7, this situation is slightly different. The most relevant neighborhoods are not among the Top 3 in terms of population, but they are among the Top 7 in the lowest family income category. In this study, "Centro", a neighborhood in the city's central region, represents the most relevant one in all metrics. A special note for this neighborhood is a high intra-neighborhood displacement, justified by the number of bus lines in this region, facilitating the displacement.
Finally, when comparing our study with the recently conducted research with some city dwellers and a random model, we observed significant similarities between our study and the survey model. It is undeniable that an interview presents a greater detail of the information. Still, if we observe both networks' measures, we show high similarities, indicating that our approach, easier and cheaper to conduct than a survey, could be a complementary tool for the city managers. Comparing the random model with our study and the survey model, we showed the lack of linear correlation in both cases.
Our study shares similar conclusions with the study of Lotero et al. [10]. However, as mentioned earlier, our smart card-based approach is an easier and cheaper methodology to be implemented compared to a household travel survey (the one used by [3,10]). Besides, interviews and questionnaires are expensive to be conducted; it is not uncommon to take a considerable amount of time. We also observe different behavioral patterns between different cities from previous studies. Oviedo et al. [11] found that BRT system's implementation in Lima enabled greater accessibility for the higher income population, unlike the lower-income classes. Our study, on the other hand, observed that users from all economic classes of Curitiba move to all regions of the city, even though they present different characteristics in terms of mobility. Such a conclusion demonstrates the importance of studying particular cities before generalizing results.
The methodology and analysis covered in this study can be adapted for any city or region with transit access data. This information should make it possible, directly or indirectly, to understand urban space use better, enabling comparing particular areas. Using different databases, it was possible to enrich the characterization of public transit mobility patterns of different economic classes, which is essential to help create better urban policies that consider these aspects.
The mobility pattern centered on individual motorized transportation proves to be unsustainable, both in terms of environmental protection and in meeting the displacement demands of current urban life. The traditional response to congestion problems, by increasing road capacity, stimulates the use of the car and generates new congestion, feeding a vicious cycle responsible for the degradation of air quality, global warming, and compromised quality of life in cities, such as a significant increase in noise levels, loss of time, degradation of public space and stress [34]. Thus, considering different economic classes, a view of urban transit mobility allows us to understand how people in these classes use urban spaces, providing subsidies for more sustainable economic development proposals.
Although the difference in income between classes is very subtle according to Census data, the difference in transit mobility patterns between these classes is noticeable. Besides helping to propose new interventions to promote more sustainable development, there are many other advantages to incorporating socioeconomic differences when studying users' mobility. For example, this could help develop new contagion processes or new mechanisms to disseminate strategic information in urban areas.

Conflicts of Interest:
The authors declare no conflict of interest. Table A1 summarizes some key demographics of Curitiba City.