This section defines the research methodology providing an overview of the methodology and focuses on determining the relation between urban mobility indicators and an urban mobility index as a function of urban mobility assessment process.
3.1. The Methodology Overview
This section defines the methodology for establishing an expert system for the urban mobility index calculation (
Figure 1). An expert system is defined as a system in which human knowledge is embedded and applied to solve problems in a specific area, in a way similar to a human expert [
72]. A corresponding subset of information was extracted from the telecommunication activity records of the public mobile communication networks, which contained the following information: the time of the telecommunication activity, the location of the base station to which the user was connected while performing the telecommunication activity, and a designation concealing the real identity of the user. The information provided is used to define the following urban mobility indicators: the number of migrations (number of trips), duration, and Euclidean distance. These indicators are used as input for the model. The first indicator represents a number of trips. An algorithm is used to determine this indicator, and the result is a trip matrix whose fields contain the total number of trips between the specific pairs of base stations (in the associated urban space), in a predefined timeframe. The second indicator is a trip duration indicator, calculated as a median value of the duration of the trip carried out over a certain period of time between specific base station pairs. The third indicator represents the (Euclidean) distance between the base station pairs. The calculated indicator values are used as inputs for the calculation within the urban mobility estimation, i.e., they are used as inputs for an expert system-based model, which is to be developed in the next step. The following step defines an expert system that uses the predefined indices, putting them in the appropriate relations in order to assess the overall urban mobility, on the basis of the “knowledge learned”. The learning includes a process of “knowledge extraction” from a wide range of experts in the field of urban mobility. Technically, this acquisition of knowledge is carried out through a survey, and an expert system is established based on the survey results, using the fuzzy logic method, i.e., the so-called ANFIS machine learning technique. The result is an expert system that defines and develops a model for calculating an urban mobility index, based on the indicators derived from the telecommunication activity records of the public mobile communication network users. According to the proposed model, the urban mobility index is calculated for each base station pair (of the corresponding urban wholes), and as such, represents a share in the total mobility index, therefore, called a partial index (partiality refers to taking into account only a part of the observed agglomeration space). The values of the partial urban mobility indices are, then, used to calculate the urban mobility index of the whole area for a predefined timeframe.
In step 1.1 the input to the model is a subset of the CDR records that are ex-ante anonymized (depersonalized) and from which all information is removed except for the temporary identification of the user, the timing of the telecommunication activity, and the geographical location of the base station to which the user was connected while performing the telecommunication activity. Data taken from [
73]. In step 1.2, the data file is transformed into a table. In step 1.3, the base stations stored in the associated table are identified. In step 1.4, the coverage area of each base station is identified, using the so-called Voronoi procedure [
42,
74,
75]. The calculation of the indicators primarily takes place at the level of all base station pairs. In step 2.1, the Euclidean distance between each pair of base stations is calculated in order to get their linear distances. The distance between the two locations is calculated using the Haversin formula [
76]. In the next step, the data is divided into subsets, since it is possible to define different timeframes within one day, depending on the purpose of the analysis. In step 4.1, the changes of the location are identified for each trip and user, in every individual dataset [
51]. Within the same step, an indicator B, trip duration, is defined as the time difference between the recorded start of the activity at the origin cell and the recorded start of activity at the destination cell for each individual trip. It is calculated by using Relation (1):
where:
B is an indicator of the duration of the journey for each individual trip;
is a time denotement in which the user activity was recorded at the base station and identified as the trip origin;
is a time denotement in which the user activity was recorded at the base station and identified as the trip destination;
is the travel time between two base stations.
Within steps 5.1 and 5.2, the tables are merged and the redundant data removed. The results of this procedure are the trip values with the assigned distances between the base station pairs. Next, in step 5.3, the travel speed for each individual trip is calculated as a quotient of the linear distance value and Δt, as shown in the Equation (2).
where:
is the speed of movement for each individual journey within a timeframe;
is the Euclidean distance between base station pairs;
is the duration of the journey, for each individual trip within a timeframe.
In step 5.4, the filtering follows. The data in which the time difference between the two recorded events is less than the defined time threshold is being filtered. All trip durations under 10 min, or longer than 60 min are also filtered [
38,
51,
74], similar to the data in which the linear spatial distance between the base stations is less than the defined spatial threshold. The spatial threshold is defined as 1 km [
38,
51,
74]. The filtering is done over data in which the calculated travel speed is not realistic in the context of urban migrations. All trips with the travel speed greater than 100 km/h are filtered.
Then, in step 5.5, the dataset is divided into two sets, namely the calibration set, and the model validation set. The calibration set consists of 80% of the total dataset, and the validation set consists of the remaining 20%. All further steps are performed on both datasets for each timeframe. In step 6, the value of the indicator for each timeframe is calculated and the data is stored in three different matrices for each timeframe. The first is the trip matrix (O-D matrix), the second is the distance matrix, and the third is the time matrix. In step 6.1. a trip matrix (O-D Matrix) is defined, containing the sum of all trips between the individual base station pairs within the appropriate time period. The indicator of the number of trips between the individual base station pairs
is calculated using Relation (3):
where:
is the total number of the recorded (registered) trips between the individual base station pairs (trip matrix);
is a trip between two base stations in an appropriate timeframe, all assumptions being met.
In step 6.2, a distance matrix is calculated, containing distances of all base station pairs between which a trip is generated within an appropriate time period. In step 6.3, a trip duration matrix is formed. It contains values of the median trip duration between all base station pairs for the corresponding time period, calculated according to the methodology described in step 4.1.
The indicator
is calculated using Expression (4):
as the median value of all travel times between base station pairs within the appropriate time period.
Where:
is the median value of all travel times between each pair of base stations over a timeframe;
is the value of the trip duration for each individual trip in a timeframe;
is the number of trips registered between each base station pair in a timeframe.
In step 6.4, the data normalization is performed. Previous steps result in values stored in three matrices, for each time period. Since the day is divided into eight time periods, a total output from the previous step includes a total of 24 matrices (8 × 3), which are then used in the urban mobility index calculation. In order to ensure that the model is applicable to all urban areas, regardless of their size, the obtained data need to be normalized, i.e., reduced to an interval [0,1]. The normalization is performed by normalizing the value of an indicator over a given interval with the highest value of an individual indicator for all eight intervals. The purpose of carrying out the normalization carried in this way is to include specific data in the calculation, i.e., to take into consideration to what extent an individual indicator participates in the total mobility of an urban agglomeration in each individual timeframe, and to identify its maximum. The normalization of the number of trips is performed by identifying the maximum value of the trip number indicator for a specific pair of origins and destinations within all eight time periods. These values are then stored into a new matrix called the matrix of the maximum trip quantities. The normalized trip quantity values are calculated by dividing the indicator values assigned to the eight matrices for each time period with the value of the maximum travel quantity matrix, thus, obtaining the normalized travel matrices for each time period. In step 6.6, the normalization of the trip duration indicators is carried out by identifying the maximum values of the trip duration indicator for each pair of origins and destinations within all predefined time periods, which is called the matrix of the maximum trip duration. This is followed by the calculation of the normalized trip duration value, by dividing the indicator values assigned to the eight matrices for each time period with the value of the maximum trip quantities matrix, thus, obtaining the normalized trip duration matrices for each time period. Then, in step 6.7, a normalization of the distance indicators is carried out. Unlike the two previously described indicators, the value of the distance indicator does not change over time. Therefore, the normalization is calculated in a way to normalize the length of each section between origin and destination pairs with the length of the total network, i.e., with the total length of all the sections included in the trip during all timeframes.
Next, in step 7, a partial urban mobility index is calculated. The parameter
, i.e., the partial urban mobility index for the corresponding base station pair in the appropriate timeframe, is calculated using the model defined in the following sections. The partial urban mobility index is defined as a function of the normalized indicator value, namely the total number of the trips recorded between the individual base station pairs, the median value of the trip duration between the individual base station pairs in a specific timeframe, and the Euclidean distance between the base station pairs, as shown in Expression (5). The result is a parameter within the [0,1] interval, indicating the value of the mobility estimate, with “0” indicating the lowest and “1” indicating the highest mobility level.
is the normalized value of the number of trips for the corresponding base station pair for all timeframes within one day;
is the normalized value of the median trip duration value for the corresponding base station pair for all timeframes within one day;
is the normalized value of the distance indicator for the corresponding base station pair.
In step 8, the urban mobility index (UMI) is calculated for each timeframe, as a median of the partial urban mobility indices between each pair of base stations (urban areas) between which a trip was made during a specific timeframe (O). The urban mobility index can have a value from zero to one, i.e., it can be presented as a percentage with a value of zero to 100%. It is calculated by using Formula (6). UMI is calculated for every specific timeframe:
where:
is the urban mobility index for the timeframe;
is the partial urban mobility index for the corresponding base station pair within the appropriate timeframe;
is the total number of origin and destination pairs for which the mobility assessment was performed.
In step 9, after calculating the partial urban mobility index, an additional urban mobility index can be calculated, in order to provide a broader picture of the urban mobility assessment and to complement the partial mobility index. An additional indicator is called the coefficient of the total mobility share (
). For each base station pair, the following indicators are defined: The number of trips and the Euclidean distance between them. Total mobility can be defined as the product of the number of trips and the distances between the individual base station pairs. The unit for this parameter is a passenger/transport kilometer (pkm). This unit is common in traffic engineering and is based on the number of actually transported passengers, i.e., actually traveled distance [
11,
12,
13]. The coefficient is calculated using Expression (7):
where:
is the total mobility share coefficient for each origin and destination pair (base station pair);
is the value of the trip number indicator in a specific timeframe;
is the value of the distance indicator for a corresponding base station pair.
Therefore, by multiplying the values of the trip number indicators with the distance indicators, it is possible to calculate the total mobility, which we define as the sum of all products of the trip number and distance indicator values, for each base station pair within a specific timeframe. The sum of all total mobility values, for all timeframes represents a total daily mobility. The total mobility at the level of an urban agglomeration within a specific timeframe is calculated by applying Formula (8), as the sum of all mobility quantities:
In the final step of the process, step 10, the verification and validation are performed. The purpose of the verification in this part of the procedure is to confirm that the defined algorithms and software correctly calculate the segment for which each algorithm is intended. An illustration of the relationship between the indicators and the parameters in the urban mobility index calculation is shown in
Figure 2.
3.2. The Relationship between the Mobility Indicators and the Urban Mobility Index Assessment
The determination of the relationship between the mobility indicators and the urban mobility index assessment is based on the input data collected through the survey method. The questionnaire was designed to collect the experts’ opinions through a set of questions about how and to what extent the combination of the values of the individual urban mobility indicators affect mobility. The questionnaire contains questions in the form of scenarios that include the urban mobility indicator values, defined in a way that they capture the characteristic indicator values within the predefined ranges, as well as the proposed mobility estimation procedure. On the basis of their own judgment and by applying the proposed mobility assessment procedure, the experts assign an appropriate value to each question. The survey results establish a link between the baseline values of the indicators from the scenario and the value of the mobility estimate.
In line with the previously defined methodology, urban mobility is estimated by using the three indicators obtained from the analysis of the telecommunication service billing records for the use of the public mobile communications network. The indicators include the number of trips, trip duration, and the distance travelled. The number of trips is an indicator that relates to the whole system, i.e., the respondent estimates mobility in the context and based on the total number of the mobility participants in a specific time period. Other indicators, i.e., trip duration and distance travelled refer to an individual trip, and the respondent assesses mobility by looking at these values from an end-user perspective. In the survey questionnaire, the number of trips is defined as follows: The indicator of the number of trips refers to the number of movements between the individual urban areas in a defined timeframe. It represents the number of all recorded movements, of all users, between the individual urban areas in an appropriate timeframe [
3,
36]. Values are categorized as follows [
3,
5,
63,
77]: A small number of trips are those which are, during a typical urban day, mainly present during periods when there are usually no trips related to work, entertainment, recreation, commerce or social events. A medium number of trips represents the number of trips, in urban conditions, that is characteristic for the period outside of the peak traffic period. For example, it refers to the number of trips achieved in the period when travelling is usually motivated by commerce, recreation, or social activities. A large number of trips represents the number of trips that is, in urban conditions, characteristic for the peak traffic periods, i.e., periods of a typical day, when residents travel to or from work during peak hours.
The trip duration indicator refers to the duration of each individual trip. However, in order to apply this indicator to all urban surroundings, i.e., in cities of all sizes, it must be relativized by calculating the proportion of a single trip in relation to the longest lasting registered trip, thus, normalizing the value of a single trip duration in relation to the longest trip in a characteristic day. In that sense, the longest trip is the one with the longest duration, while the origin and destination remain within the same urban agglomeration, i.e., the coverage area. For this reason, the value of the trip indicators can be defined as in
Table 1.
The distance indicator is used as an approximation of the distance travelled [
1,
3,
36]. In the questionnaire, the distance travelled indicator was defined as follows: The distance indicator refers to the distance travelled by the user during the trip between individual urban areas, i.e., it represents the distance between the trip origin and destination. The value of the trip distance indicator can be relativized by calculating the proportion of a single trip in relation to the longest registered trip, thus, normalizing the distance travelled within a single trip with respect to the longest journey in a characteristic day. The longest trip is the one during which the distance travelled is the greatest, with the origin and destination remaining within the same urban agglomeration, i.e., the coverage area. For this reason, the value of the distance indicator is defined as in
Table 2.
If the value of one of the indicators is equal to zero, the trip did not happen and the mobility is equal to zero, i.e., there is no mobility.
A mobility assessment was then defined. Respondents assigned one of the six predefined mobility ratings to the appropriate scenario, as explained in the following paragraphs. The rating categories, as well as the related descriptions further in text, are based on the ratings related to the service level, service quality, and customer satisfaction [
13,
53,
54]. In “extremely high mobility”, the traffic system is characterized by the completely free traffic flow and full freedom of maneuvering, where participants choose the speed, the freedom of movement is not disturbed, and waiting times are minimal. The users are extremely satisfied. With “high mobility” the traffic system is described by the conditions of the free traffic flow, the freedom of maneuvering is not disturbed, the speed of movement is minimally restricted, the freedom of movement is limited to a small extent and waiting times are rare and short. The users are satisfied. With “higher medium mobility” the traffic system is described by the stable traffic flow, limited maneuvering freedom and limited speed. Freedom of movement is also limited and waiting times are noticeable. Users are mostly satisfied. With “lower medium mobility”, the traffic system is dominated by the unstable traffic conditions, with low maneuverability, substantially limited speeds, clearly limited free movement, and higher average waiting times. Users are mostly dissatisfied. In the case of “low mobility”, the transport system is characterized by the unstable traffic flow with queuing, almost constantly restricted maneuverability, significantly limited movement speed, delays, and long waiting times. Users are dissatisfied. With “extremely low mobility” the conditions within the transport system encompass a forced traffic flow, with the maneuvering freedom and the freedom of movement completely disabled, movement speed lower than critical speeds, and extremely long waiting times. Users are extremely dissatisfied.
Survey questions were then formed. The questions cover all the permutations of the indicator values included in the mobility assessment. The experts answer the questions basing on their own knowledge and experience, by assigning an appropriate value to the mobility estimate in each question. The questions are designed by using the indicator values shown in
Table 3.
The questions are designed to encompass the values of all the indicators involved, for example, If <number of trips>, <trip duration indicator> and <travelled distance indicator>, mobility is <mobility estimate>.
For example, If <number of trips = low number of trips>, <trip duration indicator = medium trip duration> and <travelled distance indicator = long distance>, mobility is <mobility estimate = choose from value 1 to 6 where “6” is extremely high mobility, “5” represents high mobility, “4” stands for “higher medium mobility”, 3 represents “lower medium mobility”, 2 represents “low mobility”, 1 represents “extremely low mobility”>.
Each indicator has three possible values, whereas the mobility estimate has six values. By permuting all indicator values (33), a total number of 27 questions is set.
The survey included 36 urban mobility professionals, i.e., academics, experts from the private sector, and public sector experts working at the city level.
Although it is common to include a small number of experts, usually six to 10, when using the knowledge extraction method, a high number of experts from different domains of urban mobility are included in the research. In addition, the survey covered different professions working in the field of urban mobility, i.e., 16 experts from the scientific community, 11 experts from the private sector, and nine experts from the public sector.
Interviews with the experts were conducted at planned workshops and through teleconferences. Each workshop was designed in such a way that in the introductory part, a brief presentation of the research objective was given, and the research background was explained. This was followed by a discussion including the potential ambiguities, after which the questionnaire was answered.
This section further explains the segment of mapping the indicator values, obtained from the CDRs of the values available to experts, i.e., the link between the two sets. By the CDR records analysis, i.e., by applying the activities shown in Steps 1 through 6 in the methodology, normalized (numerical) values of the three target indicators were provided (for example: the trip number indicator has a numeric value of 0.12). By using the fuzzy approach, the “number of trips” indicator value is no longer displayed as a numeric value 0.12, but as a fuzzy affiliation function “a low number of trips”. During the knowledge extraction process, the experts were presented with the fuzzy indicator affiliation function values, thus, ensuring the link between the indicator values calculated by the proposed methodology, using the data from the CDR database together with a set of defined rules for establishing a model, generated as a result of analyzing the experts’ survey responses.
A preliminary result analysis is given below and shown in
Table 4. For each scenario, it shows a minimum and maximum mobility rating assigned by the experts, an average rating, and the calculated standard deviation. The distribution of ratings (as a percentage) for each mobility assessment scenario is also presented.
The preliminary data analysis was made in order to check the consistency of the answers and to identify questions where a pronounced assent or disagreement in the opinion of the experts exists. Certain differences in the opinion of the experts are expected, as they consider mobility from different perspectives, depending on their own expertise. The highest consensus of the experts was recorded for questions 14 (medium number of trips, medium trip duration, and medium distance) and 21 (high number of trips, short trip duration, and long distance). In Question 14, the assigned mobility ratings range from three to four, with an average score of 3.11, while in the Question 21, the responses range from five to six, with an average value of 5.78. In case of several questions, Question 12 (medium number of trips, short trip duration, long distance) and Question 16 (medium number of trips, long trip duration, short distance), the experts assigned the highest and the lowest ratings to the same question, with an average mobility rating of 4.67 in Question 12 and 2.78 in Question 16. In Question 7 (low number of trips, long trip duration, short distance), the highest number of experts (78%) rated the described mobility conditions with rank “one” (low number of trips, long trip duration, short distance), while in the Question 16 (medium number of trips, long trip duration, short distance), the highest number of experts (56%) rated mobility with the rank “two”. The highest number of experts (89%) rated the mobility described in Question 14 (medium number of trips, medium trip duration, medium distance) with a rank “three”. The highest number of experts (56%) rated mobility described in Question 10 (medium number of trips, short trip duration, short distance) and Question 15 (medium number of trips, medium trip duration, long distance) as rank “four”. The highest number of experts (56%) rated the mobility described in Question 12 (medium number of tips, short trip duration, long distance) as rank “five”. The highest number of experts (78%) rated the mobility described by Question 21 (high number of trips, short trip duration, long distances) with a rank “six”.