Next Article in Journal
Remote Sensing in Environmental Justice Research—A Review
Next Article in Special Issue
Spatial–Temporal Evolution and Regional Differentiation Features of Urbanization in China from 2003 to 2013
Previous Article in Journal
An Integrated Graphic Modeling System for Three-Dimensional Hydrodynamic and Water Quality Simulation in Lakes
Previous Article in Special Issue
Combining the Two-Layers PageRank Approach with the APA Centrality in Networks with Data

ISPRS Int. J. Geo-Inf. 2019, 8(1), 19; https://doi.org/10.3390/ijgi8010019

Article
Mobile Phone Indicators and Their Relation to the Socioeconomic Organisation of Cities
by 1,2,*,† and 3,4,†
1
CNRS, Centre Maurice Halbwachs, UMR 8097, Paris 75014, France
2
Centre for Advanced Spatial Analysis, University College London, London W1T 4TJ, UK
3
Open Lab, University of Newcastle, Newcastle upon Tyne NE4 5TG, UK
4
Orange Labs France, SENSe, Châtillon 92320, France
*
Correspondence: [email protected]; Tel.: +33-(0)1-80-52-14-42
These authors contributed equally to this work.
Received: 23 November 2018 / Accepted: 7 January 2019 / Published: 9 January 2019

Abstract

:
Thanks to the use of geolocated big data in computational social science research, the spatial and temporal heterogeneity of human activities is increasingly being revealed. Paired with smaller and more traditional data, this opens new ways of understanding how people act and move, and how these movements crystallise into the structural patterns observed by censuses. In this article we explore the convergence between mobile phone data and more traditional socioeconomic data from the national census in French cities. We extract mobile phone indicators from six months worth of Call Detail Records (CDR) data, while census and administrative data are used to characterize the socioeconomic organisation of French cities. We address various definitions of cities and investigate how they impact the statistical relationships between mobile phone indicators, such as the number of calls or the entropy of visited cell towers, and measures of economic organisation based on census data, such as the level of deprivation, inequality and segregation. Our findings show that some mobile phone indicators relate significantly with different socioeconomic organisation of cities. However, we show that relations are sensitive to the way cities are defined and delineated. In several cases, changing the city delineation rule can change the significance and even the sign of the correlation. In general, cities delineated in a restricted way (central cores only) exhibit traces of human activity which are less related to their socioeconomic organisation than cities delineated as metropolitan areas and dispersed urban regions.
Keywords:
cities; interaction; mobility; correlation; mobile phone; deprivation; segregation; inequality

1. Introduction

1.1. The Single-City Focus of Urban Sensing

The quantitative analysis of mobile phone records [1], smart card traces [2], or credit card transactions [3,4], is increasingly revealing the regularities behind human daily practices, such as mobility or social interactions (e.g., [5,6]), very often in an urban context. The main advantages of passive big data are well known and consist of, among others, the reduction of collection and treatment cost, the increase of sample sizes, and the possibilities for more timely and recurring observations. In the case of mobility studies, for example, Batran et al. [7] note that: “While traditional survey methods provide a snapshot of the traffic situation in a typical weekday, mobile phone data can capture weekday and weekend travel patterns, as well as seasonal variation of a large sample of the population at a low cost and wide geographical scale”. The disadvantages of such datasets lie in the fact that they suffer from spatial and temporal sparseness [8,9], from a lack of—or an unknown degree of—representativeness [10,11], and from issues regarding anonymity [4].
Within the urban sensing literature, mobile phone data play a prominent role as they form a source of passively collected information (users do not need to make an explicit action to share their locations as would be the case in, for example, location-based services or social networks), for large shares of populations (a large proportion of the world population now owns a mobile device of any sort), captured at a rather high spatial resolution (in general, the density of cell towers is high in urban areas). Mobile phone data research in an urban context has been applied to a diversity of individual cities, or to international comparison of cities: Paris [12]; Maputo [7], Dhaka [8], Santiago [13], Boston and Singapore [14], London, Singapore and Beijing [2]. Research with a focus on a single city, or a set of single cities, bears the advantage that it can easily tap into local knowledge when questioning the quantitative results obtained. This leads to better insights that can be used in urban planning and policy.

1.2. Mobile Phone Indicators

One problem with the single-city focus is that it cannot ensure that observations made in one city (usually a capital city of large population) remain valid for other cities. As a consequence, it is unclear whether findings can be generalized over different types of cities. The creation of mobile phone indicators avoids this problem. Since mobile phone indicators are typically computed for large user samples covering multiple cities, the aggregation of individual-level indicators in space allows to compare findings between cities. In addition, mobile phone indicators can be paired with other datasets to allow multi-variate methods to support interpretation. In the case of mobile phone data, creating individual indicators is possible at a nation-wide scale (as datasets are mostly provided by national operators) but it is not a straightforward task. For example, differences in the spatial resolution of observations make it hard to create comparable indicators for individual mobility [15], and it is known that home detection methods, which enable the spatial allocation and aggregation of individual users, still face severe challenges when it comes to validation and error estimation [9,16].
Regardless of the methodological challenges, creating mobile phone indicators and pairing them with census data is deemed promiscuous by multiple statistics offices and has been performed in the academic literature on several occasions. Pappalardo et al. [17], for example, show how a mobile phone indicator on the diversity of movement (the mobility entropy) in France relates directly to the European Deprivation Index (EDI). Eagle et al. [18] describe the relation between regional calling patterns and economic development in the UK. Decuyper et al. [19] discuss the relation between calling and purchase behaviours and food security in a Central African country. Frias-martinez et al. [20] investigate the relations between several mobile phone indicators (call, movement and purchase behaviour) and multiple census variables on education, demographics and purchase power in a Latin American country. With the exception of Vanhoof et al. [15], who study relations between mobile phone and census indicators for different urban areas in France, one clear shortcoming of these studies is that their analyses are fixed on the nation-level only, leaving a missed opportunity to explore the empirical relations between human mobility, social interactions, and the socioeconomic organisation of cities.

1.3. Sensitivity of Urban Scaling Laws to City Definitions

When intending to compare values of mobile phone indicators between cities, it is important to have a clear definition of what is considered a city. This is especially true since recent works on urban scaling and Zipf’s law on census data [21,22,23] have shown that the delineation of cities can substantially influence results and interpretations, mainly because areas either included or not in different delineations have heterogeneous properties. Despite the fact that this issue is traditionally overlooked (sometimes for good reason, because data is only available for a single delineation), it is to suspect that average human activity sensed in general, and mobile phone indicators in particular, are similarly sensitive to city delineation.
Proceeding one step further to the relations between mobile phone indicators and census data, one can ask to which degree such relations will be influenced by city definitions. Indeed, what is unclear from previous work on mobile phone indicators is how statistical relations with census indicators, whether obtained from a multi-variate analysis at the nation level or in the form of urban scaling laws, are determined by the way cities are defined. Before this question gets answered, empirically produced relations will be insufficiently trustworthy to properly engage with theoretical hypotheses such as the ones about the link between mobility, human interactions and the socioeconomic organisation of cities.

1.4. Research Question and Relevance

Consequently, in this paper, we explore the degree to which relations between mobile phone indicators and census indicators are sensitive to particular ways of delineating cities. In doing so, we question the value of mobile phone indicators on mobility and human interactions for understanding the socioeconomic organisation of cities. To test the sensitivity to city definitions, we run a parametric simulation of different city delineations in France. Assuming that mobile phone indicators depict different types of spatial variation compared to socioeconomic urban indicators (e.g., calling patterns might be less influenced by infrastructural elements and built-up environment and therefore more homogeneously spread across the country than, for instance, wages) and building upon recent empirical work that highlights the influence of city definition when assessing scaling laws [21,23], our hypothesis is that city definitions will influence empirical relations between the two types of indicators in a non-trivial way. As discussed before, multiple works have uncovered relations between mobile phone indicators and census data but, to the best of our knowledge, all of them do so based on one city definition only. If relations are sensitive to city definitions, this would have considerable implications for their validity, interpretation, and potential use in (predictive) applications.
Other attempts at describing inequality in cities with big data resorted to mapping and rewiring credit card spendings [3,24], or geolocated tweets [25]. However we believe that the crossing between big and small data and the delineation of cities is novel. This paper thus contributes to the literature by assessing the variability of mobile phone data with the socioeconomic structure of cities: we show for example that high numbers and diversity of contacts are “explained” for a large part by low levels of deprivation and segregation, and a higher inequality (economic diversity) in cities, especially when they are considered in their functional delineation of metropolitan areas (i.e., including commuters). On the contrary, social activities such as the duration of calls and their nocturnal aspect are left unexplained by the socioeconomic organisation of cities, whichever way cities are delineated.

2. Data

This section introduces the census and mobile phone indicators we will use in our investigation. We limit the analysis of census data to three socioeconomic urban indicators chosen for their social relevance and easy interpretation. They relate to three dimensions of the economic organisation of cities: their level of poverty (or deprivation), their level of inequality (distribution of wages) and their level of segregation (spatial distribution of wages) and are introduced in Section 2.1. Regarding mobile phone indicators we deploy 15 mobile phone indicators derived from a dataset in France covering aspects of human mobility and social interaction. Their definitions and properties are described in Section 2.2.

2.1. Census Data on Segregation, Inequality, and Deprivation

The deprivation indicator is measured by the European Deprivation Index (EDI) created for France by Pornet et al. [26]. The EDI is an individual deprivation indicator constructed from an European survey specifically designed to study deprivation. It is created as a composite measure incorporating information on both subjective and objective poverty and the attribution of the weights for different contributing factors is done specifically for France.
Wage inequality at the level of the city was computed similarly to Cottineau et al. [27]. The inequality index is computed using the Gini index method [28] on groups of similar wage earners as described in the CLAP database holding information on French firms and establishments. The inequality index measures the overall dispersion in the distribution of wages at the city level, and varies between 0 (extreme equality) and 1 (extreme concentration of wages).
Wage segregation for cities was also computed similar to Cottineau et al. [27], using the Reardon [29]’s R O index of ordinal segregation for classes of wages retrieved, again, from the CLAP database. This segregation indicator measures the spatial dispersion of the distribution of wages between communes of the city, and varies between 0 (homogeneous city) and 1 (extreme segregation by wages in the city).
The distribution of deprivation, inequality and segregation in French metropolitan areas (Aires Urbaines) is depicted on Figure 1. It shows different spatial logics (size effects for inequality, which is higher in large cities) and a regional differentiation for deprivation levels and segregation (which are higher in Northern cities for example).

2.2. Constructing Indicators from the French Mobile Phone Data

To create mobile phone indicators, we use a French mobile phone dataset collected during the period between 13th May and 15th October 2007. The dataset is owned by Orange and holds information from the Orange cellular network, which in 2007 consisted of about 18,275 cell towers nation-wide. The dataset itself consists of Call Detailed Records (CDR), that collect information on the time, deployed cell tower, initializing user, receiving user and duration/length of each call or text made by about 18 millions Orange subscribers. The French CDR dataset has been extensively studied before [9,15,16,17,30,31,32] and is one of the largest datasets that guarantee access to individual user data over such a long time period.
To construct mobile phone indicators from CDR data, a version of the open-source python library Bandicoot [33] was implemented on the big data infrastructure of the Orange Labs France. For each user and for each month in the observation period, a set of indicators (Table 1) is computed. Because indicators at the individual level entail a small but potential privacy risk, user indicators are aggregated at cell tower level. The aggregation is done for each user and for each month based on the presumed home location according to a home detection algorithm. We tested two home detection algorithms: the maximum amount of activities algorithm (home is the cell tower where the user made most mobile phone actions during a month) and the distinct days algorithm (home is the cell tower where the user was present the maximum number of distinct days during a month) as defined by Vanhoof et al. [9,16]. The result, for each home detection method, is a distribution of values for all indicators for each cell tower in the Orange cell network. When comparing results of using the distinct days algorithm to results when using the maximum activities algorithm, no substantial differences were found. This is not entirely surprising as results in [9,16] already suggested performance of both algorithms to be only slightly different when comparing population counts at the national level, with the performance of both algorithms being optimal in September [9]. Therefore, we choose to present the remaining results using the distinct days algorithms, limiting our analysis to September 2007 only. The use of home detection algorithms entails a separate pre-processing step and is prone to a degree of uncertainty at nation-wide level and an unknown degree of error at the individual level (as any individual validation is impossible because of privacy regulations). For the French 2007 dataset, the nation-wide degree of uncertainty has been extensively described by Vanhoof et al. [9,16], while an estimation of the individual level error has been estimated in Vanhoof et al. [34] to be around 3 km for median users in the French 2007 dataset.
The definition of most of the mobile phone indicators is straightforward, but some merit a proper mathematical definition (* in Table 1). The radius of gyration for example, is a measure of a user’s mobility surface defined as the spatial spread of the cell towers visited by a user relative to his or her centre of mass, which is defined as the mean point of all his/her visited cell towers [17]:
R a d i u s o f g y r a t i o n = 1 N i L n i ( r i r c m ) 2
where L is the set of cell towers visited by the user, n i is each cell tower’s visitation frequency, N = i L n i is the sum of all the single frequencies, r i and r c m are the vector coordinates of cell tower i and centre of mass respectively.
The entropy of visited cell towers is a reflection of the diversity of users’ movement pattern. It is defined as the Shannon entropy of the pattern of visited cell towers [15,17]:
E n t r o p y o f c e l l t o w e r s = l L p ( l ) log p log N
where L is the set of cell towers visited by the user, l represents a single cell tower, p ( l ) is the probability of a user being active at a cell tower l, and N is the total number of activities of one user.
The calculation of the entropy of called contacts is identical to the entropy of visited cell towers, only here, the pattern of called contacts replaces the visited cell towers [17]:
E n t r o p y o f c o n t a c t s = e E p ( e ) log p log N
where C is the set of all contacts of a user, c represents a single contact, p ( c ) is the probability that the user is contacting a contact c when active, and N is the total number of activities of one user.
Spatial patterns of the mobile phone indicators computed for September 2007 (averaged over individuals by cell tower) are presented in Figure 2. Most spatial patterns show a clear urban-rural dichotomy with, for example, number of active days, called contacts, visited cell tower and number of calls being higher in city centres where, most likely, mobile phone use was more adopted by the general population compared to rural areas. An other explanation for the number of antennas visited might be that their distribution is more dense in cities, thus artificially raising the value for a similar surface travelled by users. However, this bias was taken into account for the entropy of visited cell towers. The spatial pattern of the radius of gyration and distance between L1 and L2 stands out but is not unsurprising. Here, cell tower averages are influenced by users performing domestic tourism (see also [31]) that have a detected home at the seaside (L1) and a second plausible home location (L2) further away, resulting in a high radius of gyration value and a high L1–L2 distance. Another intriguing pattern is visible in the absolute number of calls at home. This pattern remains unexplained but could point to differences in the adoption of mobile phone technology and the Orange provider between regions. It is interesting, however, to note that this regional pattern dissolves when looking at the percentage of calls at home. The relative number of calls at home is rather uniformly distributed in France except for the extremely remote and rural areas where almost 100% of calls are performed at home.

3. Methods

In this section we’ll explain the methods used to investigate the sensitivity to city definitions of the relations between mobile phone indicators and census data. Section 3.1 explains how we align both types of indicators as they are gathered at a different spatial resolutions. The following sections explain how we simulate 4914 different city definitions (Section 3.2), how we compute correlations between indicators for each of these city definitions (Section 3.3) and how we represent the obtained results (Section 3.4). In a final section we propose a way to capture the 4914 different city delineations into a more interpretable set of 6 classes of city definitions. And we show how, for each of these classes, there exists a limited relation between the three census indicators we use to describe the socioeconomic situation of cities, pointing out their independence against one another.

3.1. Aggregation: From Cell Towers to Commune to City

In order to compare mobile phone indicators with census data, we need to find a common perimeter to aggregate the data. Since the majority of census data is available within boundaries defined by administrative units (communes in France), we choose to extrapolate the mobile phone indicators (available at the cell tower level) to match the communes boundaries. There is no information about the exact perimeter each cell tower covers but it is reasonable to assume that phones will log in to the closest antenna available at all time, thus drawing coverage areas close to Voronoi polygons around the cell towers.
After building a layer of Voronoi polygons, we intersect it with the layer of commune polygons using the programming language R. In each of the resulting intersection polygons, we computed the share of area that the intersection polygon represents with respect to its original cell tower Voronoi polygon. We allocated the number of users from the original cell tower Voronoi polygons to the intersections based on the share of area they represented. Finally, the data for communes were aggregated based on the number of users in each intersection polygon and a weighted average of all the indicators of mobile phone activity based on the share of users, with respect to the commune total users.

3.2. Simulating City Definitions

Now that we have mobile phone indicators and census data available at the commune level, we can simulate different city definitions by grouping communes together, and aggregating their indicators. We use a generative, parametric method to simulate a range of city definitions. This method is inspired by the official city definitions in France (which define a city centre based on a minimum density, a periphery based on a minimum share of commune dwellers commuting to the city centre, and then apply a population minimum) and has been produced by Arcaute et al. [21] and Cottineau et al. [23]. The method simulates different city delineations by aggregating the French communes into a set of cities by iterating over three parameters: a density minimum d to define city centres (from 1 to 20 persons per ha, by steps of 0.5), a minimum percentage f of workers in a commune commuting to this city centre (from 0 to 100%, by steps of 5) and a minimum population p within the resulting city (from 0 to 50,000, by steps of 10,000). In total, the simulation renders 4914 different city definitions, i.e., spatial delineations of aggregated communes (4914 = 39 density thresholds × 21 flow thresholds × 6 population thresholds). For each city definition we compute for all cities, the total population considered, the overall inequality (Gini coefficient of wage groups present in the city), the spatial segregation (of wages groups in the communes, the average deprivation index (EDI) and the weighted average of the mobile phone indicators based on the communes values with respect to their number of users.

3.3. Correlations between Mobile Phone Indicators and Census Data

Having prepared mobile phone and census indicators, as well as a method to simulate different city definitions, we investigate the correlations between mobile phone and census indicators, and their sensitivity to city definition. Specifically, we investigate the relation between the three census variables (Gini index, Segregation index and EDI) and all mobile phone indicators in Table 1 for each of the 4914 city definitions. For each combination of census indicator, mobile phone indicator, and city definition, the Spearman correlation coefficient is calculated based on the point cloud of all cities adhering to the deployed city definition. The Spearman correlation on ranks is preferred over the Pearson correlation as the latter is mainly for linear relationships which is not verified for in the automated computation we performed. In Section 4.4, we combine the effect of all three aspects of the socioeconomic organisation of cities into a multiple linear regression of each mobile phone indicator.

3.4. Representing Results for All 4914 City Definitions

Representing the resulting correlations for 4914 city definitions will be done by plotting the distributions over all 4914 city definitions (Section 4.1) or by constructing heatmaps when discussing the impact of the simulation parameters d, f, and p (Section 4.3). The coordinates of the heatmaps are made up of the different thresholds for population (p), density (d) and flow (f), i.e., the different parameters of the city definition, and are coloured according to the obtained Spearman correlation coefficient. In this way they offer a more expressive view of the correlation and their sensitivity to different city definitions.
When discussing in further depth the relation between the different socioeconomic variables and mobile phone indicators (Section 4.4), we will reduce the number of studied city definitions to a manageable 6 case studies. These 6 case studies correspond to the centres of classes formed by thousands of city definitions. The clustering of city definitions was performed on a fixed population minimum (p) of 10,000 residents (the threshold most frequently used in urban system studies [35]), because we want its focus to be on the density (d) and flow (f) thresholds that produce a variation in the spatial extent of cities amongst city definitions). Fixing the population minimum (p) on 10,000 residents, the clustering is thus performed on 819 definitions only.

3.5. Clustering City Definitions into 6 Classes

Clustering is based on the similarity of commune membership in cities over different city definitions. Starting from the membership table of communes to cities in the different definitions, we compute a dissimilarity matrix of city definitions based on their vector of about 36,000 asymmetric binary values (indicating if each commune is included or not in a city) and a Gower distance [36]. We then apply a k-medoid clustering [37] algorithm to the dissimilarity matrix and, judging from the silhouette width and the groupings obtained, we identify 6 classes (Figure 3) and their centroid.
The most numerous definition (“Urban cores” in green on Figure 3) represents ways of defining cities which result in very small aggregates. These “urban cores” are obtained either by highly dense communes with little periphery (right of Figure 3, i.e., high density threshold and middle flow threshold) or a wider centre with no periphery (top left, i.e., middle density threshold and high flow threshold), similarly to Unités Urbaines as defined by the French statistical office INSEE. This is the most restrictive way of thinking about cities and the centroid for this class is obtained with a density minimum of 11 persons per ha and a minimum of 75% of commuters from peripheral communes working in the centre. The next three classes (going clockwise on Figure 3) stretch through similar values of density minima but have increasing peripheries (by lowering the percentage needed to attached communes to the metropolitan centres). We call them “MetroCores”, “MetroMedium” and “MetroWide”. Their centroids have a similar density minimum (12–12.5) and a decreasing flow minimum (40%, 25% and 10%). With an even lower flow minimum on average, the “Dispersed” class is the furthest away from a strict view on cities. Indeed, almost all French communes are included in a “city” of some sort according to this definition, as only a few commuters are sufficient to aggregate peripheral communes to high population centres. This class is a limit case, represented by a centroid of density minimum of 10 and a flow minimum of 0. Finally, the “Metropolitan Area” class includes definitions closest to that of the Aires Urbaines as defined by the French statistical office INSEE. They are characterized by relatively low density thresholds (from 1 to 8.5 residents per ha) and a limited range of flow thresholds (from 10 to 45%). The city definitions in this class, as exemplified by the centroid case, are generally more numerous than in the “Metro-*” cases, have always some periphery but exclude the most rural parts of the country.
Before continuing on the correlation between mobile phone indicators and indicators of the socioeconomic organisation of cities, we want to check that the three indicators we picked to represent this organisation are independent from one another. We look at the Spearman correlation between deprivation, inequality, and segregation for the centroid city definition of each class, as indicated also in Figure 3. We find only one correlation with a R 2 over 5% (in bold in Table 2): the negative correlation between inequality and segregation in the ’Dispersed’ class (i.e., the class which is the furthest away from plausible definitions of cities). This result enables us to consider deprivation, inequality and segregation as three rather orthogonal dimensions to characterise cities.

4. Results

4.1. Distributions of Correlation Coefficients for All 4914 City Definitions

Visualising the distributions of obtained Spearman correlation coefficients for all 4914 city definitions allows us to (partly) assess the sensitivity of relations to city definitions. For the relation between EDI and several mobile phone indicators, for example, Figure 4 shows the distribution of Spearman correlation coefficients over all city definitions. The figure suggests that for the relation between EDI and some mobile phone indicators (interactions per contact, percentage of calls at home, mean call duration), altering the city definition does not affect the direction of the correlation computed, although differences in significance occur for different city definitions. Remarkable is that for some mobile phone indicators, the relation with EDI can change direction depending on the city definition. One example is the relation between EDI and the number of calls in Figure 4. For this relation, a part of the city definitions results in positive correlation coefficients (meaning that the residents of the poorest cities, according to these definitions, have called more) but a another part of the city definitions results in negative correlation coefficients (meaning that the residents of the poorest cities have called less). The relation between EDI and mobile phone indicators is thus influenced by city definition, leading to differences in significance or even to changes in correlation directions between city definitions.
Regarding the correlation of mobile phone indicators with the Gini index of wages inequality (Figure 5), we find more robust relationships across city definitions, with the human activity sensed by mobile phone generally positively correlated with inequality. For example, the number of calls, their diversity (entropy of contacts) and the mobility range and diversity tend to increase in cities with a larger level of inequality (generally larger cities, cf. Figure 1). Only call-specific indicators are rather uncorrelated with inequality.
Regarding the correlation of mobile phone indicators with the segregation index of wages (Figure 6), we find that most human activity sensed by mobile phone tend to vary negatively with segregation. For example, the number of calls, their diversity (entropy of contacts) and the mobility range and diversity tend to decrease in cities with a larger level of segregation. However, these trends are more mixed and the choice of urban definition affects the sign of the coefficient obtained.

4.2. Distributions of Correlation Coefficients by Definition Cluster

Another outstanding question is whether the correlations between indicators are sensitive to the six classes of city definitions defined earlier in Figure 3. To investigate this, Figure 7 shows the correlation coefficient, in this case for the relation between Entropy of Contacts and the socioeconomic indicators of cities, for all city definitions belonging to each of the six classes.
In the case of the relation between the entropy of contacts and the Gini index, we find that correlations are similar for all six classes of city definitions. The correlations are always positive: the more unequal the urban cluster (whatever its delineation), the more diverse the average pool of contacts of residents). Similarly, the correlation between the entropy of contacts and the deprivation level is also negative regardless of the delineation of city clusters.
The case of the correlation between the entropy of contacts and the wage segregation shows variation between the MetroCore and UrbanCore classes compared to the other classes as their histogram shows both positive and negative relations, whereas these correlation are always negative for other classes, and all very strongly in the case of the Dispersed and Metrowide classes. This can probably be explained by the strong impact of urban delineation on the measurement of segregation. The results for compact definitions (MetroCore and UrbanCores) tend to diverge from extensive definitions (Dispersed and Metrowide).

4.3. Heatmaps of Correlations

The outstanding question therefore becomes which city definitions leads towards which correlations?
An answer to this question can be formulated by mapping the obtained correlations coefficients between two indicators to the parameter-space used for the city definition. The heatmap in Figure 8, for example, does so for the relation between the entropy of contacts and the segregation index of wages. Overall, results show that the entropy of mobile phone contacts and wage segregation are mostly negatively related. However, this is especially so for flow thresholds below 40% (where the explanatory power of the model is the highest). This threshold controls the intensity of commuting relation between the urban core and its periphery. For low thresholds, large areas of periphery are included in the delineation (because the minimum share of active residents that the municipalities send to the urban core is below 40%). In these cities, the higher the segregation, the lower the diversity of contacts. It all happens as if segregated urban regions as a whole were restraining the social networks of their residents, or if the restricted networks of residents were hampering economically-mixed living.
The positive correlations are found for definitions with density thresholds overs 15 residents per ha, flow thresholds over 60% and low population thresholds. This corresponds to very compact spatial definitions of cities, where only central and integrated areas of cities are included. In these cases, there is spatial correspondence between highly segregated cities and residents with diverse social networks. This inversion of the correlation calls the impact of density on socioeconomic structures. Indeed, it seems to suggest that because the areas considered here are very dense and integrated, it is possible for diverse residents to overcome segregation (or even to thrive on it), perhaps by being more mobile.

4.4. Multiple Regression of Mobile Phone Indices with Socioeconomic Indicators

In order to combine the explaining power of all three aspects of the socioeconomic organisation of urban clusters, we have regressed the value of each aggregated mobile indicators with the value of deprivation, inequality and segregation for each definition of cities. In this section, we report the results for the 6 centres of classes defined previously (with a population cutoff of 10,000 residents) only for significance levels above 0.05.
Most social behaviours as sensed by mobile phone activity and aggregated at the city level seem to be influenced by (or at least correlate to) the three socioeconomic dimensions cities (Figure 9). For the number and diversity of contacts called for example, we see that, regardless of the city definition used, socioeconomic indicators work in the same direction: more deprivation and more segregation lessen significantly and on average the number of contacts called and their diversity, whereas higher inequality in the city has an opposite result. There is sociological evidence relating to the reduction of social networks with individual and neighbourhood deprivation [38] which could explain the statistical relations observed in this case. Combined with the observation that in some cases (clusters corresponding to very large urban perimeters rather than dense urban cores), deprivation correlates positively with the intensity of contacts (interaction per contact called: middle left graphs of Figure 9), this could match the usual observation that poorer actors have networks composed of more strong ties (more intense relations) and less weak ties (less diverse relations) than richer actors [39]. Higher deprivation and a stronger spatial concentration of wages could thus reduce the size and diversity of the social network with which an average individual interact virtually through mobile contacts. The important thing here is that, despite the mobile phone data dating from 2007, it might not just be the effect of the repelling cost of calls, because it is probably the cause behind the negative coefficient of the deprivation index on the mean duration of calls but in this latter case, segregation plays no role for most city definitions.
Finally, we find it intriguing that urban level of wage inequality would foster the number and diversity of contacts. This might be an effect of higher professional interdependency between the richest and the poorest in more unequal cities [40]... or simply an indirect effect of city size (which correlates positively with wage inequality, cf. Figure 1). In this case, the larger pool of potential contacts would increase the average actual diversity of contacts of individuals.
It is interesting to note that the intensity of the coefficients of the multiple regression vary slightly between the different clusters of city definitions but the overall picture is the same, except in the urban core clusters (the one composed of very dense city cores with little or no commuting periphery). Using these types of definition, the only significant variables at play is the level of inequality, which covariates positively with the entropy and number of contacts called. The absence of effect of deprivation and segregation in French dense city cores could indicate that the centrality and density of the residence has a positive effect on the size and diversity of the social network which is reflected in the phone behaviour observed. More generally, the predictive power of regressions ( R 2 ) is always the weakest for urban cores, whereas it peaks for definitions of clusters as metropolitan areas and dispersed areas. This suggests that activity behaviours of the inner parts of metropolises (urban cores), as sensed by mobile phones throughout the day, cannot be well described by any of their static socioeconomic properties.
As for the physical mobility behaviours sensed by mobile phone activity and aggregated at the city level, we see the exact same pattern as for the number and diversity of contacts. Therefore, the network of physical encounters seems to be influenced by the same variables as the social network of contacts, which is an interesting result.
Finally, we chose the mean duration of calls and the percentage of nocturnal calls to show that not all mobile phone behaviours covariate with the socioeconomic structure of cities (cf. the values of R 2 < 10 % in bottom graphs of Figure 9, whereas most other multiple regressions reach between 10 and 40%). The percentage of nocturnal calls for example is orthogonal to all three indicators for most city definitions, whereas the mean duration of calls seems to be only significantly impacted by the mean deprivation level of cities.

5. Discussion & Conclusions

In this paper, we have tried to take advantage of small and big data to bridge a gap between what is known about ’night-time’ residential socioeconomic characteristics of urban areas in France, their ’day-time’ production of inequality and segregation through wages, and the average social and spatial networks of citizens sensed with passive mobile phone data. We did so using French municipalities as a common unit of aggregation. We then studied the effect of city definition on the distribution and correlations of indicators obtained, building on previous work on the impact of city delineation on urban scaling results. Indeed, even though there is evidence of geographic concentration of inequalities with city size [41], it was showed that this statistical relation varied with the city delineation chosen [27] and did not hold for other aspects of inequality such as spatial segregation. In the present paper, we add to this evidence by showing that the fixed socioeconomic characteristics of urban clusters also do not relate monotonically with sociospatial activities sensed by mobile phone data. For example, segregation can be positively or negatively correlated with the average diversity of places visited and the number of interactions per contact, depending on the urban delineation chosen among about 5000 possible ones based on variations of density, commuting and total population criteria. Moreover, we have found that the combination of our three socioeconomic indicators was more or less predictive of social and spatial activity levels. For example, high numbers and diversity of contacts are “explained” for a third of their variation by low levels of deprivation and segregation, and a higher inequality (economic diversity) in cities, especially when they are considered in their functional delineation of metropolitan areas (i.e., including commuters). However, social activities such as the duration of calls and their nocturnal aspect are left unexplained by the socioeconomic organisation of cities.
This study has enabled us to further assess the quality of mobile phone data for social science, to examine its relationship with traditional small data such as census and administrative data, as well as to look at their geographical variability. However, it is prone to some biases, such as the aggregation of data at the cell tower and resulting Modifiable Area Unit Problem (MAUP) effects (plus the fact that dense areas are better described, with more antennas, than rural areas). Furthermore, the use of CDR dataset in general, and this 2007 CDR dataset in particular, is debatable: the data we have used are rather old (2007) and hard to update at this scale (six months worth of sensing) given the new laws regarding individual data privacy. Mobile phone behaviours have changed with the widespread smartphones and mobile data usage, which makes this study hard to replicate nowadays. One outstanding question is to which degree changes in mobile phone usage over time can influence findings. As stated before, CDR data collect traces from calling and texting. Due to the absence of a continuous data record between 2007 and now, it is unknown how mobile phone usage amongst French Orange users has evolved over time and how this would influence research findings. For example, since calling and texting can nowadays also be performed by web applications (something that is not captured in CDR data), the intuition could be that CDR data are becoming less relevant. Another intuition could be that, because users use their phones more, the advent of web applications on the mobile phone has actually increased the amount of calling and texting hereby improving the information available in CDR data. Both intuitions could be true, but no scientific work has been published on this yet. The recent collection of Data Detailed Records (DDR) data, a data record similar to CDR data that captures data roaming connections at cell tower level, can potentially help in answering questions regarding web related mobile phone usage, but then again it falls short in some other ways. Since DDR data collects cell tower information every time a user connects to mobile internet, the intuition is that it offers locational data on individual users more frequently, improving for example the quality of mobile phone indicators related to mobility. On the other hand, most DDR data records do not show which web applications is deployed by users, nor do they provide any information on the intended receivers of communication. In contrast to CDR data, this property of DDR data makes it impossible to construct any indicators related to social activities. In addition to the unknown changes in mobile phone usage over time, there is the obvious but crucial element of heterogeneity in mobile phone usage between (sub)populations. Different subgroups of populations will use their mobile phones in different ways, leading towards a difference in the quality of indicators created from their CDR records. For all the efforts that have gone into mobile phone research yet, this heterogeneity and its influence on research findings remains poorly understood and continues to cast an ambiguity when it comes to reproducing findings based on datasets from other operators, countries, or time periods. The consequence is that all research based on mobile phone data, or other large-scale datasets for that matter, is irrevocably determined by the time period and mode of data collection, and is (currently) characterized by a merely superficial understanding of the heterogeneity between subpopulations. Caution is therefore highly advised when extrapolating insights and follow-up studies on other datasets are absolutely recommended.
However, our point is to emphasize that observed correlations of geographical data need to account for delineation sensitivity or to justify why a specific spatial delineation is preferred over others. In the absence of such justification, the exploration of delineations helps highlighting robust correlations (which work in all configurations), systematic variations (which respond to some characteristics of the urban space taken into account) and random variations (which prove the spuriousness of correlations reported on a single spatial delineations).

Author Contributions

The authors contributed equally to the design, analysis, interpretation and writing of the paper.

Funding

This research received no external funding beyond our employment contracts.

Acknowledgments

We are grateful for our funding institutions at the time of this project (UCL, CNRS, University of Newcastle and Orange Labs). We also want to thank Elsa Arcaute for inspiring discussions at the start of this collaboration, as well as Clement Lee and Thomas Louail for their valuable comments on the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Gonzalez, M.C.; Hidalgo, C.A.; Barabasi, A.L. Understanding individual human mobility patterns. Nature 2008, 453, 779–782. [Google Scholar] [CrossRef] [PubMed]
  2. Zhong, C.; Batty, M.; Manley, E.; Wang, J.; Wang, Z.; Chen, F.; Schmitt, G. Variability in regularity: Mining temporal mobility patterns in London, Singapore and Beijing using smart-card data. PLoS ONE 2016, 11, e0149222. [Google Scholar] [CrossRef] [PubMed]
  3. Lenormand, M.; Louail, T.; Cantú-Ros, O.G.; Picornell, M.; Herranz, R.; Arias, J.M.; Barthelemy, M.; San Miguel, M.; Ramasco, J.J. Influence of sociodemographic characteristics on human mobility. Sci. Rep. 2015, 5, 10075. [Google Scholar] [CrossRef] [PubMed]
  4. De Montjoye, Y.A.; Radaelli, L.; Singh, V.K. Unique in the shopping mall: On the reidentifiability of credit card metadata. Science 2015, 347, 536–539. [Google Scholar] [CrossRef] [PubMed][Green Version]
  5. Alessandretti, L.; Sapiezynski, P.; Sekara, V.; Lehmann, S.; Baronchelli, A. Evidence for a conserved quantity in human mobility. Nat. Hum. Behav. 2018, 2, 485–491. [Google Scholar] [CrossRef][Green Version]
  6. Pappalardo, L.; Simini, F.; Rinzivillo, S.; Pedreschi, D.; Giannotti, F.; Barabási, A.L. Returners and explorers dichotomy in human mobility. Nat. Commun. 2015, 6, 8166. [Google Scholar] [CrossRef] [PubMed][Green Version]
  7. Batran, M.; Mejia, M.; Kanasugi, H.; Sekimoto, Y.; Shibasaki, R. Inferencing Human Spatiotemporal Mobility in Greater Maputo via Mobile Phone Big Data Mining. ISPRS Int. J. Geo-Inf. 2018, 7, 259. [Google Scholar] [CrossRef]
  8. Lu, S.; Fang, Z.; Zhang, X.; Shaw, S.L.; Yin, L.; Zhao, Z.; Yang, X. Understanding the representativeness of mobile phone location data in characterizing human mobility indicators. ISPRS Int. J. Geo-Inf. 2017, 6, 7. [Google Scholar] [CrossRef]
  9. Vanhoof, M.; Reis, F.; Ploetz, T.; Smoreda, Z. Assessing the quality of home detection from mobile phone data for official statistics. J. Off. Stat. 2018, 34, 935–960. [Google Scholar] [CrossRef]
  10. Longley, P.A.; Adnan, M.; Lansley, G. The geotemporal demographics of Twitter usage. Environ. Plan. A 2015, 47, 465–484. [Google Scholar] [CrossRef]
  11. Arai, A.; Fan, Z.; Matekenya, D.; Shibasaki, R. Comparative perspective of human behavior patterns to uncover ownership bias among mobile phone users. ISPRS Int. J. Geo-Inf. 2016, 5, 85. [Google Scholar] [CrossRef]
  12. Schneider, C.M.; Belik, V.; Couronné, T.; Smoreda, Z.; González, M.C. Unravelling daily human mobility motifs. J. R. Soc. Interface 2013, 10, 20130246. [Google Scholar] [CrossRef] [PubMed]
  13. Dannamann, T.; Sotomayor-Gómez, B.; Samaniego, H. The time geography of segregation during working hours. arXiv, 2018; arXiv:1802.00117. [Google Scholar] [CrossRef] [PubMed]
  14. Xu, Y.; Belyi, A.; Bojic, I.; Ratti, C. Human mobility and socioeconomic status: Analysis of Singapore and Boston. Comput. Environ. Urban Syst. 2018, 72, 51–67. [Google Scholar] [CrossRef]
  15. Vanhoof, M.; Schoors, W.; Van Rompaey, A.; Ploetz, T.; Smoreda, Z. Comparing Regional Patterns of Individual Movement Using Corrected Mobility Entropy. J. Urban Technol. 2018, 25, 27–61. [Google Scholar] [CrossRef]
  16. Vanhoof, M.; Lee, C.; Smoreda, Z. Performance and sensitivities of home detection from mobile phone data. arXiv, 2018; arXiv:1809.09911. [Google Scholar]
  17. Pappalardo, L.; Vanhoof, M.; Gabrielli, L.; Smoreda, Z.; Pedreschi, D.; Giannotti, F. An analytical framework to nowcast well-being using mobile phone data. Int. J. Data Sci. Anal. 2016, 2, 75–92. [Google Scholar] [CrossRef][Green Version]
  18. Eagle, N.; Macy, M.; Claxton, R. Network diversity and economic development. Science 2010, 328, 1029–1031. [Google Scholar] [CrossRef]
  19. Decuyper, A.; Rutherford, A.; Wadhwa, A.; Bauer, J.M.; Krings, G.; Gutierrez, T.; Blondel, V.D.; Luengo-Oroz, M.A. Estimating food consumption and poverty indices with mobile phone data. arXiv, 2014; arXiv:1412.2595. [Google Scholar]
  20. Frias-martinez, V.; Soto, V.; Virseda, J.; Frias-martinez, E. Can cell phone traces measure social development. In Third Conference on the Analysis of Mobile Phone datasets, NetMob; Blondel, V., Decuyper, A., Deville, P., De Montjoye, Y.-A., Toole, J., Traag, V., Wang, D., Eds.; MIT Media Lab: Cambridge, MA, USA, 2013; pp. 62–65. [Google Scholar]
  21. Arcaute, E.; Hatna, E.; Ferguson, P.; Youn, H.; Johansson, A.; Batty, M. Constructing cities, deconstructing scaling laws. J. R. Soc. Interface 2015, 12, 20140745. [Google Scholar] [CrossRef]
  22. Veneri, P. City size distribution across the OECD: Does the definition of cities matter? Comput. Environ. Urban Syst. 2016, 59, 86–94. [Google Scholar] [CrossRef]
  23. Cottineau, C.; Hatna, E.; Arcaute, E.; Batty, M. Diverse cities or the systematic paradox of Urban Scaling Laws. Comput. Environ. Urban Syst. 2017, 63, 80–94. [Google Scholar] [CrossRef][Green Version]
  24. Louail, T.; Lenormand, M.; Arias, J.M.; Ramasco, J.J. Crowdsourcing the Robin Hood effect in cities. Appl. Netw. Sci. 2017, 2, 11. [Google Scholar] [CrossRef] [PubMed]
  25. Shelton, T.; Poorthuis, A.; Zook, M. Social media and the city: Rethinking urban socio-spatial inequality using user-generated geographic information. Landsc. Urban Plan. 2015, 142, 198–211. [Google Scholar] [CrossRef]
  26. Pornet, C.; Delpierre, C.; Dejardin, O.; Grosclaude, P.; Launay, L.; Guittet, L.; Lang, T.; Launoy, G. Construction of an adaptable European transnational ecological deprivation index: The French version. J. Epidemiol. Commun. Health 2012, 66, 982–989. [Google Scholar] [CrossRef] [PubMed]
  27. Cottineau, C.; Finance, O.; Hatna, E.; Arcaute, E.; Batty, M. Defining urban clusters to detect agglomeration economies. Environ. Plan. B Urban Anal. City Sci. 2018. [Google Scholar] [CrossRef]
  28. Fuller, M. The estimation of Gini coefficients from grouped data: Upper and Lower Bounds. Econ. Lett. 1979, 3, 187–192. [Google Scholar] [CrossRef]
  29. Reardon, S.F. Measures of ordinal segregation. In Occupational and Residential Segregation; Research on Economic Inequality; Flückiger, Y., Reardon, S.F., Silber, J., Eds.; Emerald Group Publishing Limited: Bingley, UK, 2009; pp. 129–155. [Google Scholar]
  30. Grauwin, S.; Szell, M.; Sobolevsky, S.; Hövel, P.; Simini, F.; Vanhoof, M.; Smoreda, Z.; Barabási, A.L.; Ratti, C. Identifying and modeling the structural discontinuities of human interactions. Sci. Rep. 2017, 7, 46677. [Google Scholar] [CrossRef] [PubMed][Green Version]
  31. Vanhoof, M.; Hendrickx, L.; Puussaar, A.; Verstraeten, G.; Ploetz, T.; Smoreda, Z. Exploring the use of mobile phones during domestic tourism trips. Netcom 2017, 31, 335–372. [Google Scholar] [CrossRef]
  32. Vanhoof, M.; Combes, S.; De Bellefon, M.P. Mining mobile phone data to detect urban areas. In Statistics and Data Science: New Challenges, New Generations, SIS 2017; Petrucci, A., Verde, R., Eds.; Firenze University Press: Firenze, Italy, 2017; pp. 1005–1012. [Google Scholar]
  33. De Montjoye, Y.A.; Rocher, L.; Pentland, A.S. Bandicoot: A python toolbox for mobile phone metadata. J. Mach. Learn. Res. 2016, 17, 6100–6104. [Google Scholar]
  34. Vanhoof, M.; Reis, F.; Smoreda, Z.; Plötz, T. Detecting home locations from CDR data: Introducing spatial uncertainty to the state-of-the-art. arXiv, 2018; arXiv:1808.06398. [Google Scholar]
  35. Cottineau, C. MetaZipf. A dynamic meta-analysis of city size distributions. PLoS ONE 2017, 12, e0183919. [Google Scholar] [CrossRef]
  36. Gower, J.C. A general coefficient of similarity and some of its properties. Biometrics 1971, 27, 857–871. [Google Scholar] [CrossRef]
  37. Kaufman, L.; Rousseeuw, P. Clustering by Means of Medoids; Faculty of Mathematics and Informatics: Sofia, Bulgaria, 1987. [Google Scholar]
  38. Cattell, V. Poor people, poor places, and poor health: The mediating role of social networks and social capital. Soc. Sci. Med. 2001, 52, 1501–1516. [Google Scholar] [CrossRef]
  39. Granovetter, M. The strength of weak ties: A network theory revisited. Sociol. Theory 1983, 1, 201–233. [Google Scholar] [CrossRef]
  40. Eeckhout, J.; Pinheiro, R.; Schmidheiny, K. Spatial sorting. J. Polit. Econ. 2014, 122, 554–620. [Google Scholar] [CrossRef]
  41. Sarkar, S. Urban scaling and the geographic concentration of inequalities by city size. Environ. Plan. B Urban Anal. City Sci. 2018. [Google Scholar] [CrossRef]
Figure 1. Maps of deprivation, inequality and segregation levels for metropolitan areas with more than 10,000 residents in 2011.
Figure 1. Maps of deprivation, inequality and segregation levels for metropolitan areas with more than 10,000 residents in 2011.
Ijgi 08 00019 g001
Figure 2. Maps of several mobile phone indicators at cell tower level. Indicators are computed for September 2007, for all users in the French CDR dataset. Users are aggregated at cell tower level by the the Distinct Days Algorithm. Each dot on the map is a cell tower and displays the average indicator value of all users allocated to this cell tower. Cell towers with an average value that is higher, or lower, than 3 standard deviations from the nationwide average are omitted, hence the differing number of displayed cell towers (n) between maps.
Figure 2. Maps of several mobile phone indicators at cell tower level. Indicators are computed for September 2007, for all users in the French CDR dataset. Users are aggregated at cell tower level by the the Distinct Days Algorithm. Each dot on the map is a cell tower and displays the average indicator value of all users allocated to this cell tower. Cell towers with an average value that is higher, or lower, than 3 standard deviations from the nationwide average are omitted, hence the differing number of displayed cell towers (n) between maps.
Ijgi 08 00019 g002
Figure 3. Results of clustering: 6 classes of city definitions (Population > 10,000).
Figure 3. Results of clustering: 6 classes of city definitions (Population > 10,000).
Ijgi 08 00019 g003
Figure 4. Distributions of the Spearman correlation coefficient for the relation between EDI and a selection of mobile phone indicators calculated for all 4914 city definitions. The histogram is colored green when correlation coefficients are positive and orange when negative.
Figure 4. Distributions of the Spearman correlation coefficient for the relation between EDI and a selection of mobile phone indicators calculated for all 4914 city definitions. The histogram is colored green when correlation coefficients are positive and orange when negative.
Ijgi 08 00019 g004
Figure 5. Distributions of the Spearman correlation coefficient for the relation between the Gini index and a selection of mobile phone indicators calculated for all 4914 city definitions. The histogram is coloured green when correlation coefficients are positive and orange when negative.
Figure 5. Distributions of the Spearman correlation coefficient for the relation between the Gini index and a selection of mobile phone indicators calculated for all 4914 city definitions. The histogram is coloured green when correlation coefficients are positive and orange when negative.
Ijgi 08 00019 g005
Figure 6. Distributions of the Spearman correlation coefficient for the relation between the Segregation index and a selection of mobile phone indicators calculated for all 4914 city definitions. The histogram is colored green when correlation coefficients are positive and orange when negative.
Figure 6. Distributions of the Spearman correlation coefficient for the relation between the Segregation index and a selection of mobile phone indicators calculated for all 4914 city definitions. The histogram is colored green when correlation coefficients are positive and orange when negative.
Ijgi 08 00019 g006
Figure 7. Distributions of the Spearman correlation coefficient for the relation between the Entropy of contacts and the different socioeconomic indicators. Correlation coefficients are calculated for all 4914 city definitions but the histograms group results by the different classes of city definitions as defined in Figure 3. The bars in the histograms are coloured green when correlation coefficients are positive and orange when negative. The colours outlining the histograms accord to the different classes as defined in Figure 3.
Figure 7. Distributions of the Spearman correlation coefficient for the relation between the Entropy of contacts and the different socioeconomic indicators. Correlation coefficients are calculated for all 4914 city definitions but the histograms group results by the different classes of city definitions as defined in Figure 3. The bars in the histograms are coloured green when correlation coefficients are positive and orange when negative. The colours outlining the histograms accord to the different classes as defined in Figure 3.
Ijgi 08 00019 g007
Figure 8. Heatmap of the Pearsons’s R for the relation between the entropy of contacts and the segregation index of wages for all city definitions represented in their according parameter-space. Each square on the heatmaps represents one of the 4914 city definitions projected in the space of definition criteria (x = density threshold, y = flow threshold, z = population threshold). It is coloured according to the value of the correlation index between the variable for the given definition. Density thresholds (d) are for the city centers and in thousands inhabitants/hectare, flow thresholds (f) are in percentage of population commuting to the city center, and population thresholds (p) are in thousands inhabitants in the wider city. As can be deduced, the top row plots have a population threshold (p) of 0.
Figure 8. Heatmap of the Pearsons’s R for the relation between the entropy of contacts and the segregation index of wages for all city definitions represented in their according parameter-space. Each square on the heatmaps represents one of the 4914 city definitions projected in the space of definition criteria (x = density threshold, y = flow threshold, z = population threshold). It is coloured according to the value of the correlation index between the variable for the given definition. Density thresholds (d) are for the city centers and in thousands inhabitants/hectare, flow thresholds (f) are in percentage of population commuting to the city center, and population thresholds (p) are in thousands inhabitants in the wider city. As can be deduced, the top row plots have a population threshold (p) of 0.
Ijgi 08 00019 g008
Figure 9. Significant coefficients in a multiple regression of mobile indicators by cluster medoid. NB: In this figure, the number of observation N refers to the number of clusters within the representative medoid definition. It is a number of cities which are included in the regression for a given definition.
Figure 9. Significant coefficients in a multiple regression of mobile indicators by cluster medoid. NB: In this figure, the number of observation N refers to the number of clusters within the representative medoid definition. It is a number of cities which are included in the regression for a given definition.
Ijgi 08 00019 g009
Table 1. Description of mobile phone indicators.
Table 1. Description of mobile phone indicators.
Mobile Phone IndicatorDescription
Number of callsNumber of calls made or received
Active daysNumber of active distinct days
Percentage nocturnal callsPercentage of calls made between 7 p.m. and 9 a.m.
Duration of callsMean duration of all calls
Inter-event timeMean duration between consecutive calls
Number of contactsNumber of contacts interacted with
Interaction per contactMean amount of interactions per contact
Entropy of contacts * (Equation (3))Entropy measure of calls to contacts
Number of visited cell towersNumber of cell towers used to make calls
Radius of gyration * (Equation (1))Radius of gyration of movement patterns based on visited cell towers
Entropy of visited cell towers * (Equation (2))Entropy measure of visited cell towers
Distance between l1 and l2Distance between most plausible and second most plausible ’home’ cell tower given a home detection algorithm
Spatial uncertaintyUncertainty measure of the detection of the most plausible home location
Calls at homeNumber of calls made at the presumed home cell tower
Percentage calls at homePercentage of calls made at the presumed home cell tower
Table 2. Spearman correlations between three census indicators for the centroid definition of all the six classes.
Table 2. Spearman correlations between three census indicators for the centroid definition of all the six classes.
Deprivation–InequalityInequality–SegregationSegregation–Deprivation
UrbanAreas0.060−0.082−0.028
Dispersed0.062−0.252−0.144
UrbanCores−0.0440.1920.186
MetroMedium−0.059−0.047−0.156
MetroCore−0.0720.119−0.131
MetroWide−0.041−0.156−0.069

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Back to TopTop