Entropy as a Measure of Attractiveness and Socioeconomic Complexity in Rio de Janeiro Metropolitan Area

Lenormand, Maxime; Samaniego, Horacio; Chaves, Júlio César; da Fonseca Vieira, Vinícius; da Silva, Moacyr Alvim Horta Barbosa; Evsukoff, Alexandre Gonçalves

doi:10.3390/e22030368

Open AccessArticle

Entropy as a Measure of Attractiveness and Socioeconomic Complexity in Rio de Janeiro Metropolitan Area

by

Maxime Lenormand

^1,†

,

Horacio Samaniego

^2,3,4,†

,

Júlio César Chaves

⁵,

Vinícius da Fonseca Vieira

⁶,

Moacyr Alvim Horta Barbosa da Silva

⁵ and

Alexandre Gonçalves Evsukoff

^7,*

¹

TETIS, Univ Montpellier, AgroParisTech, Cirad, CNRS, INRAE, 34000 Montpellier, France

²

Laboratorio de Ecoinformática, Instituto de Conservación Biodiersidad y Territorio, Campus Isla Teja s/n, Valdivia 5110290, Chile

³

Instituto de Ecología y Biodiversidad, Facultad de Ciencias, Universidad de Chile, Las Palmeras, Ñuñoa, Santiago 7800003, Chile

⁴

Instituto de Sistemas Complejos de Valparaíso, Subida Artillería 470, Valparaíso 2360448, Chile

⁵

Getulio Vargas Foundation, Praia de Botafogo 190, Rio de Janeiro, RJ, 22250-900, Brazil

⁶

Department of Computer Science, Federal University of São João Del Rey, Sao João Del Rey, MG, 36301-360, Brazil

⁷

Coppe/Federal University of Rio de Janeiro, P.O. Box 68506, Rio de Janeiro, RJ 22941-972, Brazil

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Entropy 2020, 22(3), 368; https://doi.org/10.3390/e22030368

Submission received: 15 January 2020 / Revised: 6 March 2020 / Accepted: 13 March 2020 / Published: 23 March 2020

(This article belongs to the Special Issue Information Theory for Human and Social Processes)

Download

Browse Figures

Versions Notes

Abstract

Defining and measuring spatial inequalities across the urban environment remains a complex and elusive task which has been facilitated by the increasing availability of large geolocated databases. In this study, we rely on a mobile phone dataset and an entropy-based metric to measure the attractiveness of a location in the Rio de Janeiro Metropolitan Area (Brazil) as the diversity of visitors’ location of residence. The results show that the attractiveness of a given location measured by entropy is an important descriptor of the socioeconomic status of the location, and can thus be used as a proxy for complex socioeconomic indicators.

Keywords:

mobile phone data; urban mobility; attractiveness; urban entropy; urban computing

1. Introduction

While cities have long been recognized as the cradle of modern civilization by providing a safe place for cultural development, the inequality distribution of wealth and services remain the main pressing issue threatening the sustainability of modern societies. Despite the large technological advances making our lives apparently easier, economic inequality has been on the rise worldwide since 1980. This has become such an issue that most recent datasets show that the top 1% of the wealthy population capture twice as much of the global income growth as the bottom 50% [1]. While such distribution disparity among urbanites and social stratification is currently under deep scrutiny among economists, including the spatial components to such descriptions, it imposes additional methodological difficulties given the vagility of human nature and the heterogeneity of the spatial distribution of resources.

While different views exist regarding the origins of socio-spatial inequalities across cities [2], the consequences of poorly integrated societies deeply affect opportunities in key realms of social life that hamper social cohesion at local and societal levels [3,4,5]. While some discuss causal factors behind socio-spatial inequalities, evidence coming from natural experiments have shown direct impacts on particularly vulnerable groups [6]. Such evidence, among others, has tied inequalities to societal imbalances leading to critical states in terms of security, health, and wealth distribution [2,6,7,8,9] dreading social cohesion and precluding possibilities of enriching the social capital at particular locations [10,11,12,13]. Defining and measuring spatial inequalities remains a complex and elusive task for which scientists have recognized several dimensions that are, so far, poorly integrated with a general conceptual framework [4,14,15]. For instance, its precise understanding is often linked to the study objects at hand and the particular methodology employed to study them. Dimensions of inequalities often include the localized concentration of particular groups within cities, the spatial homogeneity of social groups, their accessibility, or more particularly, their distance to downtown [16]. Hence, devising appropriate tools to characterize the spatial distribution of complex socioeconomic factors may contribute to the urgently needed development of integrative urban planning.

The explosive use of Information and Communication Technologies (ICT), such as cellphones and large databases of user spending behavior, has made huge volumes of nonconventional data available for urban research purposes [17,18,19,20,21]. Knowing the cellphone tower to which we connect permits the reconstruction of our daily trajectories, providing a surprisingly high spatiotemporal resolution of our social interactions [22,23]. This approach has been widely used recently to assess a variety of topics, from individual mobility patterns [24] and land use patterns [25] to the detection of relevant places of high social activity within the city [26], thereby unveiling the structure and function of cities [25,27,28]. Devising an efficient mobility infrastructure has long been known as a means for city integration, and the increasing availability of ICT data allows for a new understanding of spatial integration patterns and its relationship to mobility as well as socioeconomic and ethnic stratification [29]. Such highly resolved datasets provide a contextual understanding of land use that is readily available to derive new measures of social integration in its spatial context, thereby contributing to accurate, and near-real-time, descriptions of urban dynamics [30,31,32,33,34,35]. Many of these studies are based on the concept of activity space [31,36,37], defined as the set of locations visited by a traveler throughout their daily activities. Different measures describing the activity space have been studied to understand daily mobility patterns [21,38]. Among these metrics, those based on Shannon entropy are particularly interesting to study human mobility patterns. Indeed, the concept of "Mobility Entropy" indicators has been widely used to measure the diversity of users’ movement patterns [39,40,41,42]. It can be used at different scales to evaluate the diversity of trips made by an individual [40,43] or the diversity of locations visited by an individual [39,42] or a group of individuals [29,44].

In this work, we rely on the concept of "Mobility Entropy" from the point of view of visiting locations in order to deepen our understanding of human mobility in the context of urban computing by focusing on the concept of attractiveness. We particularly look into mapping the entropy of urban structure using increasingly available mobile phone datasets as a tool to provide highly resolved descriptions of the relationship between attractiveness and several key aspects of the urban environment such as productivity, education, and ethnic origin in the Rio de Janeiro Metropolitan Area of Brazil. We focus here on the diversity of visitors’ residence to measure the attractiveness of a location and then compare our results to economic and social indicators to assess how entropy effectively relates to socioeconomic indicators. We show that entropy is an important descriptor of socioeconomic complexity across this vastly populated area.

2. Materials and Methods

2.1. The Study Area and Dataset

The study area is the Rio de Janeiro Metropolitan Area (RJMA), the second largest urban area in Brazil with 12,145,734 inhabitants. Administratively, the RJMA is a part of the Rio de Janeiro State, of which Rio de Janeiro city (Rio for short) is the state Capital and the largest municipality with 6,320,446 inhabitants and an area of 1,200,177 km

^{2}

.

The organization responsible for the demographic census in Brazil is the Institute of Geography and Statistics (IBGE), who follows global standards to aggregate census tracts in subdistrict, district, city, state, and country levels such that this partitioning can be used for most regions in the world and at different scales. This study relies on such partitioning, dividing the study area into 49 locations (Figure 1) whereby the city of Rio is divided into 33 subdistricts aggregated into 5 districts, as shown in Figure 1. Districts are called Planning Areas (AP) and represent macro zones of the city with AP1 the center; AP2, the southern zone; AP3, the northern zone; AP4, Barra-Jacarepaguá; and AP5 depicts the western zone.

Our analysis is based on mobile phone data provided by a Brazilian telecommunication operator. The dataset was collected during 363 days between January and December 2014 across the phone area code 21. We use 2.1 × 10

^{9}

call records originating from 2.9 × 10

^{6}

anonymized subscribers. Only outgoing voice call data were made available for this work. We first focused on the identification of the user’s residence. The algorithm to detect places of residence is based on the analysis of the most frequently visited locations on evenings and weekends (see the Appendix A for more details). This step allows us to discard users not living in the RJMA and remove users with no significant activity for the analysis.

350, 685

residences were identified.

We then aggregate the data in space and time. Aggregated records represent the number of users

v_{i j} (t)

living in the location

i \in | [1, N] |

and visiting the location

j \in | [1, N] |

at time t. We spatially aggregate the antennas’ Voronoi polygons in order to obtain

N = 49

locations matching the 49 locations composing the RJMA shown in Figure 1. We also divide each day into four 6-hour shifts (Morning, Work, Afternoon, and Night) and label each time period

t \in | [1, 1452] |

as either weekday or weekend, including holidays. More details regarding the data preprocessing are available in Appendix A.

2.2. Entropy as a Measure of Attractiveness

For each time interval t, there is a probability that a user living in i, will visit location j described by

p_{i \to j} (t) = \frac{v_{i j} (t)}{\sum_{k = 1}^{N} v_{i k} (t)} .

(1)

This probability describes the production of visitors and is normalized by the total number of users living in location i. In this study, we are interested in the diversity of visitors’ location of residence as a measure of the attractiveness of the destination. We therefore need to compute the probability

p_{j \leftarrow i}

that a user visiting location j lives in location i. To do so, we combine the probability

p_{i \to j}

with census data to estimate

V_{i j} (t)

—the number of users living at location i and visiting the location j at time t—using the following equation.

V_{i j} (t) = O_{i} p_{i \to j} (t),

(2)

where

O_{i}

is the population of location i according to the 2010 IBGE census. We can now compute the probability

p_{j \leftarrow i} (t)

for an individual visiting j at time t that lives in i (Equation (3)).

p_{j \leftarrow i} (t) = \frac{V_{i j} (t)}{\sum_{k = 1}^{N} V_{k j} (t)} .

(3)

This second probability is thus related to the attraction of visitors, normalized at destination, and allows us to compute the normalized Shannon entropy as follows:

S_{j} (t) = \frac{- 1}{l o g (N)} \sum_{k = 1}^{N} p_{j \leftarrow k} (t) l o g (p_{j \leftarrow k} (t)) .

(4)

Large entropy values (

S_{j} (t) \approx 1

) mean that people visiting location j at time t are evenly distributed among all 49 locations, whereas a smaller value of entropy means that people visiting location j at time t tend to be mostly concentrated among few residence locations. The entropy has been widely used to analyze and model human mobility patterns. It can be used in spatial analysis to describe the diversity of individual movement patterns [41] or in spatial interaction modeling to estimate trip distributions by entropy maximization [45], to name a few. It is worth noting that we focus in this work on the analysis of entropy as a measure of attractiveness that can be used as a proxy for complex socioeconomic indicators.

It is important to keep in mind that a given entropy value can cover a large variety of situations regarding the distance traveled by visitors. Here, we characterize the relationship between traveled distance and entropy by computing the radius of attraction of a location j as the average distance traveled by people visiting j at time t:

R_{j} (t) = \sum_{k = 1}^{N} p_{j \leftarrow k} (t) d_{k j},

(5)

where

d_{k j}

is the distance from location k to j along the road network between the locations’ centroids computed using the Google Maps API [46]. This calculation is particularly important in the case of Rio due to the presence of mountains, lakes, and the Guanabara Bay, which makes road distances between certain locations very different from the Euclidean distances.

Finally, we also consider the ratio between the number of visitors divided by the population as a complementary measure of attractiveness.

δ_{j} (t) = \frac{\sum_{k = 1}^{N} V_{k j} (t)}{O_{j}} .

(6)

2.3. Entropy, Economic and Sociodemographic Indicators

As our entropy index represents a synoptic representation of mobility across the RJMA, we finally seek to describe its impact in terms of well-known economic, social, and demographic indicators, as collected by the IBGE. We therefore evaluated how the diversity of visitors relates to the economic performance of the city by plotting the number of jobs and income levels against our entropy estimation. Sociodemography, in turn, was assessed by establishing the relationship between education levels in primary and secondary (high school) education among the population resident in each partitioned area. Finally, two developmental indices were chosen to evaluate entropy performance across the RJMA.

3. Results

3.1. Classification of Locations According to Their Attractiveness

We start our analysis by performing a clustering analysis to group together locations exhibiting similar features regarding their attractiveness. As a first step, we focus on two features across the urban landscape, the diversity at the origin location and the attractiveness at work locations. This led us to average the three indicators for each location (Equation (4), Equation (5), and Equation (6)) over the work-shift time periods on weekdays. Locations are clustered using the k-means algorithm based on the three standardized averaged metrics. The number of clusters was chosen based on the ratio between within-group variance and the total variance (see Appendix B for more details). We obtained four clusters. Clustering results and the relationships between the different metrics are shown in Figure 2. We observe a positive relationship between metrics, in which attractiveness and radius of attraction tend to increase with the entropy. There is nevertheless a strong dispersion around these tendencies with an attractiveness and radius of attraction values that can double for a given entropy value.

Figure 3 shows the spatial distribution of the four resulting clusters across the whole studied area. Clusters are determined by a certain level of attractiveness and can be described as follows:

C1 (red) represents a low-attractive cluster composed of 17 locations. It is characterized by a low entropy, an attractiveness ratio lower than one, and a low radius of attraction. Locations in C1 are far from the Rio city center or segregated areas inside the Capital.
C2 (green) is a cluster of 22 locations, mostly located inside the city. This cluster is characterized by medium values of entropy of visitors and radius of attraction, while having an attractiveness ratio close to one.
C3 (blue) is an attractive group with 8 locations mostly near to the sea inside Capital. This cluster shares high entropy values, attractiveness ratio between 1 and 2, and a large radius of attraction.
C4 (orange) is composed of only one location that can be considered as an outlier due to its very high attractiveness. The remaining three clusters do not change if this outlier is removed before clustering. This location is the business center (Centro) of the city, and is a very attractive cluster with a very large entropy ( $S_{C 4} \approx 0.9$ ), attractiveness ratio, and radius of attraction ( $δ_{C 4} \approx 12$ ). This location concentrates most of jobs and visitors from all the RJMA.

Our methodology allows us to detect segregated areas with a very low diversity of visitors and attractiveness. Figure 4 shows the comparison of the clustering results with two social development indexes. We focus the discussion on the five locations shown in Figure 4a: (1) Complexo do Alemão, (2) Jacarezinho, (3) Rocinha, (4) Complexo da Maré, and (5) Cidade de Deus. The first four locations are classified by Rio City Hall as favela subdistricts [47] and are shown in purple in Figure 4a. The term "favela" is used here in the sense of subnormal agglomerate as defined by IBGE [48]: "a form of irregular occupation of land usually characterized by an irregular urban pattern, with scarce essential public services and located in areas that are not proper or allowed for housing use". In a broad sense, favela also includes urbanized areas—areas that were once subnormal agglomerates but have been urbanized—and also housing estates. The favela subdistricts assigned in purple in Figure 4a are defined, according to Rio City Hall, as the locations with more than 50% of population living in subnormal agglomerates. In Cidade de Deus, only 13% of the population is living in subnormal agglomerates, as it is mostly composed of housing estates building, while its socioeconomic indexes are similar to the favela subdistricts.

Figure 4b shows a zoomed-in image of the clustering results. The main favela subdistricts were classified in low-attractive cluster (C1) as was Cidade de Deus. Complexo da Maré also has many housing estates building and 54% of its population living in subnormal agglomerate. It was classified in the medium-attractive cluster (C2), maybe because it is crossed by two of the main expressways that lead to the exit of the city. In the dataset used in this work, a visitor is detected in a given location by a call recorded within the location, such that some detected visitors may be passing by the location to reach another destination.

Figure 4c,d show two social development indexes. In Figure 4c, the Municipal Human Development Index (MHDI), which is an adaptation of the Human Development Index (HDI) for municipalities. The MHDI data were obtained from the Atlas of Human Development in Brazil [49], where the MHDI computed in 2013 is available at the census track level as aggregated values for all municipalities and for district level in metropolitan areas. In Rio, the MHDI is available for the macro zones shown in Figure 1 and the value for the five locations of interest in Figure 4a were obtained from the census track level. The classes and colors used in Figure 4c were suggested by the Atlas. All five locations assigned in Figure 4a were classified as medium MHDI, and many locations classified in the high-attractive cluster (C3) have very high MHDI.

The MHDI is a global index intended to compare the social development in the whole country. The Rio City Hall has adopted the Social Progress Index (IPS), which is more focused on the city characteristics and is based on 32 indicators in three dimensions. The data used in this work were computed in 2016 and obtained from the open data portal of Rio City Hall [50]. The colors and levels presented in Figure 4d are the ones used by the Rio City Hall. It can be seen from Figure 4d that all four locations assigned in the low-attractive cluster (C1) have low IPS (IPS < 50). The Complexo da Maré subdistrict has medium IPS (50 < IPS < 60) and was assigned to the medium-attractive cluster (C2). Moreover, most locations assigned to high-attractive cluster (C3) have a very high IPS (IPS > 70). There is a very good agreement between the clusters computed from mobility and IPS, as cluster C1 corresponds to IPS < 50, cluster C2 corresponds to 50 < IPS < 70, and cluster C3 corresponds to IPS > 70.

In the next section, we discuss the relationship between the mobility indicators and the economic and social indicators selected for this study.

3.2. Economic Activity and Sociodemographic Factors

While transportation mobility has largely been recognized as a major player in the urban economy [51], the recent scrutiny of Call Detail Records (CDR)have expanded our understanding of how mobility relates to economic activity across cities [42,52]. We here evaluated how entropy relates to officially reported job numbers and income levels (Figure 5). In spite of the large informal job market known to occur in RJMA, our analysis shows a positive and exponential relationship between formal jobs and entropy (Figure 5a). Similar patterns emerge when relating income level with entropy (Figure 5b) as well as with Gross Domestic Product (GDP) (see Appendix C).

Interestingly, opposite trends emerge when entropy is plotted against demographic indices, such as the percentage of the population having completed primary education and high school degrees. In Figure 6, “primary school” refers to the percentage of individuals having primary school or lower education level and “high school” refers to individuals having high school or higher education level. School degrees are positively correlated with income, meaning that higher income locations tend to have higher education levels. In the same way race is negatively correlated with income, there is indeed a prevalence of white-skin individuals in higher income locations and the prevalence of black-skin individuals in lower income locations. As entropy is related to income (Figure 5), locations having a large fraction of its population with a completed primary school diploma exhibit lower entropy values (Figure 6a); while locations with a large proportion with high school or higher education level is positively associated to entropy (Figure 6b). This is strikingly similar to the pattern exhibited by ethnic origin. Black-skin population, as well as the percentage of primary school, also shows a negative relation to entropy (Figure 6c), while areas with a larger percentage of white-skin population tend to exhibit higher entropy values (Figure 6d).

The entropy of visitors, computed from CDR, reflects the complexity of indicators usually computed using classical approaches. In fact, entropy seems to be positively associated with socioeconomic indicators such as MHDI and IPS (Figure 7), as shown in Figure 5 and Figure 6.

3.3. Temporal Evolution of the Attractiveness

To study the temporal evolution of entropy, attractiveness, and radius of attraction, we plot the normalized average metric values for each cluster across time shifts (Figure 8). Normalizations are performed using the reference values obtained for the work-shift time period on weekdays. We decided here to consider relative instead of absolute values in order to make average attractiveness of clusters of locations comparable over time. Entropy tends to globally decrease along the day on both weekdays and weekends for every location, whatever cluster it belongs to. However, it is interesting to note that the entropy is relatively higher during weekday nights and weekends for locations classified as low-attractive during weekday work shifts compared to highly attractive locations. Indeed, while locations of cluster C4 exhibit an entropy index 50% lower than the reference value, it actually represents 80% for cluster C2/C3 and more than 90% for locations belonging to cluster C1. A similar behavior is observed for the radius of attraction. The situation is slightly different however, for the attractiveness with an increase of the metrics during afternoons and night shifts on weekdays for the low-attractive cluster C1. It further reaches a plateau during the weekend days. The location of cluster C4 shows the opposite behavior, with a decreasing attractiveness along the day to reach a plateau during weekend days. The attractiveness remains more or less constant for locations belonging to cluster C2 and C3.

4. Discussion

The impact of socio-spatial inequalities on urban systems has largely been treated in the urban economics and sociological literature, but the increasing availability of large mobile phone databases has opened the possibility to provide a clearer picture of how different aspects of urban life impact economic and sociodemographic aspects of cities [19]. Going into this direction, this work presents the results of the processing of 2.1-billion records collected from 2 million users in the Rio de Janeiro Metropolitan Area, Brazil, during the whole year of 2014, placing this research among the largest analyses—to our knowledge—used to relate mobility and its link to socioeconomic complexity in Brazil. We hereby illustrate the potential of combining mobile phone data with entropy-based metrics to measure the attractiveness of a location. This may prove useful to urban planners and managers when it comes to describe and plan for complex socioeconomic indicators. While it is known that mobility is in fact related to economic activity, this work presents an effective and simple way to measure such relationships from increasingly available ICT data such as mobile phone datasets.

While most capital cities in South America suffer from a disproportionate growth compared to other urban settlements [53], common patterns of spatial inequalities show that underprivileged populations establish themselves away from highly productive central zones [30,54], often with clear differences among the usage of urban infrastructure [55]. In this sense, the particular and complex topography of Rio de Janeiro would suggest the existence of shared usage patterns of the city among urbanites coming from different social contexts. The spatial partitioning employed in our study closely matches IBGE delineation; we are therefore able to compare official statistics with measures derived from CDR data and offer specific insights regarding the usage of ICT as proxies for the spatial distribution of complex socioeconomic indicators derived from mobile phone datasets. Our analysis shows that the attractiveness of a district measured with the diversity of visitors’ place of residence is correlated with the income and the number of jobs in spite of the large informal job market of Rio [32].

We also show that the attractiveness is lower in areas hosting a large percentage of the population with African descent and/or locations where primary school training is prevalent (Figure 6a,c). While this points to previous descriptions showing how available schooling options closely reproduce residential patterns of socio-spatial segregation [56,57], the spatial mismatch and highly productive Centro area, where work opportunities are concentrated in the RJMA, leads us to think that residential segregation of the poorest is reinforced by new inequalities when taking into account daily mobility opportunities. Unfortunately, and in spite of using state-of-the-art descriptors of urban diversity, we are able to corroborate a well-known trend in which areas with large African descendant populations are still syndicated as an indicator of social inequality. This poses important planning challenges to historical areas such as the RJMA, where almost one million enslaved Africans were estimated to arrive in the XVII^th century [58].

The observed results concur on recent developments in the scientific literature that show how mobile phone information can be used to evaluate the socioeconomic state of spatially heterogeneous regions [43,59,60], especially in developing countries. Moreover, the RJMA is a very particular case study where socioeconomic isolated districts are placed in between richer areas, as well as in the periphery, which is more common in greater cities of developing countries. This particular characteristic of the city allows us to validate the results, as the clusters accurately identified favelas and other socioeconomic isolated districts, as shown in Figure 4.

In summary, this manuscript serves to illustrate the potential of mobile phone data combined with entropy-based metrics for measuring the attractiveness of a location that can be used as a proxy for complex socioeconomic indicators. Even if the spatial partitioning used in this study tends to reduce the level of spatial uncertainty inherent in this type of data sources [61], it would be interesting to reproduce the results with different datasets coming from different sources of mobility information.

Author Contributions

Conceptualization, J.C.C., A.G.E., M.A.H.B.d.S., and V.d.F.V.; methodology, J.C.C., A.G.E., M.A.H.B.d.S., V.d.F.V., H.S., and M.L.; software, J.C.C.; validation, J.C.C., A.G.E., and M.L.; formal analysis, A.G.E., M.A.H.B.d.S., and M.L.; investigation, H.S. and M.L.; resources, J.C.C. and A.G.E.; data curation, J.C.C.; writing–original draft preparation, J.C.C., A.G.E.; writing–review and editing, A.G.E., H.S., and M.L.; visualization, J.C.C. and M.L.; supervision, A.G.E., M.L., and H.S.; project administration, A.G.E. and M.A.H.B.d.S.; funding acquisition, A.G.E. and M.A.H.B.d.S. All authors have read and agreed to the published version of the manuscript.

Funding

Júlio César Chaves, Vinícius da Fonseca Vieira, Moacyr Alvim Horta Barbosa da Silva, and Alexandre Gonçalves Esukoff acknowledge the funding granted by The Rio de Janeiro State Research Agency (FAPERJ) and by the Getulio Vargas Foundation. The work of Maxime Lenormand was funded by the French National Research Agency (grant number ANR-17-CE03-0003). Horacio Samaniego was funded by FONDECYT-CONICYT Chile (grant no. 1161280).

Data availability

Data used in this work to compute entropy in Section 2.2 and sociodemographic indicators in Figure 6 are available in Dataverse (https://dataverse.harvard.edu/dataverse/MRRJ).

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. Data Preprocessing

Appendix A.1. Spatial Aggregation

In this work, in order to link mobility results directly to socioeconomic data, each geographic unit is the union of Voronoi polygons of antennas (Figure A1) matching the geographic limits of the 49 locations (Figure A2). This makes the set of regions outlined here directly related to the respective locations and, consequently, to the census data and many other sources.

Figure A1. Spatial distribution of antennas with their respective Voronoi polygons.

Figure A2. Spatial overlap between the aggregation of Voronoi cells and the districts’ spatial polygons.

Appendix A.2. Temporal Aggregation

In addition to the spatial partitioning, the results were aggregated for each day in the data set. The four shifts considered the distribution of activities throughout one day. The time shift with the smallest number of records was 04:00 AM, as shown in Figure A3.

Figure A3. Number of calls per hour and the partition of time shifts. Total number of calls made in the RJMA in 2014 (including weekdays and weekends).

Appendix A.3. Identification of the User’s Place of Residence

The presumed residence of each user was computed as the most visited Voronoi cell between 08:00PM and 06:00AM during workdays and the entire day on Sundays and holidays. We additionally required that the user to be regularly detected in this cell (at least five times) and that the number of visits at the most frequented cell is always greater than the number of visits at the second most frequented cell. The final dataset containing only users with an identified residence ended up to be

350, 685

mobile phone users. As mentioned above, the data were aggregated spatially by assigning each Voronoi cell to one of the 49 districts. The identification of the user’s place of residence was then evaluated using data from the IBGE 2010 census. As it can be observed in Figure A4, we obtained a good match between the census data and the residence identified with mobile phone data with a Pearson correlation coefficient equal to 0.9.

Figure A4. Number of mobile phone users with an identified residence in the RJMA as a function of the number of inhabitants in the 49 locations.

Appendix B. Clustering Analysis

See the Figure A5.

Figure A5. Ratio between the within-group variance and the total variance as a function of the number of clusters.

Appendix C. Economic Activity

See the Figure A6.

Figure A6. Global Domestic Product (GDP) as a function of the entropy index. The entropy have been averaged over the work-shift time periods on weekdays.

Appendix D. Temporal Evolution

See the Figure A7.

Figure A7. Temporal evolution of the three metrics. From the top to the bottom, Tukey boxplots of the entropy, attractiveness, and radius of attraction as a function of time by cluster. The values are normalized by the value obtained for the work shift during weekdays.

References

Alvaredo, F.; Chancel, L.; Piketty, T.; Saez, E.; Zucman, G. World Inequality Report 2018; Belknap Press: Cambridge, MA, USA, 2018. [Google Scholar]
Ruiz-Tagle, J. A theory of socio-spatial integration: Problems, policies and concepts from a US perspective. Int. J. Urban Reg. Res. 2013, 37, 388–408. [Google Scholar] [CrossRef]
Jargowsky, P.A. Poverty and Place: Ghettos, Barrios, and the American City; Russell Sage Foundation: New York, NY, USA, 1997. [Google Scholar]
Massey, D.S. American apartheid: Segregation and the making of the underclass. Am. J. Sociol. 1990, 96, 329–357. [Google Scholar] [CrossRef]
Wilson, W.J. The Truly Disadvantaged: The Inner City, the Underclass, and Public Policy; University of Chicago Press: Chicago, IL, USA, 2012. [Google Scholar]
Cutler, D.; Glaeser, E. Are ghettos good or bad? Q. J. Econ. 1997, 112, 827–872. [Google Scholar] [CrossRef]
Garreton, M.; Sánchez, R. Identifying an optimal analysis level in multiscalar regionalization: A study case of social distress in greater Santiago. Comput. Environ. Urban Syst. 2016, 56, 14–24. [Google Scholar] [CrossRef]
Krieger, N. Embodying inequality: A review of concepts, measures, and methods for studying health consequences of discrimination. Int. J. Health Serv. 1999, 29, 295–352. [Google Scholar] [CrossRef]
Massey, D.S.; Denton, N.A. The dimensions of residential segregation. Soc. Forces 1988, 67, 281–315. [Google Scholar] [CrossRef]
Bolt, G.; Burgers, J.; Van Kempen, R. On the social significance of spatial location; spatial segregation and social inclusion. Neth. J. Hous. Built Environ. 1998, 13, 83. [Google Scholar] [CrossRef]
Farber, S.; O’Kelly, M.; Miller, H.; Neutens, T. Measuring segregation using patterns of daily travel behavior: A social interaction based model of exposure. J. Transp. Geogr. 2015, 49, 26–38. [Google Scholar] [CrossRef]
Farber, S.; Neutens, T.; Miller, H.; Li, X. The social interaction potential of metropolitan regions: A time-geographic measurement approach using joint accessibility. Ann. Assoc. Am. Geogr. 2013, 103, 483–504. [Google Scholar] [CrossRef]
Forrest, R.; Kearns, A. Social cohesion, social capital and the neighbourhood. Urban Stud. 2001, 38, 2125–2143. [Google Scholar] [CrossRef]
Louf, R.; Barthelemy, M. Patterns of residential segregation. PLoS ONE 2016, 11, e0157476. [Google Scholar] [CrossRef] [PubMed]
Netto, V.; Brigatti, E.; Meirelles, J.; Ribeiro, F.; Pace, B.; Cacholas, C.; Sanches, P. Cities, from Information to Interaction. Entropy 2018, 20, 834. [Google Scholar] [CrossRef]
Caldeira, T. Fortified enclaves: The new urban segregation. In The Urban Sociology Reader; Routledge: London, UK, 2012; pp. 419–427. [Google Scholar]
Batty, M. Big data, smart cities and city planning. Dialogues Hum. Geogr. 2013, 3, 274–279. [Google Scholar] [CrossRef] [PubMed]
Bettencourt, L.; Samaniego, H.; Youn, H. Professional diversity and the productivity of cities. Sci. Rep. 2014, 4, 5393. [Google Scholar] [CrossRef] [PubMed]
Blondel, V.D.; Decuyper, A.; Krings, G. A survey of results on mobile phone datasets analysis. EPJ Data Sci. 2015, 4, 10. [Google Scholar] [CrossRef]
Louail, T.; Lenormand, M.; Arias, J.M.; Ramasco, J.J. Crowdsourcing the Robin Hood effect in cities. Appl. Netw. Sci. 2017, 2, 11. [Google Scholar] [CrossRef]
Barbosa, H.; Barthelemy, M.; Ghoshal, G.; James, C.R.; Lenormand, M.; Louail, T.; Menezes, R.; Ramasco, J.J.; Simini, F.; Tomasini, M. Human mobility: Models and applications. Phys. Rep. 2018, 734, 1–74. [Google Scholar] [CrossRef]
Onnela, J.P.; Saramäki, J.; Hyvönen, J.; Szabó, G.; Lazer, D.; Kaski, K.; Kertész, J.; Barabási, A.L. Structure and tie strengths in mobile communication networks. Proc. Natl. Acad. Sci. 2007, 104, 7332–7336. [Google Scholar] [CrossRef]
Panigutti, C.; Tizzoni, M.; Bajardi, P.; Smoreda, Z.; Colizza, V. Assessing the use of mobile phone data to describe recurrent mobility patterns in spatial epidemic models. R. Soc. Open Sci. 2017, 4, 160950. [Google Scholar] [CrossRef]
Gonzalez, M.C.; Hidalgo, C.A.; Barabási, A.L. Understanding individual human mobility patterns. Nature 2008, 453, 779. [Google Scholar] [CrossRef]
Lenormand, M.; Picornell, M.; Cantú-Ros, O.G.; Louail, T.; Herranz, R.; Barthelemy, M.; Frías-Martínez, E.; San Miguel, M.; Ramasco, J.J. Comparing and modelling land use organization in cities. R. Soc. Open Sci. 2015, 2, 150449. [Google Scholar] [CrossRef] [PubMed]
Beiró, M.G.; Bravo, L.; Caro, D.; Cattuto, C.; Ferres, L.; Graells-Garrido, E. Shopping mall attraction and social mixing at a city scale. EPJ Data Sci. 2018, 7, 28. [Google Scholar] [CrossRef]
Louail, T.; Lenormand, M.; Cantú-Ros, O.G.; Picornell, M.; Herranz, R.; Frias-Martinez, E.; Ramasco, J.J.; Barthelemy, M. From mobile phone data to the spatial structure of cities. Sci. Rep. 2014, 4, 5276. [Google Scholar] [CrossRef] [PubMed]
Sotomayor-Gómez, B.; Samaniego, H. City limits in the age of smartphones and urban scaling. Comput. Environ. Urban Syst. 2020, 79, 101423. [Google Scholar] [CrossRef]
Lamanna, F.; Lenormand, M.; Salas-Olmedo, M.H.; Romanillos, G.; Gonçalves, B.; Ramasco, J.J. Immigrant community integration in world cities. PLoS ONE 2018, 13, e0191612. [Google Scholar] [CrossRef]
Dannemann, T.; Sotomayor-Gómez, B.; Samaniego, H. The time geography of segregation during working hours. R. Soc. Open Sci. 2018, 5, 180749. [Google Scholar] [CrossRef]
Jiang, S.; Ferreira, J.; Gonzalez, M.C. Activity-based human mobility patterns inferred from mobile phone data: A case study of Singapore. IEEE Trans. Big Data 2017, 3, 208–219. [Google Scholar] [CrossRef]
Motte, B.; Aguilera, A.; Bonin, O.; Nassi, C.D. Commuting patterns in the metropolitan region of Rio de Janeiro. What differences between formal and informal jobs? J. Transp. Geogr. 2016, 51, 59–69. [Google Scholar] [CrossRef]
Rubim, B.; Leitão, S. O plano de mobilidade urbana e o futuro das cidades. Estudos avançados 2013, 27, 55–66. [Google Scholar] [CrossRef]
Toole, J.L.; Colak, S.; Sturt, B.; Alexander, L.P.; Evsukoff, A.; González, M.C. The path most traveled: Travel demand estimation using big data resources. Transp. Res. Part C: Emerg. Technol. 2015, 58, 162–177. [Google Scholar] [CrossRef]
Song, C.; Qu, Z.; Blumm, N.; Barabási, A.L. Limits of predictability in human mobility. Science 2010, 327, 1018–1021. [Google Scholar] [CrossRef] [PubMed]
Hägerstrand, T. What about people in regional science? Pap. Reg. Sci. 1970, 24, 6–21. [Google Scholar] [CrossRef]
Schönfelder, S.; Axhausen, K.W. Activity spaces: Measures of social exclusion? Transp. Policy 2003, 10, 273–286. [Google Scholar]
Phithakkitnukoon, S.; Smoreda, Z.; Olivier, P. Socio-geography of human mobility: A study using longitudinal mobile phone data. PLoS ONE 2012, 7, e39253. [Google Scholar] [CrossRef] [PubMed]
Lin, M.; Hsu, W.J.; Lee, Z.Q. Predictability of individuals’ mobility with high-resolution positioning data. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing, Pittsburgh, PA, USA, 5–8 September 2012; pp. 381–390. [Google Scholar]
Pappalardo, L.; Vanhoof, M.; Gabrielli, L.; Smoreda, Z.; Pedreschi, D.; Giannotti, F. An analytical framework to nowcast well-being using mobile phone data. Int. J. Data Sci. Anal. 2016, 2, 75–92. [Google Scholar] [CrossRef]
Vanhoof, M.; Schoors, W.; Rompaey, A.V.; Ploetz, T.; Smoreda, Z. Comparing regional patterns of individual movement using corrected mobility entropy. J. Urban Technol. 2018, 25, 27–61. [Google Scholar] [CrossRef]
Cottineau, C.; Vanhoof, M. Mobile Phone Indicators and Their Relation to the Socioeconomic Organisation of Cities. ISPRS Int. J. -Geo-Inf. 2019, 8, 19. [Google Scholar] [CrossRef]
Pappalardo, L.; Pedreschi, D.; Smoreda, Z.; Giannotti, F. Using big data to study the link between human mobility and socio-economic development. In Proceedings of the 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, USA, 29 October–1 November 2015; pp. 871–878. [Google Scholar]
Lenormand, M.; Luque, S.; Langemeyer, J.; Tenerelli, P.; Zulian, G.; Aalders, I.; Chivulescu, S.; Clemente, P.; Dick, J.; van Dijk, J.; et al. Multiscale socio-ecological networks in the age of information. PLoS ONE 2018, 13, 1–16. [Google Scholar] [CrossRef]
Wilson, A.G. The use of entropy maximising models, in the theory of trip distribution, mode split and route split. J. Transp. Econ. Policy 1969, 3, 108–126. [Google Scholar]
Distance Matrix API. Available online: https://developers.google.com/maps/documentation/distance-matrix/ (accessed on 14 March 2020).
Favelas na cidade do Rio de Janeiro: O quadro populacional com base no Censo 2010. (In Portuguese). Available online: http://bit.ly/2O9SEdA.
Subnormal Agglomerates. Available online: http://bit.ly/337gQlb (accessed on 14 March 2020).
Atlas Brasil. Available online: www.atlasbrasil.org.br (accessed on 14 March 2020).
Data. Available online: www.data.rio (accessed on 14 March 2020).
Duranton, G.; Puga, D. Micro-foundations of urban agglomeration economies. In Handbook of Regional and Urban Economics; Elsevier: Amsterdam, The Netherlands, 2004; Volume 4, pp. 2063–2117. [Google Scholar]
Xu, Y.; Belyi, A.; Bojic, I.; Ratti, C. Human mobility and socioeconomic status: Analysis of Singapore and Boston. Comput. Environ. Urban Syst. 2018, 72, 51–67. [Google Scholar] [CrossRef]
Henderson, J.V. Urban Development: Theory, Fact, and Illusion; Oxford University Press: Oxford, UK, 1991. [Google Scholar]
Sabatini, F. The Social Spatial Segregation in the Cities of Latin America; Technical report; Inter-American Development Bank: Washington, DC, USA, 2006. [Google Scholar]
Lotero, L.; Hurtado, R.G.; Floría, L.M.; Gómez-Gardeñes, J. Rich do not rise early: Spatio-temporal patterns in the mobility networks of different socio-economic classes. R. Soc. Open Sci. 2016, 3, 150654. [Google Scholar] [CrossRef] [PubMed]
Flores, C.A. Residential segregation and the geography of opportunites: A spatial analysis of heterogeneity and spillovers in education. Ph.D. Thesis, LBJ School of Public Affairs, University of Texas, Austin, TX, USA, 2008. [Google Scholar]
Li, H.; Campbell, H.; Fernandez, S. Residential Segregation, Spatial Mismatch and Economic Growth across US Metropolitan Areas. Urban Stud. 2013, 50, 2642–2660. [Google Scholar] [CrossRef]
Karasch, M.C. Slave life in Rio de Janeiro, 1808-1850; Princeton University Press: Princeton, NJ, USA, 1987; p. 448. [Google Scholar]
Eagle, N.; Macy, M.; Claxton, R. Network diversity and economic development. Science 2010, 328, 1029–1031. [Google Scholar] [CrossRef]
Blumenstock, J.; Cadamuro, G.; On, R. Predicting poverty and wealth from mobile phone metadata. Science 2015, 350, 1073–1076. [Google Scholar] [CrossRef] [PubMed]
Lenormand, M.; Louail, T.; Barthelemy, M.; Ramasco, J.J. Is spatial information in ICT data reliable? In Proceedings of the 2016 Spatial Accuracy Conference, Montpellier, France, 5–8 July 2016. [Google Scholar]

Figure 1. Rio de Janeiro Metropolitan Area (RJMA). The RJMA is composed of 49 locations—16 municipalities outside the Capital represented in grey and 33 subdistricts inside the Capital grouped into 5 districts.

Figure 2. Results of the clustering analysis. Log-log scatter plot of (a) the attractiveness and (b) the radius of attraction in terms of the entropy index. The inset in (a) shows the relationship after removing one outlier (cluster C4). Each dot represents a location within the study area. Indicators have been averaged over the work-shift time period during weekdays.

Figure 3. Map of the RJMA that display the spatial distribution of four clusters.

Figure 4. Zoomed-in view of Rio de Janeiro city. (a) Favela subdistricts (in purple) and business center (in orange) locations. We focus the discussion on five locations: (1) Complexo do Alemão, (2) Jacarezinho, (3) Rocinha, (4) Complexo da Maré, and (5) Cidade de Deus. (b) Clusters spatial distribution. (c) Municipal Human Development Index (MHDI) from 2013. (d) Social Progress Index (IPS) from 2016.

Figure 5. Economic analysis. Number of jobs (a) and income (in Brazilian Reals) (b) as a function of the entropy index. The entropy have been averaged over the work-shift time periods on weekdays.

Figure 6. Sociodemographic analysis. Percentage of primary school level education ((a), high school level education (b), black people (c), and white people (d) as a function of the entropy index. The entropy have been averaged over the work-shift time periods on weekdays.

Figure 7. Social development indexes. MHDI (a) and IPS (b) as a function of the entropy index. The entropy has been averaged over the work-shift time periods on weekdays.

Figure 8. Temporal evolution of the three metrics. From the top to the bottom—entropy, attractiveness, and radius of attraction as a function of time by cluster. The values are averaged by cluster and normalized by the value obtained for the work shift during weekdays. A similar plot displaying boxplots instead of average values is available in Appendix D.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lenormand, M.; Samaniego, H.; Chaves, J.C.; da Fonseca Vieira, V.; da Silva, M.A.H.B.; Evsukoff, A.G. Entropy as a Measure of Attractiveness and Socioeconomic Complexity in Rio de Janeiro Metropolitan Area. Entropy 2020, 22, 368. https://doi.org/10.3390/e22030368

AMA Style

Lenormand M, Samaniego H, Chaves JC, da Fonseca Vieira V, da Silva MAHB, Evsukoff AG. Entropy as a Measure of Attractiveness and Socioeconomic Complexity in Rio de Janeiro Metropolitan Area. Entropy. 2020; 22(3):368. https://doi.org/10.3390/e22030368

Chicago/Turabian Style

Lenormand, Maxime, Horacio Samaniego, Júlio César Chaves, Vinícius da Fonseca Vieira, Moacyr Alvim Horta Barbosa da Silva, and Alexandre Gonçalves Evsukoff. 2020. "Entropy as a Measure of Attractiveness and Socioeconomic Complexity in Rio de Janeiro Metropolitan Area" Entropy 22, no. 3: 368. https://doi.org/10.3390/e22030368

APA Style

Lenormand, M., Samaniego, H., Chaves, J. C., da Fonseca Vieira, V., da Silva, M. A. H. B., & Evsukoff, A. G. (2020). Entropy as a Measure of Attractiveness and Socioeconomic Complexity in Rio de Janeiro Metropolitan Area. Entropy, 22(3), 368. https://doi.org/10.3390/e22030368

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Entropy as a Measure of Attractiveness and Socioeconomic Complexity in Rio de Janeiro Metropolitan Area

Abstract

1. Introduction

2. Materials and Methods

2.1. The Study Area and Dataset

2.2. Entropy as a Measure of Attractiveness

2.3. Entropy, Economic and Sociodemographic Indicators

3. Results

3.1. Classification of Locations According to Their Attractiveness

3.2. Economic Activity and Sociodemographic Factors

3.3. Temporal Evolution of the Attractiveness

4. Discussion

Author Contributions

Funding

Data availability

Conflicts of Interest

Appendix A. Data Preprocessing

Appendix A.1. Spatial Aggregation

Appendix A.2. Temporal Aggregation

Appendix A.3. Identification of the User’s Place of Residence

Appendix B. Clustering Analysis

Appendix C. Economic Activity

Appendix D. Temporal Evolution

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI