# Uncovering the Socioeconomic Structure of Spatial and Social Interactions in Cities

^{1}

^{2}

^{3}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. From Data to Networks

#### 2.2. Pulse of a Location

#### 2.3. Cluster Analysis

#### 2.4. Measuring Spatial and Social Interactions

## 3. Results

#### 3.1. Pulse of a Location and Socioeconomic Structure

#### 3.2. Socio-Spatial Interactions Analysis

## 4. Discussion

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A. Data Preprocessing

#### Appendix A.1. Call and Location History

#### Appendix A.2. Identification of the Users’ Place of Residence

- First, we focused on the user’s spatial events occurring during nighttime hours (between 9 pm and 8 am included). Only days of the week from Monday to Thursday were considered ($N=48$ h in total). We note that ${N}_{u}$ is the number of events occurring during nighttime hours.
- We applied here a
**first filter**by considering only users with a number of spatial events higher than a fraction ${\delta}_{A}={N}_{u}/N$ of the total number of nighttime hours. - We identified the location in which the user has localized the highest number of spatial events during nighttime hours. We define this location as her or his home location.
- A
**second filter**was also implemented to select only users whose fraction of events occurring at their home location during nighttime is larger than a fraction ${\delta}_{R}$ of the total number of events during nighttime.

**Figure A1.**Influence of the parameters. Number of reliable users during the first (

**a**), second (

**b**), and third (

**c**) week as a function of ${\delta}_{R}$ and for different values of ${\delta}_{A}$. The vertical bars indicate the value ${\delta}_{R}=0.3$.

**Table A1.**Number of users (all) and reliable users according to the week of observation and in total.

Date | # Users (All) | # Reliable Users |
---|---|---|

15 to 21 March 2015 | 3,292,923 | 1,657,048 |

10 to 16 May 2015 | 3,292,647 | 1,598,571 |

2 to 8 August 2015 | 3,236,122 | 1,539,621 |

Total | 4,064,476 | 2,565,365 |

**Figure A2.**Comparison between census and XDR data. Each scatter plot and its associated Pearson correlation coefficient represents a comparison between the number of inhabitants (expressed in thousands of individuals) in the census and the number of inhabitants (expressed in thousands of individuals) estimated with XDR data (i.e., reliable users) during the three weeks of observation. Each point represents one municipality of Chile.

**Figure A3.**Boxplots of the number of events per reliable user according to the week of observation. The dashed grey line represents the minimum value (15 is the minimum value required to pass the first filter in the home identification). The dash-dotted line represents the limit of 100 events. The maximum value is 168 (number of hours in the week). Each boxplot is composed of the first decile, the lower hinge, the median, the upper hinge, and the last decile. The blue dots represent the outliers.

#### Appendix A.3. From Events to Networks

Date | #Reliable Users | #Spatial Events | #Social Events |
---|---|---|---|

15 to 21 March 2015 | 1,657,048 | 129,760,887 | 4,433,505 |

10 to 16 May 2015 | 1,598,571 | 126,359,359 | 4,207,538 |

2 to 8 August 2015 | 1,539,621 | 120,960,807 | 3,905,935 |

Total | 3,023,946 | 377,081,053 | 12,546,978 |

**Figure A4.**Number of spatial events (in pink) and social events (in green) according to the hour of the day. Each line represents a week of observation.

## Appendix B. Socioeconomic Structure of the Locations

- Antofagasta in 2002 available at https://ideocuc-ocuc.hub.arcgis.com/datasets/fbde68b6c3d547c8adfcc17d196e1e88_0, last accessed 6 December 2022.
- Coquimbo y La Serena in 2002 available at https://ideocuc-ocuc.hub.arcgis.com/, last accessed 6 December 2022.
- Gran Concepción in 2002 available at https://ideocuc-ocuc.hub.arcgis.com/datasets/f62f12fae97548fd8c71cb405d40e5f2_0, last accessed 6 December 2022.
- Gran Santiago in 2012 available at https://ideocuc-ocuc.hub.arcgis.com/datasets/c264bc8bca7f45bc8ae74329557628b2_0, last accessed 6 December 2022.
- Puerto Montt and Puerto Varas in 2002 available at https://ideocuc-ocuc.hub.arcgis.com/datasets/91deae3707ff447f961b4e2a5cf2300d_0, last accessed 6 December 2022.
- Valparaíso in 2002 available at https://ideocuc-ocuc.hub.arcgis.com/datasets/b9458dbbc94343e58ea5fc9c5def03f9_0, last accessed 6 December 2022.

## Appendix C. Clustering Analysis

**Figure A5.**Ratio between the within-group variance and the total variance as a function of the number of clusters. The red line represents the selected number of clusters.

**Figure A7.**Pulses associated with the four main clusters. The solid lines represent the average pulse, while the dashed lines represent one standard deviation.

**Figure A8.**Pulses associated with the three additional clusters. The solid lines represent the average pulse, while the dashed lines represent one standard deviation.

**Figure A9.**Boxplots of the fraction of reliable users per cluster. Each boxplot is composed of the minimum value, the first quartile, the median, the third quartile, and the maximal value.

## Appendix D. Null Model

**Figure A10.**Boxplots of $\overline{\Phi}$ for the spatial and social interaction matrices. Each boxplot is composed of 100 $\overline{\Phi}$ values, each of them obtained with a ${\Phi}_{h}$ value based on one random assignment. Each boxplot is composed of the minimum value, the first quartile, the median, the third quartile, and the maximal value.

**Figure 1.**Average pulse associated with the four clusters. Plots displaying the standard deviations are available in Figure A7 and Figure A8. It is worth noting that the fraction of reliable users (i.e., mobile phone users with a validated home location) is stable between the different clusters (Figure A9 in Appendix C).

**Figure 2.**Socioeconomic characteristic of the clusters. (

**A**) Fraction of surface area dedicated to each socioeconomic category according to the cluster (colored bars) and in total (white bar) for the whole country. (

**B**) Maps of the four clusters in Gran Santiago (the largest city). (

**C**) Spatial distribution of socioeconomic categories in Gran Santiago (the largest city).

**Figure 3.**Socio-spatial interactions analysis. (

**A**,

**B**) The fraction of spatial (

**A**) and social (

**B**) interaction within and between clusters. The values of $\overline{\Phi}$ obtained with both matrices are displayed. (

**C**) Temporal evolution of $\overline{\Phi}$ across week hours for the spatial interactions (in pink) and social interactions (in green).

**Figure 4.**Intra-city and inter-cities socio-spatial interactions analysis. The index values are based on spatial (

**A**) and social (

**B**) interactions between locations in the same city (diagonal) or from one city to another. There were not enough data available (NA) to measure the spatial interactions between Concepción and Valparaíso.

