Crowdsensing for Characterizing Mobility and Its Impact on the Subjective Wellbeing in an Underdeveloped Region

Living in an underdeveloped region implies a higher cost of living: access to services, such as school, work, medical care, and groceries, becomes more costly than those who live in regions with better infrastructure. We are interested in studying how mobility affects the cost of living and the subjective wellbeing of residents in underdeveloped regions. We conducted a four-weeks sensing campaign with 14 users in Camino Verde (an underserved region in Tijuana, Mexico). All of the participants used a mobile system that we developed to track their daily mobility. The participants were indicated not to change their daily routine for the study as they carried the tracking device. We analyzed 537 individual routes from different city points and calculated their mobility divergences, while comparing the actual route chosen against the route that was suggested by Google Maps and using this not as the optimal route, but as the baseline. Our results allowed for us to quantify and observe how Camino Verde residents are affected in their mobility in four crucial aspects: geography, time, economy, and safety. A posteriori qualitative analysis, using semi-structured interviews, complemented the quantitative observations and provided insights into the mobility decisions that those people living in underserved regions have to take.


Introduction
Much of the research that has been carried out in the social sciences related to subjective wellbeing (SWB) focuses on its conceptualization, measurement, and understanding [1]. The use of subjective indicators, such as "happiness" or "subjective wellbeing", has become more prevalent in the last 50 years [2]. The desire to understand what makes a life a good life is a desire that comes from ancient times [3]. We have very little knowledge about what makes a worthwhile life [4]. To date, most of the instruments being used to measure wellbeing are time-consuming methods that are based on self-report to capture qualitative data; they are subjective to the interpretation of social scientists (e.g., sociologists, economists, and urban planners). Accordingly, there is an untapped potential to develop tools that enable specialists to more objectively quantify aspects of wellbeing. It is necessary to understand how to measure people's wellbeing, or, in other words, to understand the factors that affect the self-perception of people's happiness. The literature has shown that opportunistic sensing systems are useful in objectively measuring aspects of health, behaviors, and especially mobility. Most of these studies assume the availability of communications infrastructure and access to innovative technology "anytime and anywhere" that is hardly available in undeveloped regions. However, scientific evidence that supports the efficacy of mobile platforms in developing regions for measuring wellbeing is currently lacking. Real-life descriptions of mobile sensing solutions in undeveloped regions are scarce and urgently needed.
People's ability to move about in their environment affects how they perceive their wellbeing. Mobility is a critical and essential factor in maintaining life satisfaction and wellbeing in a complex urban society [5]. In other words, external factors, such as access to jobs, public services, and better infrastructure, affect individuals' subjective wellbeing. Limited access to essential services and limited mobility impacts their possibility of escaping poverty [6]. Indeed, geography plays a causal role in determining the economic growth of a household [7]. For example, a poor household located in an area with favorable geographic factors may eventually escape poverty. In contrast, another completely identical home located in an area with less access to these services will see little progress [8].
These challenges are exacerbated in undeveloped regions with low infrastructure development, like Mexico, which may commonly have a poor road infrastructure, such as lack of sidewalks, streets, and pedestrian crossings. Even in countries with observed economic growth, there are persistently poor areas [7]. More than 70% of the 3.3 billion people that make up the global urban population live in environments that are poor in public resources [9]. In Latin America, 260 million people live in the 198 largest cities [10]. This amounts to a little over 40% of the population living in only 198 cities. People with poor access to public resources have to be continually more aware of other factors beyond time and money. Suppose that a person perceives a lack of security during commutes between home, work, school, and other usual places. In that case, it makes her life more difficult, as she has to be more aware of factors that can affect their safety during the mobility decision-making process. This situation ends up affecting their quality of life and their wellbeing.
We hypothesize that the mobility divergences that are obtained from individuals' mobility patterns may signal factors that affect and can be used to measure individuals' wellbeing objectively. In this paper, we present our conclusions that resulted from a quantitative analysis that we made on data we obtained from a study tjhat ee carried out in Camino Verde, a disadvantaged area located inside of the city of Tijuana in Baja California, Mexico. The mobility data observed presented divergences. The analysis focused on generating a route database from the study's data and comparing them to ones generated by Google Maps as a baseline. The routes generated by Google maps serve as an example of the route that someone without semantic knowledge of the region would take. This analysis allowed for us to observe how Camino Verde residents are affected in four aspects crucial to their SWB: time, economy, geography, and safety.
The observational study that we present in this paper serves to characterize mobility in urban regions where public environments have precarious conditions. We will be better positioned to propose solutions to identified problems with the observations from this study and others we are conducting in parallel to characterize urban environment factors that affect the SWB of citizens. In the case of the mobility study alone, some preliminary ideas are to discuss the findings with local authorities in order to propose better bus routes, prioritize the construction of sidewalks and public lights, among others. This paper represents, to the best of our knowledge, the first study that descrives the use of opportunistic crowdsensing to measure mobility divergences when considering the constraints of underserved regions. The main contributions of this work are:

Related Work
Mobility decisions and possibilities of urban dwellers are affected by their geographic environment [11], so it follows that there will be significant differences between developed and underdeveloped urban regions. The majority of the world's population now lives in cities, with around 3.3 billion people living in urban environments. According to recent data [12], global urbanization is increasing, and developing countries experience the highest growth rates. Currently, around 70 percent live in underserved regions. Urban mobility has been widely studied through simulation [13] or more empirical studies [14]. However, except for some broad characterizations [9,15], not much attention has been paid to underserved urban zones.
Studies that aim to understand mobility patterns, such as the one that was conducted by Lathia et al. [16], where they track users of public transportation systems through the use of checks ins and outs of transportation cards, are very difficult to duplicate in underserved cities where payments in public transportation are made exclusively in cash. Some studies have focused on the large-scale use of tracking devices [17,18] in vehicles, such as taxis, motorcycles, bicycles, and buses; these have produced invaluable insights. Again, in underserved regions, this type of study would be impractical, as public transportation is usually not widely available, or it is so basic that a tracking device would be considered an unnecessary luxury (as would also be the case of private vehicles that some residents may own). As these works evidence, our current understanding of urban mobility comes from studies conducted in highly developed countries. In other words, the urban areas in developing regions, where most people live, are ignored.
Additionally, there are very few studies [19] that focus on explaining the impact that a person's environment has on their subjective wellbeing. However, these studies do not focus on underserved regions. While other mobile sensing platforms (like FunToolkit [20], InCense [21]) are successful in collecting data, they are inadequate for our context where individuals do not have the resources to pay for a monthly data usage plan, and the neighborhood has limited Wi-Fi access and poses several security challenges.

RaMoS: A Mobile Sensing Platform for Opportunistic Sensing
We developed a platform, called RaMos, for six months and following a user-centered design methodology, including contextual interviews, observations, and participatory design sessions that were conducted in an underdeveloped region of Mexico. Our contextual study revealed that participants could not wear a device in plain sight due to security reasons. The application cannot incur data fees and must work without assuming that it will be connected to a data network due to stringent economic resources. We developed a system from scratch, when considering these security and economic constraints of our population and a possible low technological knowledge. Our platform uses a client-server architecture following a disconnected data model, as shown in Figure 1. Here, we provide a summary describing our platform and refer the reader to [22,23] for more detailed information.

The Client Side
The GeoLock subsystem is on the client-side of the platform. An administrator uses a mobile app to set up the sensing campaign by specifying the participant's info (e.g., name, contact), connectivity details, and the number of days to run the campaign. The users are locked out of all interactions with the phone, and they only see a lock screen ( Figure 2). The phone is automatically collecting data, and we store a timestamp of the GPS and altitude data. The users can consult, in their locked screen, contextual information, such as date, time, current battery usage, and the number of remaining days of the campaign. It is important to note that there is no need for technological knowledge from participants, as phones are locked to dissuade any interaction with the device, as a method to conserve battery. The only thing that is required from participants is to carry their phones during their daily routines. The data collected are sent out to the cloud via a WiFi network (when available) while using the MQTT protocol. The mobile app is currently implemented for Android devices.

The Server Side
The RaMoS subsystem is a Web application that resides on the server-side, where data from the mobile devices are collected, and the administrator can visualize this data for an analysis a posteriori. The administrator is presented with a list of the users registered and it can select a GeoJSON file for a particular hour or combine a day's worth of GeoJSON files into a single view. In Figure 3, we can see the visualization of the mobility that one participant experienced during a day. Besides visualizing data, administrators can send unicast or broadcast announcements to users; this serves as an easy way for administrators to relay non-time-sensitive messages to users. There is also an OSRM-based routing engine providing an API to generate the expected routes between the participant's points of interest; this can be done while using the OSRM component or the Google Maps API. As we stated in the previous subsection, the mobile devices are not connected to the cellular network; they only connect to the server when they are within the coverage of a previously recognized WiFi hotspot. Connections are made using the MQTT protocol for data exchange between client and server.

Field Study
We conducted a four-weeks sensing campaign with 14 users in a marginalized neighborhood, called Camino Verde, which has the highest poverty and crime rates in the USA-Mexico border city of Tijuana, Mexico. Physical access to the zone is difficult, since it is located between a hill and a canyon where most of the streets do not have pavement. Public transportation is scarce and limited, and the police do not patrol the entire neighborhood. During the study, the participants were asked to carry a mobile device, which we provided during their daily activities. In doing so, we were able to track their daily mobility patterns from Camino Verde to other Tijuana locations.

Limitations
A considerable limitation was the low number of participants due to the difficulties in recruiting them. We addressed this by focusing not on the participants as individuals, but on the routes that they generated, granting us a bigger n with which to work. Although there are many similarities between all regions in Latin America, it is essential to remember that this study was only carried out in one region of Mexico. Finally, the qualitative data captured were limited due to low participation and challenges in working in such a challenging environment and with a vulnerable population.

Pilot Study
We first conducted a pilot to test our system's use in a concrete scenario outside of the lab. We conducted the test for seven days, with four computer science students that were aged 24 to 38. We collected the data from the user devices and found out that some users had collected less than seven days of data due to application errors. Some users complained about the rapid battery depletion, which led us to adjust the data-sampling rate and redesign the user interface to warn users about low battery levels through sound notifications. Overall, this pilot test was useful for fine-tuning the system and making it more robust for field deployment.

Participants
We recruited 14 participants (see Table 1) through a research center for innovation and social advancement, called La Granja (run by the ToroLab Collective), with a building located inside Camino Verde. The participants voluntarily enrolled and consented to the study. Among the inclusion criteria was to have only participants living in Camino Verde, some of whom may commute between Camino Verde and other Tijuana locations daily. The exclusion criteria involved individuals using a motorized vehicle; although, using public transportation was allowed. We gave all of the participants a cheap (around USD 30) Android smartphone with the RaMoS platform installed as their tracking devices. The participants kept the phone as an incentive after completing the study. Only 12 out of the initial 14 participants finished the study.

Data Collection
In total, we captured over 537 mobility routes. We define a route as the series of geographical points given by GPS coordinates that were traveled by a participant. We collected 5960 h of mobility data, with an average of thirteen hours a day per participant.

Data Analysis
We logged the daily route from each user in order o analyze the quantitative data. We created our database from the GeoJSON files uploaded automatically by the users' devices. Each GeoJSON file was processed while using the iCluster algorithm [24] to create a list of points in the route, which we used to obtain the users' daily routes that were traveled by each participant. Each route is a vector of the user's location at instance i and it is defined by the following equation: where each point (P i ) in the real route is composed of a latitude, a longitude, an altitude and a time at an instance i, such as that From each real route, we were able to obtain the suggested Google Maps route while using the Maps API, with P i and P n as arguments. This, in with some pre-processing in turn, gave us the suggested route that is given by the following equation.
where P 0 = S 0 and P n = S n , thus both routes have the same starting and finishing points by latitude and longitude. From these routes, we extracted the distance, as seen in Table 2; although the participants did not drive their own vehicle, we added to the same table how the fuel consumption and cost features would be extracted, in case some comparison should be made.

Distance
The sum of the distance between every point in the list. Formula:

Fuel Consumption
The fuel consumption that was calculated by dividing the distance traveled by a reasonable consumption of gasoline of a vehicle that is around ten years old. We used our routes database to calculate the divergences between the participant's route from point A to point B and the route that was suggested by Google Maps for the same two points. It should be noted that we used routes in both pedestrian and vehicular (public transport) modes, as compared with their respective suggestions by Google Maps, i.e., we did not compare vehicular versus pedestrian routes. We calculated the areas that formed inside these divergences, naming these areas geodes (from GEOgraphic DifferencES) and manually tagged 537 of them; Figure 4 shows two of such geodes. For each geode, we calculated its area, centroid, and the actual route distance (Table 3). We stipulated that if two geodes had a similar area and a similar central point (composed of a latitude and a longitude), they would probably be the same geode. Using the area and the center of the geode of these tagged divergences, we determined whether the divergence had previously been found. This way, we created a data set that contained the total amount of times that any of the participants took the same divergence.
The Geodes allow for both visualizing and quantifying the differences between the routes that Camino Verde residents take, according to their knowledge and possibilities, and the routes that a non-resident without knowledge of the region would take by consulting a route suggestion app. In the case of vehicular mobility, it is essential to remember that Camino Verde residents only used public transportation; therefore, the comparison was made under the assumption that non-residents would drive their own vehicle. There were two main reasons to make the comparisons in this way: first, it would not make any sense to compare with other people while using the same bus routes, and second, none of the route suggestion apps that we tried (including Google Maps) had the possibility of suggesting public transportation routes. In the case of pedestrian mobility, it was possible to obtain route suggestions and, thus, make comparisons. Ultimately, the point of making these comparisons is to observe how environmental conditions affect the mobility decisions of neighborhood residents. Because they already know the problems and how to avoid or at least mitigate them (e.g., what shortcuts to take, where not to go), the best point of comparison is with non-residents who will probably rely on external aids (such as route suggestions) for their mobility.

Feature Description, Formula, and Variable
Geode A Geode is an area that is created by the concatenation of the points that compose a participant's real route taken, with those of the route that was suggested by Google maps. Formula: The area is the measurement of the surface delimited by the real route and the Google Maps route. Formula: The geographical center point of the Geode. Formula: centroid = ( ∑ i lon /n, ∑ i lat /n)

Quantitative Results
We found significant variations when compared to the routes that were suggested by Google Maps after analyzing the routes that were taken by Camino Verde residents during their daily mobility. It is important to note that the participants did not use Google Maps to get suggestions; they just had the tracking device capturing the routes they usually took. We used Google Maps suggestions for our analysis as a reference to compare the actual routes taken versus what people without knowledge of the specifics of the zone (such as shortcuts, hills, dangers, etc.) would take. In this section, we show the results of our quantitative analysis; these results were complemented with qualitative analysis (Section 6) in order to gain insights into the factors that affect mobility and the decisions that the residents have to take. We identified four significant types of problems that people living in these underserved regions face: in their time, through their geography, in their economy, and their sense of security. While we manually tagged over 537 routes, it is noteworthy that 50% are unique (269/537).

Mobility Divergences
Our results indicate that mobility divergences, for both pedestrian and vehicular modes, are not highly recurrent among participants; only 37.9% (102/269) of the data had the same geode, 167 geodes occurred once, 50 geodes repeated twice, and 22 geodes happened three times ( Figure 5). This percentage means that the majority of divergences occurred only once. However, some divergences did occur multiple times, in particular, one repeated twenty-two times. This divergence occurred between a frequent street crossing and a local university, when people walked between these two points. Our inference from the data that were collected indicates that this divergence occurred so much due to the bus taking a more transited street instead of a shorter route, with the latter being the official route that it should have always followed. On average, each participant incurred in 41 divergences representing 8% of mobility traces, five out of 12 participants decided to take a route other than the route suggested by Google Maps most of the time with divergences above the average. In the a-posteriori analysis (see Section 6), the participants explained that, when walking, they would frequently take routes different than those suggested by Google maps, as the suggestions would not consider certain factors, such as accessibility and security; they also indicated that the route in Google maps was not the most efficient, as there were often alternative informal routes that they knew and that did not appear in maps.

Time
One of the first factors that we identified in which participants were affected was in their time. The easiest way to identify how participants are affected in their time is by the direct relation of the distances traveled and the time spent. Our initial assumptions were that people who live in underserved regions would always take the shortest route, which would mean the shortest travel time. The collected data showed that participants often would end up taking a much longer route, both when walking or when using public transportation. For pedestrian mobility, the longer routes are mostly due to the geographic situation, so we will discuss it in the next subsection. In the case of vehicular mobility, most people use public transportation; this not only involves longer routes (as compared with driving), but also overhead by walking to the bus stop, waiting times, transfers between buses, and other factors that the participants could not control. In total, we calculated the expected travel time for 382 routes. 58.115% (222/382) of the routes that were provided by Google had a shorter travel time. In Figure 6, we can see a scatter plot of all the route travel times, where positive values belong to instances where the real route was shorter. In contrast, routes with negative values belong to instances where the route that was suggested by Google Maps was shorter. This distribution means that, although a shorter (faster) route existed, the participants would take a longer route more frequently. To mention a concrete example, participant 35 (Figure 7) took longer routes than expected and had to spend more time commuting between different city locations using public transportation. The time that was dedicated to commuting represented a considerable amount of the day. The participants reported having spent anywhere from two to four hours daily on public transport. Figure 7. Graph of relation between total divergences, real route distance, and suggested route distance.

Geography
After analyzing the collected data, we observed that the distance most participants had to walk to use public transportation was within 2 km. Thus, we defined this as our division length between pedestrian transport and motorized transport; this distance coincide with studies, such as [25], where they indicate that around 80 percent of trips shorter than one mile (1.6 km) are made on foot, as well as that this drops significantly for trips more than two miles (3.2 km). Of our original 382 routes, 142 (38%) had a distance shorter than 2 km. When the distance was shorter than 2 km, the participants' route was shorter 71.62% (106/148) of the time than the one suggested. In contrast, our dataset of routes longer than 2 km indicates that only 23.07% (54/234) of the time the route taken by participants was shorter than the suggested route.
Among the calculated geodes, the largest one was over 20 square kilometers, while the smallest had an area of 0.00017 square kilometers. 17% of geodes had an area above 758,775 square meters. This means that, most of the time, the route that is suggested by Google Maps is slightly different from the participants' actual route. The knowledge that participants had about the environmental conditions of the neighborhood (such as hills, unpaved roads, unmarked paths, and others) influenced the routes they took, as will be discussed in Section 6.
It could be expected that, when walking, people would take a shorter (or equally long) route than the one suggested. However, from our data analysis, we can see that people do not always take the shortest route, as other factors may influence their mobility decisions. One of these factors is the geography of where they are traveling. For example, in Figure 8, we can see a case where the participant (green-yellow route) walked a longer route than the one that was suggested by Google Maps; in this case, the length difference between both routes was 0.39 km. On the right side of Figure 8, we can see that there is a much smaller inclination in the route that the participant took, as opposed to a steeper hill in the route that was suggested by Google Maps.

Economy
Another area where people who live in underserved regions get negatively affected is in their economy. Most of them do not own a vehicle, so they have to take public transportation. The current daily salary in the State of Baja California is approximately 176.72 MXN, and one round-trip in public transport costs about 28 MXN in the city of Tijuana. Table 4 shows what percentage of a person's daily salary that each round trip represents. It is worthwhile to note that many people in the zone have to take more than one transport daily in order to get from their home to work and back, as discussed in Section 6.

Qualitative Analysis
We supplemented our quantitative data with semi-structured interviews; only three participants could continue with this phase. During the interviews, we showed participants the routes that they followed and asked some questions regarding their daily travels, such as how much they spent on their daily commutes, how that affected their schedules, and what they felt was their most significant problem for mobility. Most were surprised by their routes and were interested to see the map of their daily travels. All of the interviews were transcribed for future analysis.
We used deductive analytical approaches [26], which were based on our initial research questions focused on understanding the reasons why participants did not follow the suggested routes from Google Maps, in order to analyze the qualitative data. We used deductive coding to examine how observed behaviors and reported perceptions supported or contradicted our research questions. We additionally used inductive approaches for theme analysis from our data in several categories and subcategories (Table 5). To support our inductive analysis, we followed a qualitative approach, including using techniques to derive grounded theory and affinity diagramming (e.g., open and axial coding). Using these techniques, quotes, and/or events that were obtained from the interviews, all were grouped to uncover emergent themes that were related to geography, time, economy, and security. Further inquiry on these subjects elicited interesting comments from participants, providing insights into the quantitative findings. Regarding time and geography, the participants reported that they spent anywhere from two to four hours daily on public transport. The time spent commuting represented up to one-sixth of their whole day for some participants, and event up to one fourth, not counting a recommended eight hours sleep. Additioally, the lack of certainty about the bus routes, as drivers do not always follow the official routes, and the non-existent routes schedules make time management a complicated task.
Geography was also a factor affecting time and overall mobility, as the participants had to avoid some routes where it was difficult to walk, more so if carrying a baby stroller, for instance. Some of the reported environmental factors that affected mobility decisions were the steepness of a hill, rainwater streams, muddy terrain, lack of pavement, and sidewalks. The presence of empty lots, open playing fields, and places that are not appropriate for building provided opportunities for making unofficial pathways for walking shorter, although sometimes unsafer, routes.
As for the impact in their economy, one participant, a student, reported around fifty percent of their personal income going to public transportation and sometimes even more, for instance, when arriving late to get the bus and thus needing to grab a taxi to make it to school on time. All of the participants reported a high percentage of their income going to travel-related activities.
Finally, the participants also reported being affected in their sense of security, as they have to be aware of adverse factors during their daily commutes. Several participants reported that trying to avoid certain places or zones perceived as insecure was essential when deciding which route to take. Furthermore, they indicated that this perception of insecurity came from factors, such as a known drug selling point, a place that is frequented by gangs, a street without lights, a place where landslides occur when it rains, a street with many loose menacing dogs, and other adverse factors. This made them take decisions not only to avoid certain regions, but also certain hours (e.g., trying to be home before it gets dark).

Conclusions and Discussion
Although underdeveloped urban regions make up a large percentage of urban areas, there is very little research into the mobility circumstances that affect people within these areas. Although there are studies [5] that indicate a link between the mobility and subjective wellbeing of people, they focus on a much different populace.
The lack of research into the mobility of people that live in underdeveloped regions can be partially attributed to a lack of infrastructure. Studies that have been done in other regions of the world [16] are impossible to replicate in regions that are so underdeveloped in infrastructure. We believe that studies, such as this one, can serve as a small piece of the foundation for further research into people's mobility in these areas. We intend to present our findings to the local authorities, so they can determine actions to help mitigate the problems that are faced in people's daily mobility. Some possible actions are: finding a more direct route that buses can take, identifying the need for new routes in public transportation, prioritize the placement of public lights, etc.
This study served as a first step to understand how mobility can differ between what is expected and reality in an underdeveloped region, such as Camino Verde, Tijuana. While this study had few participants, we collected a large number of routes and compared them to what we could consider to be an expected route. The quantitative results show that people in these regions are affected in their mobility, mainly in four aspects: in their geography, time, economy, and safety. Further qualitative analysis provided insights into the decisions that have to be made in order to overcome or mitigate the affecting factors. For future work, we consider applying more questionnaires, both before and after the mobility data collection.
This study proposes a way to collect mobility data from people that live in underdeveloped regions. We believe that access to this data could prove to be useful in designing better cities and creating better navigation applications while using semantic data. As future work, more research should be done on the subject of participants' reports of factors affecting their mobility, which, from the interviews, were classified in the categories of geography, time, security, and economy. Our qualitative analysis up to now is only speculative due to a lack of more participants for interviews. A future study could also use the same methods to collect data in a first-world city, thus allowing for us to create a comparison between the two regions.