Measurement of Potential Victims of Burglary at the Mesoscale: Comparison of Census, Phone Users, and Social Media Data

: Since the target of burglars is generally the property of the inhabitant, it is crucial to accu ‐ rately measure potential victims when analyzing burglaries, especially in small areas. Previous studies on burglary are mostly based on large units such as census tracts or communities. One of the difficulties is the measurement of the potential victims of burglary at the mesoscale. We compare the measuring effects of census population, census households, nighttime mobile phone users, and nighttime social media, such as the Tencent regional heatmap (TRH), on potential victims of bur ‐ glary on 150 m × 150 m grids. Based on the rational choice theory, and controlling for the potentially confounding effects of risks and cost, we show that the TRH performed best, followed by census households and census population, and phone users performed poorly. The best ‐ performing time period for TRH data was 3:00–5:00am on weekends. These findings could lead to an improved measurement of potential victims of burglary at the mesoscale, and could provide scientific insight for crime prevention.


Introduction
The target of home burglary is a residential household. Data such as resident population [1], households [2], and building characteristics [3] in the census are often used to estimate the actual residential population targets. However, such data are not available in small geographical areas in China and many other countries.
Emerging data sources, such as phone users [4,5], the Tencent regional heatmap (TRH) [6], taxi ridership, and subway ridership [7], have been used to estimate ambient population, capturing the dynamic distribution of population. Among them, phone users are mostly applied in theft research. He et al. [8] found that the population indicators calculated from mobile phone big data significantly correlate with the spatial variation of thefts. Song et al. [7] compared the measurement effects of four types of data (phone users, residential population, subway ridership, and taxi ridership) on the population at risk of theft at different times on 1 km × 1 km grids, and concluded that phone users and taxi ridership outperformed residential population in the majority of periods of a day. Recently, researchers [9,10] discovered that TRH helps explain the distribution of drug crime hotspots, and found that it has a significant positive impact on street contact crime. The former [9] used dynamic data to represent the overall population, but did not make full use of the time-varying characteristics of dynamic data; the latter [10] divided a day into three time periods to assess the varying effects on street contact crime. So far, this new ambient population measure has not been applied to burglary studies.
With the development of crime analysis from the macro-to the meso-and microscales [11,12], burglary also needs more meso-and micro-research [13]. At the macro scale, the social disorganization theory is commonly used to explore the influencing factors of burglary [12,14]. In contrast, the rational choice theory and crime pattern theory have been applied at the meso-and microscales. Previous literature has considered the influences of bus stops [15], road density [2,16], neighborhood commercial density [17], population composition [2], and target concentration [16,18]. Crime prevention through environmental design theory (CPTED) pays more attention to housing accessibility [16,18], housing usage patterns, and traffic accessibility [19]. Similarly, in police practice, measurement, prevention, and intervention have been addressed in the literature [20,21]. Measurement and representation of potential burglary victims at the mesoscale have been a consistent challenge.
While studies of burglary are abundant, few exist that compared different measurements of potential victims of burglary. It is in this context that this paper explores alternative measurements using phone users and TRH data. Phone users are derived from the interactions of individual phones and the base stations. TRH is based on the location information collected by mobile apps. Both cell phone data and TRH data have a very high temporal resolution, but the latter has a much higher spatial resolution. Theoretically, cell phone and TRH data at nighttime, when people have returned home, can better capture the actual residential population. In reality, in large cities in southern China, even at night, residents engage in many activities, such as working overtime, having midnight snacks, partying at entertainment facilities, etc. It is worth noting that there may be a significant difference between the regularity of these activities during weekday nights and weekend nights. All of these will affect the measurement of residential population by dynamic population data [10]. However, exactly what time periods are best suited for burglary studies has not been tested empirically.
This study chose the rational choice theory to control for crime risks, costs, and rewards, and aims to identify the most appropriate measure of residential population as a surrogate measure of potential burglary targets. Four measures are compared in this study, including census population, census households, mobile phone users, and THR.

Theories
As argued in the introduction, it is very important to choose a theoretical framework based on the actual social situation of the study area. The rational choice theory is considered suitable for crime research at the mesoscale, and it has been proved to be able to explain Chinaʹs burglary effectively [3,22]. It emphasizes offendersʹ perception and judgment of the target, and comprehensively considers the potential benefits, risks, and cost of approaching the target. Obviously, these three elements, which are outlined below, are closely related to the state of potential victims.
(1) Potential benefits. Previous studies [23,24] considered the wealth and exposure of targets in residential areas to represent their attractiveness to criminals. The proceeds of burglaries come from the residence of the victim, so the number of potential victims and their possible wealth have become the primary conditions for the offender to measure the benefits. In this study, census population, census households, nighttime mobile phone users, and the Tencent regional heatmap were used to represent potential targets. Housing types also imply the offenderʹs estimation of the possible wealth of the household. Three housing types were used in this study: residential district buildings, apartments and dormitory buildings, and commercial-residential buildings [17,25].
(2) Cost of traveling to the target. Traffic convenience affects the cost of burglaries. Following prior research [3], bus stops and road density were selected to represent traffic accessibility. The more convenient the transportation in the target area, the lower the travel cost.
(3) Risks of getting caught. Crime risks stem from civil defense, physical defense, and technical defense. The risk of surveillance and standards of physical security are primary deterrents for burglars [26]. In China, surveillance cameras installed by the police are a direct supervision force, which increases the risk of burglars being discovered. Figure 1 outlines the application of rational choice theory in this study. Specifically the study considered the potential benefits, risks, and costs that may be brought by contacting potentially victimized targets; sorted out the specific corresponding indicators in the built environment and population environment; and analyzed the factors that affect the occurrence of burglaries. From the perspective of the built environment, the risk of exposure and the physical condition of criminal travel are described by surveillance-camera distribution, road density, and various housing-type POIs. From the perspective of the social environment, residential population (census population, census household, mobile phone users, Tencent regional heatmap) was used to represent potential victims.

Study Area
The study area is located in XT JieDao, ZG City, South China. ZG City, as the ʺSouth Gateʺ of China, is a metropolis with a developed economy, a large population, and a very high degree of openness. There are very different urbanization processes, with complex population structure and diverse land use. The southern part of XT JieDao is an urban area with a high degree of urbanization; the northern part is dominated by mountainous areas and sparsely populated. We choose the south as the study area, with a total area of 4.4 square kilometers, with a nonlocal population accounting for nearly 70%. As of November 2010, it had a nonlocal population of 118,900 and a local permanent population of 54,500. The rate of crime, especially burglary, of the study area is significantly higher than in the rest of the city. The study area is composed of 15 communities, including 3 urban villages ( Figure 2). The urban village is a unique urban phenomenon in the process of urbanization in China (it is completely different from the "urban village" in the UK, an idealized model for sustainable development and planning). Urban villages are essentially islands surrounded by newly developed areas. There are major differences in the appearance and internal composition between the urban villages and other communities. The former is composed of disorderly low-rent housing, mixed with various business for basic services, small restaurants, retail shops, and street vendors along narrow alleys. The crowded back-to-back buildings may not follow the proper building codes. In general, urban villages are not well maintained. The latter is composed of well-planned high-rise buildings and green spaces [27].

Data and Methods
We constructed a negative binomial regression model by taking the grid as the sample unit to compare the effects of different indicators on the measurement of the potential victims (the actual resident population). The research data included burglary cases, census data, mobile phone users, the Tencent regional heatmap, surveillance cameras, POIs of housing types, traffic networks, and other basic geographic data. Since the nighttime population can represent the residential population distribution more effectively, mobile phone users and TRH data from 18:00 to 08:00 on both weekdays and weekends were selected for the study. Following a previous study [7], the data were divided in 5 3-hour time periods. Descriptive statistics of all variables are shown in Table 1. In order to not be restricted by administrative boundaries, a grid treatment was applied. Referring to the spatial grid division methods in prior research [28,29], the optimal grid size formula of Griffith et al. [30] was applied, and the optimal grid side length turned out to be about 130 meters. Considering the actual situation of the study area, the grid side length was determined to be 150 meters [10]. Along the edge of the study area, cells with less than 50% inside the study area were removed from the analysis. A total of 192 cells made up the sample for subsequent analysis.

Data and Processing
The data sources and relevant information are shown in Table 2. For burglary cases, we first adopted a procedure to automatically settle the case points, and then manually adjusted the case points one by one to ensure accuracy (number of cases: 284 in 2017, 153 in 2018, and 110 in 2019). For surveillance cameras, abnormal points such as unclear positioning were eliminated first, and the remaining errors were manually adjusted (616 points were in XT JieDao, and 9 points were eliminated, leaving 607 points; among these, 318 cameras were distributed in the southern part of XT JieDao). For road-network data, prior to processing, manual updates were completed by comparing remote-sensing images and field surveys. Referring to prior research [7], after processing, all data were summarized into the grid by intersecting to estimate the grid value of the corresponding variable.

Mobile phone users
Mobile phone user data were derived from the signal data based on base stations provided by one of the three major carriers in China, including the total number of anonymous mobile users counted hourly in each base station during the whole week of 12-18 May 2016. In south XT JieDao, the adjacent cellular towers are within 500 meters of each other. As long as a mobile phone is powered on, the mobile phone antenna will automatically look for base stations to obtain service, and its signal searching is recorded in the mobile phone user data [31]. Since phones often try to connect to the nearest base station, it made sense to use the Thiessen polygon approach to aggregate mobile user groups into grids. To ensure the integrity and accuracy of each Thiessen polygon in the study area, all base stations within 1 km of the periphery of the study area also were collected. It is worth noting that the number of mobile phone users was relative and cannot represent the absolute number of users. Considering the difference between the activities of residents on weekdays and weekends [32], the 114 hours from 00:00 on Monday to 17:00 on Friday were considered weekdays, and the remaining 54 hours from 17:01 on Friday to 23:59 on Sunday were considered weekends. 2. Tencent Regional Heatmap The Tencent regional heatmap (TRH) was a capture of current relative population data in the study area based on the map. It can be viewed in the mini-program ʺYichuxingʺ (based on the social media software WeChat), which shows the dynamic spatial distribution of the app users. Compared with the traditional static census data, commuter survey data, and other dynamic population data, the TRH has the advantages of strong real-time performance, high accuracy, wide coverage, and easy access [33]. Tencent apps have a large user base, and as long as the app is in use or running in the background, the realtime location of users is collected [34]. Through the big data acquisition method with a Python program, the dynamic population data of the whole week of 9-15 April 2018 was obtained, with a sampling spatial interval of 25 m × 25 m and a time interval of 1 h. Although TRH data can only collect the population data of Tencent-related products on smart phones, considering the general extremely high utilization rate of Tencent products in major cities of China, it had a good coverage rate in the research area [34]. Due to its higher accuracy than that of the grids, to make data interpretation more practical, prior studies [33] were referred to convert data to obtain the population at the sampling point, and the corresponding sampling point data in each grid was combined in the overlay analysis as the value of each grid population.

Negative Binomial Regression Model
As is typical in crime studies, our dependent variable, the number of burglary cases, was overdispersed. Therefore, the negative binomial model was more appropriate than the Poisson model [7,35]. The marginal effect of the independent variable of the model is called the incidence rate ratio (IRR), which means for every increase unit of the independent variable, the incidence rate of a case will increase by a factor of the IRR.
For model comparison, Akaikeʹs information criterion (AIC) was adopted to reflect the effectiveness of the model; the smaller the value is, the better the model performed [7]. Variance inflation factors (VIF) were used to test the multicollinearity between independent variables prior to the estimation. In principle, the tolerable range of VIF was less than 10.

Spatial Distribution of Burglary
The burglaries in south XT JieDao showed obvious clustering characteristics ( Figure  3), and the case volume was significantly high in hotspot A (located in the central and western area) and hotspot B (located in the eastern area). Compared with hotspot A, hotspot B had a weaker concentration and a wider range. There also were a few cases in the far east and west. The investigation found that hotspot A was a typical urban village, and it was bounded by three main roads in the periphery. Hotspot B was composed of two urban villages juxtaposed from east to west that were separated by a zigzag path with little morphological differences. What hotspot A and hotspot B had in common was that the land space inside the village city was mixed and disordered, including a large number of highdensity cheap rental buildings, mixed with commercial and residential buildings and schools; while the surrounding area close to the main road was another well-constructed scene, composed of office buildings, commercial buildings, and government agencies, with distinct functions. Obviously, burglary cases were concentrated in the inner urban village area. In addition, scattered burglary cases also occurred in regions C and D. There were two main roads on the west side of region C and on the east side of region D, and there were many well-constructed high-rise residential buildings in each region.
In terms of time distribution, the cases were mainly concentrated in the morning and evening (Figure 4). Most people become aware of a burglary when they got up in the morning or returned home from work during the evening. It should be noted that the exact time of a burglary was uncertain, because the burglary occurred in the absence of the owner's awareness [36]. The ʺtime of incidentʺ in the crime data is often the best guess.

Measurement Result Comparison
Census population, census households, mobile phone users, and TRH data were used to represent the residential population to capture the potential victims of burglary. The first two were static, while the second two were dynamic data that changed in real time. Due to the mobility of residents between their working and living spaces, the nighttime data may be significant for the representation of the residential population. However, before the morning rush hour, the residents may be generally in their homes. Meanwhile, residentsʹ daily routines on weekdays were affected by work and overtime, while on weekends, they were affected by various recreational activities. In order to measure the performance of dynamic data at different time periods at night accurately, 15 hours, from 18:00 to 8:00, were selected separately on weekdays and weekends.
As the VIF shows (Table 3), the collinearity of independent variables was weak in all models (mean VIFs were all less than 1.2, and the Max VIFs were all less than 2). All models were established for the 5 periods with a corresponding potential victim representation variable, and their AIC values are shown in Figure 5. The AIC value was used to measure the fitting of the statistical model. The smaller the AIC value was, the more accurate the independent variable value was in capturing the real situation. According to Figure 5, TRH data performed best, followed by census households and census population, and mobile phone users performed poorly. Both mobile phone users and the TRH data performed better in the late night and predawn hours. However, the performance of the TRH data was more stable throughout the night, and much better fitted than that of the cell phone data.

Influencing Factors Analysis
According to the minimum AIC principle, for mobile phone users and TRH data, the time periods with the best performances (03:00-05:00 on weekdays or weekends) were selected. Four types of different data were used to construct four negative binomial regression models to analyze the influencing factors of burglary, and the results are shown in Table 4. Table 4. Results of the negative binomial regression model.

Model Ⅰ
Model Ⅱ Model Ⅲ Model Ⅳ The AIC value of model Ⅳ was the smallest, and the Pseudo R 2 value of model Ⅳ also was the largest. Both suggest that model Ⅳ had a better goodness of fit than the other three models. The clustering of residuals ( Figure 6) shows that the residuals were highest in the urban villages and their surrounding areas. The Moran's I indicated that models Ⅰ, Ⅱ, and Ⅲ all had significant but low spatial autocorrelation (Moran's I was 0.22, 0.16, and 0.09, respectively), while model Ⅳ had no significant spatial autocorrelation (Moran's I was 0), further underscoring that the TRH data from 03:00-05:00 was the best measure for potential targets. The results of the models indicated that the four types of residential population data played an integral role in measuring the distribution of burglary victims. The IRR values of the four types of residential population were significantly larger than other independent variables, which showed that it had a positive effect on the occurrence of burglaries. The incidence rate ratio (IRR) suggested that the burglaries in a grid increased by approximately 67% (IRR = 1.67) with every single unit increase in the TRH on weekends from 03:00-05:00. Interestingly, although the TRH data could reflect the population distribution more accurately, it was not the absolute real population on the numerical magnitude, and only the census population could be equivalent to the real population value. According to models Ⅰ and Ⅱ, the burglaries in a grid increased by approximately 99% (IRR = 1.99) with every 100-person increase and 182% (IRR = 2.82) with every 100 households.

Census Population Census Households
Theoretically, surveillance cameras represent the power of supervision and guardianship, but only showed weak or no effects in our models. This was consistent with the literature, stating that there was limited evidence of the effectiveness of CCTV for reducing burglaries, as it is easy to mask/hide a face from the camera.
Road density was not significantly correlated with burglaries in all models. This was probably related to the reality of the study area: the roads outside the villages in the city were long, straight, and simple, while the roads inside the villages in the city are mostly narrow alleys, many of which were not passable by cars. The complexity in the composition of different types of roads may have contributed to this result.
Bus stops showed weak negative effects in models Ⅰ, Ⅱ, and Ⅲ . One possible explanation here is that offenders arrived at a locality by bus, and then moved away from the stop to a high-density residential area to find suitable targets, thus bus stops were negatively related to burglaries. The inclusion of the TRH data offset the role of bus stops, making it insignificant in model Ⅳ.
In terms of residential types, only the residential district buildings and commercialresidential buildings showed significant positive effects in model Ⅳ. In fact, in south XT JieDao, apartments and dormitory buildings were often mixed with commercial-residential buildings, with residential district buildings embedded into them in slices.
Apartments and dormitories can reflect some short-term residents who are not registered. These populations also existed in the TRH data, but not in the census population. In model Ⅳ, the IRR suggested that burglaries increased by 12% (IRR = 1.12) with every single unit increase in commercial-residential buildings, and by 19% (IRR = 1.19) with every single unit increase in residential district buildings. This suggested that burglars preferred residential areas to commercial districts, likely due to a greater number of suitable and more rewarding targets. Unexpectedly, in the first three models, every residential type was not significantly correlated with burglaries. This may be due to the low data accuracy leading to mixed information. It also proved the superiority of TRH data in the measurement of the distribution of potential victims.

Discussion
As Favarin said, the analysis of specific types of crime in small areas is better for understanding where and how crime happens in urban areas [37], and we confirmed that TRH data with high spatial and temporal resolution had a significant advantage in measuring the distribution of burglary targets at the mesoscale, and census population also played an important role in quantitative measurement. Compared with prior research [38], which analyzed potential victims by using only a certain kind of spatiotemporal sampling data, our research combined the precise population distribution (TRH data) with the absolute value of the population (census data), which was more realistic.
Although the TRH data was a relative population value, it reflected the real residential population at the micro-to mesoscale due to its high spatial resolution. The TRH data during night hours were viable measures of burglary targets. More potential targets attracted more burglars [39,40]. This was consistent with the rational choice theory that the number of targets and their spatial distribution strongly influence a criminal's choice [16].
As we know, most burglaries occur when a person is not in their home. There may be confusion regarding the logic of using ambient population for burglary targets, as people may argue that the ambient population can deter burglars from committing a crime. Our study actually helped answer this concern. We discovered that the active ambient population at daytime did not explain burglary well, and that only the TRH data during late night and predawn hours, when people may have gone to sleep, were viable surrogate measures of potential burglary targets. The derived variable for potential targets was then applied to the entire 24 hours of a day.
The performance of mobile phone users during weekdays was similar to that during weekends. This was consistent with the findings of Song et al. [7]. However, the accuracy of mobile phone data had limitations. The service areas of a cell tower were estimated through Thiessen polygons [35], which were significantly larger than the size of individual cells. In addition to the mismatch in spatial resolution, the cell phone data was from 2014, a few years prior to the year of the TRH and the crime data. The mismatch in time may have contributed to the poor performance. Further, the cell phone data was from one of the three largest service providers, and it may not have been fully representative of all cell phone users, especially for a small study area.
The performance of the TRH during weekdays was slightly different from that of weekends [41]. However, the best performing time period was consistently 03:00-05:00, whether on weekdays or weekends. This was in line with the regularity of peopleʹs daily activities observed by Rizwan et al. based on location-based social network (LBSN) data sets. They found that the activities of people were the weakest from 03:00 to 05:00 in Shanghai, China [42]. This meant that residents most likely were asleep during this period. Therefore, the TRH data for 03:00-05:00 was a good indicator of the occupied apartments, leading to a viable measure of potential targets for burglars.
One major advantage of TRH is its high resolution of 25 m × 25 m. The other is its superior representation of social media. TRH data is captured by all apps in the big family of Tencent, the most dominant social media platform. These two factors combined help explain why the TRH outperforms the others.
Census population and census households were consistent in many aspects. However, it seemed that household is more important than population, given that the IRR was much higher for the household counts than the population counts. This result was logical, as burglars target households, not individual people.
The deterrent effect of surveillance cameras was not significant. In theory, cameras may discourage street crime [43]. However, their deterrence of burglary has not been universally supported in the literature. It is not really a surprise that this study did not reveal any significant effects of the cameras.
The results for road density were not significant, but the bus stops mainly had a negative effect. Bus stops were mainly located along main and arterial roads. The areas immediately adjacent to these roads were fairly affluent. The inner area further away from the main roads were mostly mid-to low-rise buildings, most of which were cheap rental apartments. These properties, characteristic of the urban village, greatly promoted the risk of victimization in urban villages [19,44,45]. As a result, bus stops showed negative or insignificant effects, which was consistent with the findings of Liu et al. [15].
The rapid drop of burglary cases from 2017 to 2019 might have affected the performance of the model. This drop was coupled with a rapid increase in fraud cases. The convenience and increasing popularity of mobile payments, such as WeChat Pay and Alipay, might have facilitated such a switch.

Conclusions
Based on the rational choice theory, taking south XT JieDao in ZG city, a megacity in South China, as the research area, we compared census population, census households, mobile phone users, and the Tencent regional heatmap (TRH) as measures of potential victims of burglary at the meso-scale of 150 m × 150 m grids. The following were the main conclusions: (1) The ranking of the performance from high to low was: TRH data, census households, census population, and mobile phone users.
(2) The performance of TRH data varied in time. There existed minor differences in the performances between weekdays and weekends. The best time period for TRH data was 03:00-05:00 on weekends.
There were also limitations in our research. The housing type was determined by the POI data, which was viable but may not have been completely accurate. The study area was relatively small, so community-based census variables such as education level and population makeup were omitted. In addition, it cannot be ignored that both mobile phone data and TRH data were relative values of the actual population distribution. They may have underrepresented the location where cell phone coverage was limited and the population who were less inclined to use cell phones and apps. Another limitation was that the exact time of burglaries tends to be unknown. Further, the census data was from 2010, and the cell phone data was from 2016. All of these contributed to the limitations of applying the findings of this research for preventing burglaries.
In sum, our main innovation was that we introduced dynamic population data with a high spatial and temporal resolution to effectively measure burglary targets at the mesoscale. While we believe the findings are generally applicable, their robustness must be tested in future studies.  Data Availability Statement: Data of Burglary are not publicly available due to data protection regulation. The 2010 census data are openly available on request from the National Bureau of Statistics of P .R. China (http://www.stats.gov .cn/). Mobile phone user data, TRH data, and basic geographic information data presented in this study can be purchased at the corresponding data provider company (the corresponding company has been specified in the article). Due to the confidentiality of crime data and purchasing agreement with the other data providers, we cannot share relevant data.